Skip Navigation


Human Molecular Genetics Advance Access originally published online on November 3, 2005
Human Molecular Genetics 2005 14(24):3837-3845; doi:10.1093/hmg/ddi408
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Material
Right arrow All Versions of this Article:
14/24/3837    most recent
ddi408v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (8)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Grice, E. A.
Right arrow Articles by McCallion, A. S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Grice, E. A.
Right arrow Articles by McCallion, A. S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Evaluation of the RET regulatory landscape reveals the biological relevance of a HSCR-implicated enhancer

Elizabeth A. Grice1, Erin S. Rochelle1, Eric D. Green3, Aravinda Chakravarti1 and Andrew S. McCallion1,2,*

1McKusick-Nathans Institute of Genetic Medicine, 2Department of Comparative Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA and 3Genome Technology Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA

* To whom correspondence should be addressed at: McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, BRB Room 449, 733 N. Broadway, Baltimore, MD 21205, USA. Tel: +1 4432875624; Fax: +1 4106148600; Email: amccalli{at}jhmi.edu

Received September 2, 2005; Accepted October 26, 2005

GenBank accession nos{dagger}


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 MATERIALS AND METHODS
 SUPPLEMENTARY MATERIAL
 ACKNOWLEDEGMENT
 REFERENCES
 
Evolutionary sequence conservation is now a relatively common approach for the prediction of functional DNA sequences. However, the fraction of conserved non-coding sequences with regulatory potential is still unknown. In this study, we focus on elucidating the regulatory landscape of RET, a crucial developmental gene within which we have recently identified a regulatory Hirschsprung disease (HSCR) susceptibility variant. We report a systematic examination of conserved non-coding sequences (n=45) identified in a 220 kb interval encompassing RET. We demonstrate that most of these conserved elements are capable of enhancer or suppressor activity in vitro, and the majority of the elements exert cell type-dependent control. We show that discrete sequences within regulatory elements can bind nuclear protein in a cell type-dependent manner that is consistent with their identified in vitro regulatory control. Finally, we focused our attention on the enhancer implicated in HSCR to demonstrate that this element drives reporter expression in cell populations of the excretory system and central nervous system (CNS) and peripheral nervous system (PNS), consistent with expression of the endogenous RET protein. Importantly, this sequence also modulates expression in the enteric nervous system consistent with its proposed role in HSCR.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 MATERIALS AND METHODS
 SUPPLEMENTARY MATERIAL
 ACKNOWLEDEGMENT
 REFERENCES
 
The ability to impute function from analysis of DNA sequence alone remains an immense challenge in human genetics. Although protein-coding sequences can be predicted with relative ease, our understanding of the nature and identity of functional non-coding DNA is still rather limited. To a first approximation, functional DNA sequences can be predicted based upon evolutionary sequence conservation; functional regions are less tolerant of nucleotide substitution than non-functional (neutral) sequences (1Go), and thus evolve more slowly. Consistent with this hypothesis, coding sequences may be readily identified based on evolutionary conservation. Interestingly, of the 5% of the human genome that is estimated to be evolving more slowly than the neutral rate (2Go), less than one-third actually encodes protein. The remainder, conserved non-coding sequences, are commonly predicted to regulate temporal, spatial and quantitative aspects of gene expression (2Go,3Go), among other roles. However, unlike coding sequences, there is no vocabulary beyond conservation to guide the prediction of biological relevance of non-coding sequences (4Go). Consequently, there is significant interest in identifying and characterizing functional non-coding sequences.

The availability of an increasing number of vertebrate genome sequences, along with the development of bioinformatic analysis tools, has made the comparison of large genomic sequence intervals a feasible approach for the identification of putative regulatory sequences. However, despite increasing numbers of sequences identified through comparative sequence analysis, only a subset of conserved non-coding sequences identified at a handful of loci have been functionally characterized (5Go–13Go). The paucity of functional data for non-coding sequences represents a substantial impediment to evaluation of the potential role of non-coding variation in human disease. Although non-coding variation is predicted to play a significant role in common human disease (4Go,14Go,15Go), only ~1% of known human disease-associated mutations occur in regulatory sequences, localizing predominately within minimal promoter regions (16Go). Until recently, mutation detection in non-coding regions was almost exclusively restricted to sequences adjacent to the transcription start site; several classic examples of regulatory mutations have been identified in this way. Consequently, this number probably represents a gross underestimate of disease-causing non-coding mutations.

RET is a crucial developmental gene that encodes a receptor tyrosine kinase essential for normal embryonic development and neuronal maintenance. The protein is expressed throughout the CNS and PNS, and excretory system during embryogenesis, mediating signals influencing cell proliferation, differentiation, migration and apoptosis (17Go). RET is the major susceptibility gene in Hirschsprung disease (HSCR), a relatively common congenital disorder in which both non-coding and coding mutations are predicted to underlie disease susceptibility (14Go,18Go). We recently identified an enhancer sequence at RET, which contains a relatively common HSCR susceptibility mutation (15Go). Although the HSCR associated allele reduces in vitro enhancer activity 6-fold compared with the non-associated (wild-type) allele (15Go), the biological relevance of this enhancer sequence is unknown. Furthermore, the fraction of conserved non-coding sequences at this locus with regulatory potential is unknown. There are, in fact, few existing reports of comprehensive functional evaluation of conserved non-coding sequences of even a single locus, impeding attempts to examine association between non-coding variation and disease.

Here, we report a systematic examination of conserved non-coding sequences identified in a 220 kb interval encompassing RET. By employing a cell-based reporter assay, we demonstrate that most amplicons containing identified human RET conserved non-coding elements are capable of enhancer activity in vitro, and the majority of the elements exert cell type-dependent control. We also demonstrate that a selected subset of regulatory elements can bind nuclear protein in a cell type-dependent manner consistent with their in vitro activity. Importantly, the most striking neuronal enhancer identified in this study is the one we have implicated in HSCR based on association-genetic analysis in human patients (15Go). We report the in vivo function of this enhancer (MCS+9.7) via transgenesis in mouse, demonstrating that it exerts regulatory control consistent with the endogenous RET protein. This enhancer drives reporter expression in cell populations of the excretory system, CNS and PNS and, specifically, in the digestive tract during embryogenesis in a manner consistent with its proposed role in HSCR.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 MATERIALS AND METHODS
 SUPPLEMENTARY MATERIAL
 ACKNOWLEDEGMENT
 REFERENCES
 
Comparative sequence analysis identifies multi-species conserved sequences at RET
To identify conserved non-coding sequences at RET, we compared genomic sequence of an ~350 kb segment encompassing human RET (chr10: 42754810–43104810; UCSC hg17) with sequence from the orthologous intervals in 12 non-human vertebrates (chimpanzee, baboon, cow, pig, cat, dog, rat, mouse, chicken, zebrafish, Fugu and Tetraodon). The generation of genomic sequences used in this study has been described previously (15Go). Sequences were first aligned to the human reference sequence using AVID (19Go) and then visualized using the mVISTA tool (http://genome.lbl.gov/vista/index.shtml) (20Go,21Go) under established parameters of ≥70% identity, ≥100 bp (6Go). These criteria identified a total of 132 sequences conserved between the human reference and at least one other non-primate vertebrate. Forty-eight conserved sequences physically overlap exons of RET or other predicted genes. The remaining 84 conserved sequences are likely non-coding because no matching cDNA sequence or open-reading frame ≥20 amino acids in length were detected. Identification of these sequences was restricted to alignments between mammalian orthologs. All conserved sequence elements also overlapped predictions made by the method of Margulies et al. (22Go), a quantitative algorithm that uses phylogeny to calculate the probability of observing a given number of sequence identities at each base position. Although additional conserved sequences may be identified under modified criteria or through additional algorithms (23Go,24Go), the approaches we have used are both validated and established methods that clearly identify sequences evolving more slowly than the neutral rate (25Go).

In vitro analysis of RET multi-species conserved sequences
We directly tested the hypothesis that conserved non-coding sequences regulate gene expression by examining their potential to function as enhancers or repressors in vitro. To focus our efforts on conserved non-coding sequences present in multiple vertebrates, we prioritized a selection of those present in three or more non-primate mammals (n=45 of 84 non-coding sequences). We refer to these as multi-species conserved sequences (MCSs). Because the boundaries of functional non-coding elements are not well defined, MCSs were frequently amplified in groups of two or more elements (Table 1); the average size of amplicons was 1.9 kb. We amplified 18 regions, encompassing all identified MCSs from human genomic DNA; primer sequences are listed in Supplementary Material, Table S1. These amplicons were subcloned in the context of the SV40 promoter and luciferase; completed constructs were termed pDSma_RET_MCS*, where * denotes the distance (kb) and relative position (+ or –; 5' or 3', respectively) from the RET transcription start site. We selected two cell lines for these assays: the RET expressing neuroblastoma cell line, Neuro-2A, and the epithelial cell line, HeLa in which RET is not expressed. When transiently transfected into Neuro-2A cells, >80% (15/18) of MCS constructs (pDSma_RET_MCS*) demonstrated increased luciferase reporter expression compared with a control vector (pDSma_promoter) in which luciferase expression was driven by the SV40 promoter fragment alone (Fig. 1A). These results suggest that the majority of non-coding RET MCS amplicons may play a role in regulating gene expression. We next determined whether such regulatory activity was consistent with tissue-dependent RET regulatory control. We directly examined this question by conducting these assays using the HeLa cell line. Consistent with the absence of RET expression in HeLa cells, <17% (3/18) of MCS containing constructs (pDSma_RET_MCS*) demonstrated luciferase expression that was greater than the control pDSma_promoter construct. However, 67% (12/18) of MCS constructs appeared to actively repress luciferase expression in HeLa cells. All assays were conducted in triplicate and were consistent upon repetition. These data are consistent with the tissue-dependent nature of RET regulation. Importantly, we likewise examined the regulatory potential of constructs containing non-conserved sequences (NCSs) (n=10). These sequences failed to drive luciferase expression at levels significantly greater than the control (pDSma_promoter; Fig. 1A) in either of the above cell types. NCS sequences, tabulated in Supplementary Material, Table S2, were distributed evenly throughout the 220 kb interval and ranged in size from 0.5 to 1.6 kb.


View this table:
[in this window]
[in a new window]
 
Table 1. Description of non-coding amplicons encompassing identified MCSs
 


View larger version (53K):
[in this window]
[in a new window]
 
Figure 1. In vitro characterization of conserved non-coding sequences at RET. (A) Luciferase expression of 45 MCSs contained within 18 amplicons. pDSMA_MCS* constructs were analyzed in Neuro-2A (blue bars) and HeLa (red bars) cell lines. Expression values are normalized against promoter only construct (pDSMA_promoter) expression (thin dotted line). Likewise, 10 NCSs were analyzed in the same assay. Additional controls included pDSMA_basic and pDSMA_control constructs; the former contained a luciferase ORF in the absence of a promoter, and the latter comprised SV40 promoter and enhancer sequences in combination with a luciferase ORF. All assays were conducted in triplicate and consistent upon replication; error bars report standard error in each instance. (B) mVISTA plot comparing human reference sequence with orthologous sequence from 8 mammals (window shown=50 kb/350 kb). Green-highlighted regions designate RET MCS sequences that were capable of driving luciferase expression 4-fold compared with the promoter alone (see thick dotted line in Fig. 1A). Additionally, orange-highlighted regions designate RET MCS sequences localized to intron 1, wherein peak transmission distortion (by TDT) occurs in HSCR (15Go). Colored peaks indicate sequence conservation of ≥70% identity and ≥100 nucleotides. Red, non-coding; Blue, RET exons. (C) EMSAs using Neuro-2A cell extract with 30–50mer oligonucleotides located within RET MCSs. E–, no extract; E+, extract added; E+ & C, extract and competing unlabeled oligonucleotide added. (D) Corresponding EMSAs were also performed for each oligonucleotide using HeLa nuclear extract.

 
Identified MCSs demonstrate sequence-specific binding of nuclear protein
Regulatory sequences commonly mediate their effect upon binding transcription factors that instruct their regulatory control. Thus, we hypothesized that discrete sequences within regulatory MCSs would bind nuclear protein in a sequence-specific and cell type-dependent manner consistent with their behavior in the above described luciferase assay. To test this postulate, we conducted electrophoretic mobility shift assays (EMSAs) on selected sequences within a subset of MCSs. First, we prioritized MCS amplicons demonstrating the greatest magnitude of effect in luciferase assays performed in neuronal (Neuro-2A) cells, under the assumption that in vitro activity provides a reasonable surrogate for in vivo functional potential. Specifically, we selected MCSs capable of driving luciferase expression at levels ≥4-fold than the promoter only construct (n=5; MCS –32; MCS –8.7; MCS –5.2; MCS –1.3; MCS +9.7). These sequences are highlighted (green) in Figure 1B. Secondly, we additionally selected all MCSs localizing to the genetic interval previously implicated in HSCR susceptibility (RET intron 1; n=3) (14Go,15Go,26Go–28Go), under the assumption that one or more of these MCSs may be relevant to HSCR. These sequences (MCS +2.8; MCS +5.1; MCS +9.7) are highlighted in orange in Figure 1B.

Conserved non-coding sequences are reported to be enriched for functional transcription factor binding sites (TFBSs) (9Go). We examined MCSs within selected amplicons for known TFBS using TESS (Transcription Element Search Site, URL: http://www.cbil.upenn.edu/tess) as described in Materials and Methods. Identified TFBSs are tabulated in Supplementary Material, Table S3. In light of these analyses, we synthesized oligonucleotides (30–50mer, sequences tabulated in Supplementary Material, Table S4) to include complete consensus sequences for predicted TFBS and examined their potential to bind nuclear protein using EMSAs. Protein binding was determined to be sequence-specific if excess unlabeled oligonucleotide was able to displace labeled oligonucleotide (E+ & C lanes, Fig. 1C or D). 17/25 oligonucleotides, corresponding to 7/7 MCS amplicons demonstrated sequence-specific binding to Neuro-2A nuclear protein (Fig. 1C). Only 5/25 oligonucleotides, corresponding to 2/7 MCS amplicons, demonstrated sequence-specific binding of HeLa nuclear protein (Fig. 1D). The majority of the oligonucleotides readily bound nuclear extract from HeLa cells; however, as the bound protein was not significantly displaced by excess unlabeled oligonucleotide, the binding was not sequence-specific, but is likely to be an artifact of the assay. Additionally, multiple shifting bands were frequently observed when oligonucleotides were incubated with HeLa nuclear protein. These multiple bands likely represent oligonucleotides bound by full or partial protein complexes. However, these patterns were not consistent with sequence-specific interactions, as they were not competed away by excess probe. Importantly, these data are also consistent with the regulatory control of gene expression observed, for the corresponding MCS amplicons (MCS –32; MCS –8.7), in the above luciferase assays, as summarized in Table 2. MCS –32 and MCS –8.7 were the only two MCSs capable of enhancing luciferase expression in HeLa cells, and they were the only two MCSs that bound nuclear extract from HeLa cells in a sequence-specific manner.


View this table:
[in this window]
[in a new window]
 
Table 2. RET MCSs exert cell type-dependent control
 
Consistent with all of the above postulates, the recently identified HSCR-susceptibility variant localizes to enhancer sequence MCS +9.7, which demonstrates the greatest magnitude of effect of all examined MCSs, driving luciferase (Fig. 1A). Furthermore, the identified variant lies within an examined oligonucleotide sequence demonstrating cell type-dependent protein binding. However, although the HSCR-susceptibility variant lies within a predicted SRF-binding site (Supplementary Material, Table S3) and within one nucleotide of a predicted retinoic acid receptor (RAR{alpha}1, RARß, RAR{gamma}) site, a potential role for these sites remains unclear in the absence of a full understanding of the biological relevance of MCS +9.7. For this, as for any identified regulatory MCSs, such an understanding ultimately necessitates their analysis in vivo.

Transgenic analysis of MCS +9.7 regulatory control
To begin to determine the in vivo role of regulatory MCSs identified at this locus, we prioritized MCS +9.7, having previously demonstrated significant association between HSCR susceptibility and an MCS +9.7 variant (15Go). To test MCS +9.7 for its ability to spatially and temporally modulate gene expression, we subcloned the MCS +9.7 amplicon into a ß-galactosidase (LacZ) reporter vector in the context of the mouse heat shock protein 68 (hsp68) promoter. The transgenic construct was injected into fertilized mouse oocytes, and multiple stable transgenic lines (G0) were identified (n=4). We then established timed matings to facilitate examination of the resulting G1 embryos at time points overlapping the critical period of RET activity during embryogenesis (10.5–14.5 dpc, days post coitum).

MCS +9.7 drives LacZ reporter expression in a manner consistent with many aspects of the temporal and spatial expression of RET (17Go,29Go–31Go). Most notably, LacZ expression is detected within the external gut loops at 12.5 dpc (Fig. 2A) consistent with RET expression in the enteric nervous system during embryogenesis (Fig. 2B) and its proposed role in HSCR (15Go). By 14.5 dpc, reporter signal is detected throughout the length of the gut consistent with RET expression during the colonization of the gut by neural crest-derived enteric ganglia (29Go,31Go). Reporter expression is also detected in the developing sensory and autonomic ganglia of the trunk of MCS +9.7 transgenic embryos at all time points observed (10.5–14.5 dpc). At 10.5 dpc, faint and diffuse LacZ signal was detected in the spinal cord in positions consistent with truncal neural crest émigrés immediately prior to their condensation to form the dorsal root ganglia (DRG) (data not shown) (32Go). At later time points (12.5 dpc–14.5 dpc), LacZ staining intensified, becoming punctuate (Fig. 2C and E), consistent with RET expression in DRG (Fig. 2D). The identity of this cell population was confirmed in a transverse section through the trunk of a 12.5 dpc transgenic embryo which illustrates localized staining in the DRG population. This expression pattern is also accompanied by reporter signal in the more ventral portion of the spinal cord, consistent with RET expression in the motor neuron column (Fig. 2F). This ventrally localized domain of staining extends along the entire anterior–posterior axis of the spinal cord and into the hindbrain (data not shown). Additionally, MCS +9.7 transgenic embryos displayed LacZ staining in the brain. Staining was localized to the midbrain, the pons and the forebrain (Fig. 2C, G and I). Indicated in Figure 2G is LacZ staining in the trigeminal ganglia (V) and the optic ganglia (II). This staining pattern is consistent with the reported role for RET in the development of all cranial ganglia (29Go). LacZ staining is also detected in the cells of the nasal epithelium (Fig. 2I) (31Go) as well as in the forelimbs and hind limbs of MCS +9.7 transgenic embryos beginning at 12.5 dpc; staining in the limbs is predominantly localized to the mesenchyme between the digits (Fig. 2H), consistent with expression of the RET ligands GFR{alpha}1 and GFR{alpha}2 (31Go). These data were consistent among multiple (3/4) examined transgenic lines.



View larger version (65K):
[in this window]
[in a new window]
 
Figure 2. MCS+9.7 demonstrates regulatory control consistent with RET expression. (A) LacZ expression in the external gut loop of a 12.5 dpc embryo. (B) Expression of Ret in the external gut is indicated by in situ hybridization to Ret antisense probe. (C) LacZ staining in forebrain and DRG is indicated in a 12.5 dpc whole-mount embryo. (D) Expression of ret in whole mount 12.5 dpc embryo as detected by in situ hybridization. Forebrain and DRG are indicated. (E) Medial view of LacZ expression in the thoracic DRG sagitally bisected in 14.5 dpc embryo. (F) Transversally bisected view of stained thoracic motorneuron columns and DRG in 14.5 dpc embryo. (G) Medial view of 14.5 dpc embryo head sagitally bisected. Prominent staining corresponds to trigeminal ganglia (V) and optic ganglia (II). Staining is also indicated in forebrain. (H) Limb staining in 14.5 dpc embryo. Strongest LacZ espression is localized to the mesenchyme between the digits. (I) Ventral view of 12.5 dpc whole-mount embryo. Staining in forebrain, limb and neural epithelium is indicated. (J) Corresponding in situ hybridization with ret antisense probe. fb, forebrain; drg, dorsal root ganglia; lb, limb; ne, nasal epithelium; g, gut; mn, motorneurons; v, trigeminal cranial ganglia (V); ii, optic cranial ganglia (II).

 

    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 MATERIALS AND METHODS
 SUPPLEMENTARY MATERIAL
 ACKNOWLEDEGMENT
 REFERENCES
 
Until recently, sequences have been broadly categorized as genic or non-genic and coding or non-coding. However, these definitions are inadequate to provide complete functional annotation of genomes. Comparisons of orthologous sequences between genomes have already uncovered regions of predicted functional constraint. Such regions comprise known genes, previously unknown protein-coding genes, regulatory RNAs and non-coding regulatory sequences. Recent analysis of the complete human and mouse genomes demonstrated that 40% of their sequences could be aligned at the nucleotide level but only 5% appeared to be under active selection and therefore predicted to be functional (2Go). Notably, conserved non-coding elements comprise more than twice as much of the human genome as protein coding sequences (2Go,8Go). Importantly, the vast numbers of conserved non-coding sequences makes comprehensive determination of their function particularly challenging. An, as yet, undetermined fraction of these sequences contribute to control of temporal, spatial and quantitative aspects of gene expression (3Go). Despite their predicted role in common inherited human disorders, our ability to associate non-coding variation with disease is hampered by an incomplete understanding of the identity and composition of regulatory sequences. Systematic functional evaluation of conserved non-coding sequences will represent a significant step towards understanding this role.

Consistent with previous reports, we demonstrate that sequences conserved in multiple mammals are frequently regulatory (6Go,10Go,33Go,34Go). Our data suggest that most MCS amplicons examined at RET function as enhancers often in a tissue-dependent manner. The vast majority of examined sequences enhanced luciferase expression in neuronal cells (Neuro-2A) but not in epidermal cells (HeLa), consistent with RET expression. Furthermore, our data suggest that a single MCS amplicon may function as an enhancer in one cell type, yet repress expression in another. By selecting informative cell types, we demonstrate that both enhancer and suppressor functions may be readily discerned in vitro. Although these assays may miss many of the subtleties of in vivo function, their utility in determining tissue-dependent behavior of examined sequences is clear. However, examination of the biological or disease relevance of any regulatory MCS ultimately necessitates its evaluation in vivo.

MCS +9.7: LacZ transgenic mouse lines generated in this study display regulatory control of reporter expression in the PNS, CNS and excretory systems, and in the limb, consistent with the endogenous Ret gene. Furthermore, we demonstrate that MCS +9.7 is independently capable of regulating many aspects of RET-like expression; specifically, this element drives expression in the ENS, consistent with RET expression patterns and with a predicted role for a mutation in this sequence in HSCR susceptibility. However, given the number of regulatory MCS amplicons at this locus and the fine spatial and temporal regulation of RET in discrete cell subpopulations, we predict that many other sequences at RET also play complementary and/or cooperative regulatory roles. Critically, we have now demonstrated the potential disease relevance of MCS +9.7 through human genetic, in vitro and in vivo analyses. We are presently undertaking experiments to fully evaluate the biological impact of the HSCR-associated variant identified therein.

There is a well-recognized need for rapid functional screens of non-coding sequences. Importantly, although several recent reports have demonstrated that sequence comparisons at the vertebrate extremes provide a powerful filter for functional sequences, such ultraconserved elements are not abundant in vertebrate genomes and may be found in physical proximity to <1% of human genes (n=156) (35Go). This suggests that most vertebrate regulatory sequences may not be detectably conserved across large evolutionary distances (humans to teleosts). Rather, comparison of sequence orthologs from multiple more closely related species (mammals only/teleosts only) may prove to be a more sensitive approach for identifying vertebrate regulatory sequences (22Go). Consistent with this prediction, our data suggest that sequences identified in this way are also frequently functional. Furthermore, identified MCSs may be evaluated in combination, and critical sequences therein may subsequently be dissected.

Our data demonstrate the utility of examining multiple MCSs in combination in order to decrease the numbers of analyses required to prioritize sequences for subsequent molecular and in vivo investigation. These data suggest that most amplicons encompassing conserved non-coding sequences that are identified under established criteria are frequently regulatory. Furthermore, tissue-dependent regulatory control may be inferred from in vitro analysis in appropriate cell lines, and candidate TFBS may be identified by molecular investigation. However, it should be noted that the activity of an MCS amplicon may reflect the activity of a single MCS or the additive or synergistic effects of several MCSs therein. Thus, further functional evaluation of MCS amplicons will be necessary to identify specific sequences or motifs responsible for their regulatory and/or disease potential. In summary, we demonstrate the power of combining in silico, in vitro and molecular analysis for the identification of regulatory sequences at any locus, ultimately to determine the biological and/or disease relevance of selected sequences in vivo. Critically, these data suggest that regulatory non-coding sequences are not restricted to those conserved at the vertebrate extremes.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 MATERIALS AND METHODS
 SUPPLEMENTARY MATERIAL
 ACKNOWLEDEGMENT
 REFERENCES
 
Generation of luciferase reporter constructs
Eighteen MCS regions, encompassing a total of 45 MCSs were PCR amplified from human genomic DNA using PCR primers incorporating Gateway® attB sequences (sequences specified in Supplementary Material, Table S1). Each MCS amplicon was cloned into pDONR221TM, a Gateway entry vector, per manufacturer's protocol. Amplicons were then subcloned into a SmaI site in a Gateway modified pGL3 (Promega, Madison WI, USA) firefly luciferase vector containing an SV40 promoter and complete firefly luciferase open-reading frame (pDSma). Plasmids containing only the SV40 promoter driving luciferase (pDSma_promoter), the SV40 promoter and enhancer driving luciferase (pDSma_control) and containing only the luciferase ORF (pDSma_basic) served as experimental control vectors.

Transfection of reporter constructs
Transient transfections were performed using neuroblastoma (Neuro-2A, ATCC no. CCL-131) and HeLa cell lines (ATCC no. CRL-13011). Cell lines were cultured according to ATCC protocols (http://www.atcc.org). Approximately 105 cells were co-transfected (Lipofectamine PlusTM, Invitrogen) with 0.4 µg of the appropriate pDSma firefly luciferase plasmid (pDSma_promoter, pDSma_control or pDSma_RET_MCS*) and 0.01 µg phRL-SV40 control renilla luciferase plasmid. Dual Luciferase® assays (Promega) were performed in accordance with manufacturer's instructions. Luciferase activity was assayed 24 h after transfection (Victor3TM plate reader, Perkin Elmer; Monolight® 2010, Analytical Luminescence Laboratories, CA, USA). All assays were conducted in triplicate and were consistent upon repetition. Relative luciferase units (RLU) were calculated for each transfection and fold change from pDSma_promoter RLU was estimated. Fold change values of each construct are reported with corresponding standard errors (Fig. 1A).

TFBS identification
MCS sequences were queried for TFBSs using TESS (Transcription Element Search Site, URL: http://www.cbil.upenn.edu/tess). TRANSFAC 4.0 strings were searched and all results with a maximum allowable string mismatch (tmm) of 10%, a minimum log-likelihood ratio score (ts-a) >14.0 and a minimum string length (tw) >6 were considered. Filters were applied to restrict predictions to mammalian species.

Electrophoretic mobility shift assay
Nuclear proteins were extracted from Neuro-2A and HeLa cells using NE-PER® Nuclear and Cytoplasmic Extraction Kit (Pierce Biotechnologies, Rockford IL, USA). Oligos (30–50mer) within MCSs were designed based on level of conservation and relevant TFBSs present. Oligos were end labeled with biotin-ddUTP (Pierce Biotechnology) and annealed per manufacturer's protocol. About 4 fmol of labeled oligo was incubated for 20 min at room temperature with 1xbinding buffer, 50 ng/µl poly dI–dC and 4 µl Neuro-2A nuclear extract (5 µl for HeLa nuclear extract). About 20 pmol (5000-fold) unlabeled oligo was added to competition reactions. Gel shifts were detected using the LightShift® Chemiluminescent EMSA kit (Pierce Biotechnology) after transfer from 15% acrylamide gel to nylon membrane.

Mouse transgenic reporter assay
All animal studies were performed under protocols approved by the Johns Hopkins University Animal Care and Use Committee. MCS amplicons were subcloned into the Gateway ready vector phsp68/LacZ (a kind gift of Dr E.M. Rubin, LBNL). The constructs were purified and injected into mouse pronuclei by the Johns Hopkins University Transgenic Core. G0 mice were genotyped by PCR with LacZ-specific primers and MCS-specific primers. Positive G0 and F1 males were mated to CD1 females; 12:00 p.m. of the day that vaginal plugs were observed was defined as 0.5 dpc. Embryos were harvested at specified time points in cold PBS. Transgenic embryos were identified by PCR of yolk sac DNA using LacZ primers (forward primer: TTT CCA TGT TGC CAC TCG C, reverse primer: AAC GGC TTG CCG TTC AGC A). Transgenic embryos were assayed for ß-gal using 5-bromo-4-chloro-3-indolyl-ß-D-galactoside (UltrapureTM X-gal; Sigma) as described (36Go).

In situ hybridization
Wild-type mouse embryos were harvested from timed matings established with CD1 mice. Non-radioactive whole-mount in situ hybridization was performed as described (37Go). Digoxigenin-labeled antisense probes were made with template 2.5 kb Ret (pmcRet7 NotI T7 RNA polymerase).


    SUPPLEMENTARY MATERIAL
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 MATERIALS AND METHODS
 SUPPLEMENTARY MATERIAL
 ACKNOWLEDEGMENT
 REFERENCES
 
Supplementary Material is available at HMG Online.


    ACKNOWLEDEGMENT
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 MATERIALS AND METHODS
 SUPPLEMENTARY MATERIAL
 ACKNOWLEDEGMENT
 REFERENCES
 
This study was supported by a grant from the National Institute of Child Health and Development and by a Basil O'Conner starter scholar award from the March of Dimes.

Conflict of Interest statement. None declared.


    FOOTNOTES
 
{dagger} hg17, chr10: 4275481043104810 (human); Mm3, chr6:118646816119036816 (mouse); AC125509 and AC125512 (baboon); AC124166 (cat); AC138567 (chicken); RP43-171H18 (chimpanzee); AC124163 and AC124164 (cow); AC123973 (dog); AC124911 and AC125500 (fugu); AC122156 and AC124165 (pig); AC114881 (rat); AC135546 (tetraodon) and AC124155 (zebrafish) Back


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 MATERIALS AND METHODS
 SUPPLEMENTARY MATERIAL
 ACKNOWLEDEGMENT
 REFERENCES
 

  1. Kimura, M. and Ota, T. (1971) On the rate of molecular evolution. J. Mol. Evol., 1, 1–17.[CrossRef][Medline]

  2. Waterston, R.H., Lindblad-Toh, K., Birney, E., Rogers, J., Abril, J.F., Agarwal, P., Agarwala, R., Ainscough, R., Alexandersson, M., An, P. et al. (2002) Initial sequencing and comparative analysis of the mouse genome. Nature, 420, 520–562.[CrossRef][Medline]

  3. Pennacchio, L.A. and Rubin, E.M. (2001) Genomic strategies to identify mammalian regulatory sequences. Nat. Rev. Genet., 2, 100–109.[CrossRef][Web of Science][Medline]

  4. Pastinen, T. and Hudson, T.J. (2004) Cis-acting regulatory variation in the human genome. Science, 306, 647–650.[Abstract/Free Full Text]

  5. Oeltjen, J.C., Malley, T.M., Muzny, D.M., Miller, W., Gibbs, R.A. and Belmont, J.W. (1997) Large-scale comparative sequence analysis of the human and murine Bruton's tyrosine kinase loci reveals conserved regulatory domains. Genome Res., 7, 315–329.[Abstract/Free Full Text]

  6. Loots, G.G., Locksley, R.M., Blankespoor, C.M., Wang, Z.E., Miller, W., Rubin, E.M. and Frazer, K.A. (2000) Identification of a coordinate regulator of interleukins 4, 13, and 5 by cross-species sequence comparisons. Science, 288, 136–140.[Abstract/Free Full Text]

  7. Pennacchio, L.A., Olivier, M., Hubacek, J.A., Cohen, J.C., Cox, D.R., Fruchart, J.C., Krauss, R.M. and Rubin, E.M. (2001) An apolipoprotein influencing triglycerides in humans and mice revealed by comparative sequencing. Science, 294, 169–173.[Abstract/Free Full Text]

  8. Thomas, J.W., Touchman, J.W., Blakesley, R.W., Bouffard, G.G., Beckstrom-Sternberg, S.M., Margulies, E.H., Blanchette, M., Siepel, A.C., Thomas, P.J., McDowell, J.C. et al. (2003) Comparative analyses of multi-species sequences from targeted genomic regions. Nature, 424, 788–793.[CrossRef][Medline]

  9. Kellis, M., Patterson, N., Endrizzi, M., Birren, B. and Lander, E.S. (2003) Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature, 423, 241–254.[CrossRef][Medline]

  10. Frazer, K.A., Tao, H., Osoegawa, K., de Jong, P.J., Chen, X., Doherty, M.F. and Cox, D.R. (2004) Noncoding sequences conserved in a limited number of mammals in the SIM2 interval are frequently functional. Genome Res., 14, 367–372.[Abstract/Free Full Text]

  11. Pribnow, D. (1975) Nucleotide sequence of an RNA polymerase binding site at an early T7 promoter. Proc. Natl Acad. Sci. USA, 72, 784–788.[Abstract/Free Full Text]

  12. Emorine, L., Kuehl, M., Weir, L., Leder, P. and Max, E.E. (1983) A conserved sequence in the immunoglobulin J kappa–C kappa intron: possible enhancer element. Nature, 304, 447–449.[CrossRef][Medline]

  13. Woolfe, A., Goodson, M., Goode, D.K., Snell, P., McEwen, G.K., Vavouri, T., Smith, S.F., North, P., Callaway, H., Kelly, K. et al. (2005) Highly conserved non-coding sequences are associated with vertebrate development. PLoS Biol., 3, e7.[CrossRef][Medline]

  14. McCallion, A.S., Emison, E.S., Kashuk, C.S., Bush, R.T., Kenton, M., Carrasquillo, M.M., Jones, K.W., Kennedy, G.C., Portnoy, M.E., Green, E.D. et al. (2003) Genomic variation in multigenic traits: Hirschsprung disease. Cold Spring Harb. Symp. Quant. Biol., 68, 373–381.[Medline]

  15. Emison, E.S., McCallion, A.S., Kashuk, C.S., Bush, R.T., Grice, E., Lin, S., Portnoy, M.E., Cutler, D.J., Green, E.D. and Chakravarti, A. (2005) A common sex-dependent mutation in a RET enhancer underlies Hirschsprung disease risk. Nature, 434, 857–863.[CrossRef][Medline]

  16. Stenson, P.D., Ball, E.V., Mort, M., Phillips, A.D., Shiel, J.A., Thomas, N.S., Abeysinghe, S., Krawczak, M. and Cooper, D.N. (2003) Human Gene Mutation Database (HGMD): 2003 update. Hum. Mutat., 21, 577–581.[CrossRef][Web of Science][Medline]

  17. McCallion, A.S. and Chakravarti, A. (2004) In Epstein, C., Erickson, R. and Wynshaw-Boris, A. (eds), Inborn Errors of Development. Oxford University Press, San Francisco, Vol. 23, pp. 335–338.

  18. Carrasquillo, M.M., McCallion, A.S., Puffenberger, E.G., Kashuk, C.S., Nouri, N. and Chakravarti, A. (2002) Genome-wide association study and mouse model identify interaction between RET and EDNRB pathways in Hirschsprung disease. Nat. Genet., 32, 237–244.[CrossRef][Web of Science][Medline]

  19. Bray, N., Dubchak, I. and Pachter, L. (2003) AVID: a global alignment program. Genome Res., 13, 97–102.[Abstract/Free Full Text]

  20. Frazer, K.A., Pachter, L., Poliakov, A., Rubin, E.M. and Dubchak, I. (2004) VISTA: computational tools for comparative genomics. Nucleic Acids Res., 32, W273–W279.[Abstract/Free Full Text]

  21. Mayor, C., Brudno, M., Schwartz, J.R., Poliakov, A., Rubin, E.M., Frazer, K.A., Pachter, L.S. and Dubchak, I. (2000) VISTA: visualizing global DNA sequence alignments of arbitrary length. Bioinformatics, 16, 1046–1047.[Abstract/Free Full Text]

  22. Margulies, E.H., Blanchette, M., Haussler, D. and Green, E.D. (2003) Identification and characterization of multi-species conserved sequences. Genome Res., 13, 2507–2518.[Abstract/Free Full Text]

  23. Cooper, G.M., Brudno, M., Green, E.D., Batzoglou, S. and Sidow, A. (2003) Quantitative estimates of sequence divergence for comparative analyses of mammalian genomes. Genome Res., 13, 813–820.[Abstract/Free Full Text]

  24. Siepel, A., Bejerano, G., Pedersen, J.S., Hinrichs, A.S., Hou, M., Rosenbloom, K., Clawson, H., Spieth, J., Hillier, L.W., Richards, S. et al. (2005) Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res., 15, 1034–1050.[Abstract/Free Full Text]

  25. Dermitzakis, E.T., Reymond, A. and Antonarakis, S.E. (2005) Conserved non-genic sequences—an unexpected feature of mammalian genomes. Nat. Rev. Genet., 6, 151–157.[CrossRef][Web of Science][Medline]

  26. Griseri, P., Bachetti, T., Puppo, F., Lantieri, F., Ravazzolo, R., Devoto, M. and Ceccherini, I. (2005) A common haplotype at the 5' end of the RET proto-oncogene, overrepresented in Hirschsprung patients, is associated with reduced gene expression. Hum. Mutat., 25, 189–195.[Medline]

  27. Pelet, A., de Pontual, L., Clement-Ziza, M., Salomon, R., Mugnier, C., Matsuda, F., Lathrop, M., Munnich, A., Feingold, J., Lyonnet, S. et al. (2005) Homozygosity for a frequent and weakly penetrant predisposing allele at the RET locus in sporadic Hirschsprung disease. J. Med. Genet., 42, e18.[Free Full Text]

  28. Burzynski, G.M., Nolte, I.M., Bronda, A., Bos, K.K., Osinga, J., Plaza Menacho, I., Twigt, B., Maas, S., Brooks, A.S., Verheij, J.B. et al. (2005) Identifying candidate Hirschsprung disease-associated RET variants. Am. J. Hum. Genet., 76, 850–858.[Medline]

  29. Pachnis, V., Mankoo, B. and Costantini, F. (1993) Expression of the c-ret proto-oncogene during mouse embryogenesis. Development, 119, 1005–1017.[Abstract]

  30. Durbec, P.L., Larsson-Blomberg, L.B., Schuchardt, A., Costantini, F. and Pachnis, V. (1996) Common origin and developmental dependence on c-ret of subsets of enteric and sympathetic neuroblasts. Development, 122, 349–358.[Abstract]

  31. Golden, J.P., DeMaro, J.A., Osborne, P.A., Milbrandt, J. and Johnson, E.M., Jr (1999) Expression of neurturin, GDNF, and GDNF family-receptor mRNA in the developing and mature mouse. Exp. Neurol., 158, 504–528.[CrossRef][Web of Science][Medline]

  32. Le Douarin, N. and Kalcheim, C. (1999) The Neural Crest. Cambridge University Press, Cambridge, UK.

  33. Nobrega, M.A., Ovcharenko, I., Afzal, V. and Rubin, E.M. (2003) Scanning human gene deserts for long-range enhancers. Science, 302, 413.[Free Full Text]

  34. Martin, N., Patel, S. and Segre, J.A. (2004) Long-range comparison of human and mouse Sprr loci to identify conserved noncoding sequences involved in coordinate regulation. Genome Res., 14, 2430–2438.[Abstract/Free Full Text]

  35. Bejerano, G., Pheasant, M., Makunin, I., Stephen, S., Kent, W.J., Mattick, J.S. and Haussler, D. (2004) Ultraconserved elements in the human genome. Science, 304, 1321–1325.[Abstract/Free Full Text]

  36. Jackson, I.J. and Abbott, C.M. (eds) (2000) Mouse Genetics and Transgenics: a Practical Approach. Oxford University Press, Oxford, New York.

  37. Correia, K.M. and Conlon, R.A. (2001) Whole-mount in situ hybridization to mouse embryos. Methods, 23, 335–338.[CrossRef][Medline]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Genome ResHome page
Y. Cheng, D. C. King, L. C. Dore, X. Zhang, Y. Zhou, Y. Zhang, C. Dorman, D. Abebe, S. A. Kumar, F. Chiaromonte, et al.
Transcriptional enhancement by GATA1-occupied DNA segments is strongly associated with evolutionary constraint on the binding site motif
Genome Res., December 1, 2008; 18(12): 1896 - 1905.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
D. M. McGaughey, R. M. Vinton, J. Huynh, A. Al-Saif, M. A. Beer, and A. S. McCallion
Metrics of sequence constraint overlook regulatory sequences in an exhaustive analysis at phox2b
Genome Res., February 1, 2008; 18(2): 252 - 260.
[Abstract] [Full Text] [PDF]


Home page
J. Med. Genet.Home page
L de Pontual, A Pelet, D Trochet, F Jaubert, Y Espinosa-Parrilla, A Munnich, J-F Brunet, C Goridis, J Feingold, S Lyonnet, et al.
Mutations of the RET gene in isolated and syndromic Hirschsprung's disease in human disclose major and modifier alleles at a single locus
J. Med. Genet., May 1, 2006; 43(5): 419 - 423.
[Abstract] [Full Text] [PDF]


Home page
ScienceHome page
S. Fisher, E. A. Grice, R. M. Vinton, S. L. Bessling, and A. S. McCallion
Conservation of RET Regulatory Function from Human to Zebrafish Without Sequence Similarity
Science, April 14, 2006; 312(5771): 276 - 279.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Material
Right arrow All Versions of this Article:
14/24/3837    most recent
ddi408v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (8)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Grice, E. A.
Right arrow Articles by McCallion, A. S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Grice, E. A.
Right arrow Articles by McCallion, A. S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?