Skip Navigation


Human Molecular Genetics Advance Access originally published online on July 21, 2005
Human Molecular Genetics 2005 14(17):2533-2546; doi:10.1093/hmg/ddi257
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
14/17/2533    most recent
ddi257v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (28)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Sironi, M.
Right arrow Articles by Pozzoli, U.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Sironi, M.
Right arrow Articles by Pozzoli, U.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oupjournals.org

Analysis of intronic conserved elements indicates that functional complexity might represent a major source of negative selection on non-coding sequences

Manuela Sironi1, Giorgia Menozzi1, Giacomo P. Comi2, Rachele Cagliani1, Nereo Bresolin1,2 and Uberto Pozzoli1,*

1Scientific Institute IRCCS E. Medea, Via Don Luigi Monza 20, 23842 Bosisio Parini (LC), Italy and 2Centro Dino Ferrari, Dipartimento di Scienze Neurologiche, Università di Milano, IRCCS Fondazione Ospedale Maggiore Policlinico, Mangiagalli e Regina Elena, Via Francesco Sforza 35, 20102 Milan, Italy

* To whom correspondence should be addressed. Tel: +39 31877915; Fax: +39 31877499; Email: upozzoli{at}bp.lnf.it

Received May 20, 2005; Accepted July 14, 2005


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 MATERIALS AND METHODS
 SUPPLEMENTARY MATERIAL
 REFERENCES
 
The non-coding portion of human genome is punctuated by a large number of multispecies conserved sequence (MCS) elements with largely unknown function. We demonstrate that MCSs are unevenly distributed in human introns with the majority of relatively short introns (<9 kb long) displaying no or a few MCSs and that MCS density reaching up to 10% of total size in longer introns. After correction for intron length, MCSs were found to be enriched within genes involved in development and transcription, whereas depleted in immune response loci. Moreover, many central nervous system tissues show a preferential expression of MCS-rich genes and MCS enrichment significantly correlates with gene functional complexity in terms of distinct protein domains. Analysis of human–mouse orthologous pairs indicated a significant association between intronic MCS density and conservation of protein sequence, promoter regions and untranslated sequences. Moreover, MCS density correlates with the predicted occurrence of human–mouse conserved alternative splicing events. These observations suggest that evolution acts on human genes as integrated units of coding and regulatory capacity and that functional complexity might represent a major source of negative selection on non-coding sequences. To substantiate our result, we also searched previously experimentally identified intronic regulatory elements and indicate that about half of these sequences map to an MCS; in particular, support to the notion whereby mutations in MCSs can result in human genetic diseases is provided, because three previously identified intronic pathological variations were found to occur within MCSs, and human disease and cancer genes were found significantly enriched in MCSs.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 MATERIALS AND METHODS
 SUPPLEMENTARY MATERIAL
 REFERENCES
 
The non-coding portion of human genome amounts to ~99% (1Go) and a number of recent studies (reviewed in 2Go,3Go) have shown that this vast nucleotide puzzle is punctuated by multispecies conserved sequence (MCS) elements. These sequences are frequently shared among diverse mammalian species and many of them can be traced back to more distantly related organisms such as birds and fishes. Conversely, an extremely low sequence conservation outside coding regions has been identified when vertebrates and invertebrates genomes have been aligned (4Go,5Go). This latter observation allows itself to the interpretation that MCSs might represent a vertebrate-specific feature underlying genome or organism complexity.

Despite the extensive efforts, the functional significance of these sequence elements is largely unknown. Recent studies indicated that at least a portion of MCSs might act as transcriptional regulators of nearby genes (5Go–10Go), with some of them also being able to elicit gene reporter expression (6Go,7Go,10Go). In addition, distinct gene function categories have been associated (9Go) with gene deserts displaying high MCS densities, suggesting a role for non-random MCS distribution. In other instances, MCSs have been shown to function in alternative splicing regulation (11Go). Conversely, other authors (12Go) have suggested that only a few of MCSs might act as cis-acting gene regulators, because analysis of chromosome 21 indicated that MCS divergence and substitution pattern are independent of intergenic size or distance from nearby genes. MCSs located within intronic regions have attracted comparatively less attention when compared with intergenic MCSs. Still, introns cover about one-quarter of the human genome (1Go) and the concept of these sequences as mere junk has been confronted with increasing evidences suggesting their functional role in gene regulation and genome architecture.

The recent availability of multiple genomic sequences and the development of comparative algorithms allow a genome-wide identification of MCSs; at the same time, a great wealth of gene annotation databases can be exploited for mining significant associations.

By integrating these resources, we demonstrate that MCSs are unevenly distributed in human introns, depending on intron size, gene function, expression pattern and presence of alternative splicing events. Moreover, we provide analysis of previously identified intronic regulatory elements and indicate that about half of these sequences map to an MCS; in particular, support to the notion whereby mutations in MCS can result in human genetic diseases is provided.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 MATERIALS AND METHODS
 SUPPLEMENTARY MATERIAL
 REFERENCES
 
MCS identification and distribution analysis
An intron database was created as described in Materials and Methods: a total of 81 549 human introns were analyzed (55 553 mouse introns) amounting to 456 genomic megabases. MCSs were identified using phastCons predictions (13Go,14Go), which derive from human/chimpanzee/mouse/rat/dog/chicken/pufferfish/zebrafish multiple alignments. A total of 238 005 human MCSs were retrieved (average frequency=0.522 kb) and 39.6% of human introns were found to contain at least one MCS. These sequences cover, on average, 3.99% of total intron length and their mean length amounts to 76 bp.

To analyze MCS distribution, introns were divided into length classes and MCS densities (conserved sequence length/intron length) were calculated. Data are reported in Figure 1 and indicate that although variability in MCS density is observed for all intron length classes, a gradual shift toward higher MCS densities is observed with intron size increase. Extremely similar results were obtained when the same calculations were performed for mouse introns (Supplementary Material, Fig. S1).



View larger version (17K):
[in this window]
[in a new window]
 
Figure 1. MCS distribution in human intron length classes. MCS density was calculated after partition of introns in 10 length classes (1–10 on x-axis). In particular (Materials and Methods), introns were ranked according to their size and subsequently clustered to analyze, for each size class, the same absolute nucleotide number. Intron length intervals were as follows: 1, 6–2476; 2, 2477–5062; 3, 5063–9147; 4, 9148–15 666; 5, 15 667–26 179; 6, 26 180–41 774; 7, 41 775–66 914; 8, 66 915–110 044; 9, 110 045–190 058; 10, 190 059–104 3911.

 
Mapping MCSs onto experimentally verified regulatory regions
To gain insight into the biological significance of retrieved MCSs, we applied a direct, although not comprehensive, approach. We investigated whether previously experimentally identified intronic regulatory sequences matched with MCS locations. We, therefore, inspected the literature in search of experimentally verified human intronic sequences of established physiological significance and verified whether they partially or totally overlapped any phastCons element. A summary of this search is reported in Table 1 and indicates that out of 19 and 25 intronic splicing or transcription regulatory elements, 8 and 12, respectively, mapped to a phastCons sequence. One element constituted a conserved sequence with a role in adenosine deaminase-dependent RNA editing and mapped to an MCS. In seven instances, at least one mutation has been described, which affects one regulatory element and causes (or predisposes to) a human disease; three such mutations occur within phastCons functional elements.


View this table:
[in this window]
[in a new window]
 
Table 1. Mapping of regulatory elements onto phastCons elements
 
MCSs are differentially represented depending on gene ontology
We next wished to identify those genes that are extremely rich or depleted in MCSs; because, as reported earlier, MCS density increases with intron size, we used MCS density calculations in intron length classes to sort out genes with higher or lower than expected MCS densities. For each gene, the expected MCS density was calculated on the basis of its individual intron lengths and then compared to the observed MCS density; in particular, for each intron, the expected MCS density (MCSexp) is considered to be equal to the average density of the intron length class it belongs to. The normalized difference (MCSndev) between observed and expected densities for each gene is then defined as follows:

where dMCSobs and dMCSexp are the observed and expected gene MCS densities (see Materials and Methods for further details).

To investigate whether any preferential association existed between functional categories and MCS representation, human genes were classified into two groups depending on MCS density: genes displaying three times more or less MCS than expected (MCSndev>0.5 or MCSndev<–0.5) were classified as MCS rich (656 genes) or poor (2634 genes), respectively. We next used GeneMerge (60Go) to retrieve significant associations; database annotations for the three categories designated by the Gene Ontology (GO) Consortium (molecular function, biological process and cellular component) were employed. Correction for multiple tests was applied to all statistical analyses. Significant associations with one or more ontology terms were identified for the three categories (Table 2; Supplementary Material, Table S1). As the whole hierarchy of ontology terms was used to identify significant associations, nested ontology categories were retrieved, which are often accounted for by the same gene sets (Supplementary Material, Table S1).


View this table:
[in this window]
[in a new window]
 
Table 2. Association of GO terms with MCS-rich and -poor genes
 
In the MCS-rich group, genes involved in development/morphogenesis are significantly over-represented (biological process category); the same holds true for genes having a role in transcription and transcription regulation (both biological process and molecular function categories), as well as in nucleic acid metabolism. Coherently, the cellular component categorization identified gene products that localize to the nucleus as over-represented in this gene set. In addition, genes coding for ephrin receptors were significantly represented, upon classification, in molecular function categories. With respect to MCS-poor genes, over-represented biological process GO terms relate to a broad category that can be roughly described as response to stimulus/defense response/immunity. In particular, the majority of identified molecular function categories are related to protease inhibitors. Moreover, genes coding for structural ribosome components and molecules involved in oxidoreductase activity were enriched in this gene set. Closer examination of single genes indicated that the majority of them encode mitochondrial ribosome constituents or electron transporters, a finding which is consistent with retrieved terms (ribosome and mitochondrion) in the cellular component categorization.

These same analyses gave similar results when mouse genes were analyzed (Supplementary Material, Table S2); the main difference being accounted for by the over-representation of genes involved in protein catabolism and proteolysis among MCS-poor genes.

MCS density correlates with structural complexity
We wished to verify whether higher gene product complexity in terms of different protein domains correlated with MCS density. We, therefore, analyzed the number of distinct non-redundant InterPro domains associated with each entry in our data set, which displayed at least one InterPro description. A significant trend was observed for gene products associated with a higher number of unique InterPro domains to display higher MCSndev (Kruskall–Wallis P<10–5; Supplementary Material, Fig. S2A). This result is not biased by longer gene size or higher exon number in MCS-rich genes, because the latter are, on average, shorter and display a lower exon number compared with MCS-poor genes and with all other genes in the data set (Supplementary Material, Fig. S2B and C).

To verify whether any specific InterPro domain was enriched among genes displaying high or low MCSndev, we used GeneMerge to retrieve significant associations with InterPro domains. As summarized in Table 3, for both gene groups, significantly over-represented InterPro entries were identified (again Bonferroni correction was applied). In agreement with results from GO associations, genes displaying homeobox domains and helix–turn–helix motifs (involved in development and transcription, respectively) are over represented among MCS-dense genes. The same holds true for genes coding for ephrin family members.


View this table:
[in this window]
[in a new window]
 
Table 3. Association of InterPro domains with MCS-rich and -poor genes
 
As far as MCS-poor genes are concerned, the association pattern is more complicated, although it parallels, to some extent, GO term association results; in addition to already described associations (those related to defense response and oxidoreductase activity), genes coding for tumor antigens such as MAGE and GAGE result in a significant association. Moreover, an association was found with two InterPro domains characteristic of transcription factors, namely, Zn-finger and KRAB (Kruppel-associated box). Closer examination revealed that these two associations are accounted for by largely overlapping gene sets (Supplementary Material, Table S1), which map to paralogous gene clusters on chromosomes 19, 12 and X. These findings are in line with previous reports indicating that KRAB–ZNF proteins are accounted for by hundreds of family members organized in clusters and poorly conserved among vertebrates (61Go); indeed, no clear KRAB–ZNF gene has been identified in Fugu (62Go) and analysis of human–mouse KRAB–ZNF genes has shown that duplication and loss of ancestral cluster members has occurred independently in the two species (61Go).

MCSs are differentially represented depending on gene expression pattern
To verify whether any correlation existed between intronic MCS density and expression level or pattern, we compared mean expression level (averaged over all tissues analyzed) and expression breadth (number of tissues in which a given gene is expressed) between MCS-rich and -poor genes: no significant difference was found. The same results were obtained when mouse genes were analyzed.

We then investigated whether variations in MCS density existed, depending on gene expression pattern; for any tissue, we calculated the differences between median expression levels of MCS-rich versus MCS-poor genes. Results for human genes are represented in Figure 2A and indicate that in most tissues, within the nervous system, MCS-rich genes are expressed at significantly higher levels (rank-sum test P<0.01 or <0.05, black and gray bars, respectively) when compared with MCS-poor genes, hypothalamus being the only exception among brain regions. The same holds true for genes expressed in skin, prostate, heart, uterus and lung. Conversely, all tested bone marrow cells and most peripheral blood cells behave in the opposite manner, with MCS-rich genes displaying significantly lower expression levels.



View larger version (31K):
[in this window]
[in a new window]
 
Figure 2. MCS differential representation in human tissue-specific genes. (A) Differences between median expression levels of MCS-rich versus MCS-poor genes ({Delta}Expression). Statistical evaluation was performed using the rank-sum test (P<0.01 and <0.05, black and gray bars, respectively). (B) Differences between median MCSndev for each tissue and median MCSndev for all tissue-specific genes ({Delta}MCSndev), using rank-sum test (P<0.01 and <0.05, black and gray bars, respectively). For this analysis, only genes expressed in a given tissue and in less than one-quarter of the total number of tissues were considered, as previously suggested (63Go). Tissue numbers are as follows: 01, fetal brain; 02, whole brain; 03, temporal lobe; 04, parietal lobe; 05, occipital lobe; 06, prefrontal cortex; 07, cingulate cortex; 08, cerebellum; 09, cerebellum peduncles; 10, amygdala; 11, hypothalamus; 12, thalamus; 13, subthalamic nucleus; 14, caudate nucleus; 15, globus pallidus; 16, olfactory bulb; 17, pons; 18, medulla oblongata; 19, spinal cord; 20, ciliary ganglion; 21, trigeminal ganglion; 22, superior cervical ganglion; 23, dorsal root ganglion; 24, thymus; 25, tonsil; 26, lymph node; 27, bone marrow; 28, BM-CD71+ early erythroid; 29, BM-CD33+ myeloid; 30, BM-CD105+ endothelial; 31, BM-CD34+; 32, whole blood; 33, PB-BDCA4+ dentritic cells; 34, PB-CD14+ monocytes; 35, PB-CD56+ NKCells; 36, PB-CD4+ T-cells; 37, PB-CD8+ T-cells; 38, PB-CD19+ B-cells; 39, leukemia lymphoblastic (molt4); 40, 721 B lymphoblasts; 41, lymphoma Burkitts Raji; 42, leukemia promyelocytic (hl60); 43, lymphoma Burkitts Daudi; 44, leukemia chronic myelogenous (k562); 45, colorectal adenocarcinoma; 46, appendix; 47, skin; 48, adipocyte; 49, fetal thyroid; 50, thyroid; 51, pituitary gland; 52, adrenal gland; 53, adrenal cortex; 54, prostate; 55, salivary gland; 56, pancreas; 57, pancreatic islets; 58, atrioventricular node; 59, heart; 60, cardiac myocytes; 61, skeletal muscle; 62, tongue; 63, smooth muscle; 64, uterus; 65, uterus corpus; 66, trachea; 67, bronchial epithelial cells; 68, fetal lung; 69, lung; 70, kidney; 71, fetal liver; 72, liver; 73, placenta; 74, testis; 75, testis Leydig cell; 76, testis germ cell; 77, testis interstitial; 78, testis seminiferous tubule; and 79, ovary.

 
We next wished to verify whether differences existed in MCS density among tissue-specific genes. To this aim, we considered those genes expressed in a given tissue and in less than one-quarter of the total number of tissues, as previously suggested (63Go). This procedure, although arbitrary, originates gene sets numerous enough for each tissue to allow statistical evaluation of differences.

We, therefore, calculated the median MCSndev for genes expressed in each tissue and the median MCSndev for all tissue-specific genes analyzed. Differences are plotted as histograms in Figure 2B and statistical significance was assessed by applying the rank-sum test. Again, a significant preferential expression of MCS-rich genes was noticed in many nervous system tissues, whereas the opposite situation was observed in fetal liver as well as in most bone marrow and peripheral blood cells in addition to lymph node and tonsil, possibly in line with the participation of the latter tissues in immune functions.

These same analyses were performed for mouse genes (Supplementary Material, Fig. S3) and similar results were obtained. Yet, in the case of mouse, data concerning embryonal gene expression were available and indicated that although no significant deviation in MCSndev was observed for genes expressed in the fertilized egg, blastocyst and for very early developmental stages (6.5–8.5 d.p.c.), genes expressed at stages 9.5 and 10.5 d.p.c. displayed significantly higher MCSndev.

MCS density correlates with conservation of protein sequence, upstream gene regions and alternative splicing in human–mouse orthologous pairs
To evaluate whether MCS density correlated with conservation in coding regions, for any gene in our database, we searched for a murine ortholog that could be unequivocally identified (Materials and Methods): a total of 3582 human–mouse orthologous gene pairs were retrieved. MCSndev was then correlated with either the rate of non-synonymous substitution (dN) or the ratio of non-synonymous/synonymous substitutions (dN/dS); in both cases, a significant negative correlation was detected (r=–0.2715, P<10–60 and –0.20 and P<10–34, respectively).

We next wished to verify whether any relationship existed between MCS density and gene upstream sequence conservation. To this aim, we exploited data deriving from a previously reported analysis of human–mouse orthologous gene conservation in 8 kb genomic regions upstream of coding start sites (64Go). Out of 3055 previously studied genes, 1875 were also present in our study set. Comparison of MCSndev with the number of conserved sequence block in upstream gene sequences resulted in a significant correlation (r=0.33, P<10–6).

Finally, we wished to investigate whether regulation of alternative splicing events might have a role in MCS fixation. We, therefore, took advantage of a previously reported (65Go) set of orthologous exons which are predicted to undergo human–mouse conserved alternative splicing events.

Initially, only genes that were also present in human data set were selected (660 out of 1580 previously reported). Comparison of median MCSndev (Fig. 3A) indicated that genes that display at least one conserved alternative splicing event (median MCSndev=0.0051) have significantly higher MCSndev when compared with all genes in our database (median MCSndev=–0.247; rank-sum test, P<10–6); remarkably, MCSndev progressively increases (Fig. 3B) when genes that display one, two or more than two alternative events were considered (Kruskall–Wallis P<10–6). To better analyze the possible relationship between MCSs and alternative splicing, we selected all introns flanking predicted alternatively spliced exons (whether or not their genes were present in our data set) and all other introns from the same gene. We next searched for MCSs and then compared MCS density in introns located in 5' or 3' of a predicted conserved alternative spliced exons and all other introns extracted from the same genes. Data are reported in Figure 3C and indicate that introns flanking a conserved alternative exon are significantly enriched in MCSs (median MCS densities=0.015, 0.0398 and 0.029 for all introns, introns located in 5', introns located in 3', respectively; rank-sum test P<10–3).



View larger version (16K):
[in this window]
[in a new window]
 
Figure 3. MCS distribution in relation to conserved alternative splicing events. Prediction of alternative splicing events conserved between human and mouse has been previously reported (65Go). (A) Genes (n=660) that are predicted to display at least one conserved alternative splicing event (>0) show significantly higher MCSndev when compared with all genes (n=7614). (B) Median MCSndev progressively increases when genes that are predicted to display none (0; n=6954), one (1; n=529), two (2; n=96) or more than two (>2; n=35) conserved alternative events were considered. (C) Analysis of MCS density (x-axis) for introns (n=17 070) deriving from genes that are predicted to undergo at least one conserved alternative splicing event. Diamonds and asterisks indicate introns that are located in 5' and 3' (n=1467) of an alternatively spliced exon, respectively; circles indicate all introns (n=17 070).

 
MCS density correlates with untranslated region conservation and length
We next retrieved, for each human entry in our database, information concerning 5' and 3' untranslated regions (UTRs). In particular, length and MCS density were calculated (Materials and Methods) for 5' and 3' UTRs; a significant correlation was observed between MCSndev and both UTR length (r=0.14 for both 5' and 3' UTRs; P<10–6) and MCS density (r=0.38 and 0.42 for 5' and 3' UTRs, respectively; P<10–6).

MCSs are over represented in disease and cancer-related genes
To evaluate whether human genes involved in pathological processes displayed any difference in MCS density, we derived disease and cancer genes from the OMIM morbidmap and the Tumor Gene Database, respectively, and matched those that were also represented in our database: 933 disease and 152 tumor genes were obtained. For both disease and cancer genes, median MCSndev was significantly higher (rank-sum test) when compared with the median of the whole gene set, (MCSndev disease=–0.216, MCSndev cancer=0.109, MCSndev all=–0.247, P<0.01 and <10–10, respectively). To evaluate whether over-representation of genes involved in transcription or development might be responsible for higher MCS density in the cancer and disease gene sets, we purged genes associated with these GO terms. In particular, 92 (60.5%) and 710 (74.8%) cancer and disease genes, respectively, were not associated with either development (GO: 0007275) or transcription (GO: 0006350); whereas in the case of cancer genes, the median MCSndev (0.070) was significantly higher (rank-sum P<10–4) when compared with the whole set of genes, no difference was noted for disease genes when the purged sets were analyzed.


    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 MATERIALS AND METHODS
 SUPPLEMENTARY MATERIAL
 REFERENCES
 
In recent years, increasing evidences have suggested that intervening sequences have been contributing to eukaryote evolution (reviewed in 66). Intron presence allows massive proteome expansion through alternative splicing events (67Go) and has an impact on RNA metabolic processes (68Go–70Go); as, in these cases, introns exert their functional role by being removed (although in a regulated way), the problem of within-genome intron evolution dynamics remains open to debate. Some authors (71Go) have proposed a model of ‘selection for economy’ whereby highly expressed genes are subjected to stronger pressure for intron shortening when compared with genes expressed at low levels or in few tissues. Conversely, the ‘genomic design’ model (72Go), which our data strongly support, indicates that longer introns preferentially occur in tissue-specific genes due to increased regulatory complexity. Indeed, we have previously demonstrated that MCS fixation is responsible for intron size growth in humans (73Go); here, we show that intronic MCSs are specifically fixed in human genes depending on gene function and expression tissue, therefore, suggesting that the majority of these conserved sequences function as cis-acting gene regulators and, in turn, that intron elongation reflects cis-regulatory needs. In line with these considerations, the search for experimentally validated regulatory elements indicated that they map, in many instances, to MCSs (Table 1). Our data indicate (Fig. 1) that the great majority of relatively short human introns display no or a few MCSs, whereas in the longer intron classes, median MCS density reaches up to 5% of total intron size; overall, MCS-containing introns represent <40% in human (and <35% in mouse). On one hand, it should be noted that introns lacking MCSs might still carry regulatory sequences which are not conserved (at least to the extent of being classified as an MCS), as demonstrated by the lack of correlation between many known functional elements and MCS positions (Table 1). On the other hand, it is tempting to speculate that introns might provide a 2-fold tier in gene regulation by harboring cis-regulatory elements and by allowing adequate expression levels and mRNA processing. In the first instance, MCSs might play a pivotal role, whereas in the latter case, no MCS might be required, and intron removal might suffice to trigger downstream mRNA processing. Similar observations had been drawn upon analysis of Fugu genomic organization (62Go,66Go,74Go,75Go): in this organism, the few giant introns were suggested to play a role in gene regulation.

In line with previous suggestions, intronic MCSs can be expected to represent diverse functional categories, namely, chromatin structural elements (76Go), inter-chromosomal interactors (12Go), transcriptional regulators (5Go–10Go) or splicing modulatory elements (11Go). In particular, this latter possibility has scarcely been considered, except for a previous report (11Go) that only analyzed conserved sequences immediately flanking splice sites. Table 1 indicates that experimentally identified intronic splicing regulators map, in many instances, to MCSs, suggesting that the need to control splicing processes contributes to the fixation of intronic conserved sequence elements. This is in line with the observation (73Go) that MCS distribution is not uniform across human intron sequences but shows an increase in regions flanking the splice sites. Still, alternative splicing events were reported to be poorly conserved between human and mouse (77Go), an observation that casts doubts on the need to preserve MCSs to regulate largely divergent processes. Remarkably, it was recently demonstrated (65Go) that a significantly higher human-rodent conservation of alternative splicing events is observed for genes involved in transcription regulation and development, as well as in central nervous system-specific genes. Indeed, our data indicate that genes displaying at least one predicted conserved alternative splicing event have significantly higher MCSndev than those where alternative splicing events conserved in human and mouse have not been reported; moreover, MCSndev progressively increases when genes that display two or more than two alternative events are considered. Consistently, introns flanking conserved alternatively spliced exons display, on average, significantly higher MCS densities than the average of other introns from the same genes.

In addition to splicing and transcription regulators, MCSs probably represent a heterogeneous class of functional elements; it is interesting to notice, in this respect, that the GO terms that are associated with MCS-rich genes closely reflect those that were associated with gene deserts displaying a high density of conserved sequence elements (9Go), suggesting that intragenic and intergenic constraints might act in the same direction to preserve fine-tuned regulation of genes involved in pivotal processes such as development and transcription regulation. The same conclusion had also been put forward upon distribution analysis of sequence elements conserved between humans and fishes (10Go). Nonetheless, it should be noticed that as evidenced from data concerning human and mouse embryonic/fetal gene expression, specific stages might exist during development when preferential expression of MCS-rich genes occurs. In fact, early mouse developmental stages (from fertilized egg to 8.5 d.p.c.) display no preferential expression of MCS-rich genes, which is instead observed for later stages. Mouse stages 9.5–10.5 correspond to the first 5 weeks of human gestation; high-throughput gene expression data are only available for later human developmental stages (beyond 15 weeks for liver and 20 weeks after conception for brain and lung) and show no evidence of increased expression of MCS-rich genes compared with the adult tissue counterparts. Further gene expression studies are required to allow speculation on the role of highly conserved genes in human developmental stages, as well as on the possibility to provide a molecular definition for a phylotypic stage (fitting the hourglass model) for vertebrate development as previously attempted (78Go).

Functional analysis of MCS-poor genes indicated that the great majority of them are involved in defense response. An accelerated divergence of coding regions had previously been shown for this functional category (79Go) and interpreted in terms of genetic conflict between host and pathogen. Although in the case of coding sequence divergence, positive selection and coevolution of protein–protein interactions have been invoked, the poor conservation of non-coding elements is probably more easily explained by relaxation of purifying selection pressure.

More generally, our data indicate that conservation of coding and non-coding sequences is highly correlated, suggesting that although different selective pressures might act on either, in many instances the same selection source (or absence of selection, i.e. neutral evolution) might be effective on both. Moreover, the density of conserved sequences in mammalian introns correlates with UTR length and conservation, as well as with conservation in upstream gene regions. This observation suggests that for a given gene, the evolution rates of its non-coding portions are closely coupled and, in turn, they parallel protein sequence evolution. A similar observation has been drawn upon analysis of Caenorhabditis elegans and C. briggsae orthologous genes (80Go). In analogy to worms, evolution might therefore act on vertebrate genes as integrated units of coding and regulatory capacity and purifying selection might play a relevant role in preserving vital functions throughout. However, although the overall trend indicates a positive relationship between coding and non-coding sequence conservation, gene expression level and breadth do not show any association with intronic MCS density in either human or mouse. Previous studies (81Go–83Go) had indicated that highly and broadly expressed genes displayed significantly lower coding sequence divergence and this was interpreted in terms of negative selection being more effective, especially in species with small population sizes as humans, on housekeeping genes. Instead, our data suggest that intronic MCS enrichment might correlate with gene functional complexity in terms of distinct protein domains and conserved alternative splicing events, strongly supporting the role of MCS as cis-acting regulators of complex genes. In addition, MCS-rich genes are over represented among central nervous system-specific genes, suggesting that MCSs might operate in assuring complex and fine-tuned regulatory events. In analogy, recent reports have indicated that tissue biology also plays a role in protein evolution and brain-specific genes have been shown to display relatively lower protein divergence (82Go,84Go). It is therefore tempting to speculate that although expression level and breadth might render purifying selection more effective at the coding sequence level, functional complexity might represent a source of negative selection on non-coding sequences and, to some extent, on proteins.

Recent reports have also indicated that broadly expressed genes are poorly represented among human disease genes, probably reflecting high frequency of embryonic lethality for mutations in housekeeping genes. Our data indicate that disease and cancer genes are, on average, enriched of intronic MCSs. This observation is in agreement with previous findings indicating that disease proteins exhibit a wider phylogenetic extent and are generally more conserved when compared with all human proteins (85Go). Nonetheless, our data indicate that the higher MCS density in disease loci is mainly accounted for by genes that are involved in transcription or development, which possibly also account for higher protein conservation. Conversely, when the cancer gene set was purged from genes involved in these same processes, a significant enrichment in MCSs was still observed, suggesting that whatever the process they are involved in cancer genes need tight regulation and therefore are probably subjected to strong purifying selection for the maintenance of cis- regulators.

As their discovery, the role of MCSs in intraspecific phenotypic variability, complex trait expression and human genetic disease has been debated. This issue has been addressed in a recent review (2Go) and the authors indicated that mutations in only one MCS (in intron 5 of LMBR1 gene) have now been identified as responsible for a human genetic disease (preaxial polydactily). The data we report in Table 1 expand this narrow statistics and indicate that at least two other pathological gene mutations map to MCSs, therefore, substantiating the potential role of these sequences as mutation targets. The reasons why mutations in non-coding sequences are under represented as a cause of genetic disease do not need to be reported here. We consider that our data might provide further indications concerning the potential pathogenetic effects of MCS mutations. Moreover, given the notion whereby <1% of the sequence difference between individuals occurs in protein coding regions (1Go), the impact of regulatory elements and possibly MCSs on phenotypic variability and complex trait predisposition deserves extensive study.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 MATERIALS AND METHODS
 SUPPLEMENTARY MATERIAL
 REFERENCES
 
Human gene/intron database
For creation of the intron database, human genes that had been annotated in the NCBI Reference Sequence (RefSeq) collection were selected (reviewed or validated entries only); for mouse genes, ‘Provisional’ entries were also included. Genomic sequences and intron/exon boundaries were derived from the UCSC genome annotation database (http://genome.ucsc.edu/cgi-bin/hgGateway hg17 assembly and mm5, May 2004 for human and mouse, respectively). Intronless genes were discarded, and for each gene, the transcript corresponding to the longest genomic sequence and containing the highest number of exons was selected. The data sets were constituted by 7614 human and 5550 mouse genes. For the identification of human–mouse orthologous pairs, the EnsMart database (http://www.ensembl.org/Multi/martview) was used and only entries representing unique best reciprocal hits were selected.

MCS retrieval and distribution analysis
MCS were obtained using phastCons predictions (13Go,14Go), which are based on a phylogenetic hidden Markov model and are available through the UCSC database (phastConsElements Table). Only purely intronic phastCons elements were selected (i.e. MCS partially overlapping with exons was discarded). To calculate MCS density as a function of intron length, intronic sequences were partitioned in 10 length classes; in particular, introns were ranked according to their size and subsequently clustered to analyze, for each size class, the same absolute nucleotide number. The following length intervals (in bp) were obtained: 6–2476, 2477–5062, 5063–9147, 9148–15 666, 15 667–26 179, 26 180–41 774, 41 775–66 914, 66 915–110 044, 110 045–190 058, 190 059–1 043 911.

MCS density for intron length class k (dMCSk) has been calculated as total class MCS length over total class intron length:

Expected MCS length for intron i belonging to length class k has been calculated as:

For each gene, we computed expected MCS densities (dMCSexp) as:

These expected densities have been used in the evaluation of MCSndev as reported in Results.

Functional element retrieval
Intronic functional element retrieval was performed by inspecting the literature for evidences of elements that accomplished the following criteria: experimental evidence for their function, purely intronic location and direct evidence for function in humans (those sequences that were experimentally tested in mice and inferred to also work in human because of sequence conservation were not included).

Gene classification
Gene associations with GO terms and their descriptions were performed by cross-referencing the UCSC hg17 kgXref table with the GO database; InterPro information was retrieved from the UCSC protein database (interProXref table). InterPro associations were purged from redundancy using the ‘entry2entry’ table from the InterPro database, which reports existing parent/child relationships between domain entries. Association and description files were then created and significant associations between gene groups and GO terms or InterPro domains were identified using GeneMerge (60Go).

The Tumor Gene Database (http://condor.bcm.tmc.edu/ermb/tgdb/tgdb.html) was used to identify human genes involved in cancer processes. Disease genes were retrieved from OMIM (ftp://ftp.ncbi.nih.gov/repository/OMIM/morbidmap).

Expression, alternative splicing and protein divergence data
Data on expression levels in human and mouse tissues were derived from previous studies (86Go,87Go): they are publicly accessible through the UCSC database (tables: gnfHumanAtlas2median and gnfHumanAtlas2medianExps; gnfMouseAtlas2median and gnfMouseAtlas2medianExps) and they are based on high-density oligonucleotide arrays (GNF Gene Expression Atlas 2). We only considered probes corresponding to genes that had been included in our database; signals from duplicated probes on the same chip were averaged as well as replicates from the same tissue. A gene was considered to be expressed in a given tissue if its signal level was higher or equal to 200 arbitrary units (87Go).

For analysis of conserved alternative splicing events, a previously reported (65Go) list of predicted conserved alternatively spliced exons was used; in particular, to obtain genes that were also present in our initial database, Ensembl gene entries provided by the authors were cross-mapped to RefSeq entries and, if represented in our database, allocated an MCSndev value. For the analysis of single introns involved in alternative splicing events, all reported genes were used irrespective of their presence in our data set; all Ensembl transcripts corresponding to one described alternatively spliced gene were extracted from the UCSC database (ensGene and ensGtp tables) and the presence of an exon corresponding in sequence, length and position to the alternatively spliced one was checked. All gene intron sequences were then retrieved and trace was kept of introns located in either 5' or 3' of an alternatively spliced exon. To purge MCSs that might derive from alternative splicing events such as intron retention or inclusion of overlapping longer exons, all MCS that mapped to mRNA or EST entries in the UCSC database (tables all_mrna and all_est) were discarded from all introns analyzed.

Information concerning protein divergence (dN and dS) was obtained from the EnsMart database (http://www.ensembl.org/Multi/martview).

UTR and upstream sequence information retrieval
For each transcript entry in our database, data concerning transcript start and end as well as coding sequence (CDS) boundaries were retrieved from the UCSC annotation tables; the difference between these positions was used to obtain UTR length after removal of introns. For MCS density calculations, we sought to eliminate all those MCSs that might correspond to spliced coding exons in an alternative transcript. To this aim, all transcripts totally or partially overlapping with those constituting our database were extracted from the following annotation tables: ensGene, refGene and knownGene (the latter combines all known protein-coding genes on the basis of protein data from SWISS-PROT, TrEMBL and TrEMBL-NEW and their corresponding mRNAs from GenBank). MCS that mapped to at least one coding exon in one transcript was eliminated.

Data concerning upstream sequence conservation have been previously reported (64Go) and refer to 8 kb genomic regions upstream of CDS starts sites.


    SUPPLEMENTARY MATERIAL
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 MATERIALS AND METHODS
 SUPPLEMENTARY MATERIAL
 REFERENCES
 
Supplementary Material is available at HMG Online.


    ACKNOWLEDGEMENTS
 
We thank Dr Roberto Giorda for useful discussion about the manuscript.

Conflict of Interest statement. None declared.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 MATERIALS AND METHODS
 SUPPLEMENTARY MATERIAL
 REFERENCES
 

  1. Venter, J.C., Adams, M.D., Myers, E.W., Li, P.W., Mural, R.J., Sutton, G.G., Smith, H.O., Yandell, M., Evans, C.A., Holt, R.A. et al. (2001) The sequence of the human genome. Science, 291, 1304–1351.[Abstract/Free Full Text]

  2. Dermitzakis, E.T., Reymond, A. and Antonarakis, S.E. (2005) Conserved non-genic sequences—an unexpected feature of mammalian genomes. Nat. Rev. Genet., 6, 51–57.

  3. Boffelli, D., Nobrega, M.A. and Rubin, E.M. (2004) Comparative genomics at the vertebrate extremes. Nat. Rev. Genet., 5, 456–465.[CrossRef][Web of Science][Medline]

  4. Bejerano, G., Pheasant, M., Makunin, I., Stephen, S., Kent, W.J., Mattick, J.S. and Haussler, D. (2004) Ultraconserved elements in the human genome. Science, 304, 1321–1325.[Abstract/Free Full Text]

  5. Boffelli, D., McAuliffe, J., Ovcharenko, D., Lewis, K.D., Ovcharenko, I., Pachter, L. and Rubin, E.M. (2003) Phylogenetic shadowing of primate sequences to find functional regions of the human genome. Science, 299, 1391–1394.[Abstract/Free Full Text]

  6. Nobrega, M.A., Ovcharenko, I., Afzal, V. and Rubin, E.M. (2003) Scanning human gene deserts for long-range enhancers. Science, 302, 413.[Free Full Text]

  7. Frazer, K.A., Tao, H., Osoegawa, K., de Jong, P.J., Chen, X., Doherty, M.F. and Cox, D.R. (2004) Noncoding sequences conserved in a limited number of mammals in the SIM2 interval are frequently functional. Genome Res., 14, 367–372.[Abstract/Free Full Text]

  8. Loots, G.G. and Ovcharenko, I. (2004) rVISTA 2.0: evolutionary analysis of transcription factor binding sites. Nucleic Acids Res., 32, W217–W221.[Abstract/Free Full Text]

  9. Ovcharenko, I., Loots, G.G., Nobrega, M.A., Hardison, R.C., Miller, W. and Stubbs, L. (2005) Evolution and functional classification of vertebrate gene deserts. Genome Res., 15, 137–145.[Abstract/Free Full Text]

  10. Woolfe, A., Goodson, M., Goode, D.K., Snell, P., McEwen, G.K., Vavouri, T., Smith, S.F., North, P., Callaway H., Kelly, K. et al. (2005) Highly conserved non-coding sequences are associated with vertebrate development. PLoS Biol., 3, 116–130.[CrossRef]

  11. Sorek, R. and Ast, G. (2003) Intronic sequences flanking alternatively spliced exons are conserved between human and mouse. Genome Res., 13, 1631–1637.[Abstract/Free Full Text]

  12. Dermitzakis, E.T., Kirkness, E., Schwarz, S., Birney, E., Reymond, A. and Antonarakis, S.E. (2004) Comparison of human chromosome 21 conserved nongenic sequences (CNGs) with the mouse and dog genomes shows that their selective constraint is independent of their genic environment. Genome Res., 14, 852–859.[Abstract/Free Full Text]

  13. Siepel, A. and Haussler, D. (2004) Combining phylogenetic and hidden Markov models in biosequence analysis. J. Comput. Biol., 11, 413–428.[CrossRef][Web of Science][Medline]

  14. Siepel, A. and Haussler, D. (2004) Phylogenetic estimation of context-dependent substitution rates by maximum likelihood. Mol. Biol. Evol., 21, 468–488.[Abstract/Free Full Text]

  15. Deguillien, M., Huang, S.C., Moriniére, M., Dreumont, N., Benz, E. Jr and Baklouti, F. (2001) Multiple cis elements regulate an alternative splicing event at 4.1R pre-mRNA during erythroid differentiation. Blood, 98, 3809–3816.[Abstract/Free Full Text]

  16. Helledie, T., Grontved, L., Jensen, S.S., Kiilerich, P., Rietveld, L., Albrektsen, T., Boysen, M.S., Nohr, J., Larsen, L.K., Fleckner, J. et al. (2002) The gene encoding the Acyl-CoA-binding protein is activated by peroxisome proliferator-activated receptor gamma through an intronic response element functionally conserved between humans and rodents. J. Biol. Chem., 277, 26821–26830.[Abstract/Free Full Text]

  17. Surinya, K.H., Cox, T.C. and May, B.K. (1998) Identification and characterization of a conserved erythroid-specific enhancer located in intron 8 of the human 5-aminolevulinate synthase 2 gene. J. Biol. Chem., 273, 16798–16809.[Abstract/Free Full Text]

  18. Genetta, T., Morisaki, H., Morisaki, T. and Holmes, E. (2001) A novel bipartite intronic splicing enhancer promotes the inclusion of a mini-exon in the AMP deaminase 1 gene. J. Biol. Chem., 276, 25589–25597.[Abstract/Free Full Text]

  19. Ge, B., Li, O., Wilder, P., Rizzino, A. and McKeithan, T.W. (2003) NF-kappa B regulates BCL3 transcription in T lymphocytes through an intronic enhancer. J. Immunol., 171, 4210–4218.[Abstract/Free Full Text]

  20. Jo, E.K., Kanegane, H., Nonoyama, S., Tsukada, S., Lee, J.H., Lim, K., Shong, M., Song, C.H., Kim, H.J., Park, J.K. et al. (2001) Characterization of mutations, including a novel regulatory defect in the first intron, in Bruton's tyrosine kinase gene from seven Korean X-linked agammaglobulinemia families. J. Immunol., 167, 4038–4045.[Abstract/Free Full Text]

  21. Rohrer, J. and Conley, M.E. (1998) Transcriptional regulatory elements within the first intron of Bruton's tyrosine kinase. Blood, 91, 214–221.[Abstract/Free Full Text]

  22. Lou, H., Yang, Y., Cote, G., Berget, S. and Gagel, R. (1995) An intron enhancer containing a 5 splice site sequence in the human calci-tonin/calcitonin gene-related peptide gene. Mol. Cell. Biol., 15, 7135–7142.[Abstract]

  23. Horikawa, Y., Oda, N., Cox, N.J., Li, X., Orho-Melander, M., Hara, M., Hinokio, Y., Lindner, T.H., Mashima, H., Schwarz, P.E. et al. (2000) Genetic variation in the gene encoding calpain-10 is associated with type 2 diabetes mellitus. Nat. Genet., 26, 163–175.[CrossRef][Web of Science][Medline]

  24. Scotet, E., Schroeder, S. and Lanzavecchia, A. (2001) Molecular regulation of CC-chemokine receptor 3 expression in human T helper 2 cells. Blood, 98, 2568–2570.[Abstract/Free Full Text]

  25. Smith, A.N., Barth, M.L., McDowell, T.L., Moulin, D.S., Nuthall, H.N., Hollingsworth, M.A. and Harris, A. (1996) A regulatory element in intron 1 of the cystic fibrosis transmembrane conductance regulator gene. J. Biol. Chem., 271, 9947–9954.[Abstract/Free Full Text]

  26. Zuccato, E., Buratti, E., Stuani, C., Baralle, F.E. and Pagani, F. (2004) An intronic polypyrimidine-rich element downstream of the donor site modulates cystic fibrosis transmembrane conductance regulator exon 9 alternative splicing. J. Biol. Chem., 279, 16980–16988.[Abstract/Free Full Text]

  27. Pagani, F., Buratti, E., Stuani, C., Romano, M., Zuccato, E., Niksic, M., Giglio, L. Faraguna, D. and Baralle, F.E. (2000) Splicing factors induce cystic fibrosis transmembrane regulator exon 9 skipping through a nonevolutionary conserved intronic element. J. Biol. Chem., 275, 21041–21047.[Abstract/Free Full Text]

  28. Charlet, B.N., Savkur, R., Singh, G., Philips, A., Grice, E. and Cooper, T. (2002) Loss of the muscle-specific chloride channel in type I myotonic dystrophy lead to misregulated alternative splicing. Mol. Cell, 10, 45–53.[CrossRef][Web of Science][Medline]

  29. Ghayor, C., Herrouin, J.F., Chadjichristos, C., Ala-Kokko, L., Takigawa, M., Pujol, J.P. and Galera, P. (2000) Regulation of human COL2A1 gene expression in chondrocytes. Identification of C-Krox-responsive elements and modulation by phenotype alteration. J. Biol. Chem., 275, 27421–27438.[Abstract/Free Full Text]

  30. Makar, K.W., Ulgiati, D., Hagman, J. and Holers, V.M. (2001) A site in the complement receptor 2 (CR2/CD21) silencer is necessary for lineage specific transcriptional regulation. Int. Immunol., 13, 657–664.[Abstract/Free Full Text]

  31. Himes, S.R., Tagoh, H., Goonetilleke, N., Sasmono, T., Oceandy, D., Clark, R., Bonifer, C. and Hume, D.A. (2001) A highly conserved c-fms gene intronic element controls macrophage-specific and regulated expression. J. Leukoc. Biol., 70, 812–820.[Abstract/Free Full Text]

  32. Follows, G.A., Tagoh, H., Lefevre, P., Morgan, G.J. and Bonifer, C. (2003) Differential transcription factor occupancy but evolutionarily conserved chromatin features at the human and mouse M-CSF (CSF-1) receptor loci. Nucleic Acids Res., 31, 5805–1586.[Abstract/Free Full Text]

  33. Yoon, H., Liyanarachchi, S., Wright, F.A., Davuluri, R., Lockman, J.C., de la Chapelle, A. and Pellegata, N.S. (2002) Gene expression profiling of isogenic cells with different TP53 gene dosage reveals numerous genes that are affected by TP53 dosage and identifies CSPG2 as a direct target of p53. Proc. Natl Acad. Sci. USA, 99, 15632–15637.[Abstract/Free Full Text]

  34. Klamut, H.J., Bosnoyan-Collins, L.O., Worton, R.G. and Ray, P.N. (1997) A muscle-specific enhancer within intron 1 of the human dystrophin gene is functionally dependent on single MEF-1/E box and MEF-2/AT-rich sequence motifs. Nucleic Acids Res., 25, 1618–1625.[Abstract/Free Full Text]

  35. Jin, W., Huang, E.S.-C., Bi, W. and Cote, G. (1999) Redundant intronic repressors function to inhibit fibroblast growth factor receptor-1 {alpha}-exon recognition in glioblastoma cells. J. Biol. Chem., 274, 28035–28041.[Abstract/Free Full Text]

  36. Jin, W., Bi, W., Huang, E.S.-C. and Cote, G. (1999) Glioblastoma cell-specific expression of fibroblast growth factor receptor-1ß requires an intronic repressor of RNA splicing. Cancer Res., 59, 316–319.[Abstract/Free Full Text]

  37. del Gatto, F. and Breathnach, R. (1995) Exon and intron sequences, respectively, repress and activate splicing of a fibroblast growth factor receptor 2 alternative exon. Mol. Cell. Biol., 15, 4825–4834.[Abstract]

  38. del Gatto, F., Plet, A., Gesnel, M.-C., Fort, C. and Breathnach, R. (1997) Multiple interdependent sequence elements control splicing of a fibroblast growth factor receptor 2 alternative exon. Mol. Cell. Biol., 17, 5106–5116.[Abstract]

  39. Cogan, J.D., Prince, M.A., Lekhakula, S., Bundey, S., Futrakul, A., McCarthy, E.M. and Phillips, J.A. (1997) A novel mechanism of aberrant pre-mRNA splicing in humans. Hum. Mol. Genet., 6, 909–912.[Abstract/Free Full Text]

  40. Yang, J.-H., Sklar, P., Axel, R. and Maniatis, T. (1995) Editing of glutamate receptor subunit B pre-mRNA in vitro by site-specific deamination of adenosine. Nature, 374, 77–81.[CrossRef][Medline]

  41. Guil, S., Gattoni, R., Carrascal, M., Abian, J., Stevenin, J. and Bach-Elias, M. (2003) Roles of hnRNP A1, SR proteins, and p68 helicase in c-H-ras alternative splicing regulation. Mol. Cell. Biol., 23, 2927–2941.[Abstract/Free Full Text]

  42. Draper, N., Walker, E.A., Bujalska, I.J., Tomlinson, J.W., Chalder, S.M., Arlt, W., Lavery, G.G., Bedendo, O., Ray, D.W., Laing I. et al. (2003) Mutations in the genes encoding 11beta-hydroxysteroid dehydrogenase type 1 and hexose-6-phosphate dehydrogenase interact to cause cortisone reductase deficiency. Nat. Genet., 34, 434–439.[CrossRef][Web of Science][Medline]

  43. Savkur, R.S., Philips, A.V. and Cooper, T.A. (2001) Aberrant regulation of insulin receptor alternative splicing is associated with insulin resistance in myotonic dystrophy. Nat. Genet., 29, 40–47.[CrossRef][Web of Science][Medline]

  44. Lettice, L.A., Heaney, S.J., Purdie, L.A., Li, L., de Beer, P., Oostra, B.A., Goode, D., Elgar, G., Hill, R.E. and de Graaff, E. (2003) A long-range Shh enhancer regulates expression in the developing limb and fin and is associated with preaxial polydactyly. Hum. Mol. Genet. 12, 1725–1735.[Abstract/Free Full Text]

  45. D'Souza, I. and Schellenberg, G.D. (2002) tau exon 10 expression involves a bipartite intron 10 regulatory sequence and weak 5' and 3' splice sites. J. Biol. Chem., 277, 26587–26599.[Abstract/Free Full Text]

  46. Takahashi, K., Nishiyama, C., Hasegawa, M., Akizawa, Y. and Ra, C. (2003) Regulation of the human high affinity IgE receptor beta-chain gene expression via an intronic element. J. Immunol., 171, 2478–2484.[Abstract/Free Full Text]

  47. Beohar, N. and Kawamoto, S. (1998) Transcriptional regulation of the human nonmuscle myosin II heavy chain-A gene. Identification of three clustered cis-elements in intron-1 which modulate transcription in a cell type- and differentiation state-dependent manner. J. Biol. Chem., 273, 9168–9178.[Abstract/Free Full Text]

  48. Chung, M.C. and Kawamoto, S. (2004) IRF-2 is involved in up-regulation of nonmuscle myosin heavy chain II-A gene expression during phorbol ester-induced promyelocytic HL-60 differentiation. J. Biol. Chem., 279, 56042–56052.[Abstract/Free Full Text]

  49. Kawamoto, S. (1996) Neuron-specific alternative splicing of nonmuscle myosin II heavy chain-B pre-mRNA requires a cis-acting intron sequence. J. Biol. Chem., 271, 17613–17616.[Abstract/Free Full Text]

  50. Prokunina, L., Castillejo-Lopez, C., Oberg, F., Gunnarsson, I., Berg, L., Magnusson, V., Brookes, A.J., Tentler, D., Kristjansdottir, H., Grondal, G., Bolstad, A.I., Svenungsson, E. et al. (2002) A regulatory polymorphism in PDCD1 is associated with susceptibility to systemic lupus erythematosus in humans. Nat. Genet., 32, 666–669.[CrossRef][Web of Science][Medline]

  51. Schjerven, H., Brandtzaeg, P. and Johansen, F.E. (2003) Hepatocyte NF-1 and STAT6 cooperate with additional DNA-binding factors to activate transcription of the human polymeric Ig receptor gene in response to IL-4. J. Immunol., 170, 6048–6056.[Abstract/Free Full Text]

  52. Hobson, G.M., Huang, Z., Sperle, K., Stabley, D.L., Marks, H.G. and Cambi, F. (2002) A PLP splicing abnormality is associated with an unusual presentation of PMD. Ann. Neurol., 52, 477–488.[CrossRef][Web of Science][Medline]

  53. Shamsher, M.K., Chuzhanova, N.A., Friedman, B., Scopes, D.A., Alhaq, A., Millar, D.S., Cooper, D.N. and Berg, L.P. (2000) Identification of an intronic regulatory element in the human protein C (PROC) gene. Hum. Genet., 107, 458–465.[CrossRef][Web of Science][Medline]

  54. Palii, S.S., Chen, H. and Kilberg, M.S. (2004) Transcriptional control of the human sodium-coupled neutral amino acid transporter system A gene by amino acid availability is mediated by an intronic element. J. Biol. Chem., 279, 3463–3471.[Abstract/Free Full Text]

  55. Miyajima, H., Miyaso, H., Okumura, M., Kurisu, J. and Imaizumi, K. (2002) Identification of a cis-acting element for the regulation of SMN exon 7 splicing. J. Biol. Chem., 277, 23271–23277.[Abstract/Free Full Text]

  56. Wong, L.H., Sim, H., Chatterjee-Kishore, M., Hatzinisiriou, I., Devenish, R.J., Stark, G. and Ralph, S.J. (2002) Isolation and characterization of a human STAT1 gene regulatory element. Inducibility by interferon (IFN) types I and II and role of IFN regulatory factor-1. J. Biol. Chem., 277, 19408–19417.[Abstract/Free Full Text]

  57. Lietz, M., Hohl, M. and Thiel, G. (2003) RE-1 silencing transcription factor (REST) regulates human synaptophysin gene transcription through an intronic sequence-specific DNA-binding site. Eur. J. Biochem., 270, 2–9.[Web of Science][Medline]

  58. Polakowska, R.R., Graf, B.A., Falciano, V. and LaCelle, P. (1999) Transcription regulatory elements of the first intron control human transglutaminase type I gene expression in epidermal keratinocytes. J. Cell Biochem., 73, 355–369.[CrossRef][Web of Science][Medline]

  59. Galvagni, F. and Oliviero, S. (2000) Utrophin transcription is activated by an intronic enhancer. J. Biol. Chem., 275, 3168–3172.[Abstract/Free Full Text]

  60. Castillo-Davis, C.I. and Hartl, D.L. (2003) GeneMerge-post-genomic analysis, data mining, and hypothesis testing. Bioinformatics, 19, 891–892.[Abstract/Free Full Text]

  61. Shannon, M., Hamilton, A.T., Gordon, L., Branscomb, E. and Stubbs, L. (2003) Differential expansion of zinc-finger transcription factor loci in homologous human and mouse gene clusters. Genome Res., 13, 1097–1110.[Abstract/Free Full Text]

  62. Aparicio, S., Chapman, J., Stupka, E., Putnam, N., Chia, J.M., Dehal, P., Christoffels, A., Rash, S., Hoon, S., Smit, A., Gelpke, M.D. et al. (2002) Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes. Science, 297, 1301–1310.[Abstract/Free Full Text]

  63. Vinogradov, A.E. (2003) Isochores and tissue-specificity. Nucleic Acids Res., 31, 5212–5220.[Abstract/Free Full Text]

  64. Iwama, H. and Gojobori, T. (2004) Highly conserved upstream sequences for transcription factor genes and implications for the regulatory network. Proc. Natl Acad. Sci. USA, 101, 17156–17161.[Abstract/Free Full Text]

  65. Yeo, G.W., van Nostrand, E., Holste, D., Poggio, T. and Burge, C.B. (2005) Identification and analysis of alternative splicing events conserved in human and mouse. Proc. Natl Acad. Sci. USA, 102, 2850–2855.[Abstract/Free Full Text]

  66. Mattick, J.S. (2001) Non-coding RNAs: the architects of eukaryotic complexity. EMBO Rep., 2, 986–991.[CrossRef][Web of Science][Medline]

  67. Maniatis, T. and Tasic, B. (2002) Alternative pre-mRNA splicing and proteome expansion in metazoans. Nature, 418, 236–243.[CrossRef][Medline]

  68. Luo, M.J. and Reed, R. (1999) Splicing is required for rapid and efficient mRNA export in metazoans. Proc. Natl Acad. Sci. USA, 96, 14937–14942.[Abstract/Free Full Text]

  69. Maquat, L.E. (1995) When cells stop making sense: effects of nonsense codons on RNA metabolism in vertebrate cells. RNA, 1, 453–465.[Abstract]

  70. Le Hir, H., Izaurralde, E., Maquat, L.E. and Moore, M.J. (2000) The spliceosome deposits multiple proteins 20–24 nucleotides upstream of mRNA exon–exon junctions. EMBO J., 19, 6860–6869.[CrossRef][Web of Science][Medline]

  71. Castillo-Davis, C.I., Mekhedov, S.L., Hartl, D.L., Koonin, E.V. and Kondrashov, F.A. (2002) Selection for short introns in highly expressed genes. Nat. Genet., 31, 415–418.[Web of Science][Medline]

  72. Vinogradov, A.E. (2004) Compactness of human housekeeping genes: selection for economy or genomic design?. Trends Genet., 20, 248–253.[CrossRef][Web of Science][Medline]

  73. Sironi, M., Menozzi, G., Comi, G.P., Bresolin, N., Cagliani, R. and Pozzoli, U. (2005) Fixation of conserved sequences shapes human intron size and influences transposon insertion dynamics. Trends Genet., (2005) J.15 [Epub ahead of print], doi: 10.1016/j.tig.2005.06.009.

  74. Cecconi, F., Crosio, C., Mariottini, P., Cesareni, G., Giorgi, M., Brenner, S. and Amaldi, F. (1996) A functional role for some Fugu introns larger than the typical short ones: the example of the gene coding for ribosomal protein S7 and snoRNA U17. Nucleic Acids Res., 24, 3167–3172.[Abstract/Free Full Text]

  75. Pozzoli, U., Elgar, G., Cagliani, R., Riva, L., Comi, G.P., Bresolin, N., Bardoni, A. and Sironi, M. (2003) Comparative analysis of vertebrate dystrophin loci indicate intron gigantism as a common feature. Genome Res., 13, 764–772.[Abstract/Free Full Text]

  76. Glazko, G.V., Koonin, E.V., Rogozin, I.B. and Shabalina, S.A. (2003) A significant fraction of conserved noncoding DNA in human and mouse consists of predicted matrix attachment regions. Trends Genet., 19, 119–124.[CrossRef][Web of Science][Medline]

  77. Nurtdinov, R.N., Artamonova, I.I., Mironov, A.A. and Gelfand, M.S. (2003) Low conservation of alternative splicing patterns in the human and mouse genomes. Hum. Mol. Genet., 12, 1313–1320.[Abstract/Free Full Text]

  78. Hazkani-Covo, E., Wool, D. and Graur, D. (2005) In search of the vertebrate phylotypic stage: a molecular examination of the developmental hourglass model and von Baer's third law. J. Exp. Zoolog. B. Mol. Dev. Evol., 304, 150–158.[Medline]

  79. Castillo-Davis, C.I., Kondrashov, F.A. and Hartl, D.L., Kulathinal, R.J. (2004) The functional genomic distribution of protein divergence in two animal phyla: coevolution, genomic conflict, and constraint. Genome Res., 14, 802–811.[Abstract/Free Full Text]

  80. Castillo-Davis, C.I., Hartl, D.L. and Achaz, G. (2004) cis-Regulatory and protein evolution in orthologous and duplicate genes. Genome Res., 14, 1530–1536.[Abstract/Free Full Text]

  81. Zhang, L. and Li, W.H. (2004) Mammalian housekeeping genes evolve more slowly than tissue-specific genes. Mol. Biol. Evol., 21, 236–239.[Abstract/Free Full Text]

  82. Duret, L. and Mouchiroud, D. (2000) Determinants of substitution rates in mammalian genes: expression pattern affects selection intensity but not mutation rate. Mol. Biol. Evol., 17, 68–74.[Abstract/Free Full Text]

  83. Subramanian, S. and Kumar, S. (2004) Gene expression intensity shapes evolutionary rates of the proteins encoded by the vertebrate genome. Genetics, 168, 373–381.[Abstract/Free Full Text]

  84. Winter, E.E., Goodstadt, L. and Ponting, C.P. (2004) Elevated rates of protein secretion, evolution, and disease among tissue-specific genes. Genome Res., 14, 54–61.[Abstract/Free Full Text]

  85. Lopez-Bigas, N. and Ouzounis, C.A. (2004) Genome-wide identification of genes likely to be involved in human genetic disease. Nucleic Acids Res., 32, 3108–3114.[Abstract/Free Full Text]

  86. Su, A.I., Wiltshire, T., Batalov, S., Lapp, H., Ching, K.A., Block, D., Zhang, J., Soden, R., Hayakawa, M., Kreiman, G. et al. (2004) A gene atlas of the mouse and human protein-encoding transcriptomes. Proc. Natl Acad. Sci. USA, 101, 6062–6067.[Abstract/Free Full Text]

  87. Su, A.I., Cooke, M.P., Ching, K.A., Hakak, Y., Walker, J.R., Wiltshire, T., Orth, A.P., Vega, R.G., Sapinoso, L.M., Moqrich, A. et al. Large-scale analysis of the human and mouse transcriptomes. Proc. Natl Acad. Sci. USA, 99, 4465–4470.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Brief Funct Genomic ProteomicHome page
P. Navratilova and T. S. Becker
Genomic regulatory blocks in vertebrates and implications in human disease
Brief Funct Genomic Proteomic, July 1, 2009; 8(4): 333 - 342.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
B. P. Mello, E. F. Abrantes, C. H. Torres, A. Machado-Lima, R. d. S. Fonseca, D. M. Carraro, R. R. Brentani, L. F. L. Reis, and H. Brentani
No-match ORESTES explored as tumor markers
Nucleic Acids Res., May 1, 2009; 37(8): 2607 - 2617.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
A. Tsirigos and I. Rigoutsos
Human and mouse introns are linked to the same processes and functions through each genome's most frequent non-conserved motifs
Nucleic Acids Res., June 1, 2008; 36(10): 3484 - 3493.
[Abstract] [Full Text] [PDF]


Home page
Hum Mol GenetHome page
D. S. Perez, T. R. Hoage, J. R. Pritchett, A. L. Ducharme-Smith, M. L. Halling, S. C. Ganapathiraju, P. S. Streng, and D. I. Smith
Long, abundantly expressed non-coding transcripts are altered in cancer
Hum. Mol. Genet., March 1, 2008; 17(5): 642 - 655.
[Abstract] [Full Text] [PDF]


Home page
Physiol. Rev.Home page
M. F. Mehler and J. S. Mattick
Noncoding RNAs and RNA Editing in Brain Development, Functional Diversification, and Neurological Disease
Physiol Rev, July 1, 2007; 87(3): 799 - 823.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
R. B. Voelker and J. A. Berglund
A comprehensive computational characterization of conserved mammalian intronic sequences reveals conserved motifs associated with constitutive and alternative splicing
Genome Res., July 1, 2007; 17(7): 1023 - 1033.
[Abstract] [Full Text] [PDF]


Home page
J. Exp. Biol.Home page
J. S. Mattick
A new paradigm for developmental biology
J. Exp. Biol., May 1, 2007; 210(9): 1526 - 1547.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
A. E. Vinogradov
'Genome design' model and multicellular complexity: golden middle
Nucleic Acids Res., November 6, 2006; 34(20): 5906 - 5914.
[Abstract] [Full Text] [PDF]


Home page
Hum Mol GenetHome page
C. P. Ponting and G. Lunter
Signatures of adaptive evolution within human non-coding sequence
Hum. Mol. Genet., October 15, 2006; 15(suppl_2): R170 - R175.
[Abstract] [Full Text] [PDF]


Home page
Hum Mol GenetHome page
H. Sun, G. Skogerbo, and R. Chen
Conserved distances between vertebrate highly conserved elements
Hum. Mol. Genet., October 1, 2006; 15(19): 2911 - 2922.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
D. L. Halligan and P. D. Keightley
Ubiquitous selective constraints in the Drosophila genome revealed by a genome-wide interspecies comparison
Genome Res., July 1, 2006; 16(7): 875 - 884.
[Abstract] [Full Text] [PDF]


Home page
Hum Mol GenetHome page
S. Liu, C. Zhang, and Y. Zhou
Uneven size distribution of mammalian genes in the number of tissues expressed and in the number of co-expressed genes
Hum. Mol. Genet., April 15, 2006; 15(8): 1313 - 1318.
[Abstract] [Full Text] [PDF]


Home page
Hum Mol GenetHome page
J. S. Mattick and I. V. Makunin
Non-coding RNA.
Hum. Mol. Genet., April 15, 2006; 15(suppl_1): R17 - R29.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
C. Simons, M. Pheasant, I. V. Makunin, and J. S. Mattick
Transposon-free regions in mammalian genomes
Genome Res., February 1, 2006; 16(2): 164 - 172.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
14/17/2533    most recent
ddi257v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (28)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Sironi, M.
Right arrow Articles by Pozzoli, U.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Sironi, M.
Right arrow Articles by Pozzoli, U.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?