Human Molecular Genetics Advance Access originally published online on July 21, 2005
Human Molecular Genetics 2005 14(17):2533-2546; doi:10.1093/hmg/ddi257
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Analysis of intronic conserved elements indicates that functional complexity might represent a major source of negative selection on non-coding sequences
1Scientific Institute IRCCS E. Medea, Via Don Luigi Monza 20, 23842 Bosisio Parini (LC), Italy and 2Centro Dino Ferrari, Dipartimento di Scienze Neurologiche, Università di Milano, IRCCS Fondazione Ospedale Maggiore Policlinico, Mangiagalli e Regina Elena, Via Francesco Sforza 35, 20102 Milan, Italy
* To whom correspondence should be addressed. Tel: +39 31877915; Fax: +39 31877499; Email: upozzoli{at}bp.lnf.it
Received May 20, 2005; Accepted July 14, 2005
| ABSTRACT |
|---|
|
|
|---|
The non-coding portion of human genome is punctuated by a large number of multispecies conserved sequence (MCS) elements with largely unknown function. We demonstrate that MCSs are unevenly distributed in human introns with the majority of relatively short introns (<9 kb long) displaying no or a few MCSs and that MCS density reaching up to 10% of total size in longer introns. After correction for intron length, MCSs were found to be enriched within genes involved in development and transcription, whereas depleted in immune response loci. Moreover, many central nervous system tissues show a preferential expression of MCS-rich genes and MCS enrichment significantly correlates with gene functional complexity in terms of distinct protein domains. Analysis of humanmouse orthologous pairs indicated a significant association between intronic MCS density and conservation of protein sequence, promoter regions and untranslated sequences. Moreover, MCS density correlates with the predicted occurrence of humanmouse conserved alternative splicing events. These observations suggest that evolution acts on human genes as integrated units of coding and regulatory capacity and that functional complexity might represent a major source of negative selection on non-coding sequences. To substantiate our result, we also searched previously experimentally identified intronic regulatory elements and indicate that about half of these sequences map to an MCS; in particular, support to the notion whereby mutations in MCSs can result in human genetic diseases is provided, because three previously identified intronic pathological variations were found to occur within MCSs, and human disease and cancer genes were found significantly enriched in MCSs.
| INTRODUCTION |
|---|
|
|
|---|
The non-coding portion of human genome amounts to
99% (1
Despite the extensive efforts, the functional significance of these sequence elements is largely unknown. Recent studies indicated that at least a portion of MCSs might act as transcriptional regulators of nearby genes (5
10
), with some of them also being able to elicit gene reporter expression (6
,7
,10
). In addition, distinct gene function categories have been associated (9
) with gene deserts displaying high MCS densities, suggesting a role for non-random MCS distribution. In other instances, MCSs have been shown to function in alternative splicing regulation (11
). Conversely, other authors (12
) have suggested that only a few of MCSs might act as cis-acting gene regulators, because analysis of chromosome 21 indicated that MCS divergence and substitution pattern are independent of intergenic size or distance from nearby genes. MCSs located within intronic regions have attracted comparatively less attention when compared with intergenic MCSs. Still, introns cover about one-quarter of the human genome (1
) and the concept of these sequences as mere junk has been confronted with increasing evidences suggesting their functional role in gene regulation and genome architecture.
The recent availability of multiple genomic sequences and the development of comparative algorithms allow a genome-wide identification of MCSs; at the same time, a great wealth of gene annotation databases can be exploited for mining significant associations.
By integrating these resources, we demonstrate that MCSs are unevenly distributed in human introns, depending on intron size, gene function, expression pattern and presence of alternative splicing events. Moreover, we provide analysis of previously identified intronic regulatory elements and indicate that about half of these sequences map to an MCS; in particular, support to the notion whereby mutations in MCS can result in human genetic diseases is provided.
| RESULTS |
|---|
|
|
|---|
MCS identification and distribution analysis
An intron database was created as described in Materials and Methods: a total of 81 549 human introns were analyzed (55 553 mouse introns) amounting to 456 genomic megabases. MCSs were identified using phastCons predictions (13
To analyze MCS distribution, introns were divided into length classes and MCS densities (conserved sequence length/intron length) were calculated. Data are reported in Figure 1 and indicate that although variability in MCS density is observed for all intron length classes, a gradual shift toward higher MCS densities is observed with intron size increase. Extremely similar results were obtained when the same calculations were performed for mouse introns (Supplementary Material, Fig. S1).
|
Mapping MCSs onto experimentally verified regulatory regions
To gain insight into the biological significance of retrieved MCSs, we applied a direct, although not comprehensive, approach. We investigated whether previously experimentally identified intronic regulatory sequences matched with MCS locations. We, therefore, inspected the literature in search of experimentally verified human intronic sequences of established physiological significance and verified whether they partially or totally overlapped any phastCons element. A summary of this search is reported in Table 1 and indicates that out of 19 and 25 intronic splicing or transcription regulatory elements, 8 and 12, respectively, mapped to a phastCons sequence. One element constituted a conserved sequence with a role in adenosine deaminase-dependent RNA editing and mapped to an MCS. In seven instances, at least one mutation has been described, which affects one regulatory element and causes (or predisposes to) a human disease; three such mutations occur within phastCons functional elements.
|
MCSs are differentially represented depending on gene ontology
We next wished to identify those genes that are extremely rich or depleted in MCSs; because, as reported earlier, MCS density increases with intron size, we used MCS density calculations in intron length classes to sort out genes with higher or lower than expected MCS densities. For each gene, the expected MCS density was calculated on the basis of its individual intron lengths and then compared to the observed MCS density; in particular, for each intron, the expected MCS density (MCSexp) is considered to be equal to the average density of the intron length class it belongs to. The normalized difference (MCSndev) between observed and expected densities for each gene is then defined as follows:
![]() |
To investigate whether any preferential association existed between functional categories and MCS representation, human genes were classified into two groups depending on MCS density: genes displaying three times more or less MCS than expected (MCSndev>0.5 or MCSndev<0.5) were classified as MCS rich (656 genes) or poor (2634 genes), respectively. We next used GeneMerge (60
) to retrieve significant associations; database annotations for the three categories designated by the Gene Ontology (GO) Consortium (molecular function, biological process and cellular component) were employed. Correction for multiple tests was applied to all statistical analyses. Significant associations with one or more ontology terms were identified for the three categories (Table 2; Supplementary Material, Table S1). As the whole hierarchy of ontology terms was used to identify significant associations, nested ontology categories were retrieved, which are often accounted for by the same gene sets (Supplementary Material, Table S1).
|
In the MCS-rich group, genes involved in development/morphogenesis are significantly over-represented (biological process category); the same holds true for genes having a role in transcription and transcription regulation (both biological process and molecular function categories), as well as in nucleic acid metabolism. Coherently, the cellular component categorization identified gene products that localize to the nucleus as over-represented in this gene set. In addition, genes coding for ephrin receptors were significantly represented, upon classification, in molecular function categories. With respect to MCS-poor genes, over-represented biological process GO terms relate to a broad category that can be roughly described as response to stimulus/defense response/immunity. In particular, the majority of identified molecular function categories are related to protease inhibitors. Moreover, genes coding for structural ribosome components and molecules involved in oxidoreductase activity were enriched in this gene set. Closer examination of single genes indicated that the majority of them encode mitochondrial ribosome constituents or electron transporters, a finding which is consistent with retrieved terms (ribosome and mitochondrion) in the cellular component categorization.
These same analyses gave similar results when mouse genes were analyzed (Supplementary Material, Table S2); the main difference being accounted for by the over-representation of genes involved in protein catabolism and proteolysis among MCS-poor genes.
MCS density correlates with structural complexity
We wished to verify whether higher gene product complexity in terms of different protein domains correlated with MCS density. We, therefore, analyzed the number of distinct non-redundant InterPro domains associated with each entry in our data set, which displayed at least one InterPro description. A significant trend was observed for gene products associated with a higher number of unique InterPro domains to display higher MCSndev (KruskallWallis P<105; Supplementary Material, Fig. S2A). This result is not biased by longer gene size or higher exon number in MCS-rich genes, because the latter are, on average, shorter and display a lower exon number compared with MCS-poor genes and with all other genes in the data set (Supplementary Material, Fig. S2B and C).
To verify whether any specific InterPro domain was enriched among genes displaying high or low MCSndev, we used GeneMerge to retrieve significant associations with InterPro domains. As summarized in Table 3, for both gene groups, significantly over-represented InterPro entries were identified (again Bonferroni correction was applied). In agreement with results from GO associations, genes displaying homeobox domains and helixturnhelix motifs (involved in development and transcription, respectively) are over represented among MCS-dense genes. The same holds true for genes coding for ephrin family members.
|
As far as MCS-poor genes are concerned, the association pattern is more complicated, although it parallels, to some extent, GO term association results; in addition to already described associations (those related to defense response and oxidoreductase activity), genes coding for tumor antigens such as MAGE and GAGE result in a significant association. Moreover, an association was found with two InterPro domains characteristic of transcription factors, namely, Zn-finger and KRAB (Kruppel-associated box). Closer examination revealed that these two associations are accounted for by largely overlapping gene sets (Supplementary Material, Table S1), which map to paralogous gene clusters on chromosomes 19, 12 and X. These findings are in line with previous reports indicating that KRABZNF proteins are accounted for by hundreds of family members organized in clusters and poorly conserved among vertebrates (61
MCSs are differentially represented depending on gene expression pattern
To verify whether any correlation existed between intronic MCS density and expression level or pattern, we compared mean expression level (averaged over all tissues analyzed) and expression breadth (number of tissues in which a given gene is expressed) between MCS-rich and -poor genes: no significant difference was found. The same results were obtained when mouse genes were analyzed.
We then investigated whether variations in MCS density existed, depending on gene expression pattern; for any tissue, we calculated the differences between median expression levels of MCS-rich versus MCS-poor genes. Results for human genes are represented in Figure 2A and indicate that in most tissues, within the nervous system, MCS-rich genes are expressed at significantly higher levels (rank-sum test P<0.01 or <0.05, black and gray bars, respectively) when compared with MCS-poor genes, hypothalamus being the only exception among brain regions. The same holds true for genes expressed in skin, prostate, heart, uterus and lung. Conversely, all tested bone marrow cells and most peripheral blood cells behave in the opposite manner, with MCS-rich genes displaying significantly lower expression levels.
|
We next wished to verify whether differences existed in MCS density among tissue-specific genes. To this aim, we considered those genes expressed in a given tissue and in less than one-quarter of the total number of tissues, as previously suggested (63
We, therefore, calculated the median MCSndev for genes expressed in each tissue and the median MCSndev for all tissue-specific genes analyzed. Differences are plotted as histograms in Figure 2B and statistical significance was assessed by applying the rank-sum test. Again, a significant preferential expression of MCS-rich genes was noticed in many nervous system tissues, whereas the opposite situation was observed in fetal liver as well as in most bone marrow and peripheral blood cells in addition to lymph node and tonsil, possibly in line with the participation of the latter tissues in immune functions.
These same analyses were performed for mouse genes (Supplementary Material, Fig. S3) and similar results were obtained. Yet, in the case of mouse, data concerning embryonal gene expression were available and indicated that although no significant deviation in MCSndev was observed for genes expressed in the fertilized egg, blastocyst and for very early developmental stages (6.58.5 d.p.c.), genes expressed at stages 9.5 and 10.5 d.p.c. displayed significantly higher MCSndev.
MCS density correlates with conservation of protein sequence, upstream gene regions and alternative splicing in humanmouse orthologous pairs
To evaluate whether MCS density correlated with conservation in coding regions, for any gene in our database, we searched for a murine ortholog that could be unequivocally identified (Materials and Methods): a total of 3582 humanmouse orthologous gene pairs were retrieved. MCSndev was then correlated with either the rate of non-synonymous substitution (dN) or the ratio of non-synonymous/synonymous substitutions (dN/dS); in both cases, a significant negative correlation was detected (r=0.2715, P<1060 and 0.20 and P<1034, respectively).
We next wished to verify whether any relationship existed between MCS density and gene upstream sequence conservation. To this aim, we exploited data deriving from a previously reported analysis of humanmouse orthologous gene conservation in 8 kb genomic regions upstream of coding start sites (64
). Out of 3055 previously studied genes, 1875 were also present in our study set. Comparison of MCSndev with the number of conserved sequence block in upstream gene sequences resulted in a significant correlation (r=0.33, P<106).
Finally, we wished to investigate whether regulation of alternative splicing events might have a role in MCS fixation. We, therefore, took advantage of a previously reported (65
) set of orthologous exons which are predicted to undergo humanmouse conserved alternative splicing events.
Initially, only genes that were also present in human data set were selected (660 out of 1580 previously reported). Comparison of median MCSndev (Fig. 3A) indicated that genes that display at least one conserved alternative splicing event (median MCSndev=0.0051) have significantly higher MCSndev when compared with all genes in our database (median MCSndev=0.247; rank-sum test, P<106); remarkably, MCSndev progressively increases (Fig. 3B) when genes that display one, two or more than two alternative events were considered (KruskallWallis P<106). To better analyze the possible relationship between MCSs and alternative splicing, we selected all introns flanking predicted alternatively spliced exons (whether or not their genes were present in our data set) and all other introns from the same gene. We next searched for MCSs and then compared MCS density in introns located in 5' or 3' of a predicted conserved alternative spliced exons and all other introns extracted from the same genes. Data are reported in Figure 3C and indicate that introns flanking a conserved alternative exon are significantly enriched in MCSs (median MCS densities=0.015, 0.0398 and 0.029 for all introns, introns located in 5', introns located in 3', respectively; rank-sum test P<103).
|
MCS density correlates with untranslated region conservation and length
We next retrieved, for each human entry in our database, information concerning 5' and 3' untranslated regions (UTRs). In particular, length and MCS density were calculated (Materials and Methods) for 5' and 3' UTRs; a significant correlation was observed between MCSndev and both UTR length (r=0.14 for both 5' and 3' UTRs; P<106) and MCS density (r=0.38 and 0.42 for 5' and 3' UTRs, respectively; P<106).
MCSs are over represented in disease and cancer-related genes
To evaluate whether human genes involved in pathological processes displayed any difference in MCS density, we derived disease and cancer genes from the OMIM morbidmap and the Tumor Gene Database, respectively, and matched those that were also represented in our database: 933 disease and 152 tumor genes were obtained. For both disease and cancer genes, median MCSndev was significantly higher (rank-sum test) when compared with the median of the whole gene set, (MCSndev disease=0.216, MCSndev cancer=0.109, MCSndev all=0.247, P<0.01 and <1010, respectively). To evaluate whether over-representation of genes involved in transcription or development might be responsible for higher MCS density in the cancer and disease gene sets, we purged genes associated with these GO terms. In particular, 92 (60.5%) and 710 (74.8%) cancer and disease genes, respectively, were not associated with either development (GO: 0007275) or transcription (GO: 0006350); whereas in the case of cancer genes, the median MCSndev (0.070) was significantly higher (rank-sum P<104) when compared with the whole set of genes, no difference was noted for disease genes when the purged sets were analyzed.
| DISCUSSION |
|---|
|
|
|---|
In recent years, increasing evidences have suggested that intervening sequences have been contributing to eukaryote evolution (reviewed in 66). Intron presence allows massive proteome expansion through alternative splicing events (67
In line with previous suggestions, intronic MCSs can be expected to represent diverse functional categories, namely, chromatin structural elements (76
), inter-chromosomal interactors (12
), transcriptional regulators (5
10
) or splicing modulatory elements (11
). In particular, this latter possibility has scarcely been considered, except for a previous report (11
) that only analyzed conserved sequences immediately flanking splice sites. Table 1 indicates that experimentally identified intronic splicing regulators map, in many instances, to MCSs, suggesting that the need to control splicing processes contributes to the fixation of intronic conserved sequence elements. This is in line with the observation (73
) that MCS distribution is not uniform across human intron sequences but shows an increase in regions flanking the splice sites. Still, alternative splicing events were reported to be poorly conserved between human and mouse (77
), an observation that casts doubts on the need to preserve MCSs to regulate largely divergent processes. Remarkably, it was recently demonstrated (65
) that a significantly higher human-rodent conservation of alternative splicing events is observed for genes involved in transcription regulation and development, as well as in central nervous system-specific genes. Indeed, our data indicate that genes displaying at least one predicted conserved alternative splicing event have significantly higher MCSndev than those where alternative splicing events conserved in human and mouse have not been reported; moreover, MCSndev progressively increases when genes that display two or more than two alternative events are considered. Consistently, introns flanking conserved alternatively spliced exons display, on average, significantly higher MCS densities than the average of other introns from the same genes.
In addition to splicing and transcription regulators, MCSs probably represent a heterogeneous class of functional elements; it is interesting to notice, in this respect, that the GO terms that are associated with MCS-rich genes closely reflect those that were associated with gene deserts displaying a high density of conserved sequence elements (9
), suggesting that intragenic and intergenic constraints might act in the same direction to preserve fine-tuned regulation of genes involved in pivotal processes such as development and transcription regulation. The same conclusion had also been put forward upon distribution analysis of sequence elements conserved between humans and fishes (10
). Nonetheless, it should be noticed that as evidenced from data concerning human and mouse embryonic/fetal gene expression, specific stages might exist during development when preferential expression of MCS-rich genes occurs. In fact, early mouse developmental stages (from fertilized egg to 8.5 d.p.c.) display no preferential expression of MCS-rich genes, which is instead observed for later stages. Mouse stages 9.510.5 correspond to the first 5 weeks of human gestation; high-throughput gene expression data are only available for later human developmental stages (beyond 15 weeks for liver and 20 weeks after conception for brain and lung) and show no evidence of increased expression of MCS-rich genes compared with the adult tissue counterparts. Further gene expression studies are required to allow speculation on the role of highly conserved genes in human developmental stages, as well as on the possibility to provide a molecular definition for a phylotypic stage (fitting the hourglass model) for vertebrate development as previously attempted (78
).
Functional analysis of MCS-poor genes indicated that the great majority of them are involved in defense response. An accelerated divergence of coding regions had previously been shown for this functional category (79
) and interpreted in terms of genetic conflict between host and pathogen. Although in the case of coding sequence divergence, positive selection and coevolution of proteinprotein interactions have been invoked, the poor conservation of non-coding elements is probably more easily explained by relaxation of purifying selection pressure.
More generally, our data indicate that conservation of coding and non-coding sequences is highly correlated, suggesting that although different selective pressures might act on either, in many instances the same selection source (or absence of selection, i.e. neutral evolution) might be effective on both. Moreover, the density of conserved sequences in mammalian introns correlates with UTR length and conservation, as well as with conservation in upstream gene regions. This observation suggests that for a given gene, the evolution rates of its non-coding portions are closely coupled and, in turn, they parallel protein sequence evolution. A similar observation has been drawn upon analysis of Caenorhabditis elegans and C. briggsae orthologous genes (80
). In analogy to worms, evolution might therefore act on vertebrate genes as integrated units of coding and regulatory capacity and purifying selection might play a relevant role in preserving vital functions throughout. However, although the overall trend indicates a positive relationship between coding and non-coding sequence conservation, gene expression level and breadth do not show any association with intronic MCS density in either human or mouse. Previous studies (81
83
) had indicated that highly and broadly expressed genes displayed significantly lower coding sequence divergence and this was interpreted in terms of negative selection being more effective, especially in species with small population sizes as humans, on housekeeping genes. Instead, our data suggest that intronic MCS enrichment might correlate with gene functional complexity in terms of distinct protein domains and conserved alternative splicing events, strongly supporting the role of MCS as cis-acting regulators of complex genes. In addition, MCS-rich genes are over represented among central nervous system-specific genes, suggesting that MCSs might operate in assuring complex and fine-tuned regulatory events. In analogy, recent reports have indicated that tissue biology also plays a role in protein evolution and brain-specific genes have been shown to display relatively lower protein divergence (82
,84
). It is therefore tempting to speculate that although expression level and breadth might render purifying selection more effective at the coding sequence level, functional complexity might represent a source of negative selection on non-coding sequences and, to some extent, on proteins.
Recent reports have also indicated that broadly expressed genes are poorly represented among human disease genes, probably reflecting high frequency of embryonic lethality for mutations in housekeeping genes. Our data indicate that disease and cancer genes are, on average, enriched of intronic MCSs. This observation is in agreement with previous findings indicating that disease proteins exhibit a wider phylogenetic extent and are generally more conserved when compared with all human proteins (85
). Nonetheless, our data indicate that the higher MCS density in disease loci is mainly accounted for by genes that are involved in transcription or development, which possibly also account for higher protein conservation. Conversely, when the cancer gene set was purged from genes involved in these same processes, a significant enrichment in MCSs was still observed, suggesting that whatever the process they are involved in cancer genes need tight regulation and therefore are probably subjected to strong purifying selection for the maintenance of cis- regulators.
As their discovery, the role of MCSs in intraspecific phenotypic variability, complex trait expression and human genetic disease has been debated. This issue has been addressed in a recent review (2
) and the authors indicated that mutations in only one MCS (in intron 5 of LMBR1 gene) have now been identified as responsible for a human genetic disease (preaxial polydactily). The data we report in Table 1 expand this narrow statistics and indicate that at least two other pathological gene mutations map to MCSs, therefore, substantiating the potential role of these sequences as mutation targets. The reasons why mutations in non-coding sequences are under represented as a cause of genetic disease do not need to be reported here. We consider that our data might provide further indications concerning the potential pathogenetic effects of MCS mutations. Moreover, given the notion whereby <1% of the sequence difference between individuals occurs in protein coding regions (1
), the impact of regulatory elements and possibly MCSs on phenotypic variability and complex trait predisposition deserves extensive study.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Human gene/intron database
For creation of the intron database, human genes that had been annotated in the NCBI Reference Sequence (RefSeq) collection were selected (reviewed or validated entries only); for mouse genes, Provisional entries were also included. Genomic sequences and intron/exon boundaries were derived from the UCSC genome annotation database (http://genome.ucsc.edu/cgi-bin/hgGateway hg17 assembly and mm5, May 2004 for human and mouse, respectively). Intronless genes were discarded, and for each gene, the transcript corresponding to the longest genomic sequence and containing the highest number of exons was selected. The data sets were constituted by 7614 human and 5550 mouse genes. For the identification of humanmouse orthologous pairs, the EnsMart database (http://www.ensembl.org/Multi/martview) was used and only entries representing unique best reciprocal hits were selected.
MCS retrieval and distribution analysis
MCS were obtained using phastCons predictions (13
,14
), which are based on a phylogenetic hidden Markov model and are available through the UCSC database (phastConsElements Table). Only purely intronic phastCons elements were selected (i.e. MCS partially overlapping with exons was discarded). To calculate MCS density as a function of intron length, intronic sequences were partitioned in 10 length classes; in particular, introns were ranked according to their size and subsequently clustered to analyze, for each size class, the same absolute nucleotide number. The following length intervals (in bp) were obtained: 62476, 24775062, 50639147, 914815 666, 15 66726 179, 26 18041 774, 41 77566 914, 66 915110 044, 110 045190 058, 190 0591 043 911.
MCS density for intron length class k (dMCSk) has been calculated as total class MCS length over total class intron length:
![]() |
![]() |
For each gene, we computed expected MCS densities (dMCSexp) as:
![]() |
Functional element retrieval
Intronic functional element retrieval was performed by inspecting the literature for evidences of elements that accomplished the following criteria: experimental evidence for their function, purely intronic location and direct evidence for function in humans (those sequences that were experimentally tested in mice and inferred to also work in human because of sequence conservation were not included).
Gene classification
Gene associations with GO terms and their descriptions were performed by cross-referencing the UCSC hg17 kgXref table with the GO database; InterPro information was retrieved from the UCSC protein database (interProXref table). InterPro associations were purged from redundancy using the entry2entry table from the InterPro database, which reports existing parent/child relationships between domain entries. Association and description files were then created and significant associations between gene groups and GO terms or InterPro domains were identified using GeneMerge (60
).
The Tumor Gene Database (http://condor.bcm.tmc.edu/ermb/tgdb/tgdb.html) was used to identify human genes involved in cancer processes. Disease genes were retrieved from OMIM (ftp://ftp.ncbi.nih.gov/repository/OMIM/morbidmap).
Expression, alternative splicing and protein divergence data
Data on expression levels in human and mouse tissues were derived from previous studies (86
,87
): they are publicly accessible through the UCSC database (tables: gnfHumanAtlas2median and gnfHumanAtlas2medianExps; gnfMouseAtlas2median and gnfMouseAtlas2medianExps) and they are based on high-density oligonucleotide arrays (GNF Gene Expression Atlas 2). We only considered probes corresponding to genes that had been included in our database; signals from duplicated probes on the same chip were averaged as well as replicates from the same tissue. A gene was considered to be expressed in a given tissue if its signal level was higher or equal to 200 arbitrary units (87
).
For analysis of conserved alternative splicing events, a previously reported (65
) list of predicted conserved alternatively spliced exons was used; in particular, to obtain genes that were also present in our initial database, Ensembl gene entries provided by the authors were cross-mapped to RefSeq entries and, if represented in our database, allocated an MCSndev value. For the analysis of single introns involved in alternative splicing events, all reported genes were used irrespective of their presence in our data set; all Ensembl transcripts corresponding to one described alternatively spliced gene were extracted from the UCSC database (ensGene and ensGtp tables) and the presence of an exon corresponding in sequence, length and position to the alternatively spliced one was checked. All gene intron sequences were then retrieved and trace was kept of introns located in either 5' or 3' of an alternatively spliced exon. To purge MCSs that might derive from alternative splicing events such as intron retention or inclusion of overlapping longer exons, all MCS that mapped to mRNA or EST entries in the UCSC database (tables all_mrna and all_est) were discarded from all introns analyzed.
Information concerning protein divergence (dN and dS) was obtained from the EnsMart database (http://www.ensembl.org/Multi/martview).
UTR and upstream sequence information retrieval
For each transcript entry in our database, data concerning transcript start and end as well as coding sequence (CDS) boundaries were retrieved from the UCSC annotation tables; the difference between these positions was used to obtain UTR length after removal of introns. For MCS density calculations, we sought to eliminate all those MCSs that might correspond to spliced coding exons in an alternative transcript. To this aim, all transcripts totally or partially overlapping with those constituting our database were extracted from the following annotation tables: ensGene, refGene and knownGene (the latter combines all known protein-coding genes on the basis of protein data from SWISS-PROT, TrEMBL and TrEMBL-NEW and their corresponding mRNAs from GenBank). MCS that mapped to at least one coding exon in one transcript was eliminated.
Data concerning upstream sequence conservation have been previously reported (64
) and refer to 8 kb genomic regions upstream of CDS starts sites.
| SUPPLEMENTARY MATERIAL |
|---|
|
|
|---|
Supplementary Material is available at HMG Online.
| ACKNOWLEDGEMENTS |
|---|
We thank Dr Roberto Giorda for useful discussion about the manuscript.
Conflict of Interest statement. None declared.
| REFERENCES |
|---|
|
|
|---|
-
Venter, J.C., Adams, M.D., Myers, E.W., Li, P.W., Mural, R.J., Sutton, G.G., Smith, H.O., Yandell, M., Evans, C.A., Holt, R.A. et al. (2001) The sequence of the human genome. Science, 291, 13041351.
[Abstract/Free Full Text] - Dermitzakis, E.T., Reymond, A. and Antonarakis, S.E. (2005) Conserved non-genic sequencesan unexpected feature of mammalian genomes. Nat. Rev. Genet., 6, 5157.
- Boffelli, D., Nobrega, M.A. and Rubin, E.M. (2004) Comparative genomics at the vertebrate extremes. Nat. Rev. Genet., 5, 456465.[CrossRef][Web of Science][Medline]
-
Bejerano, G., Pheasant, M., Makunin, I., Stephen, S., Kent, W.J., Mattick, J.S. and Haussler, D. (2004) Ultraconserved elements in the human genome. Science, 304, 13211325.
[Abstract/Free Full Text] -
Boffelli, D., McAuliffe, J., Ovcharenko, D., Lewis, K.D., Ovcharenko, I., Pachter, L. and Rubin, E.M. (2003) Phylogenetic shadowing of primate sequences to find functional regions of the human genome. Science, 299, 13911394.
[Abstract/Free Full Text] -
Nobrega, M.A., Ovcharenko, I., Afzal, V. and Rubin, E.M. (2003) Scanning human gene deserts for long-range enhancers. Science, 302, 413.
[Free Full Text] -
Frazer, K.A., Tao, H., Osoegawa, K., de Jong, P.J., Chen, X., Doherty, M.F. and Cox, D.R. (2004) Noncoding sequences conserved in a limited number of mammals in the SIM2 interval are frequently functional. Genome Res., 14, 367372.
[Abstract/Free Full Text] -
Loots, G.G. and Ovcharenko, I. (2004) rVISTA 2.0: evolutionary analysis of transcription factor binding sites. Nucleic Acids Res., 32, W217W221.
[Abstract/Free Full Text] -
Ovcharenko, I., Loots, G.G., Nobrega, M.A., Hardison, R.C., Miller, W. and Stubbs, L. (2005) Evolution and functional classification of vertebrate gene deserts. Genome Res., 15, 137145.
[Abstract/Free Full Text] - Woolfe, A., Goodson, M., Goode, D.K., Snell, P., McEwen, G.K., Vavouri, T., Smith, S.F., North, P., Callaway H., Kelly, K. et al. (2005) Highly conserved non-coding sequences are associated with vertebrate development. PLoS Biol., 3, 116130.[CrossRef]
-
Sorek, R. and Ast, G. (2003) Intronic sequences flanking alternatively spliced exons are conserved between human and mouse. Genome Res., 13, 16311637.
[Abstract/Free Full Text] -
Dermitzakis, E.T., Kirkness, E., Schwarz, S., Birney, E., Reymond, A. and Antonarakis, S.E. (2004) Comparison of human chromosome 21 conserved nongenic sequences (CNGs) with the mouse and dog genomes shows that their selective constraint is independent of their genic environment. Genome Res., 14, 852859.
[Abstract/Free Full Text] - Siepel, A. and Haussler, D. (2004) Combining phylogenetic and hidden Markov models in biosequence analysis. J. Comput. Biol., 11, 413428.[CrossRef][Web of Science][Medline]
-
Siepel, A. and Haussler, D. (2004) Phylogenetic estimation of context-dependent substitution rates by maximum likelihood. Mol. Biol. Evol., 21, 468488.
[Abstract/Free Full Text] -
Deguillien, M., Huang, S.C., Moriniére, M., Dreumont, N., Benz, E. Jr and Baklouti, F. (2001) Multiple cis elements regulate an alternative splicing event at 4.1R pre-mRNA during erythroid differentiation. Blood, 98, 38093816.
[Abstract/Free Full Text] -
Helledie, T., Grontved, L., Jensen, S.S., Kiilerich, P., Rietveld, L., Albrektsen, T., Boysen, M.S., Nohr, J., Larsen, L.K., Fleckner, J. et al. (2002) The gene encoding the Acyl-CoA-binding protein is activated by peroxisome proliferator-activated receptor gamma through an intronic response element functionally conserved between humans and rodents. J. Biol. Chem., 277, 2682126830.
[Abstract/Free Full Text] -
Surinya, K.H., Cox, T.C. and May, B.K. (1998) Identification and characterization of a conserved erythroid-specific enhancer located in intron 8 of the human 5-aminolevulinate synthase 2 gene. J. Biol. Chem., 273, 1679816809.
[Abstract/Free Full Text] -
Genetta, T., Morisaki, H., Morisaki, T. and Holmes, E. (2001) A novel bipartite intronic splicing enhancer promotes the inclusion of a mini-exon in the AMP deaminase 1 gene. J. Biol. Chem., 276, 2558925597.
[Abstract/Free Full Text] -
Ge, B., Li, O., Wilder, P., Rizzino, A. and McKeithan, T.W. (2003) NF-kappa B regulates BCL3 transcription in T lymphocytes through an intronic enhancer. J. Immunol., 171, 42104218.
[Abstract/Free Full Text] -
Jo, E.K., Kanegane, H., Nonoyama, S., Tsukada, S., Lee, J.H., Lim, K., Shong, M., Song, C.H., Kim, H.J., Park, J.K. et al. (2001) Characterization of mutations, including a novel regulatory defect in the first intron, in Bruton's tyrosine kinase gene from seven Korean X-linked agammaglobulinemia families. J. Immunol., 167, 40384045.
[Abstract/Free Full Text] -
Rohrer, J. and Conley, M.E. (1998) Transcriptional regulatory elements within the first intron of Bruton's tyrosine kinase. Blood, 91, 214221.
[Abstract/Free Full Text] - Lou, H., Yang, Y., Cote, G., Berget, S. and Gagel, R. (1995) An intron enhancer containing a 5 splice site sequence in the human calci-tonin/calcitonin gene-related peptide gene. Mol. Cell. Biol., 15, 71357142.[Abstract]
- Horikawa, Y., Oda, N., Cox, N.J., Li, X., Orho-Melander, M., Hara, M., Hinokio, Y., Lindner, T.H., Mashima, H., Schwarz, P.E. et al. (2000) Genetic variation in the gene encoding calpain-10 is associated with type 2 diabetes mellitus. Nat. Genet., 26, 163175.[CrossRef][Web of Science][Medline]
-
Scotet, E., Schroeder, S. and Lanzavecchia, A. (2001) Molecular regulation of CC-chemokine receptor 3 expression in human T helper 2 cells. Blood, 98, 25682570.
[Abstract/Free Full Text] -
Smith, A.N., Barth, M.L., McDowell, T.L., Moulin, D.S., Nuthall, H.N., Hollingsworth, M.A. and Harris, A. (1996) A regulatory element in intron 1 of the cystic fibrosis transmembrane conductance regulator gene. J. Biol. Chem., 271, 99479954.
[Abstract/Free Full Text] -
Zuccato, E., Buratti, E., Stuani, C., Baralle, F.E. and Pagani, F. (2004) An intronic polypyrimidine-rich element downstream of the donor site modulates cystic fibrosis transmembrane conductance regulator exon 9 alternative splicing. J. Biol. Chem., 279, 1698016988.
[Abstract/Free Full Text] -
Pagani, F., Buratti, E., Stuani, C., Romano, M., Zuccato, E., Niksic, M., Giglio, L. Faraguna, D. and Baralle, F.E. (2000) Splicing factors induce cystic fibrosis transmembrane regulator exon 9 skipping through a nonevolutionary conserved intronic element. J. Biol. Chem., 275, 2104121047.
[Abstract/Free Full Text] - Charlet, B.N., Savkur, R., Singh, G., Philips, A., Grice, E. and Cooper, T. (2002) Loss of the muscle-specific chloride channel in type I myotonic dystrophy lead to misregulated alternative splicing. Mol. Cell, 10, 4553.[CrossRef][Web of Science][Medline]
-
Ghayor, C., Herrouin, J.F., Chadjichristos, C., Ala-Kokko, L., Takigawa, M., Pujol, J.P. and Galera, P. (2000) Regulation of human COL2A1 gene expression in chondrocytes. Identification of C-Krox-responsive elements and modulation by phenotype alteration. J. Biol. Chem., 275, 2742127438.
[Abstract/Free Full Text] -
Makar, K.W., Ulgiati, D., Hagman, J. and Holers, V.M. (2001) A site in the complement receptor 2 (CR2/CD21) silencer is necessary for lineage specific transcriptional regulation. Int. Immunol., 13, 657664.
[Abstract/Free Full Text] -
Himes, S.R., Tagoh, H., Goonetilleke, N., Sasmono, T., Oceandy, D., Clark, R., Bonifer, C. and Hume, D.A. (2001) A highly conserved c-fms gene intronic element controls macrophage-specific and regulated expression. J. Leukoc. Biol., 70, 812820.
[Abstract/Free Full Text] -
Follows, G.A., Tagoh, H., Lefevre, P., Morgan, G.J. and Bonifer, C. (2003) Differential transcription factor occupancy but evolutionarily conserved chromatin features at the human and mouse M-CSF (CSF-1) receptor loci. Nucleic Acids Res., 31, 58051586.
[Abstract/Free Full Text] -
Yoon, H., Liyanarachchi, S., Wright, F.A., Davuluri, R., Lockman, J.C., de la Chapelle, A. and Pellegata, N.S. (2002) Gene expression profiling of isogenic cells with different TP53 gene dosage reveals numerous genes that are affected by TP53 dosage and identifies CSPG2 as a direct target of p53. Proc. Natl Acad. Sci. USA, 99, 1563215637.
[Abstract/Free Full Text] -
Klamut, H.J., Bosnoyan-Collins, L.O., Worton, R.G. and Ray, P.N. (1997) A muscle-specific enhancer within intron 1 of the human dystrophin gene is functionally dependent on single MEF-1/E box and MEF-2/AT-rich sequence motifs. Nucleic Acids Res., 25, 16181625.
[Abstract/Free Full Text] -
Jin, W., Huang, E.S.-C., Bi, W. and Cote, G. (1999) Redundant intronic repressors function to inhibit fibroblast growth factor receptor-1
-exon recognition in glioblastoma cells. J. Biol. Chem., 274, 2803528041.[Abstract/Free Full Text] -
Jin, W., Bi, W., Huang, E.S.-C. and Cote, G. (1999) Glioblastoma cell-specific expression of fibroblast growth factor receptor-1ß requires an intronic repressor of RNA splicing. Cancer Res., 59, 316319.
[Abstract/Free Full Text] - del Gatto, F. and Breathnach, R. (1995) Exon and intron sequences, respectively, repress and activate splicing of a fibroblast growth factor receptor 2 alternative exon. Mol. Cell. Biol., 15, 48254834.[Abstract]
- del Gatto, F., Plet, A., Gesnel, M.-C., Fort, C. and Breathnach, R. (1997) Multiple interdependent sequence elements control splicing of a fibroblast growth factor receptor 2 alternative exon. Mol. Cell. Biol., 17, 51065116.[Abstract]
-
Cogan, J.D., Prince, M.A., Lekhakula, S., Bundey, S., Futrakul, A., McCarthy, E.M. and Phillips, J.A. (1997) A novel mechanism of aberrant pre-mRNA splicing in humans. Hum. Mol. Genet., 6, 909912.
[Abstract/Free Full Text] - Yang, J.-H., Sklar, P., Axel, R. and Maniatis, T. (1995) Editing of glutamate receptor subunit B pre-mRNA in vitro by site-specific deamination of adenosine. Nature, 374, 7781.[CrossRef][Medline]
-
Guil, S., Gattoni, R., Carrascal, M., Abian, J., Stevenin, J. and Bach-Elias, M. (2003) Roles of hnRNP A1, SR proteins, and p68 helicase in c-H-ras alternative splicing regulation. Mol. Cell. Biol., 23, 29272941.
[Abstract/Free Full Text] - Draper, N., Walker, E.A., Bujalska, I.J., Tomlinson, J.W., Chalder, S.M., Arlt, W., Lavery, G.G., Bedendo, O., Ray, D.W., Laing I. et al. (2003) Mutations in the genes encoding 11beta-hydroxysteroid dehydrogenase type 1 and hexose-6-phosphate dehydrogenase interact to cause cortisone reductase deficiency. Nat. Genet., 34, 434439.[CrossRef][Web of Science][Medline]
- Savkur, R.S., Philips, A.V. and Cooper, T.A. (2001) Aberrant regulation of insulin receptor alternative splicing is associated with insulin resistance in myotonic dystrophy. Nat. Genet., 29, 4047.[CrossRef][Web of Science][Medline]
-
Lettice, L.A., Heaney, S.J., Purdie, L.A., Li, L., de Beer, P., Oostra, B.A., Goode, D., Elgar, G., Hill, R.E. and de Graaff, E. (2003) A long-range Shh enhancer regulates expression in the developing limb and fin and is associated with preaxial polydactyly. Hum. Mol. Genet. 12, 17251735.
[Abstract/Free Full Text] -
D'Souza, I. and Schellenberg, G.D. (2002) tau exon 10 expression involves a bipartite intron 10 regulatory sequence and weak 5' and 3' splice sites. J. Biol. Chem., 277, 2658726599.
[Abstract/Free Full Text] -
Takahashi, K., Nishiyama, C., Hasegawa, M., Akizawa, Y. and Ra, C. (2003) Regulation of the human high affinity IgE receptor beta-chain gene expression via an intronic element. J. Immunol., 171, 24782484.
[Abstract/Free Full Text] -
Beohar, N. and Kawamoto, S. (1998) Transcriptional regulation of the human nonmuscle myosin II heavy chain-A gene. Identification of three clustered cis-elements in intron-1 which modulate transcription in a cell type- and differentiation state-dependent manner. J. Biol. Chem., 273, 91689178.
[Abstract/Free Full Text] -
Chung, M.C. and Kawamoto, S. (2004) IRF-2 is involved in up-regulation of nonmuscle myosin heavy chain II-A gene expression during phorbol ester-induced promyelocytic HL-60 differentiation. J. Biol. Chem., 279, 5604256052.
[Abstract/Free Full Text] -
Kawamoto, S. (1996) Neuron-specific alternative splicing of nonmuscle myosin II heavy chain-B pre-mRNA requires a cis-acting intron sequence. J. Biol. Chem., 271, 1761317616.
[Abstract/Free Full Text] - Prokunina, L., Castillejo-Lopez, C., Oberg, F., Gunnarsson, I., Berg, L., Magnusson, V., Brookes, A.J., Tentler, D., Kristjansdottir, H., Grondal, G., Bolstad, A.I., Svenungsson, E. et al. (2002) A regulatory polymorphism in PDCD1 is associated with susceptibility to systemic lupus erythematosus in humans. Nat. Genet., 32, 666669.[CrossRef][Web of Science][Medline]
-
Schjerven, H., Brandtzaeg, P. and Johansen, F.E. (2003) Hepatocyte NF-1 and STAT6 cooperate with additional DNA-binding factors to activate transcription of the human polymeric Ig receptor gene in response to IL-4. J. Immunol., 170, 60486056.
[Abstract/Free Full Text] - Hobson, G.M., Huang, Z., Sperle, K., Stabley, D.L., Marks, H.G. and Cambi, F. (2002) A PLP splicing abnormality is associated with an unusual presentation of PMD. Ann. Neurol., 52, 477488.[CrossRef][Web of Science][Medline]
- Shamsher, M.K., Chuzhanova, N.A., Friedman, B., Scopes, D.A., Alhaq, A., Millar, D.S., Cooper, D.N. and Berg, L.P. (2000) Identification of an intronic regulatory element in the human protein C (PROC) gene. Hum. Genet., 107, 458465.[CrossRef][Web of Science][Medline]
-
Palii, S.S., Chen, H. and Kilberg, M.S. (2004) Transcriptional control of the human sodium-coupled neutral amino acid transporter system A gene by amino acid availability is mediated by an intronic element. J. Biol. Chem., 279, 34633471.
[Abstract/Free Full Text] -
Miyajima, H., Miyaso, H., Okumura, M., Kurisu, J. and Imaizumi, K. (2002) Identification of a cis-acting element for the regulation of SMN exon 7 splicing. J. Biol. Chem., 277, 2327123277.
[Abstract/Free Full Text] -
Wong, L.H., Sim, H., Chatterjee-Kishore, M., Hatzinisiriou, I., Devenish, R.J., Stark, G. and Ralph, S.J. (2002) Isolation and characterization of a human STAT1 gene regulatory element. Inducibility by interferon (IFN) types I and II and role of IFN regulatory factor-1. J. Biol. Chem., 277, 1940819417.
[Abstract/Free Full Text] - Lietz, M., Hohl, M. and Thiel, G. (2003) RE-1 silencing transcription factor (REST) regulates human synaptophysin gene transcription through an intronic sequence-specific DNA-binding site. Eur. J. Biochem., 270, 29.[Web of Science][Medline]
- Polakowska, R.R., Graf, B.A., Falciano, V. and LaCelle, P. (1999) Transcription regulatory elements of the first intron control human transglutaminase type I gene expression in epidermal keratinocytes. J. Cell Biochem., 73, 355369.[CrossRef][Web of Science][Medline]
-
Galvagni, F. and Oliviero, S. (2000) Utrophin transcription is activated by an intronic enhancer. J. Biol. Chem., 275, 31683172.
[Abstract/Free Full Text] -
Castillo-Davis, C.I. and Hartl, D.L. (2003) GeneMerge-post-genomic analysis, data mining, and hypothesis testing. Bioinformatics, 19, 891892.
[Abstract/Free Full Text] -
Shannon, M., Hamilton, A.T., Gordon, L., Branscomb, E. and Stubbs, L. (2003) Differential expansion of zinc-finger transcription factor loci in homologous human and mouse gene clusters. Genome Res., 13, 10971110.
[Abstract/Free Full Text] -
Aparicio, S., Chapman, J., Stupka, E., Putnam, N., Chia, J.M., Dehal, P., Christoffels, A., Rash, S., Hoon, S., Smit, A., Gelpke, M.D. et al. (2002) Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes. Science, 297, 13011310.
[Abstract/Free Full Text] -
Vinogradov, A.E. (2003) Isochores and tissue-specificity. Nucleic Acids Res., 31, 52125220.
[Abstract/Free Full Text] -
Iwama, H. and Gojobori, T. (2004) Highly conserved upstream sequences for transcription factor genes and implications for the regulatory network. Proc. Natl Acad. Sci. USA, 101, 1715617161.
[Abstract/Free Full Text] -
Yeo, G.W., van Nostrand, E., Holste, D., Poggio, T. and Burge, C.B. (2005) Identification and analysis of alternative splicing events conserved in human and mouse. Proc. Natl Acad. Sci. USA, 102, 28502855.
[Abstract/Free Full Text] - Mattick, J.S. (2001) Non-coding RNAs: the architects of eukaryotic complexity. EMBO Rep., 2, 986991.[CrossRef][Web of Science][Medline]
- Maniatis, T. and Tasic, B. (2002) Alternative pre-mRNA splicing and proteome expansion in metazoans. Nature, 418, 236243.[CrossRef][Medline]
-
Luo, M.J. and Reed, R. (1999) Splicing is required for rapid and efficient mRNA export in metazoans. Proc. Natl Acad. Sci. USA, 96, 1493714942.
[Abstract/Free Full Text] - Maquat, L.E. (1995) When cells stop making sense: effects of nonsense codons on RNA metabolism in vertebrate cells. RNA, 1, 453465.[Abstract]
- Le Hir, H., Izaurralde, E., Maquat, L.E. and Moore, M.J. (2000) The spliceosome deposits multiple proteins 2024 nucleotides upstream of mRNA exonexon junctions. EMBO J., 19, 68606869.[CrossRef][Web of Science][Medline]
- Castillo-Davis, C.I., Mekhedov, S.L., Hartl, D.L., Koonin, E.V. and Kondrashov, F.A. (2002) Selection for short introns in highly expressed genes. Nat. Genet., 31, 415418.[Web of Science][Medline]
- Vinogradov, A.E. (2004) Compactness of human housekeeping genes: selection for economy or genomic design?. Trends Genet., 20, 248253.[CrossRef][Web of Science][Medline]
- Sironi, M., Menozzi, G., Comi, G.P., Bresolin, N., Cagliani, R. and Pozzoli, U. (2005) Fixation of conserved sequences shapes human intron size and influences transposon insertion dynamics. Trends Genet., (2005) J.15 [Epub ahead of print], doi: 10.1016/j.tig.2005.06.009.
-
Cecconi, F., Crosio, C., Mariottini, P., Cesareni, G., Giorgi, M., Brenner, S. and Amaldi, F. (1996) A functional role for some Fugu introns larger than the typical short ones: the example of the gene coding for ribosomal protein S7 and snoRNA U17. Nucleic Acids Res., 24, 31673172.
[Abstract/Free Full Text] -
Pozzoli, U., Elgar, G., Cagliani, R., Riva, L., Comi, G.P., Bresolin, N., Bardoni, A. and Sironi, M. (2003) Comparative analysis of vertebrate dystrophin loci indicate intron gigantism as a common feature. Genome Res., 13, 764772.
[Abstract/Free Full Text] - Glazko, G.V., Koonin, E.V., Rogozin, I.B. and Shabalina, S.A. (2003) A significant fraction of conserved noncoding DNA in human and mouse consists of predicted matrix attachment regions. Trends Genet., 19, 119124.[CrossRef][Web of Science][Medline]
-
Nurtdinov, R.N., Artamonova, I.I., Mironov, A.A. and Gelfand, M.S. (2003) Low conservation of alternative splicing patterns in the human and mouse genomes. Hum. Mol. Genet., 12, 13131320.
[Abstract/Free Full Text] - Hazkani-Covo, E., Wool, D. and Graur, D. (2005) In search of the vertebrate phylotypic stage: a molecular examination of the developmental hourglass model and von Baer's third law. J. Exp. Zoolog. B. Mol. Dev. Evol., 304, 150158.[Medline]
-
Castillo-Davis, C.I., Kondrashov, F.A. and Hartl, D.L., Kulathinal, R.J. (2004) The functional genomic distribution of protein divergence in two animal phyla: coevolution, genomic conflict, and constraint. Genome Res., 14, 802811.
[Abstract/Free Full Text] -
Castillo-Davis, C.I., Hartl, D.L. and Achaz, G. (2004) cis-Regulatory and protein evolution in orthologous and duplicate genes. Genome Res., 14, 15301536.
[Abstract/Free Full Text] -
Zhang, L. and Li, W.H. (2004) Mammalian housekeeping genes evolve more slowly than tissue-specific genes. Mol. Biol. Evol., 21, 236239.
[Abstract/Free Full Text] -
Duret, L. and Mouchiroud, D. (2000) Determinants of substitution rates in mammalian genes: expression pattern affects selection intensity but not mutation rate. Mol. Biol. Evol., 17, 6874.
[Abstract/Free Full Text] -
Subramanian, S. and Kumar, S. (2004) Gene expression intensity shapes evolutionary rates of the proteins encoded by the vertebrate genome. Genetics, 168, 373381.
[Abstract/Free Full Text] -
Winter, E.E., Goodstadt, L. and Ponting, C.P. (2004) Elevated rates of protein secretion, evolution, and disease among tissue-specific genes. Genome Res., 14, 5461.
[Abstract/Free Full Text] -
Lopez-Bigas, N. and Ouzounis, C.A. (2004) Genome-wide identification of genes likely to be involved in human genetic disease. Nucleic Acids Res., 32, 31083114.
[Abstract/Free Full Text] -
Su, A.I., Wiltshire, T., Batalov, S., Lapp, H., Ching, K.A., Block, D., Zhang, J., Soden, R., Hayakawa, M., Kreiman, G. et al. (2004) A gene atlas of the mouse and human protein-encoding transcriptomes. Proc. Natl Acad. Sci. USA, 101, 60626067.
[Abstract/Free Full Text] -
Su, A.I., Cooke, M.P., Ching, K.A., Hakak, Y., Walker, J.R., Wiltshire, T., Orth, A.P., Vega, R.G., Sapinoso, L.M., Moqrich, A. et al. Large-scale analysis of the human and mouse transcriptomes. Proc. Natl Acad. Sci. USA, 99, 44654470.
This article has been cited by other articles:
![]() |
P. Navratilova and T. S. Becker Genomic regulatory blocks in vertebrates and implications in human disease Brief Funct Genomic Proteomic, July 1, 2009; 8(4): 333 - 342. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. P. Mello, E. F. Abrantes, C. H. Torres, A. Machado-Lima, R. d. S. Fonseca, D. M. Carraro, R. R. Brentani, L. F. L. Reis, and H. Brentani No-match ORESTES explored as tumor markers Nucleic Acids Res., May 1, 2009; 37(8): 2607 - 2617. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Tsirigos and I. Rigoutsos Human and mouse introns are linked to the same processes and functions through each genome's most frequent non-conserved motifs Nucleic Acids Res., June 1, 2008; 36(10): 3484 - 3493. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. S. Perez, T. R. Hoage, J. R. Pritchett, A. L. Ducharme-Smith, M. L. Halling, S. C. Ganapathiraju, P. S. Streng, and D. I. Smith Long, abundantly expressed non-coding transcripts are altered in cancer Hum. Mol. Genet., March 1, 2008; 17(5): 642 - 655. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. F. Mehler and J. S. Mattick Noncoding RNAs and RNA Editing in Brain Development, Functional Diversification, and Neurological Disease Physiol Rev, July 1, 2007; 87(3): 799 - 823. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. B. Voelker and J. A. Berglund A comprehensive computational characterization of conserved mammalian intronic sequences reveals conserved motifs associated with constitutive and alternative splicing Genome Res., July 1, 2007; 17(7): 1023 - 1033. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. S. Mattick A new paradigm for developmental biology J. Exp. Biol., May 1, 2007; 210(9): 1526 - 1547. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. E. Vinogradov 'Genome design' model and multicellular complexity: golden middle Nucleic Acids Res., November 6, 2006; 34(20): 5906 - 5914. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. P. Ponting and G. Lunter Signatures of adaptive evolution within human non-coding sequence Hum. Mol. Genet., October 15, 2006; 15(suppl_2): R170 - R175. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Sun, G. Skogerbo, and R. Chen Conserved distances between vertebrate highly conserved elements Hum. Mol. Genet., October 1, 2006; 15(19): 2911 - 2922. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. L. Halligan and P. D. Keightley Ubiquitous selective constraints in the Drosophila genome revealed by a genome-wide interspecies comparison Genome Res., July 1, 2006; 16(7): 875 - 884. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Liu, C. Zhang, and Y. Zhou Uneven size distribution of mammalian genes in the number of tissues expressed and in the number of co-expressed genes Hum. Mol. Genet., April 15, 2006; 15(8): 1313 - 1318. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. S. Mattick and I. V. Makunin Non-coding RNA. Hum. Mol. Genet., April 15, 2006; 15(suppl_1): R17 - R29. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Simons, M. Pheasant, I. V. Makunin, and J. S. Mattick Transposon-free regions in mammalian genomes Genome Res., February 1, 2006; 16(2): 164 - 172. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||



Expression). Statistical evaluation was performed using the rank-sum test (P<0.01 and <0.05, black and gray bars, respectively). (B) Differences between median MCSndev for each tissue and median MCSndev for all tissue-specific genes (








