Human Molecular Genetics Advance Access originally published online on December 8, 2004
Human Molecular Genetics 2005 14(3):421-427; doi:10.1093/hmg/ddi038
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Human Molecular Genetics, Vol. 14, No. 3 © Oxford University Press 2005; all rights reserved
Relationship between gene expression and GC-content in mammals: statistical significance and biological relevance
Laboratoire de Biométrie et Biologie Evolutive, UMR CNRS 5558 Université Claude Bernard Lyon 1, 16 rue Raphaël Dubois, 69622 Villeurbanne Cedex, France
* To whom correspondence should be addressed. Tel: +33 4724480000; Fax: +33 472431388; Email: semon{at}biomserv.univ-lyon1.fr
Received July 2, 2004; Accepted December 3, 2004
| ABSTRACT |
|---|
|
|
|---|
Mammalian chromosomes are characterized by large-scale variations of DNA base composition (the so-called isochores). In contradiction with previous studies, Lercher et al. (Hum. Mol. Genet., 12, 2411, 2003) recently reported a strong correlation between gene expression breadth and GC-content, suggesting that there might be a selective pressure favoring the concentration of housekeeping genes in GC-rich isochores. We reassessed this issue by examining in human and mouse the correlation between gene expression and GC-content, using different measures of gene expression (EST, SAGE and microarray) and different measures of GC-content. We show that correlations between GC-content and expression are very weak, and may vary according to the method used to measure expression. Such weak correlations have a very low predictive value. The strong correlations reported by Lercher et al. (2003) are because of the fact that they measured variables over neighboring genes windows. We show here that using gene windows artificially enhances the correlation. The assertion that the expression of a given gene depends on the GC-content of the region where it is located is therefore not supported by the data.
| INTRODUCTION |
|---|
|
|
|---|
The analysis of mammalian chromosome sequences revealed complex genomic landscapes: some regions of the genome are very gene-rich, whereas some other large regions are devoid of genes (1
The question of the functional significance of these peculiar chromosomal landscapes is, however, still highly debated: do they reflect an adaptation or are they simply a by-product of neutral evolutionary processes (8
11
)? In other words, it is not yet known whether this isochore organization has any significant impact on the phenotype.
To address this important issue, several recent studies in human and mouse have analyzed the relationship between the GC-content of isochores and the expression patterns of the genes they contain. Surprisingly, these studies gave conflicting results (Table 1). Several papers reported very weak correlations, either negative (12
14
) or positive (15
17
), between the GC-content and gene expression. In contrast, Lercher et al. (19
) found strong positive correlations, suggesting that there might be some selective advantage to concentrate housekeeping genes on transcriptionally competent, GC-rich, chromosomal domains.
|
The discrepancy among the studies conducted on the relation between GC-content and expression might be due to the methods used to measure expression (EST, SAGE or DNA microarray), the expression parameter considered (expression level or tissue breadth of expression), differences in the measure of GC-content (in introns, third codon positions or intergenic regions) or differences in the tissues and gene data sets analyzed. Moreover, these studies differ in the way correlations were computed: in Lercher et al. (19
To try to understand the discrepancy between the different studies, we compared on a same gene data set the correlations between GC-content and gene expression obtained with different experimental methods, different estimators of GC-content and different scales of measure (gene by gene or by genomic regions). These analyses were performed both in human and in mouse.
We show that in both species, whatever the method used to measure expression or base composition, the correlations between gene expression and GC-content are very weak. We also show that the analyses performed on sets of neighboring genes are not appropriate, as they lead to overestimation of the real relationship between gene expression and GC-content. Given the weakness of the correlations and the noisiness of present gene expression data, one should be extremely cautious when trying to interpret the biological significance of the relationship between gene expression and GC-content.
| RESULTS |
|---|
|
|
|---|
We analyzed 6242 human genes for which patterns of expression in 11 different tissues could be estimated using three independent experimental methods (EST, SAGE and DNA microarray). We considered different expression parameters: expression breadth (the number of tissues where expression is detected), mean expression level (the average level of expression for expressed genes in the 11 tissues) and peak expression (the maximum level of expression in the 11 tissues). We measured the GC-content in introns (GCi) and at the third position of codons (GC3).
To assess for possible biases in the sampling of genes or tissues for which we had expression data from the three methods (EST, SAGE and DNA microarray), we also measured correlations on sets of genes for which we had (i) SAGE data but no microarray data (6523 genes), (ii) EST data only (19 988 genes), and on sets of tissues for which we had (i) microarray data but no SAGE data (14 tissues), (ii) EST data only (18 tissues). These analyses did not reveal any significant difference with the common data set (data not shown). Hence, we will mainly present results obtained with the set of 6242 human genes and 11 tissues for which we had expression data from the three methods.
We also assessed the correlation between GC-content and gene expression in the mouse genome, using the three measures of expression (EST, SAGE and DNA microarray). As very few tissues were available for SAGE data, it was not possible to build a common gene data set for the three methods. We therefore studied three data sets corresponding to genes for which we had EST data (26 749 genes, 45 tissues), SAGE data (6906 genes, 11 tissues) and DNA microarray data (5297 genes, 45 tissues).
Correlations measured on individual genes
Table 2 gives the correlations computed on individual genes between GC-content and different measures of expression. All these correlations are in agreement with previous results (Table 1). For each method (EST, SAGE and microarray), the different parameters of expression (breadth, peak or mean) gave similar results: when correlations are significant, they always are in the same direction. Correlations are generally stronger with the breadth than with the peak or mean expression levels. In human, EST data indicate a weak negative correlation between expression breadth and GC3 (R2=0.3%), but no significant correlation with GCi; on the contrary, SAGE and microarray data revealed a weak positive correlation between expression breadth and GC-content (R2=1.64.1%), and correlations are stronger with GCi than with GC3. In mouse, the three measures of expression are positively correlated with GC-content, but again correlations are very weak (R2=0.91.3% for expression breadth versus GCi). Thus, with the exception of human ESTs, all the measures of expression indicate a weak positive correlation between expression breadth and gene GC-content.
|
How does one explain the contradictory results obtained in human with ESTs. Is it simply due to an artifact in EST data? It is clear that gene expression data are noisy. The measures of expression breadth obtained by the three methods are only weakly correlated (SAGE/microarray R2=25%, SAGE/EST R2=27%, EST/microarray R2=16% on the common data set of 6246 human genes in 11 tissues). It is not possible to determine which one of the three measures of expression is the most reliable. In principle, quantitative estimation of expression obtained with SAGE or DNA microarrays should be more reliable than those obtained with EST data. Indeed, the first goal of EST projects was to identify new genes (and not to measure expression), and hence EST data often derive from cDNA libraries that have been normalized, to decrease the number of cDNA clones deriving from abundant transcripts. EST data are therefore expected to underestimate the level of expression of highly expressed genes. Conversely, this process of normalization allows the detection of rare transcripts, and hence should improve the measure of tissue distribution breadth (i.e. the number of tissues where genes are expressed). Hence, although ESTs are clearly not appropriate to measure expression level, there is a priori no reason why this method should be less reliable than SAGE or microarray to measure the breadth of expression.
To assess the sensitivity of the three methods, we selected in RefSeq (18
) 1493 human genes, supported by experimental evidence (i.e. for which a manually curated mRNA was available) and that are complete in their 3' end (i.e. with a polyA tail and a canonic polyadenylation signal <50 bp of the 3' end). The proportion of RefSeq mRNAs that are not detected to be transcribed in any of the 11 studied tissues is higher for microarray than for SAGE and EST (30, 7 and 7%, respectively), which suggests that microarray is less sensitive than both the other methods.
To assess the consistency of the different methods, we compared in human and mouse orthologous genes, the measures of expression breadth obtained by EST and microarray (NB: this analysis could not be performed for SAGE because there are presently too few tissues for which data are available in both human and mouse). EST-based estimates of expression breadth are highly correlated between orthologs (R2=50% on a data set of 10 950 orthologous genes and 17 tissues in common between human and mouse). Surprisingly, microarray estimates are less correlated (R2=11% on 2485 orthologous genes and 18 tissues). The restriction of the data sets to the 2485 genes and the 11 tissues in common between microarray and EST gives similar results. This suggests that for the measure of expression breadth, microarray data might be more noisy than ESTs. It is therefore not clear whether the negative correlation between GC3 and expression breadth that we observed with ESTs in human is due to an artifact of the EST approach or the fact that for some genes, expression breadth might be better estimated by ESTs than by other methods.
Whatever the answer to this question, it is important to stress that in reality the discrepancy between the measures (EST versus SAGE or microarray) is not strong, as all methods agree on the fact that correlations are very small (R2=0.024.1%). Thus, the only safe conclusion that can be drawn from these analyses is that the GC-content of genes is a very poor predictor of their expression breadth.
Correlation measured on sets of genes grouped according to their GC-content
To analyze the relationship between GC-content and expression, Lercher et al. (19
) classified genes according to their GC-content, into eight categories of 5% width. For each category, they computed the average GC-content and the average expression breadth (SAGE). With these averages, they observed a strikingly strong linear correlation between GC-content and expression breadth (R2=89%). As shown in Figure 1, microarray data give similar results: after having grouped genes into GC-content categories, one can observe a strong positive correlation (R2=85%) between the average GC-content and the average expression breadth.
|
The grouping of genes into GC-content categories is a useful way to visualize the trend of the relationship between GC-content and expression. However, we would like to stress that this approach cannot be used to quantify this relationship: when computed on individual genes, the correlation coefficient represents the percentage of the total variance that is explained by the variable. But after the grouping, the correlation coefficient represents the percentage of the inter-category variance that is explained by the variable.
To illustrate this effect, we performed a simulation: we considered two variables (X and Y) linearly correlated, with a correlation coefficient of 5% (i.e. X explains 5% of the variability of Y). We randomly generated a sample of 5000 points, according to this linear model. We then grouped the points into categories according to the value of X, computed the average of X and Y for each category and then computed the correlation between these averages. As can be seen in Table 3, the correlation coefficient increases steadily as the size of groups increases. Thus, the grouping of points is misleading because it suggests that there is a strong relationship, whereas in reality it is impossible to predict the value of Y knowing the value of X of a given point.
|
Correlation measured on groups of neighbor genes
Lercher et al. (19
How does one explain that the correlation between expression breadth and GC-content is much stronger when measured on sets of neighboring genes than on individual genes. Two hypothesis can be proposed. The first possible explanation comes from the fact that genes are not randomly distributed along mammalian genomes: it has been shown recently that tissue-specific and broadly expressed genes tend to cluster in different regions (17
,20
,21
). These regional variations of gene expression are correlated with GC-content and gene density (17
) (i.e. with isochores). However, these regional variations of gene expression are partly independent of the isochore structure: the clustering of housekeeping genes is significantly stronger in the human genome than in randomized genomes of identical isochore structure (21
). Thus, it is possible that by analyzing groups of neighbor genes, some effects due to regional variations of gene expression were better captured by Lercher et al. (19
). A second possible explanation comes from the fact that, in mammals, neighbor genes tend to have similar GC-contents (because of the isochore structure of mammalian genomes). Thus, measuring average GC-content and expression breadth in sets of neighboring genes may have the same consequence as the grouping of genes with similar GC-content: the grouping of genes with similar GC-content results in a decrease of the variance in expression, and hence to an increase in the existing correlation (as mentioned previously).
To distinguish between these hypotheses, we first assessed the correlations between GC-content and expression breadth after averaging both variables over neighboring genes (19
). As shown in Table 4, the correlations increase steadily with window size (i.e. the number of genes per window), up to R2=52% for SAGE data and R2=72% for microarray data for a window of 100 genes (which represents in average a genomic fragment of 50 Mb). We then re-assessed the correlations after having permutated genes in the genome, keeping the isochore structure unchanged. More precisely, we classified the genes according to their intronic GC-content into 20 categories of equal size, and permutated genes within each of these categories. Correlations between mean expression breadth per window and mean GC-content per window were then computed. As shown in Table 3, after permutations, we still observed strong correlation between GC-content and expression breadth measured over neighboring genes: up to R2=29% for SAGE data and R2=58% for microarray data for a window of 100 genes. These results indicate that the strong correlations reported by Lercher et al. (19
) between average regional expression breadth and GC-content, are mainly a consequence of the fact that neighbor genes have similar GC-content.
|
| DISCUSSION |
|---|
|
|
|---|
In agreement with previous reports (Table 1), we observed that both in mouse and in human, the different measures of gene expression generally show a positive correlation between the GC-content of genes and their breadth of expression. However, these correlations are very weak: in the entire human data set, the percentage of the variance of gene expression breadth explained by the correlation with GC3 or GCi (R2-values) are, respectively, 0.09 and 1.21% for SAGE data (12 205 genes, 18 tissues) and 1.72 and 3.33% for microarray data (6197 genes, 25 tissues). The relationship between GCi and expression breadth is hardly visible (Table 2). In mouse, the correlations are even weaker (Table 2).
In contradiction with these results, Lercher et al. (19
) reported strong correlations between expression breadth and GC-content, which led them to predict that when genes are inserted into a non-native chromosomal environment together with their promoter regions, their expression pattern should depend on local GC-content, and to conclude that there is probably a selective pressure favoring the concentration of housekeeping genes in GC-rich regions. The discrepancy with our results is because of the fact that Lercher et al. (19
) computed their correlations not on individual genes but on groups of genes. We would like to stress that this grouping of genes is strongly misleading because it suggests that there exists a strong relationship between the expression breadth of genes and their GC-content, whereas in reality the relationship is very weak. Indeed, the correct interpretation of the strong correlations obtained with groups of genes is that if the average GC-content of a large set of genes is known, then it is possible to predict the average expression breadth. However, in contradiction with the conclusion of Lercher et al. (19
), it is impossible to predict the expression of any particular individual gene in this set.
This work illustrates the problem of over-interpretation of statistical tests that is becoming recurrent in genomics. Thanks to the very large amount of data presently available, it is possible to detect extremely weak correlations that are significantly different from zero. However, what is the real usefulness of correlations that have such low predictive values? Correlation is not causality, and such weak correlations may reflect indirect relationships with some unknown variables. Moreover, as illustrated by the conflicting results obtained with human ESTs, they are very sensitive to possible methodological artifacts. In conclusion, although these correlations are statistically significant, it is difficult to assess their real biological significance.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Gene selection
We selected all human and mouse manually curated mRNAs from the RefSeq database (18
SAGE data
We performed the association between RefSeq mRNAs and SAGE data by determining the tags corresponding to each mRNAs. In total, 1% of the mRNA sequences lack the site NlaIII (190 mRNAs out of 19 025 for human mRNAs), and were removed from the data set. The tag (10 pb upstream of the most 3' NlaIII site) was extracted from the other sequences. In some cases, one tag may match to more than one Refseq mRNA. We looked at the genomic location of these mRNAs to determine whether they correspond to alternative transcripts of a same gene or to different genes. In the latter case, genes were removed from the data set. We finally retained 13 435 human and 8951 mouse Refseq mRNAs that are non-redundant and unambiguously located on the human genome.
SAGE experiment results, called libraries, were obtained on the SAGE Genie website [ftp://cgap.ncbi.nih.gov/SAGE/Download (24
)] for human data and on Gene Expression Omnibus site [http://www.ncbi.nlm.nih.gov/pub/geo/ (25
)] for mouse data. Each of them contains a list of tags that corresponds to a sample of the transcriptome in a given tissue at a given developmental time. We retained 141 libraries for the human data set (41 for mouse) containing more than 20 000 tags and not corresponding to tumoral tissues. The libraries were then grouped into 19 tissues types (11 for mouse). After adding all counts for libraries representing the same tissue type, we converted absolute tag counts to relative tag counts (c.p.m., count per million).
EST
We selected from GenBank (release 133, December 2002) 4 906 743 ESTs from human tissues and 3 660 463 ESTs from mouse tissues. cDNA libraries from cell culture, tumors, pooled organs or unidentified tissues were excluded. To limit stochastic variations in expression measures, we only retained cDNA libraries that had been sampled with at least 10 000 ESTs. We retained 44 non-tumoral tissues for human and mouse data sets. CDS were then compared with the EST data set by using MEGABLAST (26
). MEGABLAST alignments showing at least 95% identity over 100 nucleotides or more were counted as a sequence match. This criterion was chosen to be low enough to allow the detection of most ESTs despite sequencing error, but stringent enough to distinguish in most cases different members of highly conserved gene families. Normalization of the absolute tag count was done as for SAGE data.
Microarray
Oligonucleotide microarray data were extracted from the Gene Expression Atlas [http://expression.gnf.org (27
)] that contains 25 human non-tumoral tissues and 45 mouse non-tumoral tissues. The sample replicates corresponding to the same tissue were averaged. The signals of probes corresponding to the same gene were averaged. In total, 7735 different human mRNAs and 5297 mouse mRNAs are represented into the resulting data set. As recommended by the authors (27
), genes whose expression level exceeded 200 arbitrary units were noted as expressed.
Final data sets
For human data sets, 11 tissues are common to the three methods (blood, brain, heart, kidney, liver, lung, ovary, pancreas, placenta, prostate and uterus) and expression could be evaluated for 6246 RefSeq mRNAs. For each of these genes, we calculated expression breadth (number of tissues with positive expression), expression mean (average level of expression for expressed genes) and peak rate (maximum level of expression), using each of the three methods.
For mouse data sets, very few tissues common to the three methods were available, and we maintained one separate data set for each method. Expression could be evaluated for 26 749 mRNAs and 45 tissues with EST data, 6906 mRNAs and 11 tissues with SAGE data and 5297 mRNAs and 45 tissues with microarray data. The statistical analyses were done using R (28
).
| ACKNOWLEDGEMENTS |
|---|
We thank Vincent Navratil for providing us the orthologs data sets, Laurent Gueguen, Anne Beatrice Dufour and Eric Tannier for their help in statistics.
| REFERENCES |
|---|
|
|
|---|
- Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., Fitzhugh, W. et al. (2001) Initial sequencing and analysis of the human genome. Nature, 409, 860921.[CrossRef][Medline]
- Mouchiroud, D., D'Onofrio, G., Aissani, B., Macaya, G., Gautier, C. and Bernardi, G. (1991) The distribution of genes in the human genome. Gene, 100, 181187.[CrossRef][ISI][Medline]
- Duret, L., Mouchiroud, D. and Gautier, C. (1995) Statistical analysis of vertebrate sequences reveals that long genes are scarce in GC-rich isochores. J. Mol. Evol., 40, 308317.[CrossRef][ISI][Medline]
-
Watanabe, Y., Fujiyama, A., Ichiba, Y., Hattori, M., Yada, T., Sakaki, Y. and Ikemura, T. (2002) Chromosome-wide assessment of replication timing for human chromosomes 11q and 21q: disease-related genes in timing-switch regions. Hum. Mol. Genet., 11, 1321.
[Abstract/Free Full Text] - Kong, A., Gudbjartsson, D.F., Sainz, J., Jonsdottir, G.M., Gudjonsson, S.A., Richardsson, B., Sigurdardottir, S., Barnard, J., Hallbeck, B., Masson, G. et al. (2002) A high-resolution recombination map of the human genome. Nat. Genet., 31, 241247.[CrossRef][ISI][Medline]
- Jabbari, K., Rayko, E. and Bernardi, G. (2003) The major shifts of human duplicated genes. Gene, 317, 203208.[CrossRef][ISI][Medline]
- Smit, A.F. (1999) Interspersed repeats and other mementos of transposable elements in mammalian genomes. Curr. Opin. Genet. Dev., 9, 657663.[CrossRef][ISI][Medline]
-
Bernardi, G., Olofsson, B., Filipski, J., Zerial, M., Salinas, J., Cuny, G., Meunier-Rotival, M. and Rodier, F. (1985) The mosaic genome of warm-blooded vertebrates. Science, 228, 953958.
[Abstract/Free Full Text] - Bernardi, G. (2000) Isochores and the evolutionary genomics of vertebrates. Gene, 241, 317.[CrossRef][ISI][Medline]
-
Galtier, N., Piganeau, G., Mouchiroud, D. and Duret, L. (2001) GC-content evolution in mammalian genomes: the biased gene conversion hypothesis. Genetics, 159, 907911.
[Free Full Text] - Eyre-Walker, A. and Hurst, L.D. (2001) The evolution of isochores. Nat. Rev. Genet., 2, 549555.[CrossRef][ISI][Medline]
-
Goncalves, I., Duret, L. and Mouchiroud, D. (2000) Nature and structure of human genes that generate retropseudogenes. Genome Res., 10, 672678.
[Abstract/Free Full Text] - Duret, L. (2002) Evolution of synonymous codon usage in metazoans. Curr. Opin. Genet. Dev., 12, 640649.[CrossRef][ISI][Medline]
-
Ponger, L., Duret, L. and Mouchiroud, D. (2001) Determinants of CpG islands: expression in early embryo and isochore structure. Genome. Res., 11, 18541860.
[Abstract/Free Full Text] -
Vinogradov, A.E. (2003) Isochores and tissue-specificity. Nucleic Acids Res., 31, 52125220.
[Abstract/Free Full Text] -
Urrutia, A.O. and Hurst, L.D. (2003) The signature of selection mediated by expression on human genes. Genome Res., 13, 22602264.
[Abstract/Free Full Text] -
Versteeg, R., van Schaik, B.D., van Batenburg, M.F., Roos, M., Monajemi, R., Caron, H., Bussemaker, H.J. and van Kampen, A.H. (2003) The human transcriptome map reveals extremes in gene density, intron length, GC content, and repeat pattern for domains of highly and weakly expressed genes. Genome Res., 13, 19982004.
[Abstract/Free Full Text] -
Pruitt, K.D., Tatusova, T. and Maglott, D.R. (2003) NCBI reference sequence project: update and current status. Nucleic Acids Res., 31, 3437.
[Abstract/Free Full Text] -
Lercher, M.J., Urrutia, A.O., Pavlicek, A. and Hurst, L.D. (2003) A unification of mosaic structures in the human genome. Hum. Mol. Genet., 12, 24112415.
[Abstract/Free Full Text] -
Caron, H., van Schaik, B., van der Mee, M., Baas, F., Riggins, G., van Sluis, P., Hermus, M.C., van Asperen, R., Boon, K., Voute, P.A. et al. (2001) The human transcriptome map: clustering of highly expressed genes in chromosomal domains. Science, 291, 12891292.
[Abstract/Free Full Text] - Lercher, M.J., Urrutia, A.O. and Hurst, L.D. (2002) Clustering of housekeeping genes provides a unified model of gene order in the human genome. Nat. Genet., 31, 180183.[CrossRef][ISI][Medline]
-
Birney, E., Andrews, T.D., Bevan, P., Caccamo, M., Chen, Y., Clarke, L., Coates, G., Cuff, J., Curwen, V., Cutts, T. et al. (2004) An overview of Ensembl. Genome Res., 14, 925928
[Abstract/Free Full Text] -
Duret, L., Mouchiroud, D. and Gouy, M. (1994) HOVERGEN: a database of homologous vertebrate genes. Nucleic Acids Res., 22, 23602365.
[Abstract/Free Full Text] -
Liang, P. (2002) SAGE Genie: a suite with panoramic view of gene expression. Proc. Natl Acad. Sci. USA, 99, 1154711548.
[Free Full Text] -
Edgar, R., Domrachev, M. and Lash, A.E. (2002) Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res., 30, 207210.
[Abstract/Free Full Text] - Zhang, Z., Schwartz, S., Wagner, L. and Miller, W. (2000) A greedy algorithm for aligning DNA sequences. J. Comput. Biol., 7, 203214.[CrossRef][ISI][Medline]
-
Su, A.I., Cooke, M.P., Ching, K.A., Hakak, Y., Walker, J.R., Wiltshire, T., Orth, A.P., Vega, R.G., Sapinoso, L.M., Moqrich, A. et al. (2002) Large-scale analysis of the human and mouse transcriptomes. Proc. Natl Acad. Sci. USA, 99, 44654470.
[Abstract/Free Full Text] -
Ihaka, R. (1996) R: a language for data analysis and graphics. J. Comp. Graph. Genet., 16, 418420.
This article has been cited by other articles:
![]() |
M. Huvet, S. Nicolay, M. Touchon, B. Audit, Y. d'Aubenton-Carafa, A. Arneodo, and C. Thermes Human gene organization driven by the coordination of replication and transcription Genome Res., September 1, 2007; 17(9): 1278 - 1285. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Eddy and N. Maizels Gene function correlates with potential for G4 DNA formation in the human genome Nucleic Acids Res., September 1, 2006; 34(14): 3887 - 3896. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


