Human Molecular Genetics Advance Access originally published online on November 13, 2007
Human Molecular Genetics 2008 17(4):577-586; doi:10.1093/hmg/ddm332
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Recombination rates of genes expressed in human tissues
1 SNP Research Center, RIKEN, 1-7-22 Suehiro, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan 2 Institute for Clinical Research, Osaka National Hospital, National Hospital Organization, 2-1-14 Hoenzaka, Chuo-ku, Osaka City, Osaka 540-0006, Japan 3 Human Genome Center, Institute of Medical Science, University of Tokyo, 4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639, Japan
* To whom correspondence should be addressed at: Laboratory for Medical Informatics, SNP Research Center, RIKEN, 1-7-22 Suehiro, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan. Tel: +81 455039556; Fax: +81 455039555; Email: tsunoda{at}src.riken.jp
Received June 22, 2007; Revised October 15, 2007; Accepted November 9, 2007
| ABSTRACT |
|---|
|
|
|---|
High-resolution recombination rates have recently been revealed in the human genome, and considerable variation in patterns of recombination rates has been found along the chromosomes. Although the associations between this variation and genomic sequence features, such as genic regions, provide information on haplotype diversity and natural selection in these regions, the associations are not well understood. Here, we performed microarray experiments to identify genes specifically expressed in human tissues and investigated tendencies of recombination rates within tissue-specific genes. We found that some types of tissue-specific genes (in the frontal lobe, fetal brain, testis, thymus and thyroid) tended to have extremely low recombination rates, whereas other types (in the brain cerebellum, brain whole, stomach, lung and bone marrow) tended to have relatively high recombination rates. Surprisingly, genes specifically expressed in the frontal lobe, which is a brain region involved in human cognitive abilities, had low recombination rates, whereas genes specifically expressed in the cerebellum, which is a brain region with primitive functions shared by all vertebrate species, had high rates. These findings suggest that natural selection forms the recombination rate tendencies according to the physiological functions exerted in the tissues. For example, the low recombination rates in frontal lobe-specific genes may indicate that a few haplotypes have been rapidly widespread across the population because higher cognitive abilities are advantageous. Frontal lobe-specific genes with extremely low recombination rates may be candidates for genes related to cognitive abilities that human species have recently obtained.
| INTRODUCTION |
|---|
|
|
|---|
Meiotic recombination is the intergenerational mixing of DNA between homologous chromosomes and, along with mutation, is a major biological mechanism that generates variability in the eukaryotic genome (1). Recombination rates are traditionally calculated from family-based genotyping data (2); however, this pedigree approach is limited because such data include few informative meioses and thus result in genetic maps that simply do not have the resolution to reveal how recombination rates vary at the level of single genes (3). For high resolution, practical approaches are indirect statistical methods to infer recombination rates from genotypes of individuals randomly sampled from a population (3–5), and these methods can take advantage of large-scale genotyping data that have recently appeared (6). Such methods are realized in, for example, LDhat (5). This tool fits a statistical model on the basis of the coalescent to patterns of linkage disequilibrium (LD), which is the association between alleles at different loci (6,7), and then estimates recombination rates within this model in a Bayesian framework (8). Differently from LD between two different loci (pairwise LD) (7), the high-resolution recombination rates facilitate comparisons between regions (7) and have a good property of robustness to SNP ascertainment bias (5). As expected from the manner in which recombination gradually erodes LD (7,9), estimated high-resolution recombination rates inversely correlate with the degrees of pairwise LD (6).
Recently, large-scale genotyping studies (6,8,10–12) have investigated genomic patterns of high-resolution recombination rates and LD in the human genome and have found considerable variation in these patterns along the chromosomes. Some studies have tried to see how this variation is associated with genomic sequence features such as genes. The association of genes with degrees of recombination rates or LD is important for several reasons. It provides information on the haplotype diversity in regions of functional importance (9). It also provides the historical process of recombination in the regions and can imply natural selection that has recently worked on particular chromosomal segments (7,13,14). The detection of natural selection through the polymorphic patterns is also useful in searches for disease-causing alleles, as the thrifty gene hypothesis indicates that alleles that adapted in ancient environments to more efficiently convert food into energy increase susceptibility to obesity and type 2 diabetes in a modern environment (15). In addition, haplotypes exposed to natural selection should be surely tagged by SNPs in genome-wide association studies to capture association signals with complex diseases and traits (16).
So far, some tendencies of the patterns in genic regions have been revealed on a genomic scale. For example, it was recently found that the high-resolution recombination rates are on average lower within a gene and increase with distance from the gene symmetrically in either direction for about 30 kb before decreasing again (8). Since pairwise LD is easier to calculate than high-resolution recombination rates, patterns of LD are more frequently examined than those of the recombination rates, though recombination patterns can be generally inferred from LD patterns because of their inverse correlation. It is found that regions of extended LD are significantly overpopulated with SNPs that are located in genic or coding regions (10). LD is stronger between exonic variants within a gene compared with intronic or intergenic SNPs (12). A high degree of sequence conservation is associated with strong LD in exonic regions (17). These results illustrate the hypothesis that natural selection (either positive or negative) can cause an inflation of LD because natural selection rapidly spreads advantageous haplotypes across a population or sweeps deleterious haplotypes from a population (7). Recently, a companion study (13) of the HapMap project also revealed that regions of not only high LD but also low LD have higher densities of genes and coding bases than the rest of the genome. Those authors then classified genes by functional terms in the biological process ontology, one of three gene ontology (GO) systems (18). On the basis of this classification, Smith et al. (13) also found that, although some types of genes (including genes involved in immune response and sensory perception) are typically located in regions of low LD, other genes (including those involved in DNA and RNA metabolism) are preferentially located in regions of high LD. They suggested that immune response genes in low-LD regions might represent genes for which great allelic diversity is advantageous to the species, whereas DNA and RNA metabolism genes in high-LD regions represent conserved biological processes in which recombination and mutation might result in deleterious variants to be removed by natural selection.
Here, we investigated patterns of high-resolution recombination rates within single genes expressed in human tissues. We performed microarray experiments in 25 human normal tissues and identified genes specifically expressed in a tissue in order to examine recombination rates within these tissue-specific genes. We also investigated recombination rate patterns, combining the tissue-specific classification with GO functional categories in all three ontology systems (biological process, molecular function and cellular component). The tissue-specific classification (i.e. classification by a tissue in which genes are expressed), the cellular component ontology (i.e. classification by a cellular part in which genes are expressed) and the other two kinds of ontologies are all important attributes of genes for describing gene functions. Moreover, these four kinds of attributes are complementary to each other for indicating gene functions, especially when some of these attributes are not available but others are. For example, even when informative GO terms are not assigned to a gene, if a tissue in which the gene is specifically expressed is identified among multiple tissues, then the tissue-specific expression can indicate important clues to the genes physiological and pathological functions (19,20). Thus, the combination of the four functional attributes enabled us to more intimately analyze gene recombination patterns.
We found characteristic tendencies of recombination rates in groups of genes expressed in human tissues and in groups of genes classified by the four attributes: the tissue specificity determined from our gene expression experiments using microarrays, and the three GO systems (biological process, molecular function and cellular component). Among the genes expressed in tissues, we found that some types of genes (those specifically expressed in the frontal lobe, fetal brain, testis, thymus and thyroid, and those universally expressed in many tissues) tended to have low recombination rates, whereas other types (those specifically in brain cerebellum, brain whole, stomach, lung and bone marrow) tended to have high rates. Recombination patterns among genes expressed in brain tissues tended to differ according to regional (frontal lobe and brain cerebellum) and temporal (fetal and other brain) differences in gene expression. Classification according to the four attributes revealed the existence of subgroups whose recombination rate patterns differed from those of the whole groups. It also revealed that low recombination rates were often associated with genes involved in metabolism/transcription, whereas high rates were associated with genes involved in information transmission mediated by channels and transporters. These results suggest the possibility that natural selection forms different patterns of observed recombination rates in various gene groups.
| RESULTS |
|---|
|
|
|---|
Recombination rates of genes classified by tissue specificity
To investigate recombination rates of tissue-specific genes in human tissues, we first employed DNA microarrays to measure expression levels of genes in 25 tissue samples and determined genes that were significantly highly expressed in each tissue using a threshold (0.001) of the P-values corrected by false discovery rate (FDR). We used genes that were significantly highly expressed in only one tissue as tissue-specific genes and also used genes significantly highly expressed in many (six or more) tissues. Next, we used high-resolution recombination rates derived using LDhat (5) in the HapMap project (6) to calculate the recombination rate within a gene. We calculated this rate by (

i)/l, where l was the total length of a genic region and
i was the recombination rate at a base position i within the genic region. Next, we used the HapMap recombination rate data to randomly permutate the recombination rates for chromosomal segments along the full lengths of the chromosomes and generated recombination rates whose positions were randomly permutated. Using these randomized data, we calculated the recombination rate within a genic region by the same procedure described earlier. We refer to this rate as a randomized recombination rate within a genic region. Then, for each group of tissue-specific genes, we performed the two-sample Kolmogorov–Smirnov (KS) test to see whether the recombination rates within those genic regions were statistically higher or lower than the randomized recombination rates within the same genic regions on the basis of an upper or lower one-sided test, respectively. The P-values were corrected by FDR. See Materials and Methods for details on these procedures. As a result, every tissue showed that recombination rates within the genic regions of the tissue-specific genes were significantly lower than the randomized recombination rates (Plower,random< 0.01, Table 1). Genes expressed in many tissues also showed significantly lower recombination rates than the randomized rates (Plower,random< 0.01, Table 1). Moreover, several tissues (skeletal muscle, salivary gland and bone marrow) showed that recombination rates of tissue-specific genes were significantly higher (Pupper,random< 0.01) than the randomized rates, although this tendency was weaker than the tendency toward lower recombination rates (Pupper,random values were generally greater than Plower,randomvalues). The tendency toward both lower and higher recombination rates in these tissues indicates mixed distributions of genes with lower and higher recombination rates in the tissues, as a previous study (13) demonstrated that both strong and weak LD regions have a high density of genic regions for all genes as well as several types of genes categorized by GO terms. To confirm the mixed distribution, we classified genes in a group into those with low, mid-low, middle, mid-high or high recombination rates by q-score (see Materials and Methods). We found that distributions of genes specifically expressed in skeletal muscle, salivary gland and bone marrow tissues were bi-directionally biased toward genes with both low and high recombination rates (both Q1 and Q5 > 20% for skeletal muscle and bone marrow, Table 1; both Q1 and Q5 were the highest percentages among all Qs for salivary gland, Supplementary Material, Table S1). All the tissues showed distributions biased toward genes with low recombination rates (Q1 > 20%), and some tissues (frontal lobe, thymus, testis, fetal brain and so on) showed distributions skewed only toward genes with low recombination rates (only Q1 > 20%).
|
Although the comparison with the randomized rates revealed a stronger tendency toward lower recombination rates in every tissue, some tissues tended to have extremely lower recombination rates among all tissues, whereas others had relatively high recombination rates. For example, frontal lobe (median: 0.16 cM/Mb, Table 1), thymus (0.22), testis (0.22), fetal brain (0.24), many tissues (0.28) and thyroid (0.29) tended to have extremely low recombination rates (and some tissues showed significantly lower recombination rates even when compared with recombination rates of all genes, Plower, all genes< 0.01, Supplementary Material, Table S1); however, brain cerebellum (median: 0.97 cM/Mb), stomach (0.86), lung (0.77), bone marrow (0.69) and brain whole (0.67) tended to have relatively high recombination rates.
Interestingly, these findings show that genes expressed in brain tissues had quite different recombination rate tendencies according to both regional (e.g. frontal lobe and brain cerebellum) and temporal (e.g. fetal brain and brain whole or cerebellum) differences of gene expression in the brain. Genes specifically expressed in the frontal lobe or fetal brain tended toward extremely low recombination rates, whereas genes in brain whole and brain cerebellum tended to have higher rates.
Note that a large percentage of tissue-specific genes were not characterized by any of the three GO systems (Table 1). For example, >20% of genes specifically expressed in the frontal lobe, heart, lung, stomach and testis were uncharacterized. These genes may be related to distinctive functions exerted in these tissues, though without information on tissue specificity, we could not get any information on them.
Recombination rates of genes classified by a single GO category
We also obtained results on the recombination rates of genes classified by GO terms under each of the three GO systems—biological process, molecular function and cellular component (Supplementary Material, Table S1), though LD results by biological process have already been demonstrated (13). According to our results for the biological process ontology, representative groups with significantly lower recombination rates than the randomized rates (n> 100, Plower,random< 0.01, sorted by median, Supplementary Material, Table S1) included genes classified by response to DNA damage stimulus (median: 0.19 cM/Mb) and cell cycle (0.23). Representative groups related to significantly higher recombination rates (n> 100, Pupper, random< 0.01, sorted by median) included genes classified by cell–cell signaling (0.72) and immune response (0.71). These results are consistent with the previous LD results (13). We newly found in the molecular function GO system that genes with unfolded protein binding (0.16), RNA binding (0.16), transcription coactivator activity (0.21) and ligase activity, forming carbon–nitrogen bonds (0.23) tended to have significantly lower recombination rates, whereas genes with alpha-type channel activity (1.18), sugar binding (1.03) and electrochemical potential-driven transporter activity (0.92) tended to have higher rates. In the cellular component GO system, genes with chromosome (0.15), nucleoplasm (0.19) and mitochondrial membrane (0.20) tended to have significantly lower recombination rates; in contrast, genes with intrinsic to plasma membrane (0.85) and cell junction (0.90, but less significant: Pupper,random = 0.04) tended to have higher rates.
Recombination rates of genes classified by the four attributes
We next examined the recombination rates of genes classified by combinations of the four attributes: tissue specificity and the three GO systems. This combined categorization of genes is illustrated with an example of the gene SULT4A1, which was included in a group of genes that were specifically expressed in the frontal lobe and that were assigned to primary metabolism (but not assigned to GO terms related to nerve functions) under the biological process GO system. As inferred from this group, this gene encodes a brain-specific sulfotransferase that is believed to be involved in the metabolism of neurotransmitters (21). All the data are given in Supplementary Material, Table S2. The combined categorization provided detailed information on recombination rate tendencies that cannot be obtained from grouping by only a single attribute.
As an example of this kind of detailed information, we noticed that groups classified by one attribute sometimes included subgroups that greatly differed from the whole group in their recombination rate tendencies. We have shown that immune response genes in biological process tended to have high recombination rates (Supplementary Material, Table S1; median: 0.71 cM/Mb; Pupper,random< 0.01), and this was consistent with the previous result (13) of weak LD; however, these genes included a subgroup of immune response genes that were assigned to both metal ion binding in molecular function and nucleus in cellular component, and this subgroup had a relatively lower recombination rate tendency (0.43 cM/Mb; P< 0.05 in a lower-sided KS test to evaluate whether or not the distribution of recombination rates of the subgroup was significantly lower than that of the whole group). Another example is that primary metabolism genes in biological process had a low recombination rate tendency as a whole (Supplementary Material, Table S1; median: 0.31 cM/Mb; Plower,all genes< 0.01); however, the tendencies varied broadly according to the tissues in which the genes were expressed (Table 2). For example, genes expressed in the testis had a lower rate tendency (0.14 cM/Mb) than the whole group, but genes in the fetal liver had a higher rate tendency (0.79 cM/Mb). Subgroups with different recombination rate tendencies were also found in groups classified by tissue specificity. We have shown that the distribution of genes specifically expressed in skeletal muscle was bi-directionally biased toward both low and high recombination rates (median: 0.57 cM/Mb, Table 1), and here we found that (Table 3), for example, a subgroup of skeletal muscle-specific genes that are involved in primary metabolism and that localize in the nucleus showed lower recombination rates (0.31 cM/Mb) than the whole group, whereas those genes that were integral to membrane showed higher recombination rates (1.35 cM/Mb).
|
|
Among all the groups, we found groups tending toward rates that were lower or higher than the randomized rates (Plower,random< 0.01 and Pupper,random< 0.01, respectively, and sorted by median). For convenience, we describe tissue-specific groups separately from the others. Table 4 shows groups that had lower or higher rate tendencies and that were not tissue specific but were assigned to all three GO systems. For example, groups with the lower rate tendency included a group of genes that are involved in the cell cycle (in biological process), that have the function of DNA binding (in molecular function) and that localize in chromosomes (in cellular component). Also, those groups included a group of genes with the GO term combination of transport, cation transporter activity and mitochondrial inner membrane, and that of genes with primary metabolism, general RNA polymerase II transcription factor activity and nucleoplasm. On the other hand, groups with a higher recombination rate tendency included a group of genes with transport, alpha-type channel activity and intrinsic to plasma membrane. Note that these genes are involved in the same biological process (transport) as the earlier-mentioned genes with the lower tendency, but the genes with the lower tendency localize in mitochondrial inner membrane, where energy is produced, whereas the genes with the higher tendency have channel activity in plasma membrane, where cellular information is transmitted as in the nervous system or as in the signal transduction system. The groups with a higher rate tendency also included a group of genes with cell–cell signaling, cation transporter activity and integral to membrane, and that of genes with neurophysiological process, alpha-type channel activity and intrinsic to plasma membrane. These results suggest that groups with a lower recombination rate tendency often included genes for intra-cellular vital activities, including metabolism and transcription, whereas groups with a higher tendency often included genes involved in inter-cellular transmission of information such as that mediated by channels, transporters and receptors in plasma membrane.
|
Regarding groups assigned to both tissue specificity and the GO systems (Table 5), groups with a lower tendency included such genes that were expressed in many tissues or specifically expressed in the testis and thymus and that are involved in macromolecule/primary metabolism, have the function of DNA/ion binding and localize in nucleus.
|
Relationship between recombination rates and the number of expression tissues
We have shown that genes expressed in many (six or more) tissues tended to have extremely low recombination rates. To investigate this tendency more, we used the number of tissues in which genes were expressed and investigated the relationship between the number of expression tissues (one for tissue-specific genes and more than one for those expressed in multiple tissues) and the degree of recombination rates of genes. We found that as the number of expression tissues increased, the median of recombination rates in a group decreased (Table 6).
|
| DISCUSSION |
|---|
|
|
|---|
Recently, the HapMap project (6) and a companion study (8) extensively revealed for the first time a pattern of high-resolution recombination rates along the human genome and found that the rates are on average lower within a gene and increase with distance from the gene symmetrically before decreasing again (8). Another companion study (13) examined LD degrees, which are correlated inversely with recombination rates (6), and revealed that genic regions tend to be located both in regions with strong LD and those with weak LD. That study (13) further found patterns of LD degrees characteristic to groups of genes classified by one of the GO systems (biological process). In the present research, we performed microarray experiments for 25 human tissues to investigate recombination-rate patterns of genes that were classified by expression specificity to the tissues and integrated the tissue-specific classification with the three GO categories (biological process, molecular function and cellular component) as refinable attributes of genes to examine recombination rate patterns.
This investigation led to new discoveries about recombination rate tendencies in respective gene groups. Regarding groups of genes expressed in tissues, genes specifically expressed in the frontal lobe, thymus, testis, fetal brain, many tissues and thyroid tended to have extremely low recombination rates, whereas genes in the brain cerebellum, stomach, lung, bone marrow and brain whole had higher rates. In addition, rates within genes tended to decrease according to the number of tissues in which the genes were expressed. The integrated classification revealed that GO groups with low rate tendencies often involved genes necessary for intracellular vital activities, including metabolism and transcription, whereas groups with a high rate tendency involved genes that play a role in intercellular transmission of information. The classification also revealed the existence of subgroups whose rate tendencies differed greatly from groups classified by only one attribute. For example, even when genes belonged to the same GO group primary metabolism, the tendencies varied greatly depending on the tissues in which the genes were expressed.
One interesting question here is why these rate tendencies differed according to these gene groups. Particularly, of remarkable interest is why frontal lobe-specific and fetal brain-specific genes had low rate tendencies but brain cerebellum-specific and brain whole-specific genes had the opposite, even though they are all brain tissues. Although many factors (e.g. mutation, genetic drift, natural selection and demographic events) can influence the extent of LD (7,15) and can also influence observed recombination rates (via their inverse correlation, generally), the LD study (13) based on GO groups pointed out the effect of natural selection on the formation of different LD tendencies among the groups. Those authors suggested that the reason for the low LD observed in immune genes is that greater haplotype diversity reduces the chance that a single pathogen might sweep through the population (13). They also suggested that the reason for the high LD observed in DNA damage and metabolism genes is that such genes are involved in fundamental biological processes that evolve slowly, and that allelic diversity in the genes could disrupt finely tuned processes (13).
Thus, we think that our observations related to GO groups can be explained by the effect of natural selection. The low recombination rates (corresponding to high LD, generally) in genes for the metabolism of intracellular activities, which we observed here, may be accounted for by the intolerance of disrupting finely tuned processes. However, attention should be paid to another observation that genes for the transcription of intracellular activities had low recombination rates. This is because a comparison of protein-coding regions between human and chimpanzee at least shows a sign of positive selection in genes related to transcription (22), and because it is possible that high LD or observed low recombination rates are caused by positively selected haplotypes with advantageous alleles (7,9,23). Hence, the low rates of transcription genes may be explained by recent positive selection in these genes, though we do not deny the possibility of the intolerance of deleterious change. Regarding the high recombination rates in genes for the intercellular information transmission, the tendency may be explained by greater haplotype diversity for coping with a variety of environments and circumstances that a population encounters.
Similarly, we suggest that the formation of different recombination rate tendencies found in genes expressed in tissues is also involved in natural selection in some cases. Genes specifically expressed in some types of tissues (frontal lobe, fetal brain, testis, thymus and thyroid) may not allow disruption of finely tuned processes and thus may keep low haplotype diversity resulting in the observed low recombination rates. Another explanation could be that haplotypes in such genes may be recently widespread across the population through positive selection, which can also lead to low recombination rates. For genes in some types of tissues (brain cerebellum, brain whole, stomach, lung and bone marrow), it may be advantageous to be located in regions with high recombination rates or to maintain a high diversity of haplotypes. The reason for the tendency toward decreasing rates according to the number of tissues in which the genes were expressed may be that genes that are required to express in many tissues are selectively constrained to higher degrees (24), and the stronger selective force does not allow alleles that change finely tuned functions. The different recombination-rate tendencies among genes expressed in brain tissues according to regional (e.g. frontal lobe and brain cerebellum) and temporal (e.g. fetal brain and other brains) differences in gene expression may reflect that natural selection works differently according to the phenotypic traits specific to these brain tissues. We think this is important when considering different phenotypic traits between the frontal lobe and brain cerebellum: the frontal lobe plays a role in one of the most characteristic human traits, higher cognitive abilities (language, goal-directed behaviors, personality, self-awareness, inference of the mental states of others) (25), whereas the brain cerebellum has evolutionarily primitive functions shared by all vertebrate species.
In terms of natural selection in genes expressed in human tissues, a recent study compared coding sequences of the genes between chimpanzee and human and found signatures indicating that genes with maximal expression in the testis and thyroid (and to a lesser extent, the thymus) are as a whole exposed to positive selection (26). Interestingly, in our results based on human polymorphic data, testis-, thyroid- and thymus-specific genes all tended to have low recombination rates. There might be a relationship between the signs of positive selection and the low recombination rate tendencies in genes expressed in these tissues—that is, the low recombination rates may be caused by positive selection, as mentioned earlier. Recent positive selection in genes related to spermatogenesis is inferred from the HapMap data (16), which may ensure this relationship in testis-specific genes. Our results showed that, in addition to genes expressed in these tissues, frontal lobe- and fetal brain-specific genes tended to have low recombination rates, which also might indicate a relationship with recent positive selection in genes expressed in these tissues. Since the frontal lobe is a brain tissue that plays a role in higher cognitive abilities, frontal lobe-specific genes with extremely low recombination rates may be candidates for genes related to cognitive abilities that human species have recently obtained. Thus, our findings will be an important step for understanding the haplotypes and genes that are exposed to natural selection and therefore are important for human species.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Microarray
Human total RNA derived from 25 normal tissues (adrenal gland, bone marrow, cerebellum, colon, fetal brain, fetal liver, frontal lobe, heart, kidney, liver, lung, placenta, prostate, salivary gland, skeletal muscle, small intestine, spinal cord, spleen, stomach, testis, thymus, thyroid, trachea, uterus, whole brain) were purchased from BD Biosciences. From 2 µg of the total RNA, cDNA was synthesized and aRNA was amplified using the Amino Allyl MessageAmp aRNA Amplification Kit (Ambion). A reference sample was prepared by the mixture of aRNA with an equal amount from each tissue. For a query sample and the reference sample, 5 µg aRNA was labeled with Cy3 and Cy5, and the two samples were hybridized on the AceGene Human Oligo 30K Chip with 30 336 probes (Hitachi Software Engineering and DNA Chip Research) according to the manufacturers protocol (http://www.dna-chip.co.jp/thesis/AceGeneProtocol.pdf). Signal intensities were measured by a GenePix 4100A scanner with GenePixPro 4.1 software (Axon Instruments). Normalization was performed with the LOWESS (locally weighted linear regression) subgrid normalization method (27,28), and low signal spots (S/N ratio < 2.0) were left out. Two independent replicates were obtained using dye-swap analysis (29) to throw away outliers on the basis of a two-standard deviation cut in the replicates (30). Signal intensities were averaged over the replicates (30). We have made the microarray data available from the NCBI Gene Expression Omnibus repository (www.ncbi.nih.gov/geo) with the accession no. GSE8124 [NCBI GEO] .
To check whether or not we successfully measured expression levels in the 25 tissues, we performed a hierarchical clustering analysis on GeneSpring 7.2 software (Agilent Technologies) using Pearsons correlation coefficient for genes whose signal intensities were obtained in at least half of all the tissues and showed >2-fold change of the log expression ratios in at least one tissue. We obtained almost the same tissue classification (Supplementary Material, Fig. S1) as in previous reports (19,31,32).
Tissue-specific genes
To identify differentially expressed genes, we employed a method of intensity-dependent Z-scores (30) with an additional microarray experiment. In the experiment, we measured expression ratios (reference/reference intensities) from the same reference sample labeled with Cy3 and Cy5 and calculated the means and standard deviations of the logarithm (base 2) of the expression ratios. Using these statistics, we calculated intensity-dependent Z-scores of logarithm expression ratios (query/reference intensities) for each tissue. According to these Z-scores, we calculated P-values and then executed a multiple testing correction of them by FDR (33). We set the threshold for the corrected P-values at 0.001 and selected genes that were significantly highly expressed in only one tissue from genes whose signal intensities were obtained in at least half the tissues. We used these highly expressed genes as tissue-specific genes, which are listed in Supplementary Material, Table S3. We did not use genes whose expression levels were significantly reduced in only one tissue as tissue-specific genes, since it was unclear whether such genes were involved in functions specific to that tissue or in functions commonly performed in other tissues in which such genes were expressed to some degree.
It should be noted that tissue-specific genes as defined here does not refer to genes that were completely unexpressed in any other tissues but to genes that were much more abundantly expressed in one tissue than in any others. In the microarray experiments, we compared signal intensities of genes in a tissue sample with those in the mixture sample because of the two-color fluorescence system and then determined at a certain threshold whether or not genes were expressed. Therefore, there might have been some degrees of expression of a tissue-specific gene in other tissues. For example, a gene specifically expressed in the frontal lobe but not in the whole brain may have had a low concentration of transcription products in the whole brain but a much higher concentration in the frontal lobe.
Recombination rates
We downloaded the datasets of high-resolution recombination rates (phase II) from the HapMap Project (34) website. These datasets contained the recombination rates assigned to chromosomal segments along the full lengths of the chromosomes. For a recombination rate within a gene, we calculated the average of recombination rates across a genic region: (
i)/l, where l was the total length of the region and
i was the recombination rate at a base position i within the region. We used the NCBI annotation file (Build 35) to extract positions of genic regions. To evaluate the degrees of the recombination rates, we randomly shuffled the recombination rates for all of the chromosomal segments in the phase II datasets and randomly assigned the recombination rates to the chromosomal segments. That is, we obtained recombination rates that were randomly remapped along the full lengths of the chromosomes. Using these randomized data, we calculated the average rate within a genic region using the same procedure described earlier. We referred to this rate as a randomized recombination rate within a genic region. The shuffling was performed 100 times (shuffling more than 100 times required unreasonable computation time). We stored the randomized recombination rates within a genic region for all the shufflings.
For each group of genes classified by a criterion such as tissue-specific expression, we compared the real recombination rates within those genic regions with the randomized recombination rates within the same genic regions. In this comparison, the randomized recombination rates were rates that had been averaged over the 100 shufflings. The reason for basing the comparison on the same genic regions was to match up the region lengths between the real and randomized recombination rates and to eliminate the possible bias caused by different region lengths. We did not compare the recombination rates for gene groups with less than 20 genes. To statistically compare the real and randomized recombination rates for each gene group, we performed the two-sample KS test between the real and randomized recombination rates. We used an upper and a lower one-sided test to see whether or not the real rates were significantly higher and lower, respectively. We then employed FDR (33) to execute a multiple testing correction of both P-values together for multiple gene groups. The corrected P-values of the former and latter tests are denoted by Pupper,random and Plower,random, respectively. For each gene group, we also compared the real recombination rates within those genic regions with the real recombination rates within the regions of all genes. We performed the two-sample KS test between these two sets of real recombination rates. We executed the same FDR correction, and the corrected P-values are denoted by Pupper,all genes and Plower,all genes.
To rank the recombination rate of an individual gene considering the randomized set, we calculated N(rgene
rrandom all)/N(rrandom all), where rgene denotes the recombination rate within the region of a gene, rrandom all denotes the randomized recombination rates within the regions of all genes for each shuffling and N is the number of randomized recombination rates. Then we averaged this value over the 100 shufflings. We called this score the q-score, which indicates the standardized rank of the rate with reference to the randomized recombination rates of all genes. On the basis of the q-score, we classified genes in a group into Q1 (0.0
q< 0.2), Q2 (0.2
q< 0.4), Q3 (0.4
q< 0.6), Q4 (0.6
q< 0.8) and Q5 (0.8
q
1.0) bins, which indicate subgroups related to low, mid-low, middle, mid-high and high recombination rates. If the recombination rates of genes in a group are randomly distributed, the genes should be equally distributed in each Q bin, and each bin should have 20% of the genes in the group.
Gene ontology
We downloaded the data files of gene2go from NCBI and of gene_ontology.obo and gene_association.goa_human from the GO website (18) to examine GO annotations of genes. We used the terms with the third (shortest) depth from the root terms (biological process and molecular function) in biological process and molecular function as informative GO terms. Since terms in the cellular component seemed still unclear at this depth, we used terms with the fourth depth from the root (cellular component) in the cellular component. On the basis of GO, we also defined uncharacterized genes—genes assigned to GO terms with no more than one depth; that is, all genes except those assigned to GO terms with the second depth from the root in each of the three GO systems.
| SUPPLEMENTARY MATERIAL |
|---|
|
|
|---|
Supplementary Material is available at HMG Online.
| FUNDING |
|---|
|
|
|---|
This study was supported in part by Research on Toxicogenomics, Health, and Labour Sciences Research Grants from the Ministry of Health, Labour and Welfare, Japan. Funding to pay open access publication charges for this article was provided by RIKEN.
| ACKNOWLEDGEMENTS |
|---|
We thank Takashi Morizono, Hiroto Kawakami, Emi Okutsu and Takahisa Kawaguchi for some of the computer programs used in the analyses.
Conflict of Interest statement. None declared.
| REFERENCES |
|---|
|
|
|---|
- Kong A., Gudbjartsson D.F., Sainz J., Jonsdottir G.M., Gudjonsson S.A., Richardsson B., Sigurdardottir S., Barnard J., Hallbeck B., Masson G., et al. A high-resolution recombination map of the human genome. Nat. Genet. (2002) 31:241–247.[CrossRef][ISI][Medline]
- Kauppi L., Jeffreys A.J., Keeney S. Where the crossovers are: recombination distributions in mammals. Nat. Rev. Genet. (2004) 5:413–424.[CrossRef][ISI][Medline]
- Stumpf M.P., McVean G.A. Estimating recombination rates from population-genetic data. Nat. Rev. Genet. (2003) 4:959–968.[CrossRef][ISI][Medline]
- Crawford D.C., Bhangale T., Li N., Hellenthal G., Rieder M.J., Nickerson D.A., Stephens M. Evidence for substantial fine-scale variation in recombination rates across the human genome. Nat. Genet. (2004) 36:700–706.[CrossRef][ISI][Medline]
-
McVean G.A., Myers S.R., Hunt S., Deloukas P., Bentley D.R., Donnelly P. The fine-scale structure of recombination rate variation in the human genome. Science (2004) 304:581–584.
[Abstract/Free Full Text] - The International HapMap Consortium. A haplotype map of the human genome. Nature (2005) 437:1299–1320.[CrossRef][Medline]
- Ardlie K.G., Kruglyak L., Seielstad M. Patterns of linkage disequilibrium in the human genome. Nat. Rev. Genet. (2002) 3:299–309.[CrossRef][ISI][Medline]
-
Myers S., Bottolo L., Freeman C., McVean G., Donnelly P. A fine-scale map of recombination rates and hotspots across the human genome. Science (2005) 310:321–324.
[Abstract/Free Full Text] - Abecasis G.R., Ghosh D., Nichols T.E. Linkage disequilibrium: ancient history drives the new genetics. Hum. Hered. (2005) 59:118–124.[CrossRef][ISI][Medline]
-
Hinds D.A., Stuve L.L., Nilsen G.B., Halperin E., Eskin E., Ballinger D.G., Frazer K.A., Cox D.R. Whole-genome patterns of common DNA variation in three human populations. Science (2005) 307:1072–1079.
[Abstract/Free Full Text] - Dawson E., Abecasis G.R., Bumpstead S., Chen Y., Hunt S., Beare D.M., Pabial J., Dibling T., Tinsley E., Kirby S., et al. A first-generation linkage disequilibrium map of human chromosome 22. Nature (2002) 418:544–548.[CrossRef][Medline]
-
Tsunoda T., Lathrop G.M., Sekine A., Yamada R., Takahashi A., Ohnishi Y., Tanaka T., Nakamura Y. Variation of gene-based SNPs and linkage disequilibrium patterns in the human genome. Hum. Mol. Genet. (2004) 13:1623–1632.
[Abstract/Free Full Text] -
Smith A.V., Thomas D.J., Munro H.M., Abecasis G.R. Sequence features in regions of weak and strong linkage disequilibrium. Genome Res. (2005) 15:1519–1534.
[Abstract/Free Full Text] - Sabeti P.C., Reich D.E., Higgins J.M., Levine H.Z., Richter D.J., Schaffner S.F., Gabriel S.B., Platko J.V., Patterson N.J., McDonald G.J., et al. Detecting recent positive selection in the human genome from haplotype structure. Nature (2002) 419:832–837.[CrossRef][Medline]
- Tishkoff S.A., Verrelli B.C. Patterns of human genetic diversity: implications for human evolutionary history and disease. Annu. Rev. Genomics Hum. Genet. (2003) 4:293–340.[CrossRef][ISI][Medline]
- Voight B.F., Kudaravalli S., Wen X., Pritchard J.K. A map of recent positive selection in the human genome. PLoS Biol. (2006) 4:e72.446–e72.458.
- Kato M., Sekine A., Ohnishi Y., Johnson T.A., Tanaka T., Nakamura Y., Tsunoda T. Linkage disequilibrium of evolutionarily conserved regions in the human genome. BMC Genomics (2006) 7:326.1–326.8.
- Ashburner M., Ball C.A., Blake J.A., Botstein D., Butler H., Cherry J.M., Davis A.P., Dolinski K., Dwight S.S., Eppig J.T., et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. (2000) 25:25–29.[CrossRef][ISI][Medline]
-
Su A.I., Wiltshire T., Batalov S., Lapp H., Ching K.A., Block D., Zhang J., Soden R., Hayakawa M., Kreiman G., et al. A gene atlas of the mouse and human protein-encoding transcriptomes. Proc. Natl Acad. Sci. USA (2004) 101:6062–6067.
[Abstract/Free Full Text] -
Walker J.R., Su A.I., Self D.W., Hogenesch J.B., Lapp H., Maier R., Hoyer D., Bilbe G. Applications of a rat multiple tissue gene expression data set. Genome Res. (2004) 14:742–749.
[Abstract/Free Full Text] -
Maglott D., Ostell J., Pruitt K.D., Tatusova T. Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res. (2007) 35:D26–D31.
[Abstract/Free Full Text] - Bustamante C.D., Fledel-Alon A., Williamson S., Nielsen R., Hubisz M.T., Glanowski S., Tanenbaum D.M., White T.J., Sninsky J.J., Hernandez R.D., et al. Natural selection on protein-coding genes in the human genome. Nature (2005) 437:1153–1157.[CrossRef][Medline]
-
Sabeti P.C., Schaffner S.F., Fry B., Lohmueller J., Varilly P., Shamovsky O., Palma A., Mikkelsen T.S., Altshuler D., Lander E.S. Positive natural selection in the human lineage. Science (2006) 312:1614–1620.
[Abstract/Free Full Text] -
Zhang L., Li W.H. Mammalian housekeeping genes evolve more slowly than tissue-specific genes. Mol. Biol. Evol. (2004) 21:236–239.
[Abstract/Free Full Text] - Chayer C., Freedman M. Frontal lobe functions. Curr. Neurol. Neurosci. Rep. (2001) 1:547–552.[Medline]
- Nielsen R., Bustamante C., Clark A.G., Glanowski S., Sackton T.B., Hubisz M.J., Fledel-Alon A., Tanenbaum D.M., Civello D., White T.J., et al. A scan for positively selected genes in the genomes of humans and chimpanzees. PLoS Biol. (2005) 3:e170.976–e170.985.
- Cleveland W.S. Robust locally weighted regression and smoothing scatterplots. J. Am. Stat. Assoc. (1979) 74:829–836.[CrossRef][ISI]
- Workman C., Jensen L.J., Jarmer H., Berka R., Gautier L., Nielser H.B., Saxild H.H., Nielsen C., Brunak S., Knudsen S. A new non-linear normalization method for reducing variability in DNA microarray experiments. Genome Biol. (2002) 3:0048.1–0048.16.
- Kerr M.K., Churchill G.A. Statistical design and the analysis of gene expression microarray data. Genet. Res. (2001) 77:123–128.[CrossRef][ISI][Medline]
- Quackenbush J. Microarray data normalization and transformation. Nat. Genet. (2002) 32(suppl):496–501.[CrossRef][ISI][Medline]
- Saito-Hisaminato A., Katagiri T., Kakiuchi S., Nakamura T., Tsunoda T., Nakamura Y. Genome-wide profiling of gene expression in 29 normal human tissues with a cDNA microarray. DNA Res. (2002) 9:35–45.[Abstract]
- Shyamsundar R., Kim Y.H., Higgins J.P., Montgomery K., Jorden M., Sethuraman A., van de Rijn M., Botstein D., Brown P.O., Pollack J.R. A DNA microarray survey of gene expression in normal human tissues. Genome Biol. (2005) 6:R22.1–R22.9.[CrossRef]
- Benjamini Y., Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B (1995) 57:289–300.
-
The International HapMap Consortium. The International HapMap Project. Nature (2003) 426:789–796.[CrossRef][Medline]
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||