Skip Navigation


Human Molecular Genetics Advance Access originally published online on December 21, 2005
Human Molecular Genetics 2006 15(3):481-492; doi:10.1093/hmg/ddi462
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Material
Right arrow All Versions of this Article:
15/3/481    most recent
ddi462v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (13)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Li, H.
Right arrow Articles by Cui, Y.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Li, H.
Right arrow Articles by Cui, Y.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved.
For Permissions, please email: journals.permissions@oxfordjournals.org

Integrative genetic analysis of transcription modules: towards filling the gap between genetic loci and inherited traits

Hongqiang Li1,2,{dagger}, Hao Chen3,{dagger}, Lei Bao1,2, Kenneth F. Manly2,4,5, Elissa J. Chesler4, Lu Lu2,4, Jintao Wang2, Mi Zhou1,2, Robert W. Williams2,4,6 and Yan Cui1,2,*

1Department of Molecular Sciences, 2Center of Genomics and Bioinformatics, 3Department of Pharmacology, 4Department of Anatomy and Neurobiology, 5Department of Pathology and Laboratory Medicine and 6Department of Pediatrics, University of Tennessee Health Science Center, 858 Madison Avenue, Memphis, TN 38163, USA

* To whom correspondence should be addressed. Tel: +1 9014483240; Fax: +1 9014487360; Email: ycui2{at}utmem.edu

Received June 9, 2005; Revised September 1, 2005; Accepted November 2, 2005


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 REFERENCES
 
Genetic loci that regulate inherited traits are routinely identified using quantitative trait locus (QTL) mapping methods. However, the genotype–phenotype associations do not provide information on the gene expression program through which the genetic loci regulate the traits. Transcription modules are ‘self-consistent regulatory units’ and are closely related to the modular components of gene regulatory network [Ihmels, J., Friedlander, G., Bergmann, S., Sarig, O., Ziv, Y. and Barkai, N. (2002) Revealing modular organization in the yeast transcriptional network. Nat. Genet., 31, 370–377; Segal, E., Shapira, M., Regev, A., Pe'er, D., Botstein, D., Koller, D. and Friedman, N. (2003) Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nat. Genet., 34, 166–176]. We used genome-wide genotype and gene expression data of a genetic reference population that consists of mice of 32 recombinant inbred strains to identify the transcription modules and the genetic loci regulating them. Twenty-nine transcription modules defined by genetic variations were identified. Statistically significant associations between the transcription modules and 18 classical physiological and behavioral traits were found. Genome-wide interval mapping showed that major QTLs regulating the transcription modules are often co-localized with the QTLs regulating the associated classical traits. The association and the possible co-regulation of the classical trait and transcription module indicate that the transcription module may be involved in the gene pathways connecting the QTL and the classical trait. Our results show that a transcription module may associate with multiple seemingly unrelated classical traits and a classical trait may associate with different modules. Literature mining results provided strong independent evidences for the relations among genes of the transcription modules, genes in the regions of the QTLs regulating the transcription modules and the keywords representing the classical traits.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 REFERENCES
 
Transcriptome analysis in many organisms has revealed the modular architecture of gene expression programs (1Go–4Go). Transcription modules are ‘self-consistent regulatory units’ that consist of a set of genes co-regulated to respond to different conditions and a corresponding set of conditions that alter expression of all of the genes in the module (1Go,2Go). In this work, we extend this definition by using, instead of a set of environmental conditions, a set of recombinant inbred strains providing different combinations of genetic alleles. Transcription modules have been discovered using microarray data representing a variety of human tumors and a large number of conditions in yeast (1Go,2Go,5Go). The modular structure discovered in the genome-wide gene expression data may be due to the co-regulation of multiple genes by the same modulators that are activated or suppressed under different conditions. Transcription modules are conceptually distinct from clusters of co-expressed gene, because transcription modules represent self-consistent subsets of both genes and conditions such that all conditions of a module affect all genes in the module, and different modules may overlap, sharing some genes and/or conditions (1Go,6Go,7Go).

The recently developed genetical genomics (8Go) approach exploits genotype and microarray data of a segregating population such as F2 intercross or a genetic reference population to identify genetic loci that modulate gene expression (9Go–31Go). The gene expression levels are assayed on a genomic scale using expression microarrays and are treated as quantitative traits. Quantitative trait locus (QTL) mapping methods are then used to identify chromosomal intervals in which DNA variations are responsible for the variations in gene expression levels across the population. Several data sets containing tens of thousands of gene expression traits have been published (9Go,11Go,12Go,14Go–16Go). Many expression traits are highly correlated, reflecting complex patterns of the co-regulation.

In this work, we used the genotype and microarray data of a genetic reference population that consists of mice of 32 BXD recombinant inbred strains to study the gene expression traits. Because this is a reference population, a database consisting of several hundred physiological and behavioral traits studied in these strains was developed (32Go) and could be readily integrated with the expression data. The BXD strains were originally derived from two progenitor strains, C57BL/6J and DBA/2J. The recombinant progeny of this cross was inbred; therefore, mapping studies could be performed by simply analyzing the relationship of strain phenotypes with a database of genotypes. We identified 29 transcription modules; many of them are associated with the classical physiological and/or behavioral traits assayed using BXD mice. A signature was defined for each module and was mapped as quantitative trait to identify QTLs regulating the module. We found many significant QTLs regulating transcription modules are co-localized with QTLs regulating classical traits. Evidences extracted from the PubMed literature database using text-mining techniques confirmed many of the associations we discovered among the module genes, the classical traits and the genes in the QTL intervals. The genetic analysis of transcription modules provides insights into the gene expression programs involved in a wide range of physiological and behavioral traits, including individual differences in response to cocaine (33Go) and ethanol (34Go,35Go), acoustic startle response (36Go), high-pressure seizure susceptibility (37Go,38Go), immune responses (39Go,40Go), age-related hearing loss (41Go) and vulnerability to drug abuse (42Go).


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 REFERENCES
 
We identified 29 transcription modules defined by genetic variations in the genetic reference population using the iterative signature algorithm (ISA) (1Go). The transcription modules with prominent QTLs [log of odds (LOD) ≥3.0] and statistically significant associations with classical traits (false discovery rate ≤0.2) are listed in Table 1. Detailed information, including the genes and the strains of each module, the overlaps between the modules with respect to the genes, the associated classical traits and the genome-wide interval mapping results, can be found in Supplementary Material (Tables S1–S3). All of the 29 modules were mapped to at least one QTL with LOD≥2.1, and 21 modules have at least one QTL with LOD≥3.0. Altogether, we identified 33 QTLs with LOD≥3.0 (Supplementary Material, Table S2).


View this table:
[in this window]
[in a new window]
 
Table 1. Transcription modules, the QTLs modulating the modules and the associated classical traits
 
We found many statistically significant associations between the transcription modules and the classical traits. Eighteen classical traits are associated with 16 modules with Pearson correlation coefficients larger than 0.65 and q-values (estimates of false discovery rate) (43Go) less than 0.2 (Supplementary Material, Table S3). The associated classical traits are related to a broad range of disorders from cocaine and alcohol abuse (33Go–35Go) to hearing loss (36Go,41Go). The associations between the transcription modules and the classical traits may provide insights into the molecular basis of the classical physiological and behavioral traits. Here, we use two examples to illustrate our results (Figs 1 and 2).


Figure 4621
Figure 4621
Figure 4621
Figure 4621
Figure 4621
Figure 4621
Figure 4621
View larger version (197K):
[in this window]
[in a new window]
 
Figure 1. Genome-wide interval mapping result for: (A) classical trait—‘cocaine activity—exploratory behavior as measured by nosepokes through a holeboard, 30 mg/kg cocaine ip (difference from saline)—males’, (B) classical trait—‘cocaine activity—exploratory behavior as measured by nosepokes through a holeboard, 45 mg/kg cocaine ip (difference from saline)—females’, (C) average of the above two traits, (D) module 2 and (E) module 19. The solid blue line represents LRS (likelihood ratio statistic), the red line shows the additive regression coefficient (positive indicates that DBA/2J alleles increase trait value and negative indicates that C57BL/6J alleles increase trait values), the blue (green) horizontal line represents genome-wide 5% (63%) significant level based on 2000 permutation re-samples of the original data. The yellow bars show the frequency of the peak LRS at a given location among 2000 bootstrap re-samples. Literature mining result for (F) module 2 and (G) module 19. Relations among module genes, QTLs and classical traits associated with the module as retrieved by literature mining. Red boxes: genes of the module. Green boxes: genes located in the QTL intervals of the module. Cyan boxes: genes that belong to the module and are also located in the QTL intervals of the module. Gray boxes: keywords for the classical traits associated with the module. Gray circles: strong relations (e.g. interaction) were documented between the nodes. Gray diamonds: weak relations (e.g. co-occurrence in the same abstract or the same sentence, but no interaction was detected) were documented between the nodes. A clickable version of the graph is included in Supplementary Material. The documents on which a link is based can be displayed by clicking the link.

 

Figure 4622
Figure 4622
Figure 4622
Figure 4622
Figure 4622
Figure 4622
View larger version (169K):
[in this window]
[in a new window]
 
Figure 2. Genome-wide interval mapping result for: (A) transcription module 10, (B) classical trait—‘T cell receptor expression, V-gamma-7 positive, V-gamma-4 negative, percentage of total gamma-delta intestinal intraepithelial lymphocytes’, (C) classical trait—‘seizure susceptibility to high atmospheric pressure at 1000 atmospheres (measured at a.t.m. which seizure occurred)’, (D) classical trait—‘anergy measured by footpad thickness resulting from BCG injection (mm)’, (E) classical trait—‘seizure threshold pressure at compression rate of 1000 a.t.m. hr-1 (1000iPc) (a.t.m.)’. The solid blue line represents LRS (likelihood ratio statistic), the red line shows the additive regression coefficient (positive indicates that DBA/2J alleles increase trait value and negative indicates that C57BL/6J alleles increase trait values), the blue (green) horizontal line represents genome-wide 5% (63%) significant level based on 2000 permutation re-samples of the original data. The yellow bars show the frequency of the peak LRS at a given location among 2000 bootstrap re-samples. (F) Literature mining result for module 10. Relations among module genes, QTLs and classical traits associated with the module as retrieved by literature mining. Red boxes: genes of the module. Green boxes: genes located within the QTL intervals of the module. Gray boxes: keywords for the classical traits associated with the module. Gray circles: strong relations (e.g. interaction) were documented between the nodes. Gray diamonds: weak relations (e.g. co-occurrence in the same abstract or the same sentence, but no interaction was detected) were documented between the nodes. IGH, immunoglobulin heavy chain gene; TCR, T cell receptor; DTH, delayed type hypersensitivity. H2-D1: histocompatibility 2, D region locus 1. A clickable version of the graph is included in Supplementary Material. The documents on which a link is based can be displayed by clicking the link.

 
Physiological and behavioral traits are often complex, polygenic and are regulated by more than one QTL, indicating the involvement of multiple pathways. We found that module 2 (containing 13 genes) and module 19 (containing 14 genes), although not overlapping in terms of genes, are both associated with cocaine-inhibited exploratory behavior (as measured by nosepokes through a holeboard, different gender and dosage of cocaine are involved) (33Go) (Table 1). Genome-wide interval mapping shows four QTLs for this trait, two on chromosome 2 (at 123.1 and 150 Mb) and two on chromosome 7 (at 74.4 and 106.9 Mb) (Fig. 1A–C). Strikingly, module 2 has a QTL on chromosome 2 (at 155.1 Mb) and module 19 has a QTL on chromosome 7 (at 77.5 Mb), each co-localizes with one of the four QTLs for the cocaine trait (Fig. 1D and E). This suggests that the two modules may be involved in two different pathways underlying the response to cocaine, each regulated by a QTL on chromosomes 2 (at 155.1 Mb) and 7 (at 77.5 Mb), respectively. We searched the PubMed database for the known relations among the module genes, the associated trait and the genes located within the QTL regions of the modules. Automated literature mining was performed using Chilibot (44Go), and the results were then manually curated to eliminate false positives. The literature mining result for module 2 and module 19 is shown in Figure 1F and G, respectively. The literature mining result for module 2 revealed many documented relations among the module genes, the genes in the 1-LOD interval of the QTL regulating the module and the keywords representing the associated traits. It is well known that cocaine blocks the re-uptake of dopamine (45Go), a critical neurotransmitter for the reward system, into presynaptic neurons. An increased dopamine concentration due to the reduced re-uptake is responsible for cocaine's pleasurable effects and is one of the key mechanisms in the development of drug dependence. Parvalbumin (PVALB), a gene in module 2, is a high affinity calcium-binding protein and is expressed by a subset of neurons that also express dopamine receptors (46Go,47Go). Many of these neurons also express gamma-amino butyric acid (GABA) (47Go), an inhibitory neurotransmitter regulating the balance of excitatory and inhibitory interactions in brain regions involved in addiction. An increased number of parvalbumin neurons was detected in the brain of cocaine-sensitized animals (48Go). Thus, it has already been established that parvalbumin is involved in the action of cocaine. VIAAT is a transporter for GABA and is expressed by parvalbumin expression neurons (49Go). SRC is a protein kinase and is associated with the dopamine receptor (50Go). NFIA is a transcription factor required for the expression of a GABA receptor subunit (51Go). SKI is a transcription factor known to be important for neurons. In addition, the literature mining shows that E2F1, a transcription factor located in the QTL region of module 2, is known to regulate the expression of SRC (52Go,53Go) and is critical for the action of dopamine (54Go). Many other indications of relations are also included in Figure 1F. A clickable version of this graph is included in Supplementary Material. The documents on which a link is based can be displayed by clicking the link. Thus, the literature mining results provide support to the hypothesis that module 2 and its QTL in chromosome 2 may be involved in the gene expression program underlying the response to cocaine. Literature mining also shows that inhibition of FOLH1 (NAALADase), a gene of module 19, which is also located in the 1-LOD interval of the QTL (in chromosome 7) of module 19, attenuates the development of sensitization to the locomotor-activating effects of cocaine (55Go) (Fig. 1G). This result together with the statistically significant association between the module 19 and the cocaine activity traits suggests that module 19 is also involved in the response to cocaine.

A module may be associated with multiple traits, some of which are seemingly unrelated. Four traits are associated with module 10; two are related to seizure susceptibility to high atmospheric pressure and the other two are related to immune responses (Table 1). All of the four traits and module 10 are mapped to a QTL located in chromosome 17 at 31.5–41.9 Mb (Fig. 2), indicating that module 10 may be involved in the gene expression program underlying the four traits. Literature mining identified many known relations among the genes of module 10, the genes in the QTL interval of module 10 and the keywords of the four traits. One of the traits associated with module 10 is anergy, a state of immunological unresponsiveness to a specific antigen (56Go). One of the experimental indications of anergy in mice is the lack of response when they were exposed to sheep erythrocytes (a specific antigen) for a second time, whereas a normal immune system will respond with delayed-type hypersensitivity (DTH). CD28, the product of a gene of module 10, is well recognized for its role as a co-stimulatory signal for T cells. Typically, activation of T cell receptor in the absence of CD28 co-stimulation will result in anergy (57Go,58Go). Alternatively, activation of an inhibitory ligand on T cells, CTLA-4, can also maintain T cells in a non-reactive state (59Go). Interestingly, T cells can be released from this anergic state by activating STAT4 (60Go), another member of module 10. CD28 also has complex relations with other genes associated with module 10: CD28 is required to render STAT4 signal transduction and nuclear translocation in response to IL-12 stimulation (61Go); GATA1, a transcription factor, is regulated by activating CD28 (62Go) and TNFRSF21, a member of the tumor necrosis factor receptor family, located in the 1-LOD interval of the QTL of module 10, suppresses the expression of CD28 in activated T cells (63Go). Another phenotype associated with module 10 is susceptibility to seizure. Among the genes located in the 1-LOD interval of the QTL of module 10, platelet-activating factor acetylhydrolase (PLA2G7, phospholipase A2, group VII) encodes a secreted enzyme that catalyzes the degradation of platelet-activating factor. PLA2G7 activity was markedly reduced as early as 30 min, following initiation of seizures in rats (64Go). In humans, lissencephaly patients suffer from recurrent seizures. Coincidentally, ~60% of the lissencephaly patients have been found to have deletions or mutations of the LIS1 gene, which encodes a subunit of PLA2G7 (65Go). In addition, OPSIN 5 (OPN5, neuropsin) is a trypsin-like serine protease expressed in the hippocampus and associated limbic structures. OPN5 limits neuronal hyperexcitability. Seizure activity on kainic acid administration was markedly increased in mutant mice that lack OPN5 (66Go). The QTL mapping results for other modules and their associated classical traits can be found in the Supplementary Material. Many associations among the module genes, the QTL genes and the classical traits are confirmed by the literature mining.


    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 REFERENCES
 
Transcriptome analysis of genetic reference populations provides a great opportunity to gain insights into the gene expression programs underlying classical traits. In this work, we extended the definition of transcription module and showed that transcription modules defined by genetic variations can be identified using genome-wide gene expression data of a genetic reference population. This method is readily applied to other natural or experimental populations. As we discussed in Materials and Methods, the ISA showed better performance when the size of the genetic reference population increased. Therefore, it will be especially useful for large genetic reference populations, such as the collaborative mouse cross proposed by the complex trait consortium (67Go) and large RI panels of other organisms, and for segregating crosses such as F2 intercross and backcross which often contain a large number of individuals.

In many cases, the QTL may act in a time and space distal to the phenotype. For example, QTLs affecting early developmental gene transcription may have influenced the neuron number. Whereas the difference in neuron number is observable in adulthood, the early expression modules that were activated are not. Besides, the small size of the RI set and the incomplete coverage of the genome by the Affymetrix U74Av2 microarrays limit the power to detect transcription modules. Obviously, the 29 transcription modules identified are not all the transcription modules operating in mouse brain sometime during the whole developmental process. It is therefore not surprising that many classical traits do not associate with one of the identified modules. We expect that more transcription modules and associations can be found if larger RI sets are used, and the expression traits are measured at multiple time points during brain development.

QTL mapping is a powerful method for discovering genotype–phenotype relations. It also provides an entry point to address two challenging problems. One is to identify the causative variants in the QTL region and the other is to identify the ‘flow of causality’, i.e. the gene pathway from the QTL to the trait. Previous studies showed that transcription modules are closely related to the modular components of the gene circuits (1Go,2Go). In this work, the discovery of transcription modules from gene expression traits of a genetic reference population makes it possible to identify the genetic loci that may regulate both the transcription modules and the classical traits. The co-regulation, together with evidences from literature mining, indicates that some transcription modules may be involved in the pathways regulating the classical phenotypes. In many cases (e.g. module 2 and module 19), literature mining also found that the genes in the QTL regions regulating the transcription module have known biological roles in the regulation of the classical traits. The integrative genetic, genomic and computational approach is useful for identifying candidates for the genes involved in the pathways underlying the inherited traits. The result provides an anchor point for further studies of the structure and the properties of these gene pathways.

MATERIALS AND METHODS
Genotype, microarray and classical trait data of the genetic reference population
The mRNA abundances in forebrains of the mice of the 32 BXD strains were measured using Affymetrix MG-U74Av2 gene chips. The genotypes of the 32 BXD strains have been characterized using 779 markers. Microarray and genotype data used in this study are available from WebQTL (68Go) at the GeneNetwork website (http://www.genenetwork.org). In order to study the relations between transcription modules and classical physiological and behavioral traits, we selected a set of 578 published classical traits with quantitative values of at least 15 BXD strains from a database of published BXD traits (32Go).

Modular regulation of the gene expression traits
In light of the modular structures of gene expression programs, we investigated whether modular transcription units exist in the gene expression traits of the genetic reference population. It should be noted that samples from different individuals were obtained under the same experimental condition to minimize environmental variances. Here, it is the naturally occurring genetic variations, not the experimental conditions, that influence the gene expression. We defined the transcription module for expression traits as a set of genes co-regulated by allelic variations in upstream modifier loci or QTLs.

Experimental populations for genetic research are often generated from two inbred progenitors. In the case of recombinant inbred strains, each QTL has two possible genotypes, g1 and g2. However, experimental populations can also be constructed from more than two progenitors. For example, the 1000 eight-way RI mouse strains proposed by the complex trait consortium have eight common progenitors (67Go). The linear model for the QTL effects (69Go) on the transcript expression traits can be written as

Formula 11
where yij is the value of trait i (i.e. the expression level of gene i) in individual j, µi is the population mean of the trait i, m is the number of QTLs influencing gene expression, n is the number of possible alleles for each locus, bipk represents the effect of QTL k (with genotype gp) on trait i, eij represents the non-genetic effects modeled by a random variable and G an indicator function,

Formula 22
We let µi=0 because the traits were normalized to zero mean. Equation (1) can be written as

Formula 33
where (Mk)ij={sum}p=1nbipkGpjk. Thus, the trait matrix can be decomposed into m potential modules. Each potential module is associated with the additive genetic effect of a QTL and is represented by a matrix Mk. Suppose the genotype of the QTL k in individual j is gw, according to Eq. (3), (Mk)ij=biwk. This means that if QTL k does not affect trait i, then (Mk)ij=biwk=0. Suppose there are two possible genotypes for QTL k, the module includes all the traits on which the two alleles have significantly different effects, for example, bi1k>>bi2k. Then, the genes of the module express at a baseline level under the influence of one allele (g2) and express at an elevated level under the influence of another allele (g1). The transcription module consists of a group of genes and the individuals (or RI strains) in which the genes are co-regulated. A schematic module matrix (Mk) is shown in Figure 3. As shown in Eq. (3), the trait matrix is the sum of these module matrices.


Figure 4623
View larger version (51K):
[in this window]
[in a new window]
 
Figure 3. Schematic demonstration of a transcription module. The shadowed genes and strains belong to the transcription module.

 
Identifying transcription modules
We used the ISA (1Go,6Go) to identify transcription modules. We first tested the performance of ISA when applied to gene expression traits using simulated data and selected parameters for optimal performance. ISA starts with a set of input genes and iteratively optimizes a gene score vector and a condition score vector calculated using two normalized expression matrices, respectively, to identify a submatrix of co-expression where the ISA converges. We applied the ISA to the gene expression trait data, which is also a gene expression matrix where the rows represent genes and the columns represent samples from different RI strains. However, two differences in the nature of the data make the application complicated. The first is about the dimensions of the expression matrix. It has been shown that ISA was able to discover almost all the modules in the simulated data, which is a 1050 (genes) by 1000 (conditions) expression matrix (1Go). For our gene expression trait matrix, the dimensions are 1446 (traits) by 32 (RI strains). Therefore, we need to test whether ISA works well for the matrix of very asymmetric dimensions. Secondly, a regulator-induced transcription module consists of the conditions under which the regulator is activated. Such conditions usually represent a small portion of all the conditions of the expression matrix, whereas a QTL-induced transcription module includes the RI strains in which one allele of the QTL elevates the expression levels of the genes in the module. There are only two possible alleles for a QTL in the BXD RI strains, so each genotype often occurs in about half of the strains. This will lead to many more overlaps between modules along this dimension (strain) of the expression trait matrix. We tested ISA's ability of discovering the modules using a simulated expression trait matrix. The simulated matrix was generated using the method of Ihmels et al. (1Go),

Formula 44
where T is a 1050x32 matrix and n is the number of QTLs, here we let n=25,

Formula 5
gx represents the genotype (of the QTL) that elevates the expression levels, and eij is the random noise with a normal distribution of mean µ=0 and standard deviation {sigma}=0.3.

The performance of ISA is mainly controlled by two parameters, tG and tC, which are the thresholds for selecting the traits and strains of a module iteratively (1Go). Larger tG and tC lead to smaller modules with more stringent co-expression, whereas smaller tG and tC lead to larger modules with less stringent co-expression (6Go). We applied ISA to the simulated gene expression trait data and tested 410 pairs of tG and tC, within the intervals tGisin[1.0, 5.0] and tCisin[0.1, 1.0] to identify the best combination of the parameters. Two criteria, sensitivity and false discovery rate, were used for evaluating the performance of ISA and for selecting parameters for the best performance (Supplementary Material, Table S4). The two criteria are defined as: sensitivity=TP/(TP+FN) and false discovery rate=FP/(TP+FP), where TP is the number of true positives, FN the number of false negatives and FP the number of false positives. For our simulated data, the performance of ISA was less sensitive to tG than to tC. We selected four pairs of tG and tC in the optimal region (with high sensitivity and low discovery rate) of the parameter space, which are tG=2.0, 2.5, 3.0, 3.5 and tC=0.4. The ISA starts with an initial input set of genes. We randomly generated 2000 initial input sets. The size of the initial input set ranges from 30 to 75. The tentative modules discovered by using the four sets of parameters were fused to form final modules (1Go). First, we let each gene be controlled by one QTL. This can be set up by placing only one ‘1’ in each row of the matrix A. ISA correctly identified all the 25 modules (true positive=25) and only the 25 modules (false positive=0) in nine out of 10 independent runs. In the other run, 24 modules were correctly identified and there was no false positive. Then, we let each gene be controlled by two QTLs. Thus, there are two ‘1’s in each row of the matrix A. On average, ISA correctly identified 17.1 modules in 10 independent runs. The decrease in accuracy is due to the overlaps between the modules in terms of genes. This trend was observed in the application of ISA to a simulated gene expression matrix, where the total number of genes and conditions is comparable, but the decrease in accuracy was slower. This indicates that using larger number of RI strains (i.e. larger genetic reference population) will increase the power of ISA for detecting modules in the gene expression traits. However, even for a small number of strains as we used to generate the simulated data, the average false positive is only 0.8. Thus, the false discovery rate is low ({approx}0.047) for the selected parameters.

The algorithm should not discover transcription modules when there is actually no module in the expression matrix. We generated 100 random matrices by permuting the simulated data matrix that we used to test the parameters. ISA did not identify any module from these random matrices.

In the application of ISA to the real gene expression trait data, we filtered the data set to exclude the genes whose expression levels are not significantly influenced by any QTL. We selected 1466 genes that mapped to at least one QTL with an LOD larger than 3. The filtered expression matrix E contains the expression values of the selected genes in the forebrain samples from the 32 RI strains. We then generated two trait matrices consisting of log-transformed relative gene expression levels,

Formula 7
where Eij is the expression level of gene i in the sample from RI strainj and Eip1 and Eip2 are the expression levels of gene i in the sample from the two progenitor strains P1 and P2 (C57BL/6J and DBA/2J). Thus, the modules of y1 and y2 consist of genes whose expression levels are elevated by alleles from progenitors P1 and P2, respectively.

Mapping QTLs for transcriptional modules
We defined a module signature to represent each transcription module and mapped it as a quantitative trait. Each module has 32 signature scores, corresponding to the 32 strains. The signature score is defined as the average of the normalized trait values (each trait was normalized to zero mean and the length of the vector of the trait values was normalized to one) of the module genes in a strain and is equivalent to the condition score used by ISA (1Go). Interval mapping (70Go,71Go) was used to map QTLs for the transcription modules.

Searching for associations between transcription modules and classical BXD traits
The strength of the association between a module and a classical trait is measured by the Pearson correlation between the trait value and the module signature score. The associations should be evaluated for statistical significance as well as strength. Non-zero correlation between two variables can be tested against the null hypothesis that there is no linear correlation between the two variables by using t-statistic (72Go). The t-statistics was defined as t=r{surd}((N–2)/(1–r2)), where r is the correlation between the values of the classical trait and the module signature and N is the number of strains. The P-values associated with t were calculated using permutation. In the permutation, the trait values were permuted among the RI strains. This procedure is intended to destroy any association trait values and module signature values and allows us to calculate an empirical null distribution for the t-statistic. We then calculated the q-values and false discovery rates using the software Q-VALUE (43Go).

Mining literature for known associations among the module genes, the QTL genes and the classical traits
The relations documented in the PubMed literature database between genes within each module and the one-LOD intervals of QTLs regulating the module and the keywords representing the classical traits were analyzed using a text-mining software, Chilibot (44Go). Keywords were selected from the text descriptions of the classical BXD traits. Gross anatomical entities were excluded to increase specificity. Chilibot automatically expands the supplied gene symbols to include its synonyms. It then queries PubMed and retrieves relevant records. The texts from titles and abstracts are analyzed using natural language processing techniques, including part-of-speech tagging and shallow parsing (44Go). A set of predefined rules is then applied to identify term–term relations. The relations are presented using graphs that contain links to the text (sentences or abstracts) providing the information. The results were manually curated to eliminate the occasional false positive findings.

SUPPLEMENTARY MATERIAL
Supplementary Material is available at HMG Online.

ACKNOWLEDGEMENTS
We thank Byron C. Jones for the contribution and interpretation of BXD phenotypes for cocaine effects on exploratory behavior. This work was supported by a PhRMA Foundation grant and NIH grants U01AA014425, U01AA13499, R01HD052472 and MH-62009.

Conflict of Interest statement. None declared.


    FOOTNOTES
 
{dagger} The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors. Back


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 REFERENCES
 

  1. Ihmels, J., Friedlander, G., Bergmann, S., Sarig, O., Ziv, Y. and Barkai, N. (2002) Revealing modular organization in the yeast transcriptional network. Nat. Genet., 31, 370–377.[CrossRef][ISI][Medline]

  2. Segal, E., Shapira, M., Regev, A., Pe'er, D., Botstein, D., Koller, D. and Friedman, N. (2003) Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nat. Genet., 34, 166–176.[CrossRef][ISI][Medline]

  3. Bergmann, S., Ihmels, J. and Barkai, N. (2004) Similarities and differences in genome-wide expression data of six organisms. PLoS Biol., 2, E9.[CrossRef][Medline]

  4. Tanay, A., Sharan, R., Kupiec, M. and Shamir, R. (2004) Revealing modularity and organization in the yeast molecular network by integrated analysis of highly heterogeneous genomewide data. Proc. Natl Acad. Sci. USA, 101, 2981–2986.[Abstract/Free Full Text]

  5. Segal, E., Friedman, N., Koller, D. and Regev, A. (2004) A module map showing conditional activity of expression modules in cancer. Nat. Genet., 36, 1090–1098.[ISI][Medline]

  6. Bergmann, S., Ihmels, J. and Barkai, N. (2003) Iterative signature algorithm for the analysis of large-scale gene expression data. Phys. Rev. E, 67, 031902.[CrossRef]

  7. Kloster, M., Tang, C. and Wingreen, N.S. (2004) Finding regulatory modules through large-scale gene-expression data analysis. Bioinformatics.

  8. Jansen, R.C. and Nap, J.P. (2001) Genetical genomics: the added value from segregation. Trends Genet., 17, 388–391.[CrossRef][ISI][Medline]

  9. Brem, R.B., Yvert, G., Clinton, R. and Kruglyak, L. (2002) Genetic dissection of transcriptional regulation in budding yeast. Science, 296, 752–755.[Abstract/Free Full Text]

  10. Wayne, M.L. and McIntyre, L.M. (2002) Combining mapping and arraying: an approach to candidate gene identification. Proc. Natl Acad. Sci. USA, 99, 14903–14906.[Abstract/Free Full Text]

  11. Schadt, E.E., Monks, S.A., Drake, T.A., Lusis, A.J., Che, N., Colinayo, V., Ruff, T.G., Milligan, S.B., Lamb, J.R., Cavet, G. et al. (2003) Genetics of gene expression surveyed in maize, mouse and man. Nature, 422, 297–302.[CrossRef][Medline]

  12. Morley, M., Molony, C.M., Weber, T.M., Devlin, J.L., Ewens, K.G., Spielman, R.S. and Cheung, V.G. (2004) Genetic analysis of genome-wide variation in human gene expression. Nature, 430, 743–747.[CrossRef][Medline]

  13. Monks, S.A., Leonardson, A., Zhu, H., Cundiff, P., Pietrusiak, P., Edwards, S., Phillips, J.W., Sachs, A. and Schadt, E.E. (2004) Genetic inheritance of gene expression in human cell lines. Am. J. Hum. Genet., 75, 1094–1105.[CrossRef][ISI][Medline]

  14. Bystrykh, L., Weersing, E., Dontje, B., Sutton, S., Pletcher, M.T., Wiltshire, T., Su, A.I., Vellenga, E., Wang, J., Manly, K.F. et al. (2005) Uncovering regulatory pathways that affect hematopoietic stem cell function using ‘genetical genomics’. Nat. Genet., 37, 225–232.[CrossRef][ISI][Medline]

  15. Chesler, E.J., Lu, L., Shou, S., Qu, Y., Gu, J., Wang, J., Hsu, H.C., Mountz, J.D., Baldwin, N.E., Langston, M.A. et al. (2005) Complex trait analysis of gene expression uncovers polygenic and pleiotropic networks that modulate nervous system function. Nat. Genet., 37, 233–242.[CrossRef][ISI][Medline]

  16. Hubner, N., Wallace, C.A., Zimdahl, H., Petretto, E., Schulz, H., Maciver, F., Mueller, M., Hummel, O., Monti, J., Zidek, V. et al. (2005) Integrated transcriptional profiling and linkage analysis for identification of genes underlying disease. Nat. Genet., 37, 243–253.[CrossRef][ISI][Medline]

  17. Yvert, G., Brem, R.B., Whittle, J., Akey, J.M., Foss, E., Smith, E.N., Mackelprang, R. and Kruglyak, L. (2003) Trans-acting regulatory variation in Saccharomyces cerevisiae and the role of transcription factors. Nat. Genet., 35, 57–64.[ISI][Medline]

  18. Li, H., Lu, L., Manly, K.F., Chesler, E.J., Bao, L., Wang, J., Zhou, M., Williams, R.W. and Cui, Y. (2005) Inferring gene transcriptional modulatory relations: a genetical genomics approach. Hum. Mol. Genet., 14, 1119–1125.[Abstract/Free Full Text]

  19. Hoeschele, I. and Bing, N. (2005) Genetical genomics analysis of a yeast segregant population for transcription network inference. Genetics, 170, 533–542.[Abstract/Free Full Text]

  20. Zhu, J., Lum, P.Y., Lamb, J., GuhaThakurta, D., Edwards, S.W., Thieringer, R., Berger, J.P., Wu, M.S., Thompson, J., Sachs, A.B. et al. (2004) An integrative genomics approach to the reconstruction of gene networks in segregating populations. Cytogenet. Genome Res., 105, 363–374.[CrossRef][ISI][Medline]

  21. Brem, R.B. and Kruglyak, L. (2005) The landscape of genetic complexity across 5,700 gene expression traits in yeast. Proc. Natl Acad. Sci. USA, 102, 1572–1577.[Abstract/Free Full Text]

  22. Doss, S., Schadt, E.E., Drake, T.A. and Lusis, A.J. (2005) Cis-acting expression quantitative trait loci in mice. Genome Res., 15, 681–691.[Abstract/Free Full Text]

  23. Schadt, E.E., Lamb, J., Yang, X., Zhu, J., Edwards, S., Guhathakurta, D., Sieberts, S.K., Monks, S., Reitman, M., Zhang, C. et al. (2005) An integrative genomics approach to infer causal associations between gene expression and disease. Nat. Genet., 37, 710–717.[CrossRef][ISI][Medline]

  24. de Koning, D.J. and Haley, C.S. (2005) Genetical genomics in humans and model organisms. Trends Genet., 21, 377–381.[CrossRef][ISI][Medline]

  25. de Koning, D.-J., Carlborg, O. and Haley, C.S. (2005) The genetic dissection of immune response using gene-expression studies and genome mapping. Vet. Immunol. Immunopathol., 105, 343.[Medline]

  26. Li, J. and Burmeister, M. (2005) Genetical genomics: combining genetics with gene expression analysis. Hum. Mol. Genet., 14, R163–R169.[Abstract/Free Full Text]

  27. Storey, J.D., Akey, J.M. and Kruglyak, L. (2005) Multiple locus linkage analysis of genomewide expression in yeast. PLoS Biol., 3, e267.[CrossRef][Medline]

  28. Ronald, J., Brem, R.B., Whittle, J. and Kruglyak, L. (2005) Local regulatory variation in Saccharomyces cerevisiae. PLoS Genet., 1, e25.

  29. Brem, R.B., Storey, J.D., Whittle, J. and Kruglyak, L. (2005) Genetic interactions between polymorphisms that affect gene expression in yeast. Nature, 436, 701–703.[CrossRef][Medline]

  30. Kirst, M., Myburg, A.A., De Leon, J.P., Kirst, M.E., Scott, J. and Sederoff, R. (2004) Coordinated genetic regulation of growth and lignin revealed by quantitative trait locus analysis of cDNA microarray data in an interspecific backcross of eucalyptus. Plant Physiol., 135, 2368–2378.[Abstract/Free Full Text]

  31. Mehrabian, M., Allayee, H., Stockton, J., Lum, P.Y., Drake, T.A., Castellani, L.W., Suh, M., Armour, C., Edwards, S., Lamb, J. et al. (2005) Integrating genotypic and expression data in a segregating mouse population to identify 5-lipoxygenase as a susceptibility gene for obesity and bone traits. Nat. Genet., 37, 1224–1233.[CrossRef][ISI][Medline]

  32. Chesler, E.J., Wang, J., Lu, L., Qu, Y., Manly, K.F. and Williams, R.W. (2003) Genetic correlates of gene expression in recombinant inbred strains: a relational model system to explore neurobehavioral phenotypes. Neuroinformatics, 1, 343–357.[CrossRef][ISI][Medline]

  33. Jones, B.C., Tarantino, L.M., Rodriguez, L.A., Reed, C.L., McClearn, G.E., Plomin, R. and Erwin, V.G. (1999) Quantitative-trait loci analysis of cocaine-related behaviours and neurochemistry. Pharmacogenetics, 9, 607–617.[ISI][Medline]

  34. Crabbe, J.C. (1998) Provisional mapping of quantitative trait loci for chronic ethanol withdrawal severity in BXD recombinant inbred mice. J. Pharmacol. Exp. Ther., 286, 263–271.[Abstract/Free Full Text]

  35. Risinger, F.O. (2003) Genetic analyses of ethanol-induced hyperglycemia. Alcohol Clin. Exp. Res., 27, 756–764.[Medline]

  36. McCaughran, J., Jr, Bell, J. and Hitzemann, R. (1999) On the relationships of high-frequency hearing loss and cochlear pathology to the acoustic startle response (ASR) and prepulse inhibition of the ASR in the BXD recombinant inbred series. Behav. Genet., 29, 21–30.[CrossRef][ISI][Medline]

  37. McCall, R.D. and Frierson, D., Jr. (1981) Evidence that two loci predominantly determine the difference in susceptibility to the high pressure neurologic syndrome type I seizure in mice. Genetics, 99, 285–307.[Abstract/Free Full Text]

  38. Plomin, R., McClearn, G.E., Gora-Maslak, G. and Neiderhiser, J.M. (1991) Use of recombinant inbred strains to detect quantitative trait loci associated with behavior. Behav. Genet., 21, 99–116.[CrossRef][ISI][Medline]

  39. Schrier, D.J., Sternick, J.L., Allen, E.M. and Moore, V.L. (1982) Immunogenetics of BCG-induced anergy in mice: control by genes linked to the Igh complex. J. Immunol., 128, 1466–1469.[Abstract]

  40. Pereira, P., Lafaille, J.J., Gerber, D. and Tonegawa, S. (1997) The T cell receptor repertoire of intestinal intraepithelial gammadelta T lymphocytes is influenced by genes linked to the major histocompatibility complex and to the T cell receptor loci. Proc. Natl Acad. Sci. USA, 94, 5761–5766.[Abstract/Free Full Text]

  41. Willott, J.F. and Erway, L.C. (1998) Genetics of age-related hearing loss in mice. IV. Cochlear pathology and hearing loss in 25 BXD recombinant inbred mouse strains. Hear Res., 119, 27–36.[CrossRef][ISI][Medline]

  42. Phillips, T.J., Belknap, J.K. and Crabbe, J.C. (1991) Use of recombinant inbred strains to assess vulnerability to drug abuse at the genetic level. J. Addict. Dis., 10, 73–87.[Medline]

  43. Storey, J.D. and Tibshirani, R. (2003) Statistical significance for genomewide studies. Proc. Natl Acad. Sci. USA, 100, 9440–9445.[Abstract/Free Full Text]

  44. Chen, H. and Sharp, B.M. (2004) Content-rich biological network constructed by mining PubMed abstracts. BMC Bioinformatics, 5, 147.[CrossRef][Medline]

  45. Wise, R.A. (1984) Neural mechanisms of the reinforcing action of cocaine. NIDA Res. Monogr., 50, 15–33.[Medline]

  46. Hoover, B.R. and Marshall, J.F. (2002) Further characterization of preproenkephalin mRNA-containing cells in the rodent globus pallidus. Neuroscience, 111, 111–125.[Medline]

  47. Palomo, T., Archer, T., Kostrzewa, R.M. and Beninger, R.J. (2004) Gene-environment interplay in schizopsychotic disorders. Neurotox. Res., 6, 1–9.[Medline]

  48. Todtenkopf, M.S., Stellar, J.R., Williams, E.A. and Zahm, D.S. (2004) Differential distribution of parvalbumin immunoreactive neurons in the striatum of cocaine sensitized rats. Neuroscience, 127, 35–42.[Medline]

  49. Jellali, A., Stussi-Garaud, C., Gasnier, B., Rendon, A., Sahel, J.A., Dreyfus, H. and Picaud, S. (2002) Cellular localization of the vesicular inhibitory amino acid transporter in the mouse and human retina. J. Comp. Neurol., 449, 76–87.[CrossRef][Medline]

  50. Nair, V.D. and Sealfon, S.C. (2003) Agonist-specific transactivation of phosphoinositide 3-kinase signaling pathway mediated by the dopamine D2 receptor. J. Biol. Chem., 278, 47053–47061.[Abstract/Free Full Text]

  51. Wang, W., Stock, R.E., Gronostajski, R.M., Wong, Y.W., Schachner, M. and Kilpatrick, D.L. (2004) A role for nuclear factor I in the intrinsic control of cerebellar granule neuron gene expression. J. Biol. Chem., 279, 53491–53497.[Abstract/Free Full Text]

  52. Darville, M.I., Antoine, I.V., Mertens-Strijthagen, J.R., Dupriez, V.J. and Rousseau, G.G. (1995) An E2F-dependent late-serum-response promoter in a gene that controls glycolysis. Oncogene, 11, 1509–1517.[Medline]

  53. Pasteau, S., Loiseau, L. and Brun, G. (1997) Proliferation of chicken neuroretina cells induced by v-src, in vitro, depends on activation of the E2F transcription factor. Oncogene, 15, 17–28.[CrossRef][Medline]

  54. Hou, S.T., Cowan, E., Walker, T., Ohan, N., Dove, M., Rasqinha, I. and MacManus, J.P. (2001) The transcription factor E2F1 promotes dopamine-evoked neuronal apoptosis by a mechanism independent of transcriptional activation. J. Neurochem., 78, 287–297.[CrossRef][Medline]

  55. Shippenberg, T.S., Rea, W. and Slusher, B.S. (2000) Modulation of behavioral sensitization to cocaine by NAALADase inhibition. Synapse, 38, 161–166.[Medline]

  56. Callis, A.H., Schrier, D.J., David, C.S. and Moore, V.L. (1983) Immunogenetics of BCG-induced anergy in mice. Control by Igh- and H-2-linked genes. Immunology, 49, 609–616.[Medline]

  57. Harding, F.A., McArthur, J.G., Gross, J.A., Raulet, D.H. and Allison, J.P. (1992) CD28-mediated signalling co-stimulates murine T cells and prevents induction of anergy in T-cell clones. Nature, 356, 607–609.[CrossRef][Medline]

  58. Schwartz, R.H. (1997) T cell clonal anergy. Curr. Opin. Immunol., 9, 351–357.[CrossRef][ISI][Medline]

  59. Greenwald, R.J., Boussiotis, V.A., Lorsbach, R.B., Abbas, A.K. and Sharpe, A.H. (2001) CTLA-4 regulates induction of anergy in vivo. Immunity, 14, 145–155.

  60. Mohrs, M., Lacy, D.A. and Locksley, R.M. (2003) Stat signals release activated naive Th cells from an anergic checkpoint. J. Immunol., 170, 1870–1876.[Abstract/Free Full Text]

  61. Elloso, M.M. and Scott, P. (2001) Differential requirement of CD28 for IL-12 receptor expression and function in CD4(+) and CD8(+) T cells. Eur. J. Immunol., 31, 384–395.[CrossRef][Medline]

  62. Moriuchi, H., Moriuchi, M. and Fauci, A.S. (1997) Cloning and analysis of the promoter region of CCR5, a coreceptor for HIV-1 entry. J. Immunol., 159, 5441–5449.[Abstract]

  63. Liu, J., Na, S., Glasebrook, A., Fox, N., Solenberg, P.J., Zhang, Q., Song, H.Y. and Yang, D.D. (2001) Enhanced CD4+ T cell proliferation and Th2 cytokine production in DR6-deficient mice. Immunity, 15, 23–34.[CrossRef][ISI][Medline]

  64. Shmueli, O., Cahana, A. and Reiner, O. (1999) Platelet-activating factor (PAF) acetylhydrolase activity, LIS1 expression, and seizures. J. Neurosci. Res., 57, 176–184.[CrossRef][ISI][Medline]

  65. Cardoso, C., Leventer, R.J., Dowling, J.J., Ward, H.L., Chung, J., Petras, K.S., Roseberry, J.A., Weiss, A.M., Das, S., Martin, C.L. et al. (2002) Clinical and molecular basis of classical lissencephaly: mutations in the LIS1 gene (PAFAH1B1). Hum. Mutat., 19, 4–15.[CrossRef][ISI][Medline]

  66. Davies, B., Kearns, I.R., Ure, J., Davies, C.H. and Lathe, R. (2001) Loss of hippocampal serine protease BSP1/neuropsin predisposes to global seizure activity. J. Neurosci., 21, 6993–7000.[Abstract/Free Full Text]

  67. Churchill, G.A., Airey, D.C., Allayee, H., Angel, J.M., Attie, A.D., Beatty, J., Beavis, W.D., Belknap, J.K., Bennett, B., Berrettini, W. et al. (2004) The collaborative cross, a community resource for the genetic analysis of complex traits. Nat. Genet., 36, 1133–1137.[CrossRef][Medline]

  68. Chesler, E.J., Lu, L., Wang, J., Williams, R.W. and Manly, K.F. (2004) WebQTL: rapid exploratory analysis of gene expression and genetic networks for brain and behavior. Nat. Neurosci., 7, 485–486.[CrossRef][ISI][Medline]

  69. Lynch, M. and Walsh, B. (1998) Genetics and Analysis of Quantitative Traits. Sinauer Associates, Inc., Sunderland, MA.

  70. Lander, E.S. and Botstein, D. (1989) Mapping mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics, 121, 185–199.[Abstract/Free Full Text]

  71. Haley, C.S. and Knott, S.A. (1992) A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity, 69, 315–324.[ISI][Medline]

  72. Kanji, G.K. (1999) 100 Statistical Tests. Sage, London.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
GeneticsHome page
E. Chaibub Neto, C. T. Ferrara, A. D. Attie, and B. S. Yandell
Inferring Causal Phenotype Networks From Segregating Populations
Genetics, June 1, 2008; 179(2): 1089 - 1100.
[Abstract] [Full Text] [PDF]