Skip Navigation


Human Molecular Genetics Advance Access originally published online on November 21, 2005
Human Molecular Genetics 2005 14(24):3963-3971; doi:10.1093/hmg/ddi420
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Material
Right arrowOA All Versions of this Article:
14/24/3963    most recent
ddi420v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (30)
Google Scholar
Right arrow Articles by Pastinen, T.
Right arrow Articles by Hudson, T. J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Pastinen, T.
Right arrow Articles by Hudson, T. J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved.
The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use, please contact: journals.permissions@oxfordjournals.org

Mapping common regulatory variants to human haplotypes

Tomi Pastinen, Bing Ge, Scott Gurd, Tiffany Gaudin, Carole Dore, Mathieu Lemire, Pierre Lepage, Eef Harmsen and Thomas J. Hudson*

McGill University and Genome Quebec Innovation Center, Room 7105, 740 Dr Penfield Avenue, Montreal, Quebec, Canada H3A 1A4

* To whom correspondence should be addressed. Tel: +1 514 3983311 ext. 00385; Fax: +1 514 3982622; Email: tom.hudson{at}mcgill.ca

Received September 1, 2005; Accepted November 4, 2005


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 MATERIALS AND METHODS
 SUPPLEMENTARY MATERIAL
 REFERENCES
 
Inter-individual variation in gene expression has proven to be in part governed by genetic determinants, which may be trans- or cis-acting. The underlying cause of cis-acting regulatory variation has been identified in only a handful of the hundreds of genes shown to display differential allelic expression. In this report, we describe a systematic effort to map common cis-acting variants in 64 genes, using association methods in HapMap samples. We identified 16 loci (25%), each of which harbors common haplotypes that affect total expression of a gene, and a further 17 loci (27%) with evidence of haplotypes affecting relative allelic expression in heterozygote samples. Our survey suggests that detailed mapping of allele-specific in vivo expression will provide a rich source of regulatory SNPs or haplotypes that should be given high priority in association studies of human phenotypes.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 MATERIALS AND METHODS
 SUPPLEMENTARY MATERIAL
 REFERENCES
 
Heritable variation of gene expression has been demonstrated to be common by mapping studies in model organisms (1Go), mouse (2Go) and man (3Go). In all these studies, a subset of linked loci show evidence of strong cis-acting effects (‘self-linkages’). An alternate experimental design for detecting cis-acting variation is based on the comparison of allelic transcript levels in which the allelic copies of each transcript serve as internal controls to each other (reviewed in 4Go). We hypothesize that cis-acting regulatory variants are common, based on the evidence that 20–50% of human genes show allelic imbalance (AI) in more than 5% of heterozygous cell lines or tissues in allelic expression studies (5Go–8Go).

A logical step in characterizing regulatory variants is to map their position by haplotype-mapping methods, based on associations between such variants and the ancestral haplotypes on which they arose. As opposed to ‘traditional approaches’ of regulatory variant identification, which are most commonly based on transfection assays of candidate variants in cell-culture systems, we rely on a hypothesis-free method that investigates relative over- or under-expression of two transcripts in vivo, followed by association mapping to cis-variants. We build upon prior studies that validated a small number of genes using similar approaches (7Go,9Go,10Go), and the recent development of the HapMap (11Go) to describe a systematic search for cis-acting variants and/or haplotypes that affect or correlate with AI.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 MATERIALS AND METHODS
 SUPPLEMENTARY MATERIAL
 REFERENCES
 
We selected 73 genes (listed in Supplementary Material, Tables S1A and B) with prior evidence of AI in humans. Approximately, one-half (n=36) of target genes were selected based on previously reported as well as more recent studies of AI (Supplementary Material, Table S2) using quantitative genotyping capable of detecting ~1.5-fold ratios between allelic transcripts (7Go). The original screen for AIs in this 36 gene-set was biased for genes known or suspected to have a role in immune responses. Their selection was also based on: (i) previous evidence of AI observed in at least two informative heterozygotes; and (ii) no prior evidence of epigenetic mechanisms accounting for the observed AIs (i.e. imprinted genes such as PEG10, or genes such as IL1A thought to be regulated by random monoallelic expression were excluded). The other 37 genes were selected in a hypothesis-free manner from candidate AI-targets we identified by comparing allele frequencies in the International Haplotype Map Database (in the CEPH sample) and allele counts of expressed sequence tags in dbEST (12Go).

The experimental design included several steps. We first performed a validation test in each gene to confirm AIs, using a normalized sequencing method (12Go). We then mapped the location of the cis-acting variants causing the AIs by association with alleles specific to phased chromosomes in a panel of HapMap lymphoblastoid cell lines (LCLs). The last step involved a correlation between SNPs associated with AIs and total expression measurements. AI measured by quantitative genotyping varies in its sensitivity and specificity across genes [which are largely dependent on the expression levels of the gene in question (4Go)]. These technical issues do not allow strict fold-expression limits to be applied; the approach hinges on two independent factors: (i) consistency between RNA measurements from the same sample; and (ii) consistency between independent SNPs in the same gene using the same RNA. The determination of SNPs correlating significantly with both AI and total expression levels provides the ultimate validation.

Allelic and total expression datasets were generated from the HapMap panel of LCLs from CEPH (Utah residents with ancestry from northern and western Europe). Genotypes already exist for hundreds-of-thousands of SNPs in this sample panel (11Go). We attempted to design two or more genotyping assays per gene for AI analysis, in which we compared RT–PCR of intra-exonic or heteronuclear RNA (hnRNA) (7Go) with heterozygous genomic DNA samples, the latter to obtain the reference 50:50 ratio of alleles. AI was determined if independent replicate assays for heterozygous SNPs in the RNA samples showed allele ratios outside the 95% confidence interval (CI) for equal expression (defined at each locus by using the genomic DNA heterozygotes). The actual AI analysis is carried out using normalized sequence data and is facilitated by the PeakPicker software, which, on average, provides detection of ≥1.2-fold differences between allelic transcripts (12Go). Figure 1 illustrates sequence-based AI analysis in three genes, with examples of AI positive samples informative at two independent sites of the same gene. For each gene, two samples informative at both analyzed SNPs are shown. In the case of GUCY1A3 (top two rows), all samples that are informative at both sites show concordant results in regards to which chromosome correlates with the over-expressed transcript. The first cDNA sample illustrated for GUCY1A3 (middle column) shows over-expression of allele G at rs2306557and allele A at rs4691842; the second cDNA sample (rightmost column) shows slightly less pronounced over-expression of the same alleles. In phasing the illustrated samples, the relatively over-expressed alleles occurred on the same chromosome. Similarly, in the case of IRF4 (bottom two rows), the independent SNPs showed concordance of AIs in informative samples. The analysis of EPHX2 AIs revealed a subset of samples (Table 1), which showed discordances in AI calls, an example of which is displayed on the middle column on rows 3 and 4: the relatively over-expressed alleles G at rs1042064 and A at rs729609 in this cDNA sample are phased to opposite chromosomes. The other EPHX2 example (rows 3 and 4, right column) shows concordant over-expression of G-alleles at the corresponding sites. All three genes showed AI association (Table 1), but only EPHX2 and GUCY1A3 associations were validated in total expression studies (also see Figs 2 and 3).



View larger version (35K):
[in this window]
[in a new window]
 
Figure 1. Sequence-based AI analysis in GUCY1A3, EPHX2 and IRF4 (each gene assayed by two independent SNPs). Each row contains two examples of allelic expression analyzed in RNA derived from heterozygous individuals (middle and right columns denoted cDNA1 and cDNA2); the reference 50:50 allele ratio measured in gDNA sample is shown (left column).

 

View this table:
[in this window]
[in a new window]
 
Table 1. AI and association results
 


View larger version (32K):
[in this window]
[in a new window]
 
Figure 2. Examples of total expression–genotype correlations are shown for four genes. The graphs display the expression results from independent cultures separately. The height of the bars represents the mean expression levels and standard deviation (in arbitrary units). White and light grey bars are for homozygous genotypes; dark gray bars are for heterozygous genotypes. The sample size for each genotype is indicated in parentheses below the X-axis.

 


View larger version (82K):
[in this window]
[in a new window]
 
Figure 3. LD maps of expression phenotypes for four genes showing significant allelic and total expression associations: GUCY1A3 (A), CD151 (B), EPHX2 (C) and KL (D). The integrated LD maps include: (1) gene name and direction of transcription; (2) SNP positions; (3) reference SNP identification numbers. SNPs tagged with an asterisk (*) are marker SNPs used in the AI assays. SNPs surrounded by blue boxes are those used for the expression histogram. SNPs in red demonstrate association with total expression (P<0.05); (4) SNP alleles. SNPs reaching Pcorrected<0.05 in AI association tests are highlighted in red; (5 and 6) allele counts observed in over-expressed chromosomes. Line 5 denotes the number of alleles (corresponding to the first SNP bases in row (4Go) present on the over-expressed chromosomes, whereas line 6 denotes the numbers for the alternative bases; (7 and 8) allele counts observed in under-expressed chromosomes. Line 7 denotes the number of alleles (corresponding to the first SNP bases in row 4) present on the under-expressed chromosomes, whereas line 8 denotes the numbers for the alternative bases; (9) LD map, generated by Haploview (21Go): the numbers in the squares are pairwise D' values (in %), with D'=100 if no number is written. The red and pink colors indicate that the estimate is associated with LOD ≥2, whereas blue and white indicate for LODs <2. The histograms of genotype versus total expression level for the boxed-SNPs in the haplotype LD maps (row 3) can be found in Supplementary Material, Figure S2B.

 
Of the 73 genes selected for this study, we confirmed common AI in 64 genes (Table 1); nine were dropped from further analysis because of unreliable genotyping assays in DNA (n=1), RT–PCR (n=3) or had less than six samples carrying AI (n=5). Some of the single point AI data had been generated earlier (12Go). For the 64 genes, we found 1626 heterozygote cases (with technically successful and reproducible AI-assays) that could be phased (see below). In 742 cases, we were able to detect AI, of which 209 (involving 38 genes) had two informative SNPs that both demonstrated AI. Of these, 182 cases (88%) showed concordant AI (i.e. phased alleles indicated that the same allelic transcript was over- or under-expressed), but in 27 cases, the AI calls were discordant (independent SNPs in the same genes suggested that a different transcript was over- or under-expressed). These discrepancies mostly involved the same genes (notably OAS1, EPHX2 and FEZ1), which could reflect technical errors in one of the assays, or biological causes such as allele-specific splicing of different isoforms. A recent report of allele-specific splicing of the OAS1 gene (13Go) provides a plausible explanation for discordant AIs in this gene: one of our two assays was specific for a single isoform (p46), whereas the other assay detected expression from two isoforms (p46 and p52). Additional experiments in our laboratory indicated that the fourth known OAS1 isoform (p42) is also regulated in an allele-specific fashion (data not shown).

In order to correlate which chromosome is associated with higher or lower expression, we phased the chromosomes from samples showing AI, using the software Merlin (14Go) and the CEPH parent–offspring genotype datasets from HapMap release no. 13 (exclusive of the cases where the marker SNPs used in AI assays were heterozygous in all three samples of the trio). The alleles of each haplotype in each AI sample were assigned a ‘+’ or ‘–’ sign to designate the relative over- or under-expression of the transcript correlating to the phased chromosome. The chromosomal alleles neighboring a test gene that had an assignment of ‘+’ or ‘–’ were used in an association test; allele counts for each unequivocally phased site were tabulated in a two-by-two table. If the distribution of alleles deviated significantly (using a two-tailed Fisher's exact test) between the ‘+’ and ‘–’ chromosomes, a putative association with AI was determined. The AI association procedure is summarized in Supplementary Material, Figure S1 for GUCY1A3 gene. We kept all informative cases from the 64 loci for further analysis, but excluded the 27 instances with conflicting AI calls from independent SNPs (with the known caveat that undetected errors in the dataset would most likely result in a loss of power in the association test). We arbitrarily limited the search window to 100 kb regions on each side of the gene, in order to avoid dilution of the power of the analysis by performing too many statistical tests. Nominal significance for allelic association with SNP(s) mapping to the gene region was achieved for 48/64 genes (75%). As many tests were performed, we carried out gene-wide P-value corrections using permutation tests. The gene-based corrected P-value threshold (Pcorrected<0.05) was reached for 33/64 (52%). These 33 genes proceeded to the validation phase.

For variants correlating with allelic expression differences one could expect to see that variation in the total expression levels of a gene would also correlate with genotypic categories, where +/+, +/– and –/– alleles associated with AI would be predictive of high, medium and low expression levels, respectively (provided that the cis-acting influences are sufficiently strong to overcome trans-acting, feedback and environmental effects). To explore this, we cultured the unrelated LCLs (n=60) of the CEU HapMap panel in three different conditions (see Materials and Methods) to minimize effects because of environmental context. In addition to the three independent RNA samples for each LCL, multiple replicate measurements of total expression were performed by quantitative RT–PCR. We sought to limit the number of association tests between expression levels and genotypes, and only tested SNPs that had been associated with AI (Pcorrected<0.05), and used an a priori hypothesis regarding which genotype should associate with high or low total expression. The statistical analysis combined the data from the three independent cultures (Table 2). A linear model was used to test for association between the candidate sites and the repeated measurements of total expression. Figure 2 shows examples of total expression association data (for the SNP showing the strongest association with total expression) in three independent cultures for four genes. The cis-acting SNPs identified in allelic expression analysis for GUCY1A3 (top panel), CD151 (second panel from top), EPHX2 (third panel from top) and KL (bottom panel) could be validated by total expression analysis, with the predicted alleles significantly over-expressed (Table 2). Furthermore, the total expression analysis in the independent cultures yielded convergent data in all genes. Summary graphs for the strongest associations for each gene are shown in Supplementary Material, Figure S2B. The genotype association data for total expression is summarized in Table 2: 16/33 tested genes (48%) or 16/64 total genes with common AI (25%) showed nominally significant expression differences between genotypes, as predicted by the AI association tests. Empirical P-values based on permutation tests are also shown (Table 2). Two genes (PAX8 and PISD) showed the ‘unexpected’ allele as over-expressed at the P<0.05 level, which may indicate false positives in either the AI or total expression association studies, though isoform-specific expression cannot be ruled out.


View this table:
[in this window]
[in a new window]
 
Table 2. Total expression association results
 
By inspecting the location of the SNPs associated with expression datasets generated by both methods, we observe that there usually are several neighboring SNPs showing similar expression association, and that they almost always co-locate to the same haplotype block (Fig. 3) (Supplementary Material, Fig. S2A). Linkage disequilibrium (LD) maps summarizing the AI data for all 64 genes, including the genes that did not show significant associations, can be accessed at http://www.genomequebec.mcgill.ca/HMG/.


    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 MATERIALS AND METHODS
 SUPPLEMENTARY MATERIAL
 REFERENCES
 
This study represents the first systematic approach to map cis-acting variants using AI methods. Large-scale studies surveying AI (reviewed in 4Go) are a relatively new phenomenon, and no general guidelines exist regarding the interpretation of the results. In this work, sensitive AI assays validated both by independent AI assays as well as total expression measurements were combined with high-density genotyping data provided by the HapMap project (11Go). In genes showing common occurrence of AI, we found common variants associated with AI in 52% of cases, with 16 genes showing an effect that is strong enough to be detected in association studies with total expression levels. This may indicate that some common cis-acting polymorphisms remain unidentified in more commonly employed expression mapping approaches (3Go), which could be because of trans-acting, feedback and environmental effects that dilute cis-acting influences in total expression studies. Such effects are theoretically avoided in AI studies, in which each expressed allele serves as reference for the other. The fact that for 31 genes with common AIs, no statistically significant association with neighboring SNP alleles are observed may be explained by: (i) rare variants accounting for the allelic expression observed in different samples; (ii) epigenetic effects causing allelic expression; and (iii) technical limitations of the allelic expression method in assigning relative over-/under-expression. Similar studies in other tissues and populations in a genome-wide manner may be facilitated by recent advances in genotyping technologies (15Go).

One can hypothesize that the large size of ‘regulatory haplotypes’ that we observe are partly dependent on the density of the HapMap dataset (which approached one SNP per 5 kb when this dataset was analyzed). As the HapMap density increases, small haplotype block associations that went undetected in the current analyses may later appear. Similar ‘regulatory haplotypes’ showing large associated regions for cis-acting effects in Caucasian populations have been reported for individual genes in earlier studies (1Go,9Go,10Go).

Mapping of a regulatory phenotype to a common haplotype may be sufficient for testing the role of regulatory variation in association studies (provided that the samples in regulatory haplotype determination as well as clinical association study are derived from the same population). In our study we confirmed regulatory haplotypes in genes such as KL, EPHX2, VDR and CAT, which have earlier been associated with risk of heart disease (16Go), stroke (17Go), asthma (18Go) or hypertension (19Go), respectively. The role of the regulatory haplotypes identified here are obvious targets of further studies in these complex disease phenotypes. Ultimately functional assessment of each variant on the block are required before one can be certain which variant(s) affect(s) expression levels (4Go,20Go). Extending the analysis to genetically more diverse populations with recent African ancestry could allow more precise mapping, as the haplotype blocks are on average >50% smaller (15Go).


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 MATERIALS AND METHODS
 SUPPLEMENTARY MATERIAL
 REFERENCES
 
Samples and RNA preparation for allelic expression assays
A description of the LCLs used in this study is listed in Supplementary Material, Table S3. The RNA from cultured cells (see culture conditions below) used in all allelic expression assays consisted of 60 unrelated LCLs of Caucasian origin, corresponding to the parents of the sample trios used in the International HapMap project (11Go,14Go). Four children from these 30 trios were also included in the analysis of AI association in cases where the parents were uninformative or failed. Three independent cultures of LCLs from HapMap parents (n=60) were generated for the total expression analyses. The first culture was carried out in RPMI 1640 medium (Invitrogen, Carlsbad, CA, USA) supplemented with penicillin/streptomycin, 2 mM L-Glutamine and 15% heat-inactivated fetal bovine serum (Sigma–Aldrich, St. Louis, MO, USA). The cells were grown at 37°C and 5% CO2. The cell growth was monitored using a hemocytometer and cultures were harvested at a density of 0.8–1.1x106 cells/ml. The cells were lyzed by resuspension in Trizol reagent (Invitrogen). The second and third cultures were carried out 4 months after the first culture. LCLs were cultured as described earlier and were treated with PBS containing 0.1% BSA with or without human interleukin-4 (Sigma–Aldrich) at a final concentration of 30 ng/ml for 48 h.

cDNA synthesis
RNA was isolated from the LCLs using Trizol reagent according to the manufacturer's instructions (Invitrogen). The RNA quality was verified using the Agilent 2100 Bioanalyzer (Agilent, Palo Alto, CA, USA). In cDNA synthesis 50 µg aliquots of total RNA were treated with 8 U of DNase I for 40 min at 37°C (Ambion, Austin, TX, USA), extracted with phenol/chloroform (Invitrogen) and reprecipitated. The resulting RNA was annealed to 1000 ng random hexamers (Invitrogen) and first strand cDNA synthesis (RT) was performed using Superscript II reverse transcriptase according to the manufacturer's instructions (Invitrogen). In allelic expression studies, the RT-reactions were carried out in two independent replicates from each RNA sample.

Allelic expression analysis by normalized sequencing
Informative SNPs for each gene were primarily selected from Caucasian HapMap data (http://www.hapmap.org/), with a preference for high-frequency exonic SNPs, and secondarily for intronic SNPs for hnRNA analysis. Some genes selected based on earlier experimental AI screening had no informative SNPs available in the HapMap data and additional SNPs were genotyped in CEPH HapMap samples by FP-SBE, as previously described (7Go), the additional genotyping data is available at http://genomequebec.mcgill.ca/regulatory/fp/. PCR/sequencing-primer designs avoided known SNPs, and applied same designs for RNA and DNA samples unless an exonic SNP was located close to exon–intron boundary. In intronic (hnRNA) assays an additional nested primer was designed for sequencing to reduce the background because of lower copy number of hnRNA. All primer designs are available upon request. Two or more independent assays were designed and attempted for all genes except FLJ12788, PDCD1, IL19, PXN and GCS1. All RNA samples were amplified in duplicate from independent cDNA preparations, and a subset (4–10/SNP) of informative gDNA heterozygotes were amplified in identical conditions to establish expected 50:50 heterozygote ratios. Sequencing was carried out using 0.5 µl of BigDye® Terminator (BDT) v3.1 Cycle Sequencing Kit (Applied Biosystems, Foster City, CA, USA), 2 µl of a PCR product, 1.75 µl of 5x sequencing buffer (0.4 M Tris–HCl of pH 9.0, 10 mM MgCl2) and 10 pmol of the sequencing primer in a 10 µl reaction in conditions suggested by the manufacturer. Reaction products were analyzed on an Applied Biosystems 3730XL DNA analyzer. The sequence traces were analyzed using PeakPicker, a custom-built software that normalizes heterozygote ratios based on peak intensities of surrounding bases as described previously (12Go). The normalized heterozygote ratios of genomic DNA samples were used to establish a 95% CI for each SNP. If both heterozygote ratios in independent RNA samples showed concordant deviation beyond the 95% CI derived from genomic DNA data, the sample was called to have AI. If one of the two RNA replicates was within the 95% CI or if the replicates deviated to opposite directions the sample was defined as ‘unknown’. The allelic expression for BTN3A2 SNPs were analyzed by the FP-SBE method described earlier (7Go).

Real-time PCR analysis for total expression
The consistency of RNA quality for total expression analysis was further ensured by analyzing the 18S rRNA by real-time TaqMan analysis (4319413E, Applied Biosystems). Only the parental HapMap RNA samples which passed RNA quality control in all three cultures (n=51) were used in the expression analysis. The primer designs for the expression analysis excluded regions with known polymorphisms and were typically carried out at different regions of the gene as compared with the allelic expression assays (primers available on request). Total expression measurements were carried out using real-time PCR and SYBR-green (Molecular Probes, Eugene, OR, USA) labeling on an ABI 7900HT (Applied Biosystems) instrument in 10 µl final volume applying the following conditions: 10–15 ng of total RNA, 2 mM MgCl2, 0.1 mM dNTPs, 0.04 uM ROX-dye (Molecular Probes), 0.4x SYBR-green, 1x BSA (NEB), 2.5% DMSO, 0.025 U/µl HotStart Taq DNA Polymerase (Qiagen), 0.32 µM of gene-specific primers; Cycling: 95°C (15 min) and 95°C (20 s), 58°C (30 s), 72°C (45 s) for 40 cycles. RNA samples were analyzed using six (first culture) or three replicates (second and third culture). A standard curve was established on all plates using dilution series of RNA samples with known total RNA concentration. The Ct values for each replicate were transformed to relative concentrations using the estimated standard curve function (SDS 2.1, Applied Biosystems) and normalized based on 18S real-time data from the same samples to account for well-to-well variability. For six replicate experiments, the outlier values were excluded and the mean of the remaining four measurements were used as the sample value, whereas in three replicate experiments the median value was used.

AI association tests
The Caucasian genotyping data spanning the loci (gene ±100 kb of flanking genomic DNA) from HapMap build no. 13 were used as the input data for establishing phase in Merlin (14Go). If two or more marker SNPs had available AI data for the same sample, the phase information allowed assessment of concordance of allelic expression calls: if all informative sites showed concordant over-expression of the transcript derived from one chromosome, the alleles on this chromosome were assigned a ‘+’ sign and the alternate alleles were assigned a ‘–’ sign. If discordant AI calls were made on the same sample, the data was discarded from the subsequent association tests. The chromosomal alleles neighboring a test gene that had an assignment of ‘+’ or ‘–’ were used in the association tests; allele counts for each unequivocally phased site were tabulated in a two-by-two table. If the distribution of alleles deviated significantly (using a two-tailed Fisher’s exact test) between the ‘+’ and ‘–’ chromosomes, a putative association for allelic expression was determined. The AI association procedure issummarized in Supplementary Material, Figure S1 for GUCY1A3 gene. We carried out gene-wide P-value corrections using permutation tests. For each individual, we maintained the original haplotypes and just randomized the ‘+’ and ‘–’ calls during the simulation. Two-thousand permutations were carried out at each locus to establish random P-value distribution. The corrected P-value equals the proportion of permutations that gave a P-value smaller than or equal to the observed P-value for each SNP. Sites that showed gene-based Pcorrected<0.05 were candidates for validation.

Total expression validation
The allelic association of candidate sites was tested for total expression association by combining the expression data from three independent cultures. A linear model was used to test for the association between the candidate sites and the repeated measurements of total expression. For each SNP, a one-sided hypothesis was formulated from the results of the AI association (Fisher's exact test), which indicates the allele that should be over-expressed; significance was assessed with a one-tailed test statistic. Significance of the results were confirmed with a permutation procedure (5000 permutations).


    SUPPLEMENTARY MATERIAL
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 MATERIALS AND METHODS
 SUPPLEMENTARY MATERIAL
 REFERENCES
 
Supplementary Material is available at HMG Online.


    ACKNOWLEDGEMENTS
 
T.J.H. is the recipient of a Clinician-Scientist Award in Translational Research by the Burroughs Wellcome Fund and an Investigator Award from CIHR. This work is supported by Genome Canada and Genome Quebec. Funding to pay the Open Access Publication charges for this article was provided by Genome Canada and Genome Quebec.

Conflict of Interest statement. The authors have no conflicts of interest to declare.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 MATERIALS AND METHODS
 SUPPLEMENTARY MATERIAL
 REFERENCES
 

  1. Brem, R.B., Yvert, G., Clinton, R. and Kruglyak, L. (2002) Genetic dissection of transcriptional regulation in budding yeast. Science, 296, 752–755.[Abstract/Free Full Text]

  2. Schadt, E.E., Monks, S.A., Drake, T.A., Lusis, A.J., Che, N., Colinayo, V., Ruff, T.G., Milligan, S.B., Lamb, J.R., Cavet, G. et al. (2003) Genetics of gene expression surveyed in maize, mouse and man. Nature, 422, 297–302.[CrossRef][Medline]

  3. Morley, M., Molony, C.M., Weber, T.M., Devlin, J.L., Ewens, K.G., Spielman, R.S. and Cheung, V.G. (2004) Genetic analysis of genome-wide variation in human gene expression. Nature, 430, 743–747.[CrossRef][Medline]

  4. Pastinen, T. and Hudson, T.J. (2004) Cis-acting regulatory variation in the human genome. Science, 306, 647–650.[Abstract/Free Full Text]

  5. Bray, N.J., Buckland, P.R., Owen, M.J. and O'Donovan, M.C. (2003) Cis-acting variation in the expression of a high proportion of genes in human brain. Hum. Genet., 113, 149–153.[ISI][Medline]

  6. Lo, H.S., Wang, Z., Hu, Y., Yang, H.H., Gere, S., Buetow, K.H. and Lee, M.P. (2003) Allelic variation in gene expression is common in the human genome. Genome Res., 13, 1855–1862.[Abstract/Free Full Text]

  7. Pastinen, T., Sladek, R., Gurd, S., Sammak, A., Ge, B., Lepage, P., Lavergne, K., Villeneuve, A., Gaudin, T., Brandstrom, H. et al. (2004) A survey of genetic and epigenetic variation affecting human gene expression. Physiol. Genomics, 16, 184–193.[Abstract/Free Full Text]

  8. Yan, H., Yuan, W., Velculescu, V.E., Vogelstein, B. and Kinzler, K.W. (2002) Allelic variation in human gene expression. Science, 297, 1143.[Free Full Text]

  9. Bray, N.J., Buckland, P.R., Williams, N.M., Williams, H.J., Norton, N., Owen, M.J. and O'Donovan, M.C. (2003) A haplotype implicated in schizophrenia susceptibility is associated with reduced COMT expression in human brain. Am. J. Hum. Genet., 73, 152–161.[CrossRef][ISI][Medline]

  10. Knight, J.C., Keating, B.J., Rockett, K.A. and Kwiatkowski, D.P. (2003) In vivo characterization of regulatory polymorphisms by allele-specific quantification of RNA polymerase loading. Nat. Genet., 33, 469–475.[CrossRef][ISI][Medline]

  11. Altshuler, D., Brooks, L.D., Chakravarti, A., Collins, F.S., Daly, M.J., Donnelly, P.; International HapMap Consortium (2005) A haplotype map of the human genome. Nature, 437, 1299–1320.[CrossRef][Medline]

  12. Ge, B., Gurd, S., Gaudin, T., Dore, C., Lepage, P., Harmsen, E., Hudson, T.J. and Pastinen, T. (2005) Genome Res., 15, 1584–1591.[Abstract/Free Full Text]

  13. Bonnevie-Nielsen, V., Leigh, F.L., Lu, S., Zheng, D.J., Li, M., Martensen, P.M., Nielsen, T.B., Beck-Nielsen, H., Lau, Y.L. and Pociot, F. (2005) Variation in antiviral 2',5'-oligoadenylate synthetase (2'5'AS) enzyme activity is controlled by a single-nucleotide polymorphism at a splice-acceptor site in the OAS1 gene. Am. J. Hum. Genet., 76, 623–633.[CrossRef][ISI][Medline]

  14. Abecasis, G.R., Cherny, S.S., Cookson, W.O. and Cardon, L.R. (2002) Merlin—rapid analysis of dense genetic maps using sparse gene flow trees. Nat. Genet., 30, 97–101.[CrossRef][ISI][Medline]

  15. Hinds, D.A., Stuve, L.L., Nilsen, G.B., Halperin, E., Eskin, E., Ballinger, D.G., Frazer, K.A. and Cox, D.R. (2005) Whole-genome patterns of common DNA variation in three human populations. Science, 307, 1072–1079.[Abstract/Free Full Text]

  16. Arking, D.E., Becker, D.M., Yanek, L.R., Fallin, D., Judge, D.P., Moy, T.F., Becker, L.C. and Dietz, H.C. (2003) KLOTHO allele status and the risk of early-onset occult coronary artery disease. Am. J. Hum. Genet., 72, 1154–1161.[CrossRef][ISI][Medline]

  17. Fornage, M., Lee, C.R., Doris, P.A., Bray, M.S., Heiss, G., Zeldin, D.C.and Boerwinkle, E. (2005) The soluble epoxide hydrolase gene harbors sequence variation associated with susceptibility to and protectionfrom incident ischemic stroke. Hum. Mol. Genet., 14, 2829–2837.[Abstract/Free Full Text]

  18. Poon, A.H., Laprise, C., Lemire, M., Montpetit, A., Sinnett, D., Schurr, E. and Hudson, T.J. (2004) Association of vitamin D receptor genetic variants with susceptibility to asthma and atopy. Am. J. Respir. Crit Care Med., 170, 967–973.[Abstract/Free Full Text]

  19. Zhou, X.F., Cui, J., DeStefano, A.L., Chazaro, I., Farrer, L.A., Manolis, A.J., Gavras, H. and Baldwin, C.T. (2005) Polymorphisms in the promoterregion of catalase gene and essential hypertension. Dis. Markers, 21, 3–7.[ISI][Medline]

  20. Knight, J.C., Keating, B.J. and Kwiatkowski, D.P. (2004) Allele-specific repression of lymphotoxin-alpha by activated B cell factor-1. Nat. Genet., 36, 394–399.[CrossRef][ISI][Medline]

  21. Barrett, J.C., Fry, B., Maller, J. and Daly, M.J. (2005) Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics, 21, 263–265.[Abstract/Free Full Text]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Schizophr BullHome page
N. J. Bray
Gene Expression in the Etiology of Schizophrenia
Schizophr Bull, May 1, 2008; 34(3): 412 - 418.
[Abstract] [Full Text] [PDF]


Home page
Hum Mol GenetHome page
L. A. Lettice, A. E. Hill, P. S. Devenney, and R. E. Hill
Point mutations in a distant sonic hedgehog cis-regulator generate a variable regulatory output responsible for preaxial polydactyly
Hum. Mol. Genet., April 1, 2008; 17(7): 978 - 985.
[Abstract] [Full Text] [PDF]


Home page
Physiol. GenomicsHome page
Y. Bosse, K. Maghni, and T. J. Hudson
1{alpha},25-Dihydroxy-vitamin D3 stimulation of bronchial smooth muscle cells induces autocrine, contractility, and remodeling processes
Physiol Genomics, April 24, 2007; 29(2): 161 - 168.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
L. Milani, M. Gupta, M. Andersen, S. Dhar, M. Fryknas, A. Isaksson, R. Larsson, and A.-C. Syvanen
Allelic imbalance in gene expression as a guide to cis-acting regulatory single nucleotide polymorphisms in cancer cells
Nucleic Acids Res., March 12, 2007; 35(5): e34 - e34.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
N. M. Springer and R. M. Stupar
Allelic variation and heterosis in maize: How do two halves make more than a whole?
Genome Res., March 1, 2007; 17(3): 264 - 275.
[Abstract] [Full Text] [PDF]


Home page
DiabetesHome page
H.-Q. Qu, Y. Lu, L. Marchand, F. Bacot, R. Frechette, M.-C. Tessier, A. Montpetit, and C. Polychronakos
Genetic Control of Alternative Splicing in the TAP2 Gene: Possible Implication in the Genetics of Type 1 Diabetes
Diabetes, January 1, 2007; 56(1): 270 - 275.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
J. T. Forton, I. A. Udalova, S. Campino, K. A. Rockett, J. Hull, and D. P. Kwiatkowski
Localization of a long-range cis-regulatory element of IL13 by allelic transcript ratio mapping
Genome Res., January 1, 2007; 17(1): 82 - 87.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
R. M. Stupar and N. M. Springer
Cis-transcriptional Variation in Maize Inbred Lines B73 and Mo17 Leads to Additive Expression Patterns in the F1 Hybrid
Genetics, August 1, 2006; 173(4): 2199 - 2210.
[Abstract] [Full Text] [PDF]


Home page
Hum Mol GenetHome page
T. Pastinen, B. Ge, and T. J. Hudson
Influence of human genome polymorphism on gene expression.
Hum. Mol. Genet., April 15, 2006; 15(suppl_1): R9 - R16.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Material
Right arrowOA All Versions of this Article:
14/24/3963    most recent
ddi420v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (30)
Google Scholar
Right arrow Articles by Pastinen, T.
Right arrow Articles by Hudson, T. J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Pastinen, T.
Right arrow Articles by Hudson, T. J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?