Skip Navigation


Human Molecular Genetics Advance Access originally published online on April 27, 2006
Human Molecular Genetics 2006 15(12):1931-1937; doi:10.1093/hmg/ddl115
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Material
Right arrowOA All Versions of this Article:
15/12/1931    most recent
ddl115v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (3)
Google Scholar
Right arrow Articles by Carlson, C. S.
Right arrow Articles by Nickerson, D. A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Carlson, C. S.
Right arrow Articles by Nickerson, D. A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 2006 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Direct detection of null alleles in SNP genotyping data

Christopher S. Carlson, Joshua D. Smith, Ian B. Stanaway, Mark J. Rieder and Deborah A. Nickerson*

Department of Genome Sciences, University of Washington, 1705 NE Pacific Street, Seattle, WA 98195-7730, USA

* To whom correspondence should be addressed. Tel: +1 2066857334; Email: debnick{at}u.washington.edu

Received January 23, 2006; Accepted April 25, 2006


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 MATERIALS AND METHODS
 SUPPLEMENTARY MATERIAL
 REFERENCES
 
Pinpointing genetic associations in the human genome relies heavily on the accuracy of the underlying genotype data. Null alleles can generate significant inaccuracies in genotype data and can negatively affect the statistical power of a study. Existing quality control (QC) tests, including tests of Hardy–Weinberg equilibrium, are not sensitive enough to detect the presence of even moderately frequent null alleles in the data. We show that direct analysis of raw data from a quantitative genotyping platform can detect up to 75% of null alleles, even at frequencies below the sensitivity of more traditional methods. Detecting unexpected null alleles not only has benefits in QC of genotype data but may also be valuable in detecting rare, functional null alleles that would otherwise be missed.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 MATERIALS AND METHODS
 SUPPLEMENTARY MATERIAL
 REFERENCES
 
Unexpected alleles may exist at any polymorphism, and these unknown or ‘null’ alleles can interfere with accurate genotyping of the expected alleles, with potentially negative impacts on the power of association studies (1Go,2Go). Null alleles can be deletions spanning a polymorphic site (3Go–5Go), secondary polymorphisms interfering with genotyping at the primary polymorphic target and even unexpected alleles at the primary polymorphism (such as triallelic sites) (6Go). All of these are important potential sources of reproducible, but inaccurate, genotypes. Similar challenges are posed by polymorphisms within segmental duplications, where the number of copies of the surrounding sequence itself can be variable (7Go–9Go). Although null alleles could be a potentially important source of genotyping error, these alleles are difficult to detect because null allele heterozygotes are indistinguishable from the expected homozygotes on most genotyping platforms. Large pedigree-based study designs can detect null alleles as non-Mendelian segregation errors, but studies based on trios detect only a fraction of null alleles (10Go). Null alleles attributable to large deletions can be detected as clusters of markers that violate Hardy–Weinberg, as has been noted in the HapMap data (11Go), but most studies do not have adequate data density for this analysis.

Null alleles can substantially impact the power of association studies. Kang et al. (12Go) investigated the effects of a variety of genotyping errors on association study power and found that misclassification of heterozygotes as rare homozygotes carries a higher cost than misclassification as common homozygotes. This is an important consideration, as secondary single nucleotide polymorphisms (SNPs) (null alleles) are more likely to be associated with the common allele, and therefore, heterozygote misclassification will tend to be biased toward rare homozygote calls. Power estimates under a model where misclassification of heterozygotes is random suggest that for every 1% increase in error rate, an increase of roughly 3% in sample size is required to maintain constant power (13Go), and the power loss is probably greater for misclassification biased toward rare homozygotes. Conversely, null alleles can inflate the rate of false positives in an association study, by reducing the effective number of alleles surveyed and therefore inflating the variance of allele frequency estimates in both cases and controls. Thus, although null alleles can have substantial impact on study outcome, rare null alleles can easily be missed using standard tests of Hardy–Weinberg equilibrium (HWE).

Most genotyping platforms for SNPs are built upon the principle of measuring the relative signal strength of two expected alleles (14Go–18Go). This signal is usually plotted in Cartesian coordinates with the X-axis representing the signal for allele A and the Y-axis representing the signal for allele B. In a clean assay of a common polymorphism, four clusters are expected for a two-allele system with alleles A and B: a cluster corresponding to the AA homozygote with strong signal from allele A and weak signal from allele B, a second cluster corresponding to the AB heterozygote with intermediate strength signal from both alleles, a third cluster corresponding to the BB homozygote with weak signal from allele A and strong signal from allele B and a fourth cluster located near the origin corresponding to failed reactions with weak signal from both alleles.

We report an extension of this Cartesian clustering analysis to identify null alleles de novo in genotype data and demonstrate that this analytic technique can identify null alleles because of secondary polymorphism(s) near an SNP, deletions of a region containing an SNP and an unexpected third allele at an SNP (triallelic SNPs).


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 MATERIALS AND METHODS
 SUPPLEMENTARY MATERIAL
 REFERENCES
 
In a survey of allele frequencies at 1536 SNPs in a multiethnic population using the Illumina BeadArray genotyping technology (19Go,20Go), we observed some unusual patterns beyond the four typical clusters in the data, where five or more clusters were observed in Cartesian plots of the data for some markers. These unexpected patterns appear to be marker specific and reproducible. One of the unexpected patterns observed at a number of markers revealed two additional clusters in the data with weaker signal strength than the expected AA and BB clusters (Fig. 1). This cluster pattern corresponds to an X-linked marker, with hemizygous males carrying only one allele (A- or B-). Because a substantial proportion of the samples genotyped were male (40.3% or 193 of 480), all of these markers showed a significant deviation from HWE. A blinded screen of 1536 SNPs looking for this clustering pattern, without regard to HWE, correctly identified a large majority (31 out of 44 or >70%) of X-linked SNPs. Although the remaining 13 X-linked SNPs do show decreased signal in males, the male and female clusters overlapped significantly and were not identified in a blinded screen.


Figure 1151
View larger version (13K):
[in this window]
[in a new window]
 
Figure 1. Normalized Illumina data in Cartesian coordinates for TLR7-7212 A/G (rs1634322), an X-linked marker. Signal from the A allele is shown along the X-axis, and signal from the G allele is shown along the Y-axis. Male samples are shown as green squares, and female samples shown as yellow circles. Hemizygous males show reduced signal relative to homozygous females of either genotype.

 
Because hemizygous males could be identified in this manner, we investigated whether null alleles at autosomal markers could be detected in a similar manner. Four sites previously identified as high-frequency triallelic SNPs by the SeattleSNPs (http://pga.gs.washington.edu) or Innate Immunity (http://innateimmunity.net) programs for genomic applications showed a similar pattern to X-linked markers: TRAF2-36728 (rs17250567), IFNAR2-25015 (rs17860225), IL5RA-10534 (rs17879701) and TLR1-5661 (rs4540055, Fig. 2). The triallelic SNPs showed up to six clusters, five analogous to the clusters seen in X-linked markers, with the sixth cluster showing low signal for both expected alleles, corresponding to homozygotes for the unexpected third allele. Thus, unexpected alleles at autosomal markers can also be identified.


Figure 1152
View larger version (18K):
[in this window]
[in a new window]
 
Figure 2. Normalized Illumina data in Cartesian coordinates for TLR1-5661 A/G/T (rs4540055), a polymorphism with three known alleles: A, G and T. All six genotypes at this triallelic SNP were observed: forty-eight samples with known genotype from resequencing data are colored, whereas the remaining samples are shown in gray. The Illumina assay was designed to the A and G alleles, so the Y-axis represents signal from the G allele and the X-axis represents signal from the A allele. Thus, TT homozygotes (blue) show very little signal from either expected allele and cluster near the origin, AT heterozygotes (purple) show reduced A allele signal relative to AA homozygotes (red) and GT heterozygotes (green) show reduced G allele signal relative to GG homozygotes (yellow). All three alleles are frequent in African-American individuals (27% A, 40% G and 33% T), whereas only the A and T alleles were observed in European individuals (17% A and 83% T).

 
In addition to the four triallelic SNPs, we also identified 14 other autosomal SNPs that showed clustering patterns consistent with a null allele. This included a series of four adjacent SNPs in the IL8RA locus where the same set of samples were heterozygous for the null allele across all four loci, as well as three samples putatively homozygous for the null allele at all loci: IL8RA-6747 (rs1008562), 6831 (rs1008563, Fig. 3), 6935 (rs3092967) and 7628 (rs1467142), SNP numbering relative to GenBank record AY651785 [GenBank] . Flanking markers at 5686 (rs16858794) and 12335 (rs4672875) did not show the null allele pattern, so we designed a series of nested primers to screen for a small deletion spanning at least 6747 through 7628. Using a pair of primers to amplify the segment from 6244 to 8663, heterozygotes for the null allele were observed with two bands and the three homozygotes showed only the low molecular weight band (Fig. 4). Resequencing of the smaller PCR product confirmed that the null allele pattern for the four IL8RA SNPs is attributable to a deletion of 1367 base pairs, from position 6501 to 7868, spanning all four SNPs. Importantly, this null allele was not detected by traditional quality control (QC) methods because it did not cause a significant departure from HWE in the 83 African-American samples genotyped, even though it was modestly frequent (11% frequency).


Figure 1153
View larger version (14K):
[in this window]
[in a new window]
 
Figure 3. Normalized Illumina data for IL8RA-6831 C/T (rs1008563). Two alleles were typed: C and T. Patterns consistent with a null allele were observed at several adjacent SNPs (rs1008562, rs3092967 and rs1467142), suggesting a small deletion. Genotypes for the deletion were scored by PCR amplification of the region and agarose gel electrophoresis, confirming that insertion homozygotes show the strongest signal (yellow), heterozygotes show reduced signal (green) and three of four deletion homozygotes by gel electrophoresis (blue) show very little signal for either allele.

 

Figure 1154
View larger version (67K):
[in this window]
[in a new window]
 
Figure 4. Gel results for a PCR product that spans the IL8RA deletion. PCR products were compared with a 123 bp molecular weight ladder (L) on a 1% agarose gel. The high molecular weight PCR product (4737 bp) corresponds to the inserted allele (+), whereas the low molecular weight PCR product (3337) corresponds to the deletion allele (–). The deletion homozygote (– –) has only the low weight band, whereas the insertion homozygotes (+ +) show only the high molecular weight band and indel heterozygotes (+ –) show both bands. The weaker signal from the large PCR product probably reflects less efficient amplification because of competition from the smaller PCR product. Inferred genotypes from the Illumina data are shown above each well and correlate perfectly with the PCR genotype.

 
To explore the origin of the patterns consistent with null alleles at the remaining 10 primary SNPs, a 500 bp region flanking each SNP was PCR amplified and resequenced in a panel of 96 individuals, including at least one null allele heterozygote (Table 1; Supplementary Material, Fig. S1). At all 10 positions, a second polymorphism was identified, which correlated perfectly with the identified null pattern. At six of 10 locations, the secondary polymorphism was not previously described in the database (dbSNP). As expected, the novel null alleles were generally of lower frequency in the sequencing data than those that have previously been reported. In one case (ITGA2 59160, Supplementary Material, Fig. S2j), there were two underlying secondary polymorphisms interfering with the signal: one (59158) producing the standard null allele pattern and the other (59185) shifting four double heterozygotes subtly but significantly out of the heterozygote cluster.


View this table:
[in this window]
[in a new window]
 
Table 1. Loci with putative null alleles
 
Although the secondary polymorphism at IL8RA is a large deletion, the detected secondary polymorphisms at the other 10 loci were a mix of five secondary base substitutions, two small indels, a 17 bp insertion and two relatively large insertions (>100 bp in length). All of these secondary polymorphisms were within 25 bp of the primary polymorphism. The Illumina Golden Gate assay requires that an allele-specific oligonucleotide first extend across a gap of 1–25 bp and then ligate to a common oligonucleotide (21Go), so we assessed where the secondary polymorphisms fell: under the allele-specific oligo, in the gap or under the common oligo. Nine were located in the common oligo and one was located immediately adjacent to the polymorphic site on the allelic oligo, so it is clear that secondary polymorphisms can disrupt annealing of either of the oligos required for the assay.

We also assessed whether the null alleles could be detected using a {chi}2test for HWE, as well as the null allele test (NAT) described by Jorgenson et al. (10Go). Because both of these approaches assume unstratified populations, we subdivided the data by ethnicity prior to applying the tests. Only one of the 10 loci showed a significant departure from HWE expectations (HMGCR-25599 in Asians, P=5.5x10–5), uncorrected for multiple tests. The NAT also detected this locus in Asians (p.null=0.22, the highest observed value). Although several other loci nominally showed p.null for the NAT between 0.05 and 0.12, the second highest p.null was a false positive (0.16 at IL20-3374 in Europeans), so the NAT test did not perform appreciably better than a simple {chi}2 test for HWE.


    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 MATERIALS AND METHODS
 SUPPLEMENTARY MATERIAL
 REFERENCES
 
As high-throughput genotyping platforms gain popularity, the probability that some genotypes will be inaccurate due to secondary polymorphism will increase substantially. We demonstrate the detection of a variety of null alleles on one current genotyping platform, with five insertion/deletion null polymorphisms and six secondary SNPs near the primary SNPs. Importantly, only four of the polymorphisms responsible for null alleles had previously been reported in dbSNP. Thus, null alleles cannot be avoided simply by using the existing database to predict the locations of secondary polymorphisms. Furthermore, only two had been identified by direct resequencing of these regions in panels of less than 50 chromosomes, so even sample sequencing does not efficiently identify the rare polymorphisms responsible for null alleles. Therefore, approaches capable of directly detecting null alleles offer significant advantages over other methods, because null alleles can be detected as part of data QC.

The mechanism for allelic quantitation on this platform could be related to the competitive amplification of all SNPs using a single pair of universal primers. Deletion null alleles would reduce the available template for an SNP by 50% in null allele heterozygotes, and triallelic null alleles would do the same by failing to gap ligate the unexpected allele. Disruption of oligonucleotide annealing by secondary SNPs beneath the oligos would generate similar patterns of reduced signal. It seems likely that duplication alleles will also be detectable using this data, although this has yet to be confirmed.

Although we explored this analytic approach using data from the Illumina genotyping platform, this method should be extensible to any quantitative genotyping technique adequate for pooled allele frequency estimation, such as quantitative PCR (22Go,23Go), primer extension (24Go,25Go) or hybridization array (26Go,27Go). However, all such techniques will require extremely accurate quantitation of the input DNA (28Go). Thus, highly quantitative and multiplexed assays may have an advantage, because signal strength at a single SNP can be normalized against signal strength in a large number of unlinked assays. Therefore, we would expect this approach to work effectively on other high multiplex genotyping platforms, such as Parallele (29Go) or Affymetrix chips (30Go).

In conclusion, the identification of null alleles a priori in genotyping data clearly facilitate not only the QC of genotype data in genetic association studies, but has the added benefit of allowing investigators to score hemizygous genotypes correctly. Providing the haplotype phase between the secondary and the primary polymorphisms can be established, it is also possible that the ‘correct’ genotype at the primary polymorphism can be imputed from the null allele pattern. However, we would suggest a better solution is to simply score the site as a triallelic polymorphism, with two expected alleles and a null allele. This approach will allow investigators to assess disease associations with the null allele as well as the expected alleles, which will be of particular interest in studies of exonic polymorphism, where rare null alleles could translate into phenotypic changes. For example, rare phenotypic null alleles can be quite important in studies with samples from the extreme tails of a phenotypic distribution (31Go). Thus, detecting rare null alleles indirectly might provide a valuable new tool in direct studies of sequence variation.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 MATERIALS AND METHODS
 SUPPLEMENTARY MATERIAL
 REFERENCES
 
Samples
DNA samples were obtained from the Coriell Cell Repositories (Camden, NJ, USA) representing a diverse panel of ethnic groups. Samples came from the following diversity panels.

AA100 African American: NA17101, NA17102, NA17103, NA17104, NA17105, NA17106, NA17107, NA17108, NA17109, NA17110, NA17111, NA17112, NA17113, NA17114, NA17115, NA17116, NA17133, NA17134, NA17135, NA17136, NA17137, NA17138, NA17139, NA17140, NA17117, NA17117, NA17118, NA17118, NA17119, NA17119, NA17120, NA17120, NA17121, NA17121, NA17122, NA17122, NA17123, NA17123, NA17124, NA17124, NA17125, NA17126, NA17127, NA17128, NA17129, NA17130, NA17131, NA17132, NA17141, NA17142, NA17143, NA17144, NA17145, NA17146, NA17147, NA17148, NA17149, NA17150, NA17151, NA17152, NA17153, NA17154, NA17155, NA17156, NA17157, NA17158, NA17159, NA17160, NA17161, NA17162, NA17163, NA17164, NA17165, NA17166, NA17167, NA17168, NA17169, NA17170, NA17171, NA17172, NA17173, NA17174, NA17175, NA17176, NA17177, NA17178, NA17179, NA17180, NA17181, NA17182, NA17183, NA17184, NA17185 and NA17186.

HD17 South American Andes: NA17301, NA17302, NA17303, NA17304, NA17305, NA17306, NA17307, NA17308, NA17309 and NA17310.

European CEPH families: NA06990, NA07019, NA07348, NA07349, NA10830, NA10831, NA10842, NA10843, NA10844, NA10845, NA10848, NA10850, NA10851, NA10852, NA10853, NA10854, NA10857, NA10858, NA10860, NA10861, NA12547, NA12548, NA12560 and NA17201.

HapMap Chinese: NA18524, NA18526, NA18526, NA18529, NA18532, NA18537, NA18540, NA18542, NA18545, NA18545, NA18547, NA18550, NA18552, NA18555, NA18558, NA18561, NA18562, NA18562, NA18563, NA18564, NA18566, NA18570, NA18571, NA18572, NA18573, NA18576, NA18577, NA18579, NA18582, NA18592, NA18593, NA18594, NA18603, NA18605, NA18608, NA18609, NA18609, NA18611, NA18612, NA18620, NA18621, NA18622, NA18623, NA18624, NA18632, NA18633, NA18635, NA18636 and NA18637.

HapMap Japanese: NA18940, NA18942, NA18943, NA18944, NA18945, NA18947, NA18948, NA18949, NA18951, NA18952, NA18953, NA18956, NA18959, NA18960, NA18961, NA18964, NA18965, NA18966, NA18967, NA18968, NA18969, NA18970, NA18971, NA18972, NA18973, NA18974, NA18975, NA18976, NA18978, NA18980, NA18981, NA18987, NA18990, NA18991, NA18992, NA18994, NA18995, NA18996, NA18997, NA18998, NA18999, NA19000, NA19003, NA19005 and NA19007.

HapMap European: NA06985, NA06993, NA06994, NA07000, NA07022, NA07034, NA07055, NA07056, NA07345, NA07357, NA11829, NA11830, NA11831, NA11832, NA11839, NA11840, NA11881, NA11882, NA11882, NA11992, NA11993, NA11994, NA11994, NA11995, NA11995, NA12003, NA12004, NA12005, NA12006, NA12043, NA12044, NA12056, NA12057, NA12144, NA12145, NA12146, NA12154, NA12155, NA12156, NA12234, NA12236, NA12239, NA12248, NA12249, NA12264, NA12716, NA12717, NA12750, NA12751, NA12760, NA12761, NA12762, NA12763, NA12812, NA12813, NA12814, NA12815, NA12872, NA12873, NA12874, NA12875, NA12891, NA12892 and NA12892.

HapMap Yoruban: NA18501, NA18502, NA18502, NA18504, NA18505, NA18507, NA18508, NA18516, NA18517, NA18522, NA18523, NA18852, NA18853, NA18855, NA18856, NA18858, NA18859, NA18861, NA18862, NA18870, NA18871, NA18912, NA18913, NA19092, NA19093, NA19098, NA19099, NA19101, NA19102, NA19116, NA19119, NA19127, NA19128, NA19130, NA19131, NA19137, NA19138, NA19140, NA19141, NA19143, NA19144, NA19152, NA19153, NA19159, NA19160, NA19171, NA19172, NA19192, NA19193, NA19200, NA19201, NA19201, NA19203, NA19204, NA19206, NA19207, NA19209, NA19210, NA19222, NA19223, NA19223, NA19238 and NA19239.

Mayan: NA10975, NA10976, NA10978 and NA10979.

HD08 Mexican: NA17061, NA17062, NA17063, NA17064, NA17065, NA17066, NA17067, NA17068, NA17069 and NA17070.

MA100 Mexican American: NA17438, NA17439, NA17440, NA17441, NA17442, NA17443, NA17444, NA17445, NA17446, NA17448, NA17449, NA17450, NA17451, NA17452, NA17453, NA17454, NA17456, NA17457, NA17458, NA17459, NA17460, NA17461, NA17462, NA17463, NA17465, NA17466, NA17467, NA17614, NA17615, NA17616, NA17617, NA17618, NA17619, NA17622, NA17624, NA17626, NA17629, NA17630, NA17631 and NA17632.

HD28 Mexican Indian: NA17392, NA17393, NA17394, NA17395 and NA17396.

HD11 North Saharan African: NA17378, NA17379, NA17380, NA17381, NA17382, NA17383 and NA17384.

HD18 South American Indian: NA17311, NA17312, NA17313, NA17314, NA17315, NA17316, NA17317, NA17318, NA17319 and NA17320.

HD100A Chinese American: NA17733, NA17734, NA17735, NA17736, NA17737, NA17738, NA17739, NA17740, NA17741, NA17742, NA17743, NA17744, NA17745, NA17746, NA17747, NA17749, NA17752, NA17753, NA17754, NA17755, NA17756, NA17757, NA17759 and NA17761.

HD09 Puerto Rican: NA17071, NA17072, NA17073, NA17074, NA17075, NA17076, NA17077, NA17078, NA17079 and NA17080.

HD13 South East Asian: NA17081, NA17082, NA17083, NA17084, NA17085, NA17086, NA17087, NA17088, NA17089 and NA17090.

HD12 Sub-Saharan African: NA17341, NA17342, NA17343, NA17344, NA17345, NA17346, NA17347, NA17348 and NA17349.

Genotyping
DNA was quantitated according to Illumina specifications using PicoGreen (Molecular Probes) and a SpectraMax 96 channel fluorometer (Molecular Devices). Genotyping reactions were assembled up according to standard Illumina Golden Gate assay protocols (21Go): in brief, using 250 ng of biotinylated DNA per sample as template, SNP-specific oligonucleotides containing both detection specific sequences and universal primer sequences were hybridized, extended and ligated to a common oligonucleotide containing a universal primer sequence. Ligated products were then amplified with a universal primer set. Genotypes were determined by hybridizing the amplified products to a bead array complementary to the sequence specific tags and fluorescent across the bead array determined using a BeadStation 500GX array reader (Illumina). Data were collected with BeadScan v2.3.0.10 software and analyzed using GenCall v1.2.2 (Illumina). After automatic exclusion of samples with weak or ambiguous signal, fluorescence data from the remaining samples was exported and analyzed further in Excel (Microsoft).

Null allele clustering.
The following criteria were applied in a manually implemented but systematic screen for null alleles. For a normal marker, the X and Y signal of each sample can be described as a vector from the origin with a specified angle and length, homozygotes tend to cluster with angles of ~0° and ~90° and heterozygotes cluster at ~45°. The expected positions of the null allele clusters are therefore at either 0° or 90°, but with roughly 50% of the signal strength (vector length) of normal homozygotes. We used this as a first stage heuristic to manually screen for SNPs where a subset of samples appeared to fit this criterion. In a second stage, we then examined the low-signal strength samples at five or more other SNPs, to determine whether the unusual signal was systematic for that sample or restricted to a single SNP. Markers that satisfied both of these criteria were judged to be strong candidates for loci that have a null allele.

Resequencing
For each suspected null allele, PCR amplicons were designed using the program PCRoverlap (32Go). Templates were amplified using the Elongase kit (Invitrogen) on Tetrad thermal cyclers (MJR). Sequence data were collected using Big Dye Terminator chemistry (Applied Biosystems) on ABI 3730 instruments (Applied Biosystems). Sequence data were processed using Phred (33Go,34Go) and Phrap, polymorphic sites identified using Polyphred v 5.03 (35Go) viewed in Consed (36Go) to confirm. At insertion–deletion polymorphisms, genotype was manually scored on both strands for confirmation. Detailed protocols for PCR and sequencing are available (http:/pga.gs.washington.edu/protocols.html).


    SUPPLEMENTARY MATERIAL
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 MATERIALS AND METHODS
 SUPPLEMENTARY MATERIAL
 REFERENCES
 
Supplementary Material is available at HMG Online.


    ACKNOWLEDGEMENTS
 
Supported by a Program for Genomic Applications from the National Heart, Lung and Blood Institute (HL66682 and HL66642 to D.A.N. and M.J.R.). The authors would like to thank Cindy Shephard, Michelle Wong and Suzanne daPonte for their efforts in generating the data for this analysis.

Conflict of Interest statement. None declared.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 MATERIALS AND METHODS
 SUPPLEMENTARY MATERIAL
 REFERENCES
 

  1. Sawcer, S.J., Maranian, M., Singlehurst, S., Yeo, T., Compston, A., Daly, M.J., De Jager, P.L., Gabriel, S., Hafler, D.A., Ivinson, A.J. et al. (2004) Enhancing linkage analysis of complex disorders: an evaluation of high-density genotyping. Hum. Mol. Genet., 13, 1943–1949.[Abstract/Free Full Text]

  2. Rice, K.M. and Holmans, P. (2003) Allowing for genotyping error in analysis of unmatched case–control studies. Ann. Hum. Genet., 67, 165–174.[CrossRef][Web of Science][Medline]

  3. Conrad, D.F., Andrews, T.D., Carter, N.P., Hurles, M.E. and Pritchard, J.K. (2006) A high-resolution survey of deletion polymorphism in the human genome. Nat. Genet., 38, 75.[CrossRef][Web of Science][Medline]

  4. Hinds, D.A., Kloek, A.P., Jen, M., Chen, X. and Frazer, K.A. (2006) Common deletions and SNPs are in linkage disequilibrium in the human genome. Nat. Genet., 38, 82.[Web of Science][Medline]

  5. McCarroll, S.A., Hadnott, T.N., Perry, G.H., Sabeti, P.C., Zody, M.C., Barrett, J.C., Dallaire, S., Gabriel, S.B., Lee, C., Daly, M.J. et al. (2006) Common deletion polymorphisms in the human genome. Nat. Genet., 38, 86.[Web of Science][Medline]

  6. Carlson, C.S., Eberle, M.A., Rieder, M.J., Yi, Q., Kruglyak, L. and Nickerson, D.A. (2004) Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. Am. J. Hum. Genet., 74, 106–120.[CrossRef][Web of Science][Medline]

  7. Sharp, A.J., Locke, D.P., McGrath, S.D., Cheng, Z., Bailey, J.A., Vallente, R.U., Pertz, L.M., Clark, R.A., Schwartz, S., Segraves, R. et al. (2005) Segmental duplications and copy-number variation in the human genome. Am. J. Hum. Genet., 77, 78–88.[CrossRef][Web of Science][Medline]

  8. Fredman, D., White, S.J., Potter, S., Eichler, E.E., Den Dunnen, J.T. and Brookes, A.J. (2004) Complex SNP-related sequence variation in segmental genome duplications. Nat. Genet., 36, 861–866.[CrossRef][Web of Science][Medline]

  9. Newman, T.L., Rieder, M.J., Morrison, V.A., Sharp, A.J., Smith, J.D., Sprague, L.J., Kaul, R., Carlson, C.S., Olson, M.V., Nickerson, D.A. et al. (2006) High-throughput genotyping of intermediate-size structural variation. Hum. Mol. Genet., 15, 1159–1167.[Abstract/Free Full Text]

  10. Jorgenson, E., Tang, H., Gadde, M., Province, M., Leppert, M., Kardia, S., Schork, N., Cooper, R., Rao, D.C., Boerwinkle, E. et al. (2005) Ethnicity and human genetic linkage maps. Am. J. Hum. Genet., 76, 276–290.[CrossRef][Web of Science][Medline]

  11. Conrad, D.F., Andrews, T.D., Carter, N.P., Hurles, M.E. and Pritchard, J.K. (2006) A high-resolution survey of deletion polymorphism in the human genome. Nat. Genet., 38, 75–81.[CrossRef][Web of Science][Medline]

  12. Kang, S.J., Gordon, D. and Finch, S.J. (2004) What SNP genotyping errors are most costly for genetic association studies? Genet. Epidemiol., 26, 132–141.[CrossRef][Web of Science][Medline]

  13. Gordon, D., Finch, S.J., Nothnagel, M. and Ott, J. (2002) Power and sample size calculations for case–control genetic association tests when errors are present: application to single nucleotide polymorphisms. Hum. Hered., 54, 22–33.[CrossRef][Web of Science][Medline]

  14. Nikiforov, T.T., Rendle, R.B., Goelet, P., Rogers, Y.H., Kotewicz, M.L., Anderson, S., Trainor, G.L. and Knapp, M.R. (1994) Genetic bit analysis: a solid phase method for typing single nucleotide polymorphisms. Nucleic Acids Res., 22, 4167–4175.[Abstract/Free Full Text]

  15. Livak, K.J. (1999) Allelic discrimination using fluorogenic probes and the 5' nuclease assay. Genet. Anal., 14, 143–149.[Medline]

  16. Chen, X. and Kwok, P.Y. (1999) Homogeneous genotyping assays for single nucleotide polymorphisms with fluorescence resonance energy transfer detection. Genet. Anal., 14, 157–163.[Medline]

  17. Tobe, V.O., Taylor, S.L. and Nickerson, D.A. (1996) Single-well genotyping of diallelic sequence variations by a two-color ELISA-based oligonucleotide ligation assay. Nucleic Acids Res., 24, 3728–3732.[Abstract/Free Full Text]

  18. Sapolsky, R.J., Hsie, L., Berno, A., Ghandour, G., Mittmann, M. and Fan, J.B. (1999) High-throughput polymorphism screening and genotyping with high-density oligonucleotide arrays. Genet Anal, 14, 187–192.[Medline]

  19. Fan, J.B., Oliphant, A., Shen, R., Kermani, B.G., Garcia, F., Gunderson, K.L., Hansen, M., Steemers, F., Butler, S.L., Deloukas, P. et al. (2003) Highly parallel SNP genotyping. Cold Spring Harb. Symp. Quant. Biol., 68, 69–78.[CrossRef][Web of Science][Medline]

  20. Oliphant, A., Barker, D.L., Stuelpnagel, J.R. and Chee, M.S. (2002) BeadArray technology: enabling an accurate, cost-effective approach to high-throughput genotyping. Biotechniques, (suppl.), 56–58, 60–61.

  21. Shen, R., Fan, J.B., Campbell, D., Chang, W., Chen, J., Doucet, D., Yeakley, J., Bibikova, M., Wickham Garcia, E., McBride, C. et al. (2005) High-throughput SNP genotyping on universal bead arrays. Mutat. Res., 573, 70–82.[Web of Science][Medline]

  22. Germer, S., Holland, M.J. and Higuchi, R. (2000) High-throughput SNP allele-frequency determination in pooled DNA samples by kinetic PCR. Genome Res., 10, 258–266.[Abstract/Free Full Text]

  23. Xiao, M. and Kwok, P.Y. (2005) Kinetic fluorescence-quenching detection assay for allele frequency estimation. Methods Mol. Biol., 311, 115–123.[Medline]

  24. Norton, N., Williams, N.M., Williams, H.J., Spurlock, G., Kirov, G., Morris, D.W., Hoogendoorn, B., Owen, M.J. and O'Donovan, M.C. (2002) Universal, robust, highly quantitative SNP allele frequency measurement in DNA pools. Hum. Genet., 110, 471–478.[CrossRef][Web of Science][Medline]

  25. Gruber, J.D., Colligan, P.B. and Wolford, J.K. (2002) Estimation of single nucleotide polymorphism allele frequency in DNA pools by using Pyrosequencing. Hum. Genet., 110, 395–401.[CrossRef][Web of Science][Medline]

  26. Fan, J.B., Chen, X., Halushka, M.K., Berno, A., Huang, X., Ryder, T., Lipshutz, R.J., Lockhart, D.J. and Chakravarti, A. (2000) Parallel genotyping of human SNPs using generic high-density oligonucleotide tag arrays. Genome Res., 10, 853–860.[Abstract/Free Full Text]

  27. Hinds, D.A., Seymour, A.B., Durham, L.K., Banerjee, P., Ballinger, D.G., Milos, P.M., Cox, D.R., Thompson, J.F. and Frazer, K.A. (2004) Application of pooled genotyping to scan candidate regions for association with HDL cholesterol levels. Hum. Genomics, 1, 421–434.[Medline]

  28. Higuchi, R., Fockler, C., Dollinger, G. and Watson, R. (1993) Kinetic PCR analysis: real-time monitoring of DNA amplification reactions. Biotechnology (N Y), 11, 1026–1030.[CrossRef][Medline]

  29. Wang, Y., Moorhead, M., Karlin-Neumann, G., Falkowski, M., Chen, C., Siddiqui, F., Davis, R.W., Willis, T.D. and Faham, M. (2005) Allele quantification using molecular inversion probes (MIP). Nucleic Acids Res., 33, e183.[Abstract/Free Full Text]

  30. Matsuzaki, H., Dong, S., Loi, H., Di, X., Liu, G., Hubbell, E., Law, J., Berntsen, T., Chadha, M., Hui, H. et al. (2004) Genotyping over 100 000 SNPs on a pair of oligonucleotide arrays. Nat. Methods, 1, 109–111.[CrossRef][Web of Science][Medline]

  31. Cohen, J., Pertsemlidis, A., Kotowski, I.K., Graham, R., Garcia, C.K. and Hobbs, H.H. (2005) Low LDL cholesterol in individuals of African descent resulting from frequent nonsense mutations in PCSK9. Nat. Genet., 37, 161–165.[CrossRef][Web of Science][Medline]

  32. Rieder, M.J., Taylor, S.L., Clark, A.G. and Nickerson, D.A. (1999) Sequence variation in the human angiotensin converting enzyme. Nat. Genet., 22, 59–62.[CrossRef][Web of Science][Medline]

  33. Ewing, B., Hillier, L., Wendl, M.C. and Green, P. (1998) Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res., 8, 175–185.[Abstract/Free Full Text]

  34. Ewing, B. and Green, P. (1998) Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res., 8, 186–194.[Abstract/Free Full Text]

  35. Nickerson, D.A., Tobe, V.O. and Taylor, S.L. (1997) PolyPhred: automating the detection and genotyping of single nucleotide substitutions using fluorescence-based resequencing. Nucleic Acids Res., 25, 2745–2751.[Abstract/Free Full Text]

  36. Gordon, D., Abajian, C. and Green, P. (1998) Consed: a graphical tool for sequence finishing. Genome Res., 8, 195–202.[Abstract/Free Full Text]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
GeneticsHome page
A. Ramirez-Soriano and R. Nielsen
Correcting Estimators of {theta} and Tajima's D for Ascertainment Biases Caused by the Single-Nucleotide Polymorphism Discovery Process
Genetics, February 1, 2009; 181(2): 701 - 710.
[Abstract] [Full Text] [PDF]


Home page
Cancer Epidemiol. Biomarkers Prev.Home page
C. Huebner, I. Petermann, B. L. Browning, A. N. Shelling, and L. R. Ferguson
Triallelic Single Nucleotide Polymorphisms and Genotyping Error in Genetic Epidemiology Studies: MDR1 (ABCB1) G2677/T/A as an Example
Cancer Epidemiol. Biomarkers Prev., June 1, 2007; 16(6): 1185 - 1192.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Material
Right arrowOA All Versions of this Article:
15/12/1931    most recent
ddl115v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (3)
Google Scholar
Right arrow Articles by Carlson, C. S.
Right arrow Articles by Nickerson, D. A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Carlson, C. S.
Right arrow Articles by Nickerson, D. A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?