Skip Navigation


Human Molecular Genetics Advance Access originally published online on November 17, 2004
Human Molecular Genetics 2005 14(1):145-153; doi:10.1093/hmg/ddi019
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
14/1/145    most recent
ddi019v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (27)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Maniatis, N.
Right arrow Articles by Collins, A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Maniatis, N.
Right arrow Articles by Collins, A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Human Molecular Genetics, Vol. 14, No. 1 © Oxford University Press 2005; all rights reserved

The optimal measure of linkage disequilibrium reduces error in association mapping of affection status

N. Maniatis1,*,{dagger}, N.E. Morton1,{dagger}, J. Gibson1, C.-F. Xu2, L.K. Hosking2 and A. Collins1

1Human Genetics Division, University of Southampton, Southampton General Hospital, Southampton SO16 6YD, UK and 2Discovery Genetics, GlaxoSmithKline, Stevenage SG1 2NY, UK

* To whom correspondence should be addressed at: Human Genetics Division, Southampton General Hospital, University of Southampton, School of Medicine, Duthie Building (MP808), Southampton SO16 6YD, UK. Tel: +44 2380796538; Fax: +44 238080794264; Email: n.maniatis{at}soton.ac.uk

Received July 7, 2004; Revised September 17, 2004; Accepted November 5, 2004


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 MATERIALS AND METHODS
 REFERENCES
 
We have developed a simple yet powerful approach for disease gene association mapping by linkage disequilibrium (LD). This method is unique because it applies a model with evolutionary theory that incorporates a parameter for the location of the causal polymorphism. The method exploits LD maps, which assign a location in LD units (LDU) for each marker. This approach is based on single marker tests within a composite likelihood framework, which avoids the heavy Bonferroni correction through multiple testing. As a proof of principle, we tested an 890 kb region flanking the CYP2D6 gene associated with poor drug-metabolizing activity in order to refine the localization of a causal mutation. Previous LD mapping studies using single markers and haplotypes have identified a 390 kb significant region associated with the poor drug-metabolizing phenotype on chromosome 22. None of the 27 Single nucleotide polymorphisms was within the gene. Using a metric LDU map, the commonest functional polymorphism within the gene was located at 14.9 kb from its true location, surrounded within a 95% confidence interval of 172 kb. The kb map had a relative efficiency of 33% compared with the LDU map. Our findings indicate that the support interval and location error are smaller than any published results. Despite the low resolution and the strong LD in the region, our results provide evidence of the substantial utility of LDU maps for disease gene association mapping. These tests are robust to large numbers of markers and are applicable to haplotypes, diplotypes, whole-genome association or candidate region studies.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 MATERIALS AND METHODS
 REFERENCES
 
Linkage disequilibrium (LD) analysis offers the prospect of fine scale localization of genetic polymorphisms of medical importance, particularly when single nucleotide polymorphisms (SNPs) are densely typed in a candidate region. The principal role of LD is to identify and then narrow a candidate region. Owing to the complex nature of the observed patterns of LD and the desire to avoid a heavy Bonferroni correction, careful modelling of the relationship between markers and disease phenotypes is required. Maniatis et al. (1Go) developed a metric LD map with additive distances in LD units (LDU). The properties of these maps were first examined by Zhang et al. (2Go), who found a remarkable agreement between LDU steps and sites of meiotic recombination using data of Jeffreys et al. (3Go), which are highly informative for crossing over. LDU maps are analogous to linkage maps and have distances which increase monotonically with physical maps but are superior in representing the pattern of LD rather than just recombination (4Go–6Go).

The application of LDU maps to association mapping, or positional cloning, was subsequently examined by Maniatis et al. (7Go). The LD method was presented whereby a multiple pairwise approach based on single SNPs was employed, using composite likelihood and its empirical variance compared for kb and LDU maps. The authors carried out a simulation study on two real data sets on which current ideas of blocks and steps are based (3Go,8Go). By use of regression (b) and correlation (r), false-negative indications of a disease locus (type II error) were examined by treating each SNP as causal and predicting its location from the remaining markers. It was shown that greater power is achieved when mapping within an LDU map compared with a map in kb, especially in a densely typed region that is characterized by intense recombination hotspots (3Go). The relative efficiency was only 62% when the kb map was used instead of the LDU map. Furthermore, the investigation of false-positive indications of a disease locus (type I error) showed that the {chi}2 distribution of 1000 simulations (simulating an unlinked causal SNP) yielded an acceptable goodness of fit (7Go).

Hosking et al. (9Go) have recently examined SNPs in the CYP2D6 region (but not within that locus) as a ‘proof of principle’ that LD mapping and genome-wide association scans can detect and refine candidate regions harbouring genetic variation leading to altered drug response. The CYP2D6 locus on chromosome 22q13.1 metabolizes ~20% of commonly prescribed drugs (10Go). The polymorphism was first recognized in response to debrisoquine treatment for hypertension (11Go). Quantitative bioassay revealed two peaks, the minor one associated with ‘poor metabolizers’ that were attributed to a recessive gene. Subsequently at least 30 other drugs were shown to be metabolized in the same way (12Go). In the same study, the locus was positionally cloned, and the genotypes of poor metabolizers were shown to be complex, with several rare polymorphisms that mimic the common one in homozygotes and compound heterozygotes (12Go). During the past 2 years, CYP2D6 has provided a tournament for positional cloning methods (9Go,13Go,14Go). Using affection status as the phenotype (slow metabolizers treated as affected individuals), Hosking et al. (9Go) identified a significant region around CYP2D6 of 390 kb by LD mapping. All efforts to date are based on the physical map which cannot represent either linkage or LD. Having investigated, by simulation (7Go), the properties of the method for association mapping by LD using both maps in kb and LDU, this study evaluates the utility of our approach to refine the localization of the poor-metabolizer gene.

Human genetics is unique in the large proportion of phenotypes that are of interest primarily because they are related to disease, and many of these phenotypes are represented by affection status (normal or affected). Association mapping, or localization of genes predisposing to affection, is most commonly based on diallelic markers, usually SNPs. Therefore, a 2x2 table of affection status by allele is a unit of analysis, with association modelled by composite likelihood of multiple markers. Whether in diplotypes (phase unknown genotypes) or haplotypes, there are many ways to parameterize a 2x2 table, differing in their efficiency to localize a causal SNP. Some of the most popular metrics have been compared for their efficiency to fit a physical map for marker-by-marker association (15Go). All metrics were shown to have low efficiency compared with the association probability {rho}, which is unique in being derived by evolutionary theory (15Go,16Go). Similar results were obtained after LDU maps were developed and shown to better represent the pattern of LD than a physical map in kilobases or a linkage map in centimorgans (1Go,2Go). However, general acceptance of {rho} has been constrained by the perception that {rho} cannot be obtained for association between a complex trait and a marker, solely because the frequency of the putative disease allele is unknown. Therefore, its utility is limited to major genes of high penetrance, where the observation of recessive homozygotes or dominant heterozygotes in affected relatives makes it easy to assign affection status and thereby use {rho} to minimize error in positional cloning (17Go,18Go,19Go). As a result, alternatives like b and r have been used for association mapping of oligogenes in diplotypes despite evidence of their comparative inefficiency in LD mapping (7Go). We shall now demonstrate that {rho} can be adapted for complex traits and that if this is done {rho} outperforms other metrics. Therefore, the objective of this article is 2-fold: to use the CYP2D6 data of Hosking et al. (9Go) to demonstrate the power of association mapping in LDU maps and compare the results with other published LD methods on the same data; secondly, to examine the power and efficiency of the {rho} metric for association mapping of affection status.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 MATERIALS AND METHODS
 REFERENCES
 
Hosking et al. (9Go) selected 27 out of 32 SNPs in an 879 kb region centred on CYP2D6, of which five, with minor allele frequencies <0.05, were rejected. None of these SNPs was within the CYP2D6 gene. Four functional CYP2D6 polymorphisms (Table 1) predicting 99% of slow metabolizers (12Go) were typed on 1018 Caucasians and 41 predicted slow metabolizers were identified. This is a random sample not a case–control study and therefore, the 41 individuals are called affected and the remaining 977 are called normal. Map locations of the functional polymorphisms (Table 1) and the 27 predictive SNPs represent distances from SNP1 relative to the finished human genome sequence assembly (NCBI build 34, UCSC July 2003). For comparability with previous studies of the same data, we used the map locations that were provided by Hosking et al. (9Go). In this study, these two maps are called GP and GSK, respectively. A full description of the data, the database IDs and primers for the marker SNPs are given in Hosking et al. (9Go).


View this table:
[in this window]
[in a new window]
 
Table 1. Frequencies and locations for four functional mutations and whole-gene deletion CYP2D6D
 
LDU maps
The LD maps developed by Maniatis et al. (1Go) assign locations to markers in LDU that describe the underlying structure of LD in the form of a metric map. Therefore, every SNP in the data was assigned two locations, one in kb and the other in LDU, based on pairwise marker association and ignoring the phenotype. The metric was used to describe association between any pair of SNPs as =|D|/Q(1–R), where D is the covariance in a 2x2 haplotype table and Q and R are the minor allele frequencies for a pair of SNPs. The theoretical framework for constructing LDU maps is based on the Malecot model, which describes the exponential decline of LD with distance and is used to predict the value of . For random samples, equals to the maximum value of D'. Unlike D', however, the optimality of and its basis in evolutionary theory, derive from its uniqueness as a probability conditional on Q and R, making Q the frequency of the putatively youngest allele of the four alleles in the SNP pair (15Go) (see Materials and Methods).

The block–step structure of the CYP2D6 region can be graphically presented by plotting the LDU locations of Table 2 against the kb map, as is shown in Figure 1. The common polymorphism (G1846A) is located in the main block at 525.3 kb. There is a large step between the last two SNPs, but only small steps flanking the block of 158 kb that includes the CYP2D6 locus. LDU maps based on 27 or 32 SNPs were found to be essentially the same, so we followed the example of Hosking et al. (9Go) and omitted the five rare SNPs throughout the study. As an LD map is constructed from a kb map, locations in LDU can be converted into kb and vice versa through the use of simple linear interpolation procedure (7Go). Conversion of LDU to kb is important because a candidate region is always specified on the kb map. There was one main long block in the significant region of CYP2D6 that included five markers (SNPs 16–20), spanning 158 kb. In this case, all markers in that block have the same value of LDU but a unique location in kb. In order to evade this problem, especially for blocks that contain more than two SNPs, linear interpolation was used so every SNP has a unique location in LDU and hence, a corresponding location in kb. These five SNPs had an LDU location of 1.822 and were interpolated as shown in Table 2. Prior to association mapping, the superiority of the LDU map was examined by simply fitting the Malecot model to both kb and LDU maps and estimating their residual error variances (see Materials and Methods). Fitting the model to the kb map reveals strong LD in the region, which extends to 270 kb=1/0.0037, where 0.0037 is the exponential decline of association across the 891 kb distance of the CYP2D6 region. The LDU map fits the data substantially better than the kb map, yielding a smaller error variance. The efficiency of the kb map relative to the LDU map is only 40%, which is calculated as the ratio of the residual error variances. Fitting the LDU map with no interpolation of the main block (i.e. SNPs 16–20 in Table 2), yields the same results (see Materials and Methods). This indicates that the interpolation procedure captures all the information from the original LDU map.


View this table:
[in this window]
[in a new window]
 
Table 2. The GSK kb and LDU maps of the CYP2D6 region. Pearson's {chi}12 values from the 2x2 table (CYP2D6 PM phenotypexSNP marker alleles).
 


View larger version (9K):
[in this window]
[in a new window]
 
Figure 1. The graph of the LDU map for the CYP2D6 region. Vertical line indicates the location of the locus at 525.3 kb.

 
Association mapping
Having used to create an LDU map for the CYP2D6 region, we adapted this metric to compute an association metric between the phenotype and a marker SNP as: =|D|/f(1–R), where D is the covariance between the affection status and the markers alleles. The frequency of affected individuals in this sample is f=Q2=0.04=41/1018; however, this may vary somewhat owing to incomplete typing at a given marker. The number of tests performed is equal to the number of SNPs, but composite likelihood evades the heavy Bonferroni correction required for maximal {chi}2 (20Go). The Malecot model was adapted so it could incorporate the parameter S, which provides the estimated causal location. Following Maniatis et al. (7Go), we used different subhypotheses of the Malecot model in order to test the existence of a causal polymorphism. The {chi}2 for the A–B contrast tests for association with disease in the region (Table 3), whereas the {chi}2 for the A–C contrast tests for a disease determinant at location S, or in this case, for the location of the commonest functional polymorphism within the CYP2D6 gene (Table 3). Analysis using the model with an additional parameter S reveals substantial power to localize this locus within the LDU map (Table 3). The A–C contrast shows a large increase in {chi}2 when the data are fitted to the LDU map ({chi}2=563), compared with the map in kb ({chi}2=165). The marked difference in power between kb and LDU maps is accompanied by differences in error variances, which indicates that the efficiency of the kb map is only 33% relative to the map in LDU (1.05/3.20) for the A–C contrast. The relative efficiency of the kb map is much smaller than was observed in our previous simulation study (7Go) using the regression and correlation metrics, reflecting superiority of the z metric. Fitting the LDU map, the location (S) was estimated to be 510.4 kb, which is very close to the true location (525.3 kb). This 14.9 kb location error does not change by fitting the GP LDU map. There is a general consistency in the results between the GSK and GP maps. When the kb map is fitted, the location error increases to 54 and 57 kb for GSK and GP maps, respectively. The A–B contrast does not estimate a point location and thus does not depend on whether the SNP locations are in kb or LDU or whether these two maps are reliable. The significant {chi}12 for the A–B contrast, however, verifies the utility of the hierarchical modelling of LD to identify candidate regions.


View this table:
[in this window]
[in a new window]
 
Table 3. Localization of CYP2D6 under the z metric
 
The 95% confidence interval is 171 and 172 kb for the GP and GSK LDU maps, respectively. Interpolating the two long blocks that are further apart from the locus in question (i.e. SNPs 1–4 and 24–26 in Table 2) does not alter the width of the confidence interval or the location error, yielding a very similar error variance (results not shown). The intervals obtained when the distances in the map are expressed in kb are considerably worse, because the true location was not included within those limits. The remarkable differences between LDU and kb can also be presented graphically by plotting the LOD values against the LDU and kb locations (Fig. 2). For the LDU map, the CYP2D6 locus is very close to the maximum likelihood estimation (within the peak) with a 95% LOD support interval of 186 kb, which was larger than the confidence interval (172 kb, Fig. 2). This is because the latter is computed using normal theory approximation while the LOD support tends to be more conservative.



View larger version (14K):
[in this window]
[in a new window]
 
Figure 2. The LOD curves, confidence (CI) and LOD support (SI) intervals (grey bars) of the CYP2D6 locus (vertical black line) for the LDU (A) and GSK kb (B) maps.

 
The statistical properties of the association mapping method presented in this study, for both kb and LDU maps, have been previously examined in a simulation study (7Go) by the use of regression (b) and correlation (r). The examination of false-positive indications of a disease locus (type I error) gave an acceptable goodness of fit for both metrics. Following Maniatis et al. (7Go), we compared the association metric z with b and r as the analyses are based on the same modelling procedures. For fair comparison, estimates for b and r were obtained from the 2x2 (affection statusxmarker alleles) table. Therefore, all three metrics had identical Pearson's {chi}12 values between the poor-metabolizer phenotype and each marker SNP (see Table 2). Greater power and smaller error variance were obtained by the implementation of the z metric (Table 4). When fitting the LDU map, the mean power using any of the two other metrics was only 21% of the power achieved using the z metric. Nevertheless, both b and r yielded greater power for LDU than the kb map and had acceptable Type I error (7Go). The z metric requires an affection status and therefore, the regression metric may have appealing properties for quantitative phenotypes.


View this table:
[in this window]
[in a new window]
 
Table 4. Estimates of {chi}2 and error variance (V) under the A–C contrast based on regression (b), correlation (r) and metric z
 

    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 MATERIALS AND METHODS
 REFERENCES
 
There have been three other studies of these data and a summary of the results are presented in Table 5. Hosking et al. (9Go) reduced the evidence on each marker to a 2x3 table of affection statusxmarker genotypes. Significant association (P<0.01) was observed with 14 SNPs (SNPs 8–21 inclusive) but all other markers were nonsignificant. The region of significance around CYP2D6 was reported to be 390 kb. Hosking et al. (9Go) and Meng et al. (13Go) considered haplotypes in sliding windows of size 5. Although the levels of significance were considerably higher than that detected using single SNPs, their analysis did not refine further the support interval. Plotting the P-values on the kb map gave a pronounced hole in significance level for the CYP2D6 locus, corresponding to P=10–54 at the highest peak and 10–37 at the lowest point. The commonest functional variant resides as 525.3 kb, near SNP 18. Two SNPs, one located at 389 kb (SNP 12) and the other at 669 kb (SNP 20), at both sides from SNP 18, are highly significant, making the surface bimodal as is shown in Table 2. Using a Bayesian coalescent-based approach, Morris et al. (14Go) further refined the significant region based on the same GSK map presented herein. In their study, a 95% posterior interval of 185 kb was obtained with a location error of >25 kb (estimated from their graph). This error was a median rather than a single estimate (21Go). They identified CYP2D6 as the most likely transcript and distinguished among affected individuals the 72 haplotypes bearing the most common mutation from the 10 bearing minor mutations. The numbers and estimates of parameter values in the Bayesian ‘prior’, however, were not specified. The computational cost was reported to preclude direct application ‘to large numbers of cases or too many SNPs.’ Our analysis differs in several respects from these earlier ones. We used composite likelihood with its empirical variance for the association metric z and a parameter S in the model that provides a point location estimate for the causal polymorphism. We did not use haplotype analysis, which we are still testing for optimal frequency estimation, scoring, windowing and other determinants of its operating characteristics. Despite the complex surface (bimodality) of the CYP2D6 region, the 95% confidence interval was narrowed to 172 kb, which excluded the most significant SNP (20Go) at 669 kb as shown in Table 2. The S point location was estimated to be at a distance of 15 kb from the true location. Our analysis provided further refinement when compared with the median estimates given by Morris et al. (14Go) (Table 5).


View this table:
[in this window]
[in a new window]
 
Table 5. Localization of CYP2D6 based on the GSK map
 
Morris et al. (14Go) also investigated SNP selection by examining the economy of spectral decomposition and diversity selection procedures. The full set of 32 SNPs (including five markers with minor allele frequency less than 0.05 that were excluded from other analyses) had a 95% location interval of 185 kb. The best alternative subset (20 SNPs selected out of 32) had an interval of 216 kb. To achieve the same power, the reduced set of 20 out of 32 SNPs requires a sample (216/185)2=1.36 as large. The product of these ratios corresponds to 85% as many typings, in a sample 36% greater. They also investigated a strategy of removing one of any pair of SNPs with r2>0.8, which led to the worst alternative subset (22 SNPs) with a 95% confidence interval of 283 kb. To achieve the same power, the reduced set requires a sample (283/185)2 as large and therefore, 1.61 as many typings in a sample 2.34 times as great. If the 27 SNPs used in other studies give the same location intervals as the 32 SNPs used by Morris et al. (14Go), the power calculations are unaffected but the typing requirement is increased from 85 to 101% for the best alternative and from 161 to 191% for the second alternative. These estimations lead either to trivial reduction in the number of SNPs, loss of power, or a considerable increase in cost to recover the initial power. In a simulation study, Zhang et al. (22Go) compared two proposed measures of haplotype diversity and one regression based method for association mapping. They found no scenario for which the assumption that haplotype-tagging SNPs (htSNPs) assure retention of power is plausible. They have shown that loss of power with selection of htSNPs is general, and better results are obtained by multistage sampling in which the search interval is narrowed but the density of SNPs within that interval is increased. The study of Zhang et al. (22Go) has compared three approaches from the large number of methods that have been proposed but nevertheless, these findings put great pressure upon haplotype tagging. In an efficient analysis the density would be increased, especially within CYP2D6 which is the obvious candidate locus in this region but systematically avoided in the ‘proof of principle’ analyses. The five rare SNPs included in an earlier study by Morris et al. (14Go) are far enough from the causal locus to contribute no information for association mapping. On the contrary, the five rare markers (Table 1) that account for nearly all slow metabolizers would give great precision to association mapping. The information provided by predictive SNPs depends less on their frequency than on the location and frequency of causal SNPs.

The results of the present study provide preliminary evidence of the utility of SNP built LDU maps for association mapping and the potential application of a linear interpolation procedure in order to obtain single point locations in LDU. Our approach for disease gene association mapping by LD is based on a model with evolutionary theory, which incorporates a parameter for the location of the causal polymorphism. When the LDU map was fitted, considerably greater power to refine the location in the significant CYP2D6 region was observed compared with the power in kb. On the basis of power and error variance, the z metric outperformed any other metrics used in this study. The support interval and location error are smaller than any published results and these tests are robust to large numbers of markers and applicable to haplotypes, diplotypes, whole-genome association or candidate region studies.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 MATERIALS AND METHODS
 REFERENCES
 
LD maps
The LD map was created using the LDMAP program (1Go), which assigned an LDU location to each marker SNP. The association probability =|D|/Q(1–R) was used, where D is the covariance in a 2x2 haplotype table with minor allele frequencies Q≤R and Q≤1–Q, which is done by arranging the 2x2 table so that Q<0.5, but R can exceed 0.5. This can always be satisfied by interchanging columns and rows, making Q the frequency of the putatively youngest allele of the four alleles in the SNP pair, thus giving the frequency of the rarest haplotype (15Go). The LDU map can be constructed for either haplotypes or diplotypes (phase unknown genotypes). When diplotypes are used, as in the case of the present data, the 3x3 table is converted to a 2x2 haplotype table using the algorithm presented by Hill (23Go). The efficiency of haplotypes and diplotypes has been previously examined, where it was shown that pairwise LD can be efficiently mapped using diplotype data with little information lost (1Go,24Go). Under a composite likelihood framework, the predicted value of the observed is obtained by the generalized Malecot equation {rho}=(1–L)M e{varepsilon}d+L, which models the exponential decay of LD in relation to distance d between a pair of SNPs (15Go,17Go). This novel application of Malecot's isolation by distance model (25Go) predicts a background level of LD as a result of the evolutionary history (15Go). The parameter {varepsilon} is the exponential decline of disequilibrium {rho} with distance and the intercept M is the maximum association at zero distance. M is the parameter with evolutionary interpretation, as it reflects the association at the last major bottleneck. A value of M not significantly less than 1 suggests monophyletic inheritance, whereas a value of M<<1 suggests polyphyletic origin of two-locus haplotypes. The asymptote L>0 is the association at large distance and hence the model corrects for spurious association often resulting from small sample sizes. The LDU map method (1Go) estimates {varepsilon} in each map interval and uses this to construct an LD scale. A map distance in LDU is {varepsilon}idi for the ith interval with a region having {sum}{varepsilon}idi LDU, with blocks of high LD defined by an uninterrupted sequence of {varepsilon}i=0, whereas {varepsilon}i>0 defines a step with reduced LD, which corresponds to recombination events, the magnitude of which reflects recombination intensity. The composite likelihood is –2 ln lk={sum}K{rho}({rho})2 with residual variance V=–2 ln lk/(mk), where K{rho}={chi}2/2 is the information about , m is number of markers pairs and k is the number of parameters estimated. Plotting LDU against kb can graphically represent the block–step structure of the CYP2D6 region. Fitting the Malecot model to both kb and LDU maps reveals the superiority of the LDU map compared with the map in kb (Table 6). The mean value of {varepsilon} for kb is 0.0037 and thus the swept radius 1/{varepsilon}, which reflects the extent of LD in the region is 270 kb. When the LDU map is fitted, the distances (d) in the e{varepsilon}d term of the model are expressed in LDU and hence the value of {varepsilon} is ~1. The LDU yielded a smaller error variance V and hence the efficiency of the kb map relative to the LDU map is only 40% (7.9/19.6). Fitting the LDU map with no interpolation of the main block (LDUA) yielded essentially the same parameter estimates and error variance (Table 6). However, having used to create an LDU map for the CYP2D6 region, we shall now show how we adapted this metric for positional cloning of oligogenes where the frequency of the putative disease allele Q is unknown.


View this table:
[in this window]
[in a new window]
 
Table 6. Parameters of kb and LD maps of the CYP2D6 region (GSK map)
 
Theoretical framework
Let there be a number of SNPs covering a candidate region, within which there is a single causal SNP with allele frequency Q that may or may not be monophyletic. This model is not as restrictive as might at first appear, because several SNPs in the same exon or locus are almost indistinguishable from a single polyphyletic SNP in association mapping. Let the frequency of affected diplotypes in a random sample be f and the contribution of the causal SNP to f be Q2x+2Q(1–Q)y, where x and y are the penetrances in homozygotes and heterozygotes, respectively. The attributable risk in diplotypes is {gamma}=[Q2x+2Q(1–Q)y]/f. This is also the attributable risk in haplotypes from random affected diplotypes, which make up a proportion f of all haplotypes when the causal SNP is assigned penetrance x in homozygotes and y/2 in heterozygotes. Therefore, a SNP with additive risk y=x/2 has attributable risk {gamma}=Qx/f. For CYP2D6, x=1, y=0, f=Q2 under the close approximation by a single recessive allele inferred from slow debrisoquine inactivation (11Go). The parameters Q, x and y are unknown unless the causal SNP has been typed or reliably inferred from segregation analysis, but it is possible to test whether one of several marker SNPs is causal and also to use the concept of {gamma} to formulate a model in terms of the association {rho} that is more powerful than correlation, regression and other metrics that have no rationale in population genetics (15Go). Incomplete ascertainment is easily accommodated but dominance is more difficult because recombination simulates additivity, which we assume here.

Consider a random sample of n diplotypes in which the frequency of affection is f and the frequency of an allele or haplotype G associated with affection is R. Then with probability z the expected frequencies in founders are: and with complementary probability 1–z the expected frequencies at equilibrium are: The frequency of affected individuals in this sample is f=Q2=0.04=41/1018, however, this may vary somewhat owing to incomplete typing. On the basis of the 2x2 table in Table 7, it follows that ={gamma} (17Go) and thus the association metric is estimated as:

which is equal to |D|/f(1–R), where D is the covariance between affection status and the markers alleles. The number of tests performed is equal to the number of SNPs, and therefore the composite likelihood based on the Malecot model combines information over all loci as –2 ln lk={sum}iKzi(izi)2, where and z are the observed and expected association values, respectively, at the ith marker SNP. An observed estimate of has an amount of information Kz and is estimated as: Kz={chi}12/2=n(a+b)(b+d)/(a+c)(c+d), where {chi}12 is the Pearson's {chi}2 from the 2x2 table (affection status by SNP alleles) as shown in Table 2. The expected value z is obtained from the equation z=(1–L)Me{varepsilon}d+L, where {varepsilon} is the exponential decline with distance d in kb or LDU. Following Maniatis et al. (7Go), the distance di is replaced by {Delta}(SiS) and hence the model becomes z=(1–L)Me{varepsilon}{Delta}(SiS)+L, where Si is the location of the ith SNP in either kb or LDU and S is the unknown parameter and provides the estimated causal location. The Kronecker {Delta} is used solely for map direction and takes the value 1 if Si>S and –1 otherwise. The asymptote L can be estimated or predicted (Lp) from the information about , which is proportional to sample size (1Go). The residual error variance is V=–2 ln lk/(nk), where n is the number of SNPs and k is the number of parameters estimated in the model.


View this table:
[in this window]
[in a new window]
 
 

View this table:
[in this window]
[in a new window]
 
 

View this table:
[in this window]
[in a new window]
 
Table 7. Frequencies in a random sample of n haplotypes
 
The present study is based on a random sample. The metric, however, can also be applied in cases and controls when the phenotype is the affection status. A case–control study increases a and b (Table 7) by an enrichment factor (17Go):

Therefore, the haplotype table derived from Table 7 becomes Table 8. The estimate of f independent of z and R is f=(a+b)/[2n+({omega}–1)(c+d)]. The other two parameters must be estimated by maximum likelihood, beginning Newton–Raphson iteration with trial values which we take as R=(a+{omega}c)/[2n+({omega}–1)(c+d)], and the association metric is estimated as ={omega}(adbc)/(a+b)(b+{omega}d).


View this table:
[in this window]
[in a new window]
 
Table 8. Frequencies in a case–control sample with enrichment {omega}
 
Association mapping passes through three stages. In the first stage, a candidate region is defined by linkage, LD, or function. Then the significance of the region is tested. For this, we use two subhypotheses of the Malecot model, A and B (7Go). The baseline is model A, in which none of the parameters is estimated (making kb and LDU maps equivalent) and it is taken as the null hypothesis H0 where there is no association between the phenotypes (affection status in this case) and markers SNPs. Therefore, the parameter M=0 and the asymptote L is fixed to its predicted value Lp. With M=0, the Malecot expectation z=Lp. Model B also takes M=0 but estimates L and so z=L. Then composite likelihood gives {chi}12=[(–2 ln lk)A–(–2 ln lk)B]/VB, where VB is the residual error variance of model B. It follows that any increase in L above the predicted asymptote Lp (significant {chi}12) provides evidence of a causal polymorphism within the significant region in question but without precise localization.

Having established that the region of interest is significant by contrasting models A and B, the next stage is to estimate a causal location. This is accomplished by models C and D, where both parameters M and S are estimated, thereby distinguishing between the kb and LDU maps. The only difference between these models is that model C takes L=Lp, whereas model D estimates L. Therefore, the contrasts A–C and A–D test for a disease determinant at location S, or in the present study, the location of the CYP2D6 locus. Replacing (–2 ln lk)B by (–2 ln lk)C in the earlier-mentioned formula gives {chi}22 and replacing (–2 ln lk)B by (–2 ln lk)D gives {chi}32. As model A is the baseline, the three contrasts A–B, A–C and A–D, with {chi}2 of 1, 2 and 3 degrees of freedom, respectively, allow hypothesis testing. The {chi}22 and {chi}32, however, may be converted into {chi}12 with the same level of significance (17, ‘Numerical Analysis’ appendix). The corresponding lod is Z={chi}12/2 (ln 10), which is useful to compare models with different degrees of freedom. However, for graphical representation and support intervals, it is convenient to take Zdf={chi}df2/2 (ln 10). A significance level P=0.05 corresponds to {chi}22=–2 (ln 0.05)=5.991 and {chi}12=3.841. Therefore for any of these {chi}df2, a 95% support interval is defined by ({chi}df2–3.841)/2 (ln 10)=Zdf–0.834, which may be converted from LDU to kb by interpolation into the kb map. The standard error of S is se={sigma}s{surd}V, where {sigma}S is a nominal error based on composite likelihood. The 95% confidence interval is S±1.96se, which may be interpolated from LDU to kb.

In these data, there is no obvious choice between the models C and D, which the same as model C but with the parameter L estimated, and thus the results were very similar for the A–D contrast (data not shown). In general, model C is more parsimonious and may, therefore, be more powerful (26Go), whereas model D can give unreliable results in a smaller candidate regions. This is because the asymptote L reflects the degree of association at maximum distance, and thus in small regions it cannot be reliably estimated.

Linear interpolation was used in order to convert locations in LDU to kb (7Go). Let SKi be locations on the kb map, where i=1–n SNPs. Let SL be the locations in the LDU map and SL be a location estimated by the model. To interpolate a location on the LDU map into the kb map, three cases must be considered as markers within a block have invariant LDU but unique locations in kb. If SL does not lie in a block but instead is flanked by markers with locations a, c in LDU and {alpha}, {gamma} in kb, then the estimated location in LDU (SL) can be converted to a location in kb as:

Secondly, if there is only one marker in the LDU map with unique location SL, then SLk corresponds to that marker in kb. The third case is when SL lies within a block. In this case, all markers in that block have the same value of SL but a unique location in kb SKi. For blocks that contain multiple SNPs, the LDU block is interpolated so every SNP has a unique location in LDU and hence, a corresponding location in kb. The block that is flanked by markers with kb locations {alpha} and {gamma} have corresponding values in LDU a and c at the beginning and ending of the block, respectively. Using these distances in LDU and kb, the earlier-mentioned interpolation procedure can be used in order to interpolate SL to SLi for each ith SNP in the block as:

The five-marker block of the CYP2D6 region (i.e. SNPs 16–20) was interpolated as was previously shown in Table 2.


    ACKNOWLEDGEMENT
 
This work was supported by Applied Biosystems.


    FOOTNOTES
 
{dagger} The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors. Back


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 MATERIALS AND METHODS
 REFERENCES
 

  1. Maniatis, N., Collins, A., Xu, C-F, McCarthy, L.C., Hewett, D.R., Tapper, W., Ennis, S., Ke, X. and Morton, N.E. (2002) The first linkage disequilibrium (LD) maps: delineation of hot and cold blocks by diplotype analysis. Proc. Natl Acad. Sci. USA, 99, 2228–2233.[Abstract/Free Full Text]

  2. Zhang, W., Collins, A., Maniatis, N., Tapper, W. and Morton, N.E. (2002) Properties of linkage disequilibrium (LD) maps. Proc. Natl Acad. Sci. USA, 99, 17004–17007.[Abstract/Free Full Text]

  3. Jeffreys, A.J., Kauppi, L. and Neumann, R. (2001) Intensely punctate meiotic recombination in the class II region of the major histocompatibility complex. Nat. Genet., 29, 217–222.[CrossRef][Web of Science][Medline]

  4. Lonjou, C., Zhang, W., Collins, A., Tapper, W.J., Elahi, E., Maniatis. N. and Morton, N.E. (2003) Linkage disequilibrium in human populations. Proc. Natl Acad. Sci. USA, 100, 6069–6074.[Abstract/Free Full Text]

  5. Tapper, W.J., Maniatis, N., Morton, N.E. and Collins, A. (2003) A metric linkage disequilibrium map of the human chromosome. Ann. Hum. Genet., 67, 487–494.[CrossRef][Web of Science][Medline]

  6. Ke, X., Hunt, S., Tapper, W.J., Lawrence, R., Stavrides, G., Whittaker, P., Collins, A., Morris, A.P., Bentley, D., Cardon, L.R. and Deloukas, P. (2004) Fine scale patterns of linkage disequilibrium across a 10Mb region at 20q12–13.2. Hum. Mol. Genet., 13, 577–588.[Abstract/Free Full Text]

  7. Maniatis, N., Collins, A., Gibson, J., Zhang, W., Tapper, W. and Morton, N.E. (2004) Positional cloning by linkage disequilibrium. Am. J. Hum. Genet., 75, 846–855.

  8. Daly, M., Rioux, J.V., Schaffner, S.F., Hudson. T.J. and Lander, E.S. (2001) High-resolution haplotype structure in the human genome. Nat. Genet., 29, 229–232.[CrossRef][Web of Science][Medline]

  9. Hosking, L.K., Boyd, R.P., Xu, C-F, Nissum, M., Cantone, K., Purvis, I.J., Khakhar, R., Barnes, M.R., Liberwirth, U., Hagen-Mann, K., Ehm, M.G. and Riley, J.H. (2002) Linkage disequilibrium mapping identifies a 390 kb region associated with CYP2D6 poor drug metabolising activity. Pharmacogenomics J., 2, 165–175.[CrossRef][Medline]

  10. Evans, W. and Relling, M. (2000) Pharmocogenomics: translating functional genomics into rational therapeutics. Science, 286, 487–491.

  11. Mahgoub, A., Idle, J.R., Dring, L.G., Lancaster, R. and Smith, R.L. (1977) Polymorphic hydroxylation of debrisoquine in man. Lancet, 2, 584–586.[CrossRef][Web of Science][Medline]

  12. Sachse, C., Brockmoller, J., Bauer, S. and Roots, I. (1997). Cytochrome P450 2D6 variants in a Caucasian population: allele frequencies and phenotype consequences. Am. J. Hum. Genet., 60, 284–295.[Web of Science][Medline]

  13. Meng, Z., Zaykin, D., Xu, C.-F., Wagner, M. and Ehm, M.G. (2003) Selection of genetic markers for association analyses, using linkage disequilibrium and haplotypes. Am. J. Hum. Genet., 73, 115–130.[CrossRef][Web of Science][Medline]

  14. Morris, A.P., Whittaker, J.C., Xu, C.-F., Hosking, L.K. and Balding, D.J. (2003) Multipoint linkage-disequilibrium mapping narrows location interval and identifies mutation heterogeneity. Proc. Natl Acad. Sci. USA, 100, 13442–13446.[Abstract/Free Full Text]

  15. Morton, N.E., Zhang, W., Taillon-Miller, P., Ennis, S., Kwok, P.-Y. and Collins, A. (2001) The optimal measure of allelic association. Proc. Natl Acad. Sci. USA, 98, 5217–5221.[Abstract/Free Full Text]

  16. Shete, S. (2003) A note on the optimal measure of allelic association. Ann. Hum. Genet., 67, 189–191.[CrossRef][Web of Science][Medline]

  17. Collins, A. and Morton, N.E. (1998) Mapping a disease by allelic association. Proc. Natl Acad. Sci. USA, 95, 1741–1745.[Abstract/Free Full Text]

  18. Lonjou, C., Collins, A., Ajioka, R.S., Jorde, L.B., Kushner, J.P. and Morton, N.E. (1998) Allelic association under map error and recombinational heterogeneity: a tale of two sites. Proc. Natl Acad. Sci., USA, 95, 11366–11370.[Abstract/Free Full Text]

  19. Lonjou, C., Collins, A., Beckmann, J., Allamand, V. and Morton, N. (1998) Limb girdle muscular dystrophy typ. 2A (CAPN3): mapping using allelic association. Hum. Hered., 48, 333–337.[CrossRef][Web of Science][Medline]

  20. Risch, N. and Merikangas, K. (1996) The future of genetic studies of complex human diseases. Science, 273, 1516–1517.[Abstract/Free Full Text]

  21. Morris, A.P., Whittaker, J.C. and Balding, D.J. (2004) Little loss of information due to unknown phase for fine-scale linkage-disequilibrium mapping with single-nucleotide-polymorphism genotype data. Am. J. Hum. Genet., 74, 945–953.[CrossRef][Web of Science][Medline]

  22. Zhang, W., Collins, A. and Morton, N.E. (2004) Does haplotype diversity predict power for association mapping of disease susceptibility? Hum. Genet., 115, 157–164.[CrossRef][Web of Science][Medline]

  23. Hill, W.G. (1974) Estimation of linkage disequilibrium in randomly mating populations. Heredity, 33, 229–239.[Web of Science][Medline]

  24. Ennis, S., Maniatis, N. and Collins, A. (2001) Allelic association and disease mapping. Brief. Bioinform., 2, 375–387.[Abstract/Free Full Text]

  25. Malecot, G. (1948) Les Mathématiques de l'Hérédité. Maison et Cie, Paris.

  26. Agresti, A. (1990) Categorical Data Analysis. John Wiley and Sons, New York.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Cancer Epidemiol. Biomarkers Prev.Home page
P. Gorroochurn
Perils in the Use of Linkage Disequilibrium for Fine Gene Mapping: Simple Insights from Population Genetics
Cancer Epidemiol. Biomarkers Prev., December 1, 2008; 17(12): 3292 - 3297.
[Abstract] [Full Text] [PDF]


Home page
BiostatisticsHome page
T. Johnson
Bayesian method for gene detection and mapping, using a case and control design and DNA pooling
Biostat., July 1, 2007; 8(3): 546 - 565.
[Abstract] [Full Text] [PDF]


Home page
Br J OphthalmolHome page
S. Ennis, S. Goverdhan, A. Cree, J. Hoh, A. Collins, and A. Lotery
Fine-scale linkage disequilibrium mapping of age-related macular degeneration in the complement factor H gene region
Br J Ophthalmol, July 1, 2007; 91(7): 966 - 970.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
W. Lau, T.-Y. Kuo, W. Tapper, S. Cox, and A. Collins
Exploiting large scale computing to construct high resolution linkage disequilibrium maps of the human genome
Bioinformatics, February 15, 2007; 23(4): 517 - 519.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
S. Kim, K. Zhao, R. Jiang, J. Molitor, J. O. Borevitz, M. Nordborg, and P. Marjoram
Association Mapping With Single-Feature Polymorphisms
Genetics, June 1, 2006; 173(2): 1125 - 1133.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
W. Tapper, A. Collins, J. Gibson, N. Maniatis, S. Ennis, and N. E. Morton
A map of the human genome in linkage disequilibrium units
PNAS, August 16, 2005; 102(33): 11835 - 11839.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Respir. Crit. Care Med.Home page
A. Simpson, N. Maniatis, F. Jury, J. A. Cakebread, L. A. Lowe, S. T. Holgate, A. Woodcock, W. E. R. Ollier, A. Collins, A. Custovic, et al.
Polymorphisms in A Disintegrin and Metalloprotease 33 (ADAM33) Predict Impaired Early-Life Lung Function
Am. J. Respir. Crit. Care Med., July 1, 2005; 172(1): 55 - 60.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
F. M. De La Vega, H. Isaac, A. Collins, C. R. Scafe, B. V. Halldorsson, X. Su, R. A. Lippert, Y. Wang, M. Laig-Webster, R. T. Koehler, et al.
The linkage disequilibrium maps of three human chromosomes across four populations reflect their demographic history and a common underlying recombination pattern
Genome Res., April 1, 2005; 15(4): 454 - 462.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
14/1/145    most recent
ddi019v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (27)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Maniatis, N.
Right arrow Articles by Collins, A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Maniatis, N.
Right arrow Articles by Collins, A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?