Human Molecular Genetics Advance Access originally published online on December 17, 2003
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Human Molecular Genetics, 2004, Vol. 13, No. 3 335-342
DOI: 10.1093/hmg/ddh035
Defining haplotype blocks and tag single-nucleotide polymorphisms in the human genome
1Division of Genetic Epidemiology in Psychiatry, Central Institute of Mental Health (ZI), 68159 Mannheim, Germany, 2Genetics Unit, Mood and Anxiety Disorders Program, National Institute of Mental Health, National Institutes of Health, US Dept of Health and Human Services, Bethesda, MD 20892, USA, 3Department of Psychiatry, The University of Chicago, Chicago, IL 60637, USA, 4Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA and 5Section on Statistical Genetics, Department of Biostatistics, University of Alabama, Birmingham, AL 35294, USA
Received September 16, 2003; Accepted December 3, 2003
| ABSTRACT |
|---|
|
|
|---|
Recent studies suggest that the genome is organized into blocks of haplotypes, and efforts to create a genome-wide haplotype map of single-nucleotide polymorphisms (SNPs) are already underway. Haplotype blocks are defined algorithmically and to date several algorithms have been proposed. However, little is known about their relative performance in real data or about the impact of allele frequencies and parameter choices on the detection of haplotype blocks and the markers that tag them. Here we present a formal comparison of two major algorithms, a linkage disequilibrium (LD)-based method and a dynamic programming algorithm (DPA), in three chromosomal regions differing in gene content and recombination rate. The two methods produced strikingly different results. DPA identified fewer and larger haplotype blocks as well as a smaller set of tag SNPs than the LD method. For both methods, the results were strongly dependent on the allele frequency. Decreasing the minor allele frequency led to an up to 3.7-fold increase in the number of haplotype blocks and tag SNPs. Definition of haploytpe blocks and tag SNPs was also sensitive to parameter changes, but the results could not be reconciled simply by parameter adjustment. These results show that two major methods for detecting haplotype blocks and tag SNPs can produce different results in the same data and that these results are sensitive to marker allele frequencies and parameter choices. More information is needed to guide the choice of method, marker allele frequencies, and parameters in the development of a haplotype map.
| INTRODUCTION |
|---|
|
|
|---|
Whole-genome linkage disequilibrium (LD) mapping has been proposed as a powerful tool to detect susceptibility genes for complex traits (1). Recent studies suggest that the human genome is organized into blocks of haplotypes (2,3). It is hoped that this genomic architecture may facilitate genome-wide LD mapping by limiting the numbers of single-nucleotide polymorphisms (SNPs) to be typed to those SNPs that explain or tag the haplotype pattern sufficiently (4).
A variety of different algorithms has been proposed to identify haplotype blocks and tag SNPs (2,511). However, little is known about the relative performance of these various methods in real data. What are the differences between the several algorithms? Do all methods come to the same conclusions, that is, do they identify the same or at least similar haploytpe blocks and tag SNPs? How does one resolve disagreement between methods? Recently, Schwartz et al. (12) assessed the overlap of block boundaries assigned by different algorithms. They found a generally poor agreement between block boundaries derived from different algorithms, which was more pronounced in small samples. Another study showed that marker spacing affects the predicted length of haplotype blocks in an evolutionary modeling analysis (13). It has also been shown that the size of haplotype blocks is algorithm-dependent (14). Still, there is a lack of studies that formally study the performance of different algorithms in the same sets of data, and no study has assessed the impact of parameter settings or allele frequencies.
Here we present a formal comparison of two major methods for defining haplotype blocks, the LD-based method proposed by Gabriel et al. (5) and the dynamic programming algorithm (DPA) developed by Zhang et al. (10). Unlike previous studies, we compared the number of haploytpe blocks and tag SNPs identified by each method, since these variables are critical in association mapping. We do not limit our analysis to one arbitrarily chosen minor allele frequency (q) threshold, but perform the analysis for various values of q in order to elucidate the impact of allele frequency on the results. Finally, we examine the impact of parameter changes on the block partitioning. In order to increase the generalizabiliy of our study, we chose three fully sequenced chromosomal regions that differed in their average recombination rates and gene-content: 18q21.3233 (180 kb), genotyped with 33 SNPs in 50 individuals (CEPH founders); 22q13.3132 (811 kb), genotyped with 55 SNPs in 91 individuals (data obtained from Wellcome Trust Sanger Institute); and 22q13.33 (993 kb), genotyped with 54 SNPs in the same 91 individuals. We found that the two methods produced different results. The DPA method consistently identified fewer, larger haplotype blocks, as well as fewer tag SNPs than the LD method. For both methods, the identification of haplotype blocks and tag SNPs was very sensitive to marker allele frequency. Both methods were sensitive to parameter choices, but the LD method was less sensitive in this regard than the DPA method. Parameter adjustment alone did not substantially improve agreement between the methods. These results show that two major methods for detecting haplotype maps and tag SNPs can produce different results in the same data and that these results are sensitive to marker allele frequencies and parameter choices.
| RESULTS |
|---|
|
|
|---|
Distinct patterns of LD for each of the three regions (Supplementary Material, Fig. 1)
Patterns of pairwise LD varied between the three regions studied. On 22q13.33, higher and more extended levels of LD (mean D'=0.35) can be seen than in the neighboring region on 22q13.3132 (mean D'=0.27), which accords with the known differences in recombination. This difference in LD also holds when comparing the subsets with higher q thresholds. The region on 18q22 showed extended and strong levels of LD (mean D'=0.58). The supplemental data Figure 1 shows the overall LD distribution for the three regions studied, for the varying q thresholds.
|
DPA identifies fewer haplotype blocks and tag SNPs than LD method (Fig. 1)
The results of haplotype block partitioning and identification of tag SNPs by both algorithms are illustrated in Figure 1. Detailed results, including the exact block partitioning and the physical length of the haplotype blocks, are presented in the Supplementary Material, 27. From the detailed block partitioning results, one can see that we do not limit the term block to a genomic stretch comprising at least two SNPs: a block can also be represented by a single SNP.
For all regions and all levels of q, DPA consistently identified fewer haplotype blocks and tag SNPs than the LD method. For instance, on chromosome 18q21.3233, at q
0.01, DPA detected six haplotype blocks, tagged by 11 SNPs, while the LD method identified 19 blocks and 15 tag SNPs. Accordingly, haplotype blocks called by DPA are larger than those called by the LD method. For example, on chromosome 18q21.3233, at q
0.01, DPA identified blocks between 4.9 and 77.6 kb long, compared with a range of 0.826.4 kb for the LD method.
Block partitioning depends critically on marker allele frequencies (Fig. 1)
For both methods, the block partitioning and the identification of tag SNPs depended on the q threshold applied. Increasing q by gradually omitting rarer SNPs from the original data sets led to a decrease in the number of identified haplotype blocks and tag SNPs. Haplotype blocks decreased steadily over the range of q
0.01 to q
0.41. The number of tag SNPs remained stable over the range q
0.01 to q
0.19, decreasing steeply after q=0.2 (Fig. 1 and Supplementary Material, 27).
Identification of haplotype blocks and tag SNPs is sensitive to key parameters (Supplementary Material 17, Fig. 2)
Variation of key parameters affected the results of both methods. For all three chromosomal regions, the number of tag SNPs identified by DPA increased with increasing levels of
(=ß) (Figure 2AC and Supplementary Material, 24). The number of haplotype blocks identified also depended on the levels of
; however, a monotonic relationship cannot be discerned (Supplementary Material, 24). The LD method proved to be quite insensitive to changes in thresholds for the confidence bounds. For the region on 18q, the same results were obtained both for the lowered and raised thresholds. For the two regions on chromosome 22, the number of haplotype blocks and tag SNPs identified varied little between the default, the raised, and the lowered thresholds, respectively (Supplementary Material, 57). No configuration of parameters we tested could reconcile the differences in results between the two methods.
|
| DISCUSSION |
|---|
|
|
|---|
The algorithmic detection of haplotype blocks is a tool to streamline genotyping efforts in a systematic and efficient way (15). Given the importance that haplotype block partitioning algorithms are believed to have for genome-wide association mapping, we wanted to address some practical but crucial questions that so far have not been addressed sufficiently. How do different algorithms compare in terms of the identification of haplotype blocks and tag SNPs? What impact does the choice of marker allele frequency have on the block partitioning? How sensitive are the results to changes in the parameter settings? We compared two major methods for haplotype block partitioning in three regions of the human genome that differed in their patterns of LD and gene content. For all three regions, we observed that the DPA consistently identified fewer haplotype blocks and tag SNPs than the LD method. Moreover, the identification of blocks and tag SNPs depended critically on the minor allele frequency. Neither method was completely insensitive to parameter choices, but the results could not be reconciled simply by parameter adjustment.
It has been proposed that genome-wide association studies could be performed in a systematic way by utilizing a reduced set of markers that tag the major haplotypes (16). This proposal follows from the observation that the genome is apparently organized into blocks of haplotypes (2,3). Numerous methods to identify these blocks algorithmically have been proposed (2,511). Still, many uncertainties persist (17).
The two major block partitioning algorithms we studied behaved very differently in the identification of haplotype blocks and tag SNPs. This is not merely a problem of calibration. Our results show that these substantial differences could not be rectified by adjusting parameters. Moreover, the differences were not only confined to one chromosomal region but were evident in each of three regions that differed in their overall recombination rates and gene content.
The absolute differences in the number of tag SNPs for our small study regions may not seem large. However, at a genome-wide level, they may amount to more substantial differences in the number of tag SNPs to be genotyped.
The identification of tag SNPs is meant to give an idea about the genotyping effort needed to cover a region or the whole genome sufficiently, while the identification of haplotype blocks can give us an idea how much of the genome has been sampled. Both are equally important when aiming at whole genome association mapping or focused fine-mapping of a region of interest. Thus, one would ideally want block partitioning algorithms to agree in these crucial features. From our comparison, one can see that different methods and marker allele frequencies give very different results.
How can these differences across methods be explained? If one thinks of haplotype blocks as nicely delineated genomic regions with low diversity interrupted by recombinational hot spots, one would be led to believe that different blocking algorithms should detect similar numbers of blocks and tag SNPs. However, the real situation appears to be more complex. Localized differences in recombination have been hypothesized to be the primary force behind the haplotype block structure of the genome (2,5,18). This hypothesis was supported by high-resolution LD studies, followed by estimation of recombination frequencies in sperm (19,20): areas of LD breakdown within stretches of strong LD corresponded perfectly with recombinational hot spots. However, the notion that such hot spots are required to explain the block structure has recently been challenged. The study by Phillips et al. (13) suggests that haplotype blocks can arise by factors other than recombination, such as natural selection, population bottlenecks, population admixture, choices of marker spacing and allele frequencies. In a simulation study, Zhang et al. (21) showed that haplotype blocks were observed even in the absence of recombination hot spots or recent population bottlenecks. Furthermore, genetic drift was also shown to generate block-like patterns. Thus, the authors cautioned against any global applicability of the haplotype map until studies had been done in multiple ethnic groups. Stumpf and Goldstein (22) reached a similar conclusion. In light of these uncertainties as to the underlying evolutionary processes, Schwartz et al. (12) conclude that differences between methods may be considered a direct consequence of the imperfect nature of the block concept.
A major focus of the present study was to assess the impact of the minor allele frequencies (q) on the outcome of the partitioning algorithms. From the literature, we can see that there are great discrepancies in the estimates of numbers of tag SNPs required for a genome-wide haplotype map (2,5,9). Apart from the fact that all these estimates are derived through different algorithms, one issue is very much overlooked: the range of q represented in the respective samples. Daly et al. (2) used SNPs with q>0.05, Patil et al. (9) only included SNPs with q>0.1, and Gabriel et al. (5) applied an even higher threshold of q>0.2.
To our knowledge, the impact of q on block partitioning algorithms has not been addressed formally. From our data, it can be seen that the number of blocks and tag SNPs identified strongly depends on the thresholds for q: the lower the threshold, the higher the numbers of tag SNPs. It has been argued that the generation of a haplotype map can ignore SNPs or haplotypes with minor allele frequencies of 10 or 20% or less, since rare causative polymorphisms will tend to be found on one or a few common haplotype backbones (23). Based on our data, we believe that such an approach might be risky. Common haplotype blocks may not automatically encompass less common variants. For example, the LD within a haplotype block may not be complete, since even within a block LD may decay with physical distance (24). Even under the scenario of complete LD, high-frequency tag SNPs might not necessarily capture rare variants, in particular in smaller sample sizes (25,26).
We would like to point out that the identification of tag SNPs for the DPA is tightly linked to delineation of haplotype blocks. This is a necessary consequence of defining tag SNPs based on the haplotype block to which they belong. For the LD method, this is not the case. In fact, Gabriel et al. (5) did not provide an algorithm for the definition of tag SNPs but focused on haplotype blocks exclusively. To enable a comparison between the methods, we used the tag SNP definition from the DPA to determine the SNPs tagging the haplotypes in the blocks identified by the LD method. However, the identification of tag SNPs is not necessarily contingent upon a prior identification of haploytpe blocks. In the case of a well-defined and physically small unit of observation, e.g. a gene, tag SNPs may well be identified without a prior determination of haplotype blocks, as demonstrated by Johnson et al. (6). However, such an algorithm cannot be directly applied to very long regions as all the haplotypes will be unique. Recently, Meng et al. (27) introduced an approach to define tag SNPs independently from haplotype blocks by using a sliding-window-based algorithm. Further studies are needed to compare the usefulness of tag SNPs identified using different algorithms for association studies.
We believe that it is problematic to limit the generation of haplotype maps to high-frequency SNPs. Such an approach may lead to fewer and larger blocks and lower genotyping efforts. However, this may come at the cost of artificially sparse maps that do not characterize the genomic structure adequately (13,17).
Our study aimed at evaluating the impact of methods, parameters and allele frequencies on the outcome of the block partitioning. Given that we chose only two methods for our comparison, our conclusions may be limited and not transferable to other algorithms. However, most of the extant methods are related to each other. Thus, we decided to consider two methods that differ in key principles and tend to span the range of common approaches. Furthermore, we chose three chromosomal regions to test the performance of the methods on different genomic backgrounds (i.e. recombination rate, gene-content).
Our data show that the computational identification of haplotype blocks remains algorithm-dependent and sensitive to allele frequency. At present, no one algorithm can be considered definitive. These algorithms were developed on the basis of different objectives. The main objective of DPA is to minimize the genotyping effort using tag SNPs for association studies; haplotype blocks were used as a tool to achieve this objective. On the other hand, the objective of Gabriel et al. (5) was to identify high LD regions using haplotype blocks; no tag SNPs were used for block partition. Depending on the purpose of a study, different block partition algorithms should be applied. In this respect, the creation of a general haplotype and tag SNP map may need to use several algorithms in parallel in order to keep up with its ambitions of universality. At the current stage it is not possible to say that any algorithm will deliver all-purposes haplotype blocks or tag SNPs. The interpretation of such features has to be within the limits of the specific algorithm employed and the purpose of a given study. An all-purposes haplotype block map and tag SNP set may not exist. Given that our understanding of the patterns of recombination and disequilibrium in the genome is still limited and that the notion of discrete blocks will probably prove too rigid to account for the complexity of linkage disequilibrium (28), future studies will have to address the question whether we should focus on approaches that describe the global genomic organization of LD (i.e. maps identifying blocks of haplotypes) or rather develop algorithms that identify tag SNPs independently of the haplotype block concept.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Chromosomal regions and samples studied
We performed our analyses with data derived from three chromosomal regions on chromosomes 18q21.3233, 22q13.3132 and 22q13.33, each fully sequenced and characterized with SNPs.
The region on chromosome 18 consists of a 180 kb contig of finished sequence that we characterized with 33 SNPs at a median density of
5 kb (for a list of the SNPs used, see Supplementary Material, 1). Genotyping was performed in 50 unrelated founder individuals from the Utah and French CEPH pedigree collection (www.cephb.fr; for genotyping procedure, see below). According to the Nov 2002 build of the Golden Path UCSC Genome browser (http://genome.ucsc.edu/cgi-bin/hgGateway), the average recombination rate in this region is estimated at 1.31.9 cM/Mb and the gene content is low (5.6 genes per Mb, according to the known and RefSeq track of the genome browser).
For the two regions on chromosome 22, we obtained publicly available data from the Wellcome Trust Sanger Institute (www.sanger.ac.uk/HGP/Chr22/). The data comprises 91 individuals from the UK; sample characteristics and genotyping procedures are described elsewhere (29). The region on chromosome 22q13.3132 consists of 811 kb characterized with 55 SNPs (rs1009783rs132231) at a
10 kb median density. The average recombination rate lies between 2.5 and 2.8 cM/Mb, gene content is the lowest of the three regions studied (1.2 genes per Mb, according to the known and RefSeq track of the genome browser). The region 22q13.33 consists of 993 kb characterized with 54 SNPs (rs139777TSC0100622) at a
9 kb median density. This region shows the highest gene content of the three regions analyzed (34.2 genes per Mb, according to the known and RefSeq track of the genome browser). The average recombination rate is very low.
Genotyping (18q21.3233)
Genotyping was performed using template-directed dye-terminator incorporation with fluorescence-polarization detection (FP-TDI) (30). A detailed protocol is presented elsewhere (31).
Minor allele frequency (q) thresholds used
One of our main interests was to assess the performance of the block partitioning algorithms for various thresholds of q. Thus we created seven sub-sets of SNPs from the respective original data sets, with the following q thresholds by progressively excluding SNPs with q values less than the threshold: q
0.01 (i.e. the original samples including all SNPs), q
0.04, q
0.1, q
0.19, q
0.25, q
34 and q
0.41.
Calculation of inter-SNP LD
Pairwise inter-SNP LD for the three regions, as expressed by the standardized LD coefficient D' (32), was calculated using the ldmax option in GOLD (www.sph.umich.edu/csg/abecasis/GOLD/) (33). This uses haplotype frequencies estimated by an expectation-maximization (EM) algorithm, and has been shown to perform well in unphased data (34,35).
Block partitioning
We compared the performance of two major methods to define haplotype blocks, the dynamic programming algorithm (DPA) (10) and a method based on D' (5), henceforth referred to as the LD method. The methods are described in detail in the original papers, and briefly summarized below.
DPA
Haplotypes are inferred through a partition-ligation EM algorithm (36). Subsequently, the DPA is applied to partition the obtained haplotypes into blocks. Common haplotypes are defined as those haplotypes that are represented more than once in a block. In the final block partition, a subset of consecutive SNPs is a block only if the common haplotypes account for at least
percent (coverage) of all estimated haplotypes within that block. The DPA aims to minimize the number of SNPs (i.e. tag SNPs) that distinguish at least ß percent of the haplotypes in a block. For our primary comparison with the LD method, in keeping with the original study of the DPA (10), we set
=ß=0.80. To further assess the influence of parameter settings, we also performed the DPA analysis for other
and ß values (0.7, 0.75, 0.85, 0.9 and 0.95)
LD method
D' values for all pairs of SNPs were calculated and the variance was estimated (37). We used a modified version of the previously described LD method (5) that replaced the bootstrap-based variance estimates with a normal approximation. Simulations showed that this modified method gave similar confidence intervals for D' as the bootstrap method with much less computational time (38). Pairs of SNPs were considered to be in strong LD if the one-sided upper 95% confidence bound on D' was larger than 0.98 and the lower bound was larger than 0.7. Low LD was assumed for pairs with an upper bound less than 0.9. A haplotype block was then defined as a region over which less than 5% of SNP pairs showed low levels of LD.
Since the LD method does not include an algorithm to define tag SNPs, we used the same criteria as in the DPA (see above).
To assess the influence of parameter settings, we also performed the analysis with more stringent criteria (strong LD defined as upper bound on D' >0.99 and lower bound >0.75) and with less stringent criteria (upper bound >0.96 and lower bound >0.65). For detailed block definition criteria, see Supplementary Material, 1.
| SUPPLEMENTARY MATERIAL |
|---|
|
|
|---|
Supplementary Material is available at HMG Online.
| ACKNOWLEDGEMENTS |
|---|
Supported by grants from the National Institute of Mental Health, the Edward F. Mallinckrodt Jr Foundation, the Chicago Brain Research Institute, and the National Alliance for Research on Schizophrenia and Depression (Young Investigators Awards to T.G.S. and Y.S.C.). K.Z. and F.S. were supported by a grant from the National Institutes of Health (NIH P50 HG 002790). We gratefully acknowledge help from Gonçalo Abecasis in obtaining the chromosome 22 genotypes from The Wellcome Trust Sanger Institute.
| FOOTNOTES |
|---|
* To whom correspondence should be addressed. Tel: +49 6211703724; Fax: +49 6211703741; Email: schulze{at}zi-mannheim.de
| REFERENCES |
|---|
|
|
|---|
- Risch, N. (2000) Searching for genes in complex diseases: lessons from systemic lupus erythematosus. J. Clin. Invest, 105, 15031506.[Web of Science][Medline]
- Daly, M.J., Rioux, J.D., Schaffner, S.F., Hudson, T.J. and Lander, E.S. (2001) High-resolution haplotype structure in the human genome. Nat. Genet., 29, 229232.[CrossRef][Web of Science][Medline]
- Taillon-Miller, P., Bauer-Sardina, I., Saccone, N.L., Putzel, J., Laitinen, T., Cao, A., Kere, J., Pilia, G., Rice, J.P. and Kwok, P.Y. (2000) Juxtaposed regions of extensive and minimal linkage disequilibrium in human Xq25 and Xq28. Nat. Genet., 25, 324328.[CrossRef][Web of Science][Medline]
- Zhang, K., Calabrese, P., Nordborg, M. and Sun, F. (2002) Haplotype block structure and its applications to association studies: power and study designs. Am. J. Hum. Genet., 71, 13861394.[CrossRef][Web of Science][Medline]
-
Gabriel, S.B., Schaffner, S.F., Nguyen, H., Moore, J.M., Roy, J., Blumenstiel, B., Higgins, J., DeFelice, M., Lochner, A., Faggart, M. et al. (2002) The structure of haplotype blocks in the human genome. Science, 296, 22252229.
[Abstract/Free Full Text] - Johnson, G.C., Esposito, L., Barratt, B.J., Smith, A.N., Heward, J., Di Genova, G., Ueda, H., Cordell, H.J., Eaves, I.A., Dudbridge, F. et al. (2001) Haplotype tagging for the identification of common disease genes. Nat. Genet., 29, 233237.[CrossRef][Web of Science][Medline]
- Koivisto, M., Perola, M., Varilo, T., Hennah, W., Ekelund, J., Lukk, M., Peltonen, L., Ukkonen, E. and Mannila, H. (2003) An MDL method for finding haplotype blocks and for estimating the strength of haplotype block boundaries. Pacific Symposium on Biocomputing, pp. 502513.
- Mannila, H., Koivisto, M., Perola, M., Varilo, T., Hennah, W., Ekelund, J., Lukk, M., Peltonen, L. and Ukkonen, E. (2003) Minimum description length block finder, a method to identify haplotype blocks and to compare the strength of block boundaries. Am. J. Hum. Genet., 73, 8694.[CrossRef][Web of Science][Medline]
-
Patil, N., Berno, A.J., Hinds, D.A., Barrett, W.A., Doshi, J.M., Hacker, C.R., Kautzer, C.R., Lee, D.H., Marjoribanks, C., McDonough, D.P. et al. (2001) Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. Science, 294, 17191723.
[Abstract/Free Full Text] -
Zhang, K., Deng, M., Chen, T., Waterman, M.S. and Sun, F. (2002) A dynamic programming algorithm for haplotype block partitioning. Proc. Natl Acad. Sci. USA, 99, 73357339.
[Abstract/Free Full Text] -
Zhang, K. and Jin, L. (2003) HaploBlockFinder: haplotype block analyses. Bioinformatics, 19, 13001301.
[Abstract/Free Full Text] - Schwartz, R., Halldorsson, B.V., Bafna, V., Clark, A.G. and Istrail, S. (2003) Robustness of inference of haplotype block structure. J. Comput. Biol., 10, 1319.[CrossRef][Web of Science][Medline]
- Phillips, M.S., Lawrence, R., Sachidanandam, R., Morris, A.P., Balding, D.J., Donaldson, M.A., Studebaker, J.F., Ankener, W.M., Alfisi, S.V., Kuo, F.S. et al. (2003) Chromosome-wide distribution of haplotype blocks and the role of recombination hot spots. Nat. Genet., 33, 382387.[CrossRef][Web of Science][Medline]
-
Zhang, W., Collins, A., Maniatis, N., Tapper, W. and Morton, N.E. (2002) Properties of linkage disequilibrium (LD) maps. Proc. Natl Acad. Sci. USA, 99, 1700417007.
[Abstract/Free Full Text] - Zhang, K., Sun, F., Waterman, M.S. and Chen, T. (2003) Haplotype block partition with limited resources and applications to human chromosome 21 haplotype data. Am. J. Hum. Genet., 73, 6373.[CrossRef][Web of Science][Medline]
- Collins, F.S. and Green, E.D. (2003) A vision for the future of genomics research. Nature, 422, 835847.[CrossRef][Medline]
- Carlson, C.S., Eberle, M.A., Rieder, M.J., Smith, J.D., Kruglyak, L. and Nickerson, D.A. (2003) Additional SNPs and linkage-disequilibrium analyses are necessary for whole-genome association studies in humans. Nat. Genet., 33, 518521.[CrossRef][Web of Science][Medline]
- Goldstein, D.B. (2001) Islands of linkage disequilibrium. Nat. Genet., 29, 109111.[CrossRef][Web of Science][Medline]
- Jeffreys, A.J., Kauppi, L. and Neumann, R. (2001) Intensely punctuate meiotic recombination in the class II region of the major histocompatibility complex. Nat. Genet., 29, 217222.[CrossRef][Web of Science][Medline]
-
Jeffreys, A.J., Ritchie, A. and Neumann, R. (2000) High resolution analysis of haplotype diversity and meiotic crossover in the human TAP2 recombination hotspot. Hum. Mol. Genet., 9, 725733.
[Abstract/Free Full Text] - Zhang, K., Akey, J.M., Wang, N., Xiong, M., Chakraborty, R. and Jin, L. (2003) Randomly distributed crossovers may generate block-like patterns of linkage disequilibrium: an act of genetic drift. Hum. Genet, 113, 5159.[CrossRef][Web of Science][Medline]
- Stumpf, M.P. and Goldstein, D.B. (2003) Demography, recombination hotspot intensity, and the block structure of linkage disequilibrium. Curr. Biol., 13, 18.[CrossRef][Web of Science][Medline]
- Judson, R., Salisbury, B., Schneider, J., Windemuth, A. and Stephens, J.C. (2002) How many SNPs does a genome-wide haplotype map require? Pharmacogenomics, 3, 379391.[CrossRef][Web of Science][Medline]
-
Shifman, S., Kuypers, J., Kokoris, M., Yakir, B. and Darvasi, A. (2003) Linkage disequilibrium patterns of the human genome across populations. Hum. Mol. Genet., 12, 771776.
[Abstract/Free Full Text] -
Risch, N. and Merikangas, K. (1996) The future of genetic studies of complex human diseases. Science, 273, 15161517.
[Abstract/Free Full Text] - McGinnis, R., Shifman, S. and Darvasi, A. (2002) Power and efficiency of the TDT and case-control design for association scans. Behav. Genet., 32, 135144.[CrossRef][Web of Science][Medline]
- Meng, Z., Zaykin, D.V., Xu, C.F., Wagner, M. and Ehm, M.G. (2003) Selection of genetic markers for association analyses, using linkage disequilibrium and haplotypes. Am. J. Hum. Genet., 73, 115130.[CrossRef][Web of Science][Medline]
- Cardon, L.R. and Abecasis, G.R. (2003) Using haplotype blocks to map human complex trait loci. Trends Genet., 19, 135140.[CrossRef][Web of Science][Medline]
- Dawson, E., Abecasis, G.R., Bumpstead, S., Chen, Y., Hunt, S., Beare, D.M., Pabial, J., Dibling, T., Tinsley, E., Kirby, S. et al. (2002) A first-generation linkage disequilibrium map of human chromosome 22. Nature, 418, 544548.[CrossRef][Medline]
-
Chen, X., Levine, L. and Kwok, P.Y. (1999) Fluorescence polarization in homogeneous nucleic acid analysis. Genome Res., 9, 492498.
[Abstract/Free Full Text] - Akula, N., Chen, Y.S., Hennessy, K., Schulze, T.G., Singh, G. and McMahon, F.J. (2002) Utility and accuracy of template-directed dye-terminator incorporation with fluorescence-polarization detection for genotyping single nucleotide polymorphisms. Biotechniques, 32, 10721076.[Web of Science][Medline]
-
Lewontin, R.C. (1964) The interaction of selection and linkage. I. general considerations; heterotic models. Genetics, 49, 4967.
[Free Full Text] -
Abecasis, G.R. and Cookson, W.O. (2000) GOLDgraphical overview of linkage disequilibrium. Bioinformatics, 16, 182183.
[Abstract/Free Full Text] - Dempster, A.P., Laird, N.M. and Rubin, D.B. (1977) Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc., Ser. B, 39, 138.
- Excoffier, L. and Slatkin, M. (1995) Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. Mol. Biol. Evol., 12, 921927.[Abstract]
- Qin, Z.S., Niu, T. and Liu, J.S. (2002) Partitionligation expectation-maximization algorithm for haplotype inference with single-nucleotide polymorphisms. Am. J. Hum. Genet, 71, 12421247.[CrossRef][Web of Science][Medline]
- Zapata, C., Alvarez, G. and Carollo, C. (1997) Approximate variance of the standardized measure of gametic disequilibrium D'. Am. J. Hum. Genet., 61, 771774.[Web of Science][Medline]
-
Kim, S.K., Zhang, K. and Sun, F. (2004) A comparison of different strategies for computing confidence intervals of the linkage disequilibrium measure D'. Pacific Symposium on Biocomputing (in press).
This article has been cited by other articles:
![]() |
R. W Lawrence, D. M Evans, and L. R Cardon Prospects and pitfalls in whole genome association studies Phil Trans R Soc B, August 29, 2005; 360(1460): 1589 - 1595. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. A. Rana, N. D. Ebenezer, A. R. Webster, A. R. Linares, D. B. Whitehouse, S. Povey, and A. J. Hardcastle Recombination hotspots and block structure of linkage disequilibrium in the human genome exemplified by detailed analysis of PGM1 on 1p31 Hum. Mol. Genet., December 15, 2004; 13(24): 3089 - 3102. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Ke, C. Durrant, A. P. Morris, S. Hunt, D. R. Bentley, P. Deloukas, and L. R. Cardon Efficiency and consistency of haplotype tagging of dense SNP maps in multiple samples Hum. Mol. Genet., November 1, 2004; 13(21): 2557 - 2565. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||




