Skip Navigation

This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (16)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Schulze, T. G.
Right arrow Articles by McMahon, F. J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Schulze, T. G.
Right arrow Articles by McMahon, F. J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Human Molecular Genetics, 2002, Vol. 11, No. 12 1363-1372
© 2002 Oxford University Press

Can long-range microsatellite data be used to predict short-range linkage disequilibrium?

Thomas G. Schulze1,2,*, Yu-Sheng Chen1, Nirmala Akula1, Kathleen Hennessy1, Judith A. Badner1, Melvin G. McInnis3, J. Raymond DePaulo3, Johannes Schumacher4, Sven Cichon4,5, Peter Propping4, Wolfgang Maier2, Marcella Rietschel2,6, Markus M. Nöthen4,5 and Francis J. McMahon1

1Department of Psychiatry, The University of Chicago, Chicago, IL 60637, USA, 2Department of Psychiatry, University of Bonn, 53105 Bonn, Germany, 3Department of Psychiatry, Johns Hopkins University, Baltimore, MD 21287, USA, 4Institute of Human Genetics, University of Bonn, 53111 Bonn, Germany, 5Department of Medical Genetics, University of Antwerp, 2610 Antwerp, Belgium and 6Central Institute of Mental Health, 68072 Mannheim, Germany

Received December 19, 2001; Revised March 12, 2002; Accepted March 28, 2002


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 MATERIAL AND METHODS
 REFERENCES
 
The distribution of linkage disequilibrium (LD) across the genome is highly complex. Little is known about the relationship between long-range and short-range LD in a genomic region. We assessed whether a dense set of microsatellite data could be used to predict short-range LD in family samples. We analyzed intermarker LD in data derived from chromosomal regions 18q22 and 10q25–26, densely genotyped with microsatellite markers. The pattern of LD was highly heterogeneous within and between both chromosomal regions. On 10q25–26, very little LD was detected. On 18q22, where marker density was higher, many marker pairs were in LD. We modeled the decay of LD over distance in this region. A classical model accounted for most of the relationship between LD and distance (2=63%). We used this model to predict the proportion of markers expected to show useful levels of LD at short distances. This prediction agreed with estimates based on single-nucleotide polymorphism (SNP) marker genotypes in the region. Both microsatellite and SNP data predict that about 80% of marker pairs would display levels of LD that are useful for association studies at distances of up to 15 kb in this region. These projections also agree with levels of LD directly measured in a 10 kb set of SNP genotypes generated in a nearby region of finished sequence. Our results suggest that existing sets of microsatellite data, if sufficiently dense, may be used to develop good initial estimates of the density of additional markers needed to screen a region for disease alleles by association analysis.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 MATERIAL AND METHODS
 REFERENCES
 
Success in finding genetic association in fine-mapping analyses hinges on the ability to discern linkage disequilibrium (LD) between marker and disease alleles (1). Hence, in recent years, genetic linkage and association studies have been closely accompanied by studies that attempt to describe the extent of intermarker LD for various chromosomal regions, particularly the relationship between LD and intermarker distance. The overall goal of all these studies is to develop good estimates of the density of markers needed to carry out successful association mapping experiments.

Considerable controversy surrounds this issue, however. It has been shown that the physical extent of LD may vary significantly between populations (2,3). Patterns of LD are also irregular across different chromosomal regions (47). For instance, a more rapid falloff of LD with physical distance can be found in telomeric regions (8). Moreover, estimates of the actual physical extent of LD vary widely. On the basis of simulation studies using a biallelic marker model, Kruglyak (9) concluded that marker pairs more than 3 kb apart would not demonstrate ‘useful’ LD in the general population (>5 kb for isolated populations) and that thus approximately 500 000 equally spaced single-nucleotide polymorphisms (SNPs) would be needed for whole-genome LD scans. This rather disheartening estimate is not supported by other studies that observed substantial LD over physical distances greater than 100 kb in actual population samples (1015). Studies in isolated founder populations report LD between markers even several Megabases apart (1619). There is further controversy over whether LD and physical distance correlate at all in small genomic regions (10,20). Very recently, it has been suggested that the human genome is organized in blocks of haplotypes (21,22) and that gene conversion predominates over recombination at small distances (23,24), adding further complexity to the relationship between LD and distance.

The overall goal of this study was to assess whether microsatellite data can be used to predict the proportion of marker pairs at a given physical distance that can be expected to display useful levels of LD. Such predictions would be helpful in planning the density of markers to genotype for fine-mapping studies. Microsatellite marker data at a 1–2 cM density are typically generated in the second stage of genetic linkage studies. It would be efficient to use such existing data as a starting point.

We analyzed the distribution of LD in two well-known linkage regions for bipolar affective disorder (BPAD). For this purpose we used data from two major pedigree collections in the field: the Johns Hopkins/Dana Foundation Pedigrees, with a linkage region on 18q22 (2528), hereinafter referred to as sample A, and the University of Bonn sample, with a linkage region on 10q25–26 (29,30), hereinafter referred to as sample B. We focused on sets of microsatellite markers with dense genetic intermarker distances (1–2 cM), typical of the kind of marker data generated at the second stage of a genome-wide linkage study. Markers were physically ordered according to the recently released draft sequence of the human genome (31). Within the heart of the linkage region on 18q22, we additionally typed 27 SNPs in 93 unrelated individuals.

For haplotying in the pedigree data, we used the SIMWALK2 package (32). Haplotype frequencies in the SNP data were inferred using an expectation-maximization algorithm (EM), as implemented in the GOLD program (33). We calculated the LD parameters D' and {Delta}2 with GOLD. We adapted a classical equation for the decay of LD [Equation 1: D=D0 (1–{theta})t] to model the decay of LD over distance. For this purpose, we expressed LD as the proportion of marker pairs showing useful levels of LD. We considered D'-values greater than 0.3 and {Delta}2-values greater than 0.1 as thresholds of useful levels of LD. The rationale and the mathematical derivations for this modeling approach are detailed in the Materials and Methods section.

Our data are consistent with the existing findings indicating that, while LD is highly heterogeneous across different genomic regions, useful LD might be observed over physical distances larger than 3–5 kb in European populations. We extend the existing findings by demonstrating that, for one of the two chromosomal regions that we studied (18q22), a classical model of the decay of LD over distance predicted the proportion of marker pairs showing useful levels of LD at distances well below those tested in our microsatellite data. Existing sets of microsatellite data, if sufficiently dense, may be used to develop good initial estimates of the density of additional markers needed to screen a region for disease alleles by association analysis.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 MATERIAL AND METHODS
 REFERENCES
 
LD across each chromosomal region
Figures 1 and 2 show the relationship between pairwise LD and physical intermarker distances for all 231 (sample A) and 120 (sample B) marker pairs. The pattern of LD was highly heterogeneous across both chromosomal regions. In sample A (Figs 1A and 2A), pairwise D'-values varied between 0.1 and 0.442 for adjacent markers. The maximal extent of LD at the D'>0.3 level for any marker pair was 3.7 Mb. The closest pair of markers with D'<0.3 was spaced 33 kb apart (D'=0.152). In sample B, very little LD was detected. Pairwise D'-values varied between 0.066 and 0.227 for adjacent markers (Figs 1B and 2B). No marker pair was in LD at the D'>0.3 level. The closest pair of markers were spaced 137 kb apart (D'=0.174).



View larger version (17K):
[in this window]
[in a new window]
 
Figure 1. LD and physical distance. The relationship between LD (D') and physical (intermarker) distance is shown. All marker pairs are plotted. (A) Johns Hopkins/Dana Foundation Pedigrees Series. (B) University of Bonn Pedigrees Series.

 


View larger version (45K):
[in this window]
[in a new window]
 
Figure 2. LD across the chromosomal regions 18q22 (sample A) and 10q25–26 (sample B). The overall distribution of LD across the respective chromosomal regions, plotted using the graphical display of the GOLD program, for sample A (A) and B (B). For each marker pair, GOLD plots the color-coded D'-values at the Cartesian coordinates that correspond to the actual physical positions of the markers. Marker pairs made up by neighboring markers are plotted along the diagonal. Pairs of markers that are further apart are plotted at increasing distance from the diagonal.

 
Decay of LD for the microsatellite and SNP data
In sample A, the intermarker LD, expressed as the proportion of marker pairs with D'>0.3 decreased over genetic distance in a manner consistent with a classical decay model (Equation 1). For the observed data range (60 kb–10 Mb), Equation 1 accounted for most of the relationship between LD and genetic distance (R2=0.63, t=617) (Fig. 3). This analysis was performed only for sample A, since in sample B no marker pair yielded a D' above 0.3.



View larger version (25K):
[in this window]
[in a new window]
 
Figure 3. Decay of LD over distance. The main graph shows the decay of LD in microsatellite data with both genetic distance (expressed by the recombination fraction {Theta}) and physical distance is presented. The graph depicts both the actually observed data points (the lower limit of observed range of intermarker distances is 60 kb) and the projected values. The inset: shows the decay of LD in SNP data with both genetic and physical distance. The graph depicts both the actually observed data points (the lower limit of observed range of intermarker distances is 10 kb) and the projected values. LD is expressed as the proportion of markers pairs with D'>0.3 (microsatellites) and {Delta}2>0.1, respectively.

 
For the SNP data, we calculated the LD measures D' and {Delta}2 for all possible 351 marker pairs. As with the microsatellite data, LD was expressed as the proportion of marker pairs with D'>0.3. We also examined the proportion of marker pairs with {Delta}2>0.1. When D' was used, Equation 1 did not account for much of the relationship between LD and genetic distance (R2=0.30, t=620). The LD parameter {Delta}2 performed better: Equation 1 accounted for most of the relationship between LD and genetic distance (R2=0.80, t=832) over the observed data range of 10 kb–5.4 Mb. These observations extend beyond the lower limit (60 kb) of intermarker distances in the microsatellite data, but continue to vary in a manner consistent with Equation 1 (Fig. 3, inset).

Predicting short-range LD
Given the good fit between model predictions and actual data (Fig. 3), we used Equation 1 to project the proportion of marker pairs with D'>0.3 (t=617), using microsatellite data: these projections fit the actually observed proportions from the SNP data very well (Pearson correlation coefficient r=0.95, P<0.0006). We also used the SNP data to project the proportion of marker pairs with {Delta}2>0.1 (t=832). Table 1 shows these projections for a range between 0 and 200 kb. A comparison of the projected values revealed that the microsatellite-based and the SNP-based models agreed closely (Pearson correlation coefficient r=0.86, P<0.0001). Both models predict that about 80% of marker pairs would be expected to display useful levels of LD at distances of up to 15 kb in this region (Table 1).


View this table:
[in this window]
[in a new window]
 
Table 1. Projected levels of useful LD
 

    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 MATERIAL AND METHODS
 REFERENCES
 
The main goal of this study was to assess whether the kind of microsatellite data typically generated in linkage studies can be used to predict the proportion of markers that will display useful levels of LD at short physical distances, i.e. over 1–20 kb. We based our marker order on the draft sequence of the human genome (31), and analyzed LD between dense sets of microsatellite markers in two different genomic regions genotyped in pedigrees of European origin. We found that a classical model for the decay of LD with distance fitted the data on chromosome 18q22 well, but could not be applied to the chromosome 10q25–26 data since useful levels of LD were not observed in that dataset. The classical model we applied produced projections of LD that agreed closely with those observed from a 200 kb set of SNP genotypes in the region, even at intermarker distances well below those represented in our microsatellite data. Our results suggest that existing sets of microsatellite marker genotypes, if sufficiently dense, may be used to develop good initial estimates of the density of additional markers that would be needed to screen a region for disease alleles by association analysis.

Our results are consistent with the growing body of studies that indicate a complex pattern of LD in various regions of the human genome (220). Some studies suggest that the distribution of LD is largely ruled by stochastic factors, selection processes, demographic factors and gene conversion (23,24,34), and that simple relationships between LD and distance are thus not to be expected. Despite this complexity, we observed a relationship between intermarker LD and distance in our microsatellite data that accounted for most of the variance over the 60 kb–10 Mb range on 18q22. This same relationship was observed at distances between 10 kb and 5.4 Mb in the SNP data that we studied in the region: projected proportions agreed closely with predictions based on the microsatellite data. That is, the SNP data confirmed the predictions derived from the microsatellite data, demonstrating that the microsatellite data were sufficient to reveal the essentials of the distribution of LD in the region and that the 2-fold increase in marker density provided by the SNP data was not necessary. Further support for the microsatellite predictions comes from a 10 kb set of SNP genotypes generated in a nearby region of finished sequence (35). Observed levels of LD again agreed closely with the microsatellite-based predictions. Daly et al. (21) argue that the ‘traditional’ assessment of LD patterns based on single-marker analyses ‘often yields an erratic, non-monotonic picture’. Their study, as well as a comprehensive study of chromosome 21 (22), suggest that the human genome is organized into blocks of haplotypes, and that LD analyses should be based on a map of haplotypes rather than individual markers. Our results do not contradict this view, but suggest that estimates of LD based on individual marker data may be very useful for planning the initial stages of mapping a large genomic region by association. Haplotype-based approaches might then be most efficiently applied at subsequent stages in the fine-mapping process.

Abecasis et al. (14), Reich et al. (15) and Dunning et al. (20) have already addressed the question of what proportion of marker pairs at a given distance displays a given level of LD, but do not attempt to model this relationship formally. To the best of our knowledge, this is the first study to do so in order to derive predictions about fine-scale LD. Abecasis et al. (14) apply the classical decay only to model the decay of individual values of D'. Their model, however, only explained roughly 45% of the observed variance, which may be too low for modeling subsequent LD mapping studies. Our proportion-based model accounted for most of the relationship between LD and distance, although the implications of using proportions within this context require further studies.

The modeling approach described here is based on data derived from only one chromosomal region (18q22), so we cannot comment on its generalizability to other regions. We need studies that compare fine-scale patterns of LD between genomic regions that differ at a coarse scale, but little of the necessary data are currently available in humans. We did not detect useful levels of LD on 10q25–26, but this should not be taken as evidence against the generalizability of the18q22 results. The average distance between the markers on 10q25–26 was more than twice as great as that on 18q22, and the marker density on 10q25–26 might simply not have been great enough to detect significant association between markers in this region. Meaningful conclusions about the pattern of LD in this region might only be drawn if additional markers were typed. The generality of our findings will become clearer after analysis of additional regions.

Although we used the recently released draft sequence of the human genome to order markers across the regions studied, there is still potential for error in marker order and distances, underlined by unresolved differences between the two published versions of the sequence (36). Error in marker order would lead to false inferences about haplotypes, tending to decrease the observed LD between marker pairs, and thus could not account for our findings. Error in intermarker distances would be unlikely to affect our overall estimates, which are based on many different marker pairs.

One of the major strengths of our study is that we calculated LD in the microsatellite data based on pedigrees consisting of multiple siblings and one or two parents. Family-based haplotypes are robust to multiple heterozygotes, and are less sensitive to sampling error than haplotypes inferred from unrelated individuals (3739). SNP haplotypes from unrelated individuals performed well in this study in the sense that the results agreed with those of the microsatellites. While experimental determination of haplotypes in the laboratory may give more accurate results, the methods are labor-intensive and the results would be unlikely to change our main conclusions.

From our study, we cannot provide a clear guideline on what constitutes a sufficiently dense microsatellite map. We found that a mean spacing of 1 cM enabled LD modeling in one region whereas a 2 cM map in another region did not. The determination of the optimal microsatellite density would be a fruitful object of future investigations.

The overall conclusion that can be drawn from this study is that microsatellite data, if sufficiently dense, can be used to make initial predictions about the level of short-range LD present in susceptibility regions identified by linkage studies. These predictions can be useful for planning the density of markers needed for fine-mapping experiments.

The theoretical implications of our modeling approach require further study, but our conclusions may be of practical importance, both for following up linkage results that implicate large genomic regions and for estimating the number of markers needed for genome-wide association studies.


    MATERIAL AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 MATERIAL AND METHODS
 REFERENCES
 
Samples studied
The analysis comprised two pedigree samples:

  1. The Johns Hopkins/Dana Foundation Bipolar Pedigrees (sample A), with a total of 58 families of essentially European–American origin (n=506) (2528,40). Most families consist of several sibships and both parents, allowing for pedigree-based haplotype estimation (see below). In these families, a susceptibility region for BPAD has been described on chromosome 18q22 (2528). Written informed consent was obtained from all participants.
  2. The University of Bonn Bipolar Pedigree series (sample B), with a total of 75 families (n=444), comprising 66 families from Germany, 8 families from Israel, and 1 family from Italy. For all families, genotype information on both parents was available. In these families, a susceptibility region for BPAD has been reported on chromosome 10q25–26 (29,30). For the present analyses, we excluded the 8 Israeli families (n=52), who were of mainly non-Ashkenazi origin, in order to minimize the level of heterogeneity. Written informed consent was obtained from all participants.

Markers analyzed
We analyzed data from microsatellite markers that were closely spaced across each candidate region according to the Marshfield Sex-Averaged Genetic Map (http://research.marshfieldclinic.org/genetics/). For sample A, the set comprises the marker interval from D18S465 to D18S462, spanning 10 Mb (mean intermarker distance 0.95 cM). For sample B, we chose the set of markers that spans the 15 Mb interval from D10S190 to D10S212 (mean intermarker distance 2.03 cM). These two marker sets encompass the respective susceptibility regions.

We ordered each set of markers according to the recently released draft sequence of the human genome (31). Thus, only those markers that could be placed unambiguously on the sequence were considered for the analysis. Table 2 shows the final list of markers that were used. The table illustrates the higher precision in marker order that can be achieved by using sequence information rather than published genetic maps, where the same genetic map position is often reported for many nearby markers. Genetic distances were obtained from the Marshfield Sex-Averaged Genetic Map. Physical distances were based on the April 2001 version of the NCBI human genome sequence (http//www.ncbi.nlm.nih.gov). Both regions show similar ratios of genetic to physical distance (0.0020 cM/kb for sample A versus 0.0022 cM/kb for sample B), suggesting similar local recombination rates (see Table 2).


View this table:
[in this window]
[in a new window]
 
Table 2. List of markers used in the analysis
 
Genotype data were ‘cleaned’ of genotyping errors with the program sib_clean of the ASPEX set of programs for multipoint linkage analysis (41; ftp://lahmed.stanford.edu/pub/aspex/index.html), using the recommended threshold of 1% error frequency. Another ASPEX program (sib_map) was used to generate sample-based sex-specific maps.

Haplotyping and calculation of pairwise LD
Haplotype sets were inferred from the pedigree data by the SIMWALK2 package (32; http://watson.hgen.pitt.edu/docs/simwalk2.html). SIMWALK2 estimates the most likely set of fully-typed maternal and paternal haplotypes for each individual in a pedigree. SIMWALK2 input files were generated by MEGA2 (42; http://watson.hgen.pitt.edu/docs/mega2_html/mega2.html). Multiple SIMWALK2 runs on the same data set can give different haplotype estimates, which prompted Moffatt et al. (6) and Abecasis et al. (14) to assess the reliability of SIMWALK2-generated sets of haplotypes: they found several runs with different random seeds to give similar results. As recommended by Abecasis et al. (14), we performed five SIMWALK2 runs with different random seeds for each sample. Since D'-values calculated from these haplotype sets (see below) differed little between runs (maximum variation of 22%), we used the first haplotype set for subsequent analysis.

Pairwise LD was calculated with the program GOLD (33; http://www.well.ox.ac.uk/asthma/GOLD/). GOLD uses founder haplotypes identified by SIMWALK2 to calculate the absolute value of the multiallelic version (43) of the standardized disequilibrium coefficient D' (44,45) and other measures of LD. The range of the absolute value of D' (i.e. |D'|) lies between 0 (no disequilibrium) and 1 (maximum disequilibrium, i.e. at least one theoretically possible haplotype constellation is not observed in the data). Compared with other measures of LD, D' offers the advantage that its range is independent of allele frequencies (45). Since the sign is not important in the analysis that we performed, for reasons of simplicity, we will hereinafter only use the term D' for |D'|.

The calculation of LD by GOLD also incorporates a standard, contingency-table {chi}2-testing for intermarker association. For microsatellite markers with rare alleles, the validity of the {chi}2-statistic is diminished owing to table sparseness, so GOLD pools rare alleles at a 7% frequency threshold by default. Monte Carlo approximations are an alternative statistical tool to estimate reliable P-values (12) in this situation. However, neither method addresses the problem of P-value inflation due to multiple testing, for which it seems difficult to formulate an appropriate correction (12). Hence, in our analysis of the microsatellite data, we emphasize the use of D' as a measure of LD. The estimation of precise P-values is an important future challenge. Furthermore, the correlated genealogical histories between different marker pairs may also contribute to uncertainty about statistical independence between marker pairs.

Inter-SNP LD for the respective susceptibility locus region on 18q22
Intermarker LD on 18q22 was also determined in a set of 27 SNPs spanning 5.4 Mb, genotyped in 93 unrelated founder individuals from the CEPH (Centre d'Etudes des Polymorphismes Humains; http://www.cephb.fr) pedigree collections (90 from the Utah, 2 from the French and 1 from the Venezuelan pedigrees). Minor allele frequencies for the SNP data ranged from 8% to 47% in this sample. SNP positions were determined by aligning flanking sequences on the 12 December 2000 build of the Golden Path (http://genome.cse.ucsc.edu/goldenPath/12dec200/database/) (Table 3). LD was calculated using the ldmax option in GOLD (33), which uses haplotype frequencies estimated by an expectation-maximization (EM) algorithm (46,47). For this biallelic marker data, GOLD calculates the LD measure {Delta}2 (48) in addition to D'.


View this table:
[in this window]
[in a new window]
 
Table 3. SNPs typed in CEPH founder individuals on chromosome 18q22
 
Modeling the decay of LD for the microsatellite and SNP data
The decay of LD can be described as a function of time t (in generations) and the recombination fraction (i.e. genetic distance) {theta}, as in the classical equation


(1)

Equation 1 was originally applied to the decay of LD expressed as the LD parameter D between two loci (for a mathematical derivation, see 49). Equation 1 has been analogously applied to model the decay of LD based on the standardized disequilibrium coefficient D' (14). Abecasis et al. (14), Reich et al. (15) and Dunning et al. (20) have already addressed the question of what proportion of marker pairs at a given distance displays a given level of LD, but have not attempted to model this relationship formally. To the best of our knowledge, this is the first study to do so in order to derive predictions about fine-scale LD. Abecasis et al. (14) apply the classical decay model only to model the decay of individual values of D'. Our approach is not based on individual values of LD parameters. Instead, we dichotomize the individual values into groups above and below a threshold value. This allows us to directly model the proportions of marker pairs at a given distance that will display useful levels of LD. Moreover, the proportional approach mitigates the positive bias inherent to the use of the absolute values of D' (|D'|) (14) by counting all intervals with values less than 0.3 as zero, irrespective of the sign of D'. A formal explication of using proportions rather than individual values has not been attempted to our knowledge, but needs further study.

We used Equation 1 to model the decay in LD (expressed as the proportion of marker pairs showing useful levels of LD) over distance. There is no universal agreement as to what constitutes ‘useful’ levels of LD, however a d2-value (i.e. the squared difference between the observed and expected haplotype frequency) of greater than 0.1 has been proposed as the minimum useful amount of LD for association mapping (9). For the case of a rare disease and randomly sampled haplotypes, Devlin and Risch (48) proposed transforming d2 into a D'-value according to the formula . Thus, Moffatt et al. (6) and Abecasis et al. (14) suggested a D'-value greater than as the minimum useful amount of LD in association studies. As Kruglyak (9) points out, different thresholds may be chosen, since the underlying assumptions may differ (50), but this would not change the results, because they are not dependent on the absolute proportions but on relative proportions across distances.

Theta was calculated by multiplying the physical distance in kilobases by the respective local recombination rate (see Tables 2 and 3). The resulting genetic distance was divided by 100 in order to obtain {theta}, according to the map function x {approx}{theta} for small genetic distances (51).

To estimate the proportion of marker pairs showing useful levels of LD at various distances, we applied a sliding-window selection strategy similar to that of Abecasis et al. (14) and Daly et al. (21), among others. First, we sorted all marker pairs by distance (kb) between the respective markers, from smallest to largest. The proportion of marker pairs with D'>0.3 was then calculated for overlapping groups of marker pairs. In order for the proportion to be meaningful, we had to set some minimum window size for the denominator. We chose 5 as the minimum window size and expanded the window up to a maximum of 10 as we moved down the ordered list of marker pairs. This allowed us to sample each marker pair a nearly equal number of times. The decay of LD for the SNP data was modeled in an analogous fashion (see also Table 3). Since D' may show more stochastic variation over short distances than {Delta}2 (34), for the denser SNP map we also modeled the relationship between the proportion of marker pairs with {Delta}2 greater than 0.1 and recombination rate {theta}. The rationale for a value of 0.1 being considered the minimum usable amount of LD when the parameter {Delta}2 is applied is as follows. According to Kruglyak (9), d2={Delta}2m(1-m)/f(1-), where f is the variant frequency and m is the marker allele frequency. For the present study, m and f are considered allele frequencies for a pair of markers, and m{approx}f; thus, d2{approx}{Delta}2.

In each case, Equation 1 was fitted to the observed data using the ‘Genfit’ option in XLStat 4.3 (http://www.XLstat.com/), which produced an estimate of t optimized to produce the smallest sum of the squared residuals, and a value for the determination coefficient, R2, a measure of how well the model accounts for the observations. We assumed D0=1.


    ACKNOWLEDGEMENTS
 
The authors thank the family volunteers for their time and energy. We gratefully acknowledge critical input from Nancy J. Cox.

For the University of Chicago and Johns Hopkins University, we thank Sylvia G. Simpson and Dean A. MacKinnon for contributing family material, Kemba Kelly and Gobind Singh for technical assistance, and Drs Lon Cardon, Sarah Shaw, and Robin Sherrington, who contributed some of the genotypes. This work was supported by grants from the National Institutes of Health, The Edward F. Mallinckrodt, Jr Foundation and The Chicago Brain Research Foundation. Family collection was also underwritten by The Charles A. Dana Foundation, with additional support from The Ted and Veda Stanley Foundation.

For the University of Bonn, we thank Margot Albus, Margitta Borrmann-Hassenbach, Ernst Franzek, Jürgen Fritze, Roland Kreiner, Mario Lanzcik, Dirk Lichtermann, Jürgen Minges, Ulrike Reuner and Bettina Weigelt for contributing family material. We thank Susanne Hemmer, Martina Hürter, Daniel J. Müller and Gabriele Schmidt-Wolf, for contributing genotype data. This work was supported by grants from the Deutsche Forschungsgemeinschaft.


    FOOTNOTES
 
* To whom correspondence should be addressed at: The University of Chicago, Department of Psychiatry, Jules F. Knapp Research Center, 924 East 57th Street, Room R004, Chicago, IL 60637, USA. Tel: +1 773 834 8920; Fax: +1 773 834 2970; Email: schulze{at}uchicago.edu Back


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 MATERIAL AND METHODS
 REFERENCES
 
1 Long, A.D. and Langley, C.H. (1999) The power of association studies to detect the contribution of candidate genetic loci to variation in complex traits. Genome Res., 9, 720–731.[Abstract/Free Full Text]

2 Goddard, K.A., Hopkins, P.J., Hall, J.M. and Witte, J.S. (2000) Linkage disequilibrium and allele-frequency distributions for 114 single-nucleotide polymorphisms in five populations. Am. J. Hum. Genet., 66, 216–234.[ISI][Medline]

3 Kidd, J.R., Pakstis, A.J., Zhao, H., Lu, R.B., Okonofua, F.E., Odunsi, A., Grigorenko, E., Tamir, B.B., Friedlaender, J., Schulz, L.O. et al. (2000) Haplotypes and linkage disequilibrium at the phenylalanine hydroxylase locus, PAH, in a global representation of populations. Am. J. Hum. Genet., 66, 1882–1899.[ISI][Medline]

4 Clark, A.G., Weiss, K.M., Nickerson, D.A., Taylor, S.L., Buchanan, A., Stengard, J., Salomaa, V., Vartiainen, E., Perola, M., Boerwinkle, E. and Sing, C.F. (1998) Haplotype structure and population genetic inferences from nucleotide-sequence variation in human lipoprotein lipase. Am. J. Hum. Genet., 63, 595–612.[ISI][Medline]

5 Rieder, M.J., Taylor, S.L., Clark, A.G. and Nickerson, D.A. (1999) Sequence variation in the human angiotensin converting enzyme. Nat. Genet., 22, 59–62.[ISI][Medline]

6 Moffatt, M.F., Traherne, J.A., Abecasis, G.R. and Cookson, W.O. (2000) Single nucleotide polymorphism and linkage disequilibrium within the TCR alpha/delta locus. Hum. Mol. Genet., 9, 1011–1019.[Abstract/Free Full Text]

7 Templeton, A.R., Clark, A.G., Weiss, K.M., Nickerson, D.A., Boerwinkle, E. and Sing, C.F. (2000) Recombinational and mutational hotspots within the human lipoprotein lipase gene. Am. J. Hum. Genet., 66, 69–83.[ISI][Medline]

8 Watkins, W.S., Zenger, R., O'Brien, E., Nyman, D., Eriksson, A.W., Renlund, M., and Jorde, L.B. (1994) Linkage disequilibrium patterns vary with chromosomal location: a case study from the von Willebrand factor region. Am. J. Hum. Genet., 55, 348–355.[ISI][Medline]

9 Kruglyak, L. (1999) Prospects for whole-genome linkage disequilibrium mapping of common disease genes. Nat. Genet., 22, 139–144.[ISI][Medline]

10 Jorde, L.B., Watkins, W.S., Carlson, M., Groden, J., Albertsen, H., Thliveris, A. and Leppert, M. (1994) Linkage disequilibrium predicts physical distance in the adenomatous polyposis coli region. Am. J. Hum. Genet., 54, 884–898.[ISI][Medline]

11 Collins, A., Lonjou, C. and Morton, N.E. (1999) Genetic epidemiology of single-nucleotide polymorphisms. Proc. Natl Acad. Sci. USA, 96, 15173–15177.[Abstract/Free Full Text]

12 Huttley, G.A., Smith, M.W., Carrington, M. and O'Brien, S.J. (1999) A scan for linkage disequilibrium across the human genome. Genetics, 152, 1711–1722.[Abstract/Free Full Text]

13 Taillon-Miller, P., Bauer-Sardina, I., Saccone, N.L., Putzel, J., Laitinen, T., Cao, A., Kere, J., Pilia, G., Rice, J.P. and Kwok, P.Y. (2000) Juxtaposed regions of extensive and minimal linkage disequilibrium in human Xq25 and Xq28. Nat. Genet., 25, 324–328.[ISI][Medline]

14 Abecasis, G.R., Noguchi, E., Heinzmann, A., Traherne, J.A., Bhattacharyya, S., Leaves, N.I., Anderson, G.G., Zhang, Y., Lench, N.J., Carey, A. et al. (2001) Extent and distribution of linkage disequilibrium in three genomic regions. Am. J. Hum. Genet., 68, 191–197.[ISI][Medline]

15 Reich, D.E., Cargill, M., Bolk, S., Ireland, J., Sabeti, P.C., Richter, D.J., Lavery, T., Kouyoumjian, R., Farhadian, S.F., Ward, R. and Lander, E.S. (2001) Linkage disequilibrium in the human genome. Nature, 411, 199–204.[Medline]

16 Peterson, A.C., Di Rienzo, A., Lehesjoki, A.E., de la Chapelle A., Slatkin, M. and Freimer, N.B. (1995) The distribution of linkage disequilibrium over anonymous genome regions. Hum. Mol. Genet., 4, 887–894.[Abstract/Free Full Text]

17 Laan, M. and Paabo, S. (1997) Demographic history and linkage disequilibrium in human populations. Nat. Genet., 17, 435–438.[ISI][Medline]

18 Gordon, D., Simonic, I. and Ott, J. (2000) Significant evidence for linkage disequilibrium over a 5-cM region among Afrikaners. Genomics, 66, 87–92.[ISI][Medline]

19 Service, S.K., Ophoff, R.A. and Freimer, N.B. (2001) The genome-wide distribution of background linkage disequilibrium in a population isolate. Hum. Mol. Genet., 10, 545–551.[Abstract/Free Full Text]

20 Dunning, A.M., Durocher, F., Healey, C.S., Teare, M.D., McBride, S.E., Carlomagno, F., Xu, C.F., Dawson, E., Rhodes, S., Ueda, S. et al. (2000) The extent of linkage disequilibrium in four populations with distinct demographic histories. Am. J. Hum. Genet., 67, 1544–1554.[ISI][Medline]

21 Daly, M.J., Rioux, J.D., Schaffner, S.F., Hudson, T.J. and Lander, E.S. (2001) High-resolution haplotype structure in the human genome. Nat. Genet., 29, 229–232.[ISI][Medline]

22 Patil, N., Berno, A.J., Hinds, D.A., Barrett, W.A., Doshi, J.M., Hacker, C.R., Kautzer, C.R., Lee, D.H., Marjoribanks, C., McDonough, D.P. et al. (2001) Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. Science, 294, 1719–1723.[Abstract/Free Full Text]

23 Przeworski, M. and Wall, J.D. (2001) Why is there so little intragenic linkage disequilibrium in humans? Genet. Res., 77, 143–151.[ISI][Medline]

24 Frisse, L., Hudson, R.R., Bartoszewicz, A., Wall, J.D., Donfack, J. and Di Rienzo, A. (2001) Gene conversion and different population histories may explain the contrast between polymorphism and linkage disequilibrium levels. Am. J. Hum. Genet., 69, 831–843.[ISI][Medline]

25 Stine, O.C., Xu, J., Koskela, R., McMahon, F.J., Gschwend, M., Friddle, C., Clark, C.D., McInnis, M.G., Simpson, S.G., Breschel, T.S. et al. (1995) Evidence for linkage of bipolar disorder to chromosome 18 with a parent-of-origin effect. Am. J. Hum. Genet., 57, 1384–1394.[ISI][Medline]

26 McMahon, F.J., Hopkins, P.J., Xu, J., McInnis, M.G., Shaw, S., Cardon, L., Simpson, S.G., MacKinnon, D.F., Stine, O.C., Sherrington, R. et al. (1997) Linkage of bipolar affective disorder to chromosome 18 markers in a new pedigree series. Am. J. Hum. Genet., 61, 1397–1404.[ISI][Medline]

27 Friddle, C., Koskela, R., Ranade, K., Hebert, J., Cargill, M., Clark, C.D., McInnis, M, G., Simpson, S., McMahon, F.J., Stine, O.C. et al. (2000) Full-genome scan for linkage in 50 families segregating the bipolar affective disease phenotype. Am. J. Hum. Genet., 66, 205–215.[ISI][Medline]

28 McMahon, F.J., Simpson, S.G., McInnis, M.G., Badner, J.A., MacKinnon, D.F., and DePaulo, J.R. (2001) Linkage of bipolar disorder to chromosome 18q and the validity of bipolar II disorder. Arch. Gen. Psychiatry, 58, 1025–1031.[Abstract/Free Full Text]

29 Cichon, S., Schmidt-Wolf, G., Schumacher, J., Müller, D.J., Hürter, M., Schulze, T.G., Albus, M., Borrmann-Hassenbach, M., Franzek, E., Lanczik, M. et al. (2001) A possible susceptibility locus for bipolar affective disorder in chromosomal region 10q25–q26. Mol. Psychiatry, 6, 342–349.[ISI][Medline]

30 Cichon, S., Schumacher, J., Müller, D.J., Hürter, M., Windemuth, C., Strauch, K., Hemmer, S., Schulze, T.G., Schmidt-Wolf, G., Albus, M. et al. (2001) A genome screen for genes predisposing to bipolar affective disorder detects a new susceptibility locus on 8q and suggests evidence for an involvement of imprinted loci. Hum. Mol. Genet., 10, 2933–2944.[Abstract/Free Full Text]

31 Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W.C et al. (2001) Initial sequencing and analysis of the human genome. Nature, 409, 860–921.[Medline]

32 Sobel, E. and Lange, K. (1996) Descent graphs in pedigree analysis: applications to haplotyping, location scores, and marker-sharing statistics. Am. J. Hum. Genet., 58, 1323–1337.[ISI][Medline]

33 Abecasis, G.R. and Cookson, W.O. (2000) GOLD – graphical overview of linkage disequilibrium. Bioinformatics., 16, 182–183.[Abstract/Free Full Text]

34 Pritchard, J.K. and Przeworski, M. (2001) Linkage disequilibrium in humans: models and data. Am. J. Hum. Genet., 69, 1–14.[ISI][Medline]

35 Chen, Y.-S., Akula, N., Schulze, T.G., Potluri, S. and McMahon, F.J. (2001) SNP density requirements in mapping genes for complex diseases. Am. J. Hum. Genet., 69, A1988.

36 Hogenesch, J.B., Ching, K.A., Batalov, S., Su, A.I., Walker, J.R., Zhou, Y., Kay, S.A., Schultz, P.G. and Cooke, M.P. (2001) A comparison of the Celera and Ensembl predicted gene sets reveals little overlap in novel genes. Cell, 106, 413–415.[ISI][Medline]

37 Fallin, D., Cohen, A., Essioux, L., Chumakov, I., Blumenfeld, M., Cohen, D. and Schork, N.J. (2001) Genetic analysis of case/control data using estimated haplotype frequencies: application to APOE locus variation and Alzheimer's disease. Genome Res., 11, 143–151.[Abstract/Free Full Text]

38 Tishkoff, S.A., Pakstis, A.J., Ruano, G. and Kidd, K.K. (2000) The accuracy of statistical methods for estimation of haplotype frequencies: an example from the CD4 locus. Am. J. Hum. Genet., 67, 518–522.[ISI][Medline]

39 Stephens, M., Smith, N.J. and Donnelly, P. (2001) A new statistical method for haplotype reconstruction from population data. Am. J. Hum. Genet., 68, 978–989.[ISI][Medline]

40 Simpson, S.G., Folstein, S.E., Meyers, D.A., McMahon, F.J., Brusco, D.M. and DePaulo, J.R., Jr. (1993) Bipolar II: the most common bipolar phenotype? Am. J. Psychiatry, 150, 901–903.[Abstract/Free Full Text]

41 Hauser, E.R., Boehnke, M., Guo, S.W. and Risch, N. (1996) Affected-sib-pair interval mapping and exclusion for complex genetic traits: sampling considerations. Genet. Epidemiol., 13, 117–137.[ISI][Medline]

42 Mukhopadhyay, N., Almasy, L., Schroeder, M., Mulvihill, W.P. and Weeks, D.E. (1999) Mega2, a data-handling program for facilitating genetic linkage and association analyses. Am. J. Hum. Genet., 65, A43.

43 Hedrick, P.W. (1987) Gametic disequilibrium measures: proceed with caution. Genetics, 117, 331–341.[Abstract/Free Full Text]

44 Lewontin, R.C. (1964) The interaction of selection and linkage. I. General considerations; heterotic models. Genetics, 49, 49–67.[Free Full Text]

45 Lewontin, R.C. (1988) On measures of gametic disequilibrium. Genetics, 120, 849–852.[Abstract/Free Full Text]

46 Dempster, A.P., Laird, N.M. and Rubin, D.B. (1977) Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc., Ser. B, 39, 1–22.

47 Excoffier, L. and Slatkin, M. (1995) Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. Mol. Biol. Evol., 12, 921–917.[Abstract]

48 Devlin, B. and Risch, N. (1995) A comparison of linkage disequilibrium measures for fine-scale mapping. Genomics, 29, 311–322.[ISI][Medline]

49 Hartl, D.L. and Clark, A.G. (1997) Principles of Population Genetics. Sinauer Associates, Sunderland, MA.

50 McCarthy, J.J. and Hilfiker, R. (2000) The use of single-nucleotide polymorphism maps in pharmacogenomics. Nat. Biotechnol., 18, 505–508.[ISI][Medline]

51 Ott, J. (1991) Analysis of Human Genetic Linkage. The Johns Hopkins University Press, Baltimore, MD.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Hum Mol GenetHome page
K. Tang, L. P. Wong, E. J.D. Lee, S. S. Chong, and C. G.L. Lee
Genomic evidence for recent positive selection at the human MDR1 gene locus
Hum. Mol. Genet., April 15, 2004; 13(8): 783 - 797.
[Abstract] [Full Text] [PDF]


Home page
Hum Mol GenetHome page
P. Heutink and B. A. Oostra
Gene finding in genetically isolated populations
Hum. Mol. Genet., October 1, 2002; 11(20): 2507 - 2515.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (16)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Schulze, T. G.
Right arrow Articles by McMahon, F. J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Schulze, T. G.
Right arrow Articles by McMahon, F. J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?