Skip Navigation

This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (78)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Cooper, G.
Right arrow Articles by Rubinsztein, D. C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Cooper, G.
Right arrow Articles by Rubinsztein, D. C.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Human Molecular Genetics Pages 1759-1767

Network analysis of human Y microsatellite haplotypes
Introduction
Results
   Analysis of Y microsatellites
   Larger-scale study of microsatellite haplotypes
   Models of microsatellite mutation
   Homoplasy and divergent evolution of Y chromosome haplotypes
   A network of `adjacent' haplotypes
Discussion
   Y microsatellite evolution
   Y microsatellites and population studies
   Estimating mutation rates by maximum divergence
Materials And Methods
   Computer simulations
Acknowledgements
References


Network analysis of human Y microsatellite haplotypes

Network analysis of human Y microsatellite haplotypes Gillian Cooper+,*, William Amos+, Dorota Hoffman1,2 and David C. Rubinsztein1

University of Cambridge, Department of Genetics, Downing Street, Cambridge CB2 3EH, UK, 1East Anglian Regional Genetics Service Molecular Genetics Laboratory and Department of Clinical Genetics, Box 158, Addenbrooke's NHS Trust, Hills Road, Cambridge CB2 2QQ, UK and 2Institute of Psychiatry and Neurology, Warsaw, Poland

Received June 5, 1996; Revised and Accepted August 8, 1996DDBJ/EMBL/GenBank accession no. X97312

To investigate the utility of Y chromosome microsatellites for studying human male-lineage evolution, we typed samples from three populations for five tetranucleotide repeats and an Alu insertion polymorphism. We found very high levels of haplotype diversity and evidence that most mutations involve the gain or loss of only one repeat unit, implying that any given microsatellite haplotype may have arisen independently on two or more Y-chromosome lineages. Together, these factors suggest that interpretation of small sample sizes (<30) will be problematic. By typing a large sample of individuals (n = 174) from one population, East Anglia, we were able to construct a haplotype network. The network exhibits a well-connected core structure of commoner haplotypes. Computer simulations based on this network estimate the convergence time for African and Caucasian groups may be between 1.4 and 1.8 times as long as the convergence of the East Anglian population. Based on our comparison between large and small sample sizes, we suggest that large sample sizes are necessary in order to interpret Y-microsatellite haplotypes, and that a network analysis of the type we describe may prove informative in future studies.

INTRODUCTION

Human mitochondrial DNA haplotypes have been used in the study of human evolution, both for the construction of phylogenetic trees (1 ) and for reconstructing human migration patterns (2 ,3 ). However, strict maternal transmission means that the mitochondrial DNA analysis is informative only for female lineages. Male lineages are likely to show different patterns of evolution because of the potentially greater variance in male reproductive success (4 ), hitchhiking associated with selective sweeps (5 ) and, most importantly, sexual differences in patterns of migration and population mixing (3 ,6 ).

The non-recombining region of the Y chromosome is male-specific and effectively haploid and thus could provide a view of evolution which is complementary to that provided by mitochondrial DNA. Unfortunately, the human Y chromosome has been shown to carry extremely low levels of polymorphism (6 -11 ) and analyses of single nucleotide variants have led to conflicting estimates of coalescence times, varying from 37 000-49 000 years (6 ) to 188 000 years (12 ). However, significant variability has been found at certain short tandem repeat loci, including both a minisatellite [DYF155S1 (13 )] and several tetranucleotide repeat microsatellites (14 ,15 ). Haplotypes based on several microsatellite loci could prove highly informative.

To investigate the informativeness of Y microsatellites in relation to human population structure, we have constructed haplotypes based on five tetranucleotide repeat microsatellites and a Y chromosome Alu insertion polymorphism for a total of 212 males from three populations (174 East Anglians, 23 Nigerians and 15 Sardinians). Our results demonstrate that haplotypes can and do evolve recurrently. The pattern we find is consistent with a single-step mutation model. Haplotype diversity is only slightly less within populations compared with between populations. Fitting our data onto a nearest neighbour network reveals a pattern comprising two sections which link through haplotypes with relatively short mean allele lengths, consistent with a general process of expansion. We estimate that the African-Caucasian split is less than twice as old as the common ancestor of all East Anglians.

RESULTS

Analysis of Y microsatellites

We investigated four (GATA)n microsatellites: DYS19 (16 ,17 ), DYS389 (18 ), DYS390 (19 ) and DYS391 (20 ). As has been reported elsewhere, the published primers for locus DYS389 amplify two products in each male (21 ), both of which are polymorphic. We refer to the larger product (~375 bp) as DYS389A and the smaller product (~246 bp) as DYS389B. Allele numbers at all loci were standardised with the smallest allele being numbered 1, allele 2 being one repeat unit longer, etc. Four alleles each were detected at DYS390 and DYS391, five at both DYS19 and DYS389B and seven at DYS389A. Reference allele sizes for each locus are listed in Materials and Methods.

Allele lengths at DYS389A and DYS389B appeared correlated (Fig. 1 a). For example, of 22 East Anglians, 14 individuals carried alleles where A was one less than the rank of B (i.e. 12, 23 or 34). To explore this further, PCR products from DYS389 were cloned and sequenced from an East Anglian male. The resulting sequence (EMBL X97312) (Fig. 1 b) shows that the priming site of the DYS389 forward primer (CHCL.GATA30F10.P14912.forward) has been duplicated beyond the 5' end of the previously published sequence (EMBL G09600). The larger (DYS389A) product thus includes two GATA repeats, whilst the smaller (DYS389B) product includes just one. To reflect this, all DYS389A alleles were modified by subtracting the length of the corresponding DYS389B allele. This adjustment reduced the number of alleles at DYS389A from seven to six.


Figure 1. (a) PCR amplification at locus DYS389 in nine East Anglians, showing an apparent size correlation between the larger products (DYS389A) and the smaller ones (DYS389B). A sequencing reaction was run as a size marker alongside the PCR reactions. (b) Short (257 bp) and long (371 bp) PCR products amplified from DYS389 with primers CHCL.GATA30F10.P14912 forward and reverse. Primer sites are underlined, the GATA repeat is in bold, and an AGAC repeat is italicised. (c) Schematic representation of the DYS389 locus and the PCR products amplified from it.

In a preliminary exploration, DNAs from 60 males were typed, yielding a total of 46 different haplotypes (Table 1 ) comprising 18 different haplotypes among 22 East Anglians, 21 from 23 Nigerians, and 14 from 15 Sardinians. One haplotype was common to all three population samples, with two other haplotypes common to East Anglians and Nigerians and three common to East Anglians and Sardinians. Such extreme haplotype diversity is interesting, but implies that meaningful interpretation will be difficult because a large number, possibly the majority, of haplotypes remain unsampled.


Table 1 Haplotypes detectedAlleles for microsatellite loci are listed in the order: DYS391, DYS19, DYS390, DYS389A, DYS389B. Haplotypes are listed in order of frequency in the East Anglian sample, followed by Nigerian and Sardinian samples. Number of individuals with each haplotype is shown with the corresponding frequency italicised in brackets. - Indicates presence of haplotype in an initial sample of 22 East Anglians, with the number of dots corresponding to the number of times a haplotype was found in this sample.


Larger-scale study of microsatellite haplotypes

To obtain a more complete picture of haplotype diversity, we enlarged greatly our sample of individuals from the East Anglian population, typing a further 152 individuals to give a total sample size of 174. Allele frequency distributions for each of the five microsatellite loci in each of the three populations are shown in Figure 2 . We compared the East Anglian allele frequencies with first the Sardinians and then the Nigerian sample, using a Model II Monte-Carlo approximation to a Fisher exact test. Combining probabilities over all five loci using the method of Sokal and Rolf, p. 795 (22 ), we find significant differences in both cases (East Anglians versus Sardinians; [chi]2 = 26.1, 10 d.f., p <0.01: East Anglians versus Nigerians; [chi]2 = 60.8, 10 d.f., p <<0.001). To determine the locus-population comparisons which contributed most to this highly significant heterogeneity, all component comparisons were examined after making the Dunn-Sidák correction for multiple tests (23 ). Treating each locus independently, significant differences in frequency were identified only between the East Anglian and Nigerian populations: at locus DYS19 (p <0.001), DYS390 (p = 0.002) and DYS389A (p <0.001).


Figure 2. Allele frequencies in the three population samples. All alleles are numbered in order of increasing size, such that one is the smallest allele detected at each locus, and DYS389A alleles are corrected by subtracting the size of DYS389B alleles. Key: dotted, East Anglians; dark, Nigerians; striped, Sardinians.

Locus DYS19 has been typed in a number of different populations (4 ,24 -29 ). Allowing for sample size effects, our data for all three population samples show close agreement with these studies for both allele size range and modal allele length.

Models of microsatellite mutation

Limited empirical data combined with extensive theoretical studies suggest that autosomal microsatellites conform to a stepwise mutation model, with most mutations involving the gain or loss of only a single repeat unit (30 -33 ). In our data set, all 15 different allele frequency distributions were unimodal and none of them contained any `gaps' (unoccupied allele size classes within the observed size-range). Intuitively, such a pattern appears to support the concept that mutations which cause the gain or loss of more than one repeat unit are rare.

To investigate this further, we used stochastic computer simulations to examine the influence of allelic diversity and mutation bite size on the probability that an allele frequency distribution contains a gap (for full details, see methods). For each set of parameter values tested, we used the results from 1000 simulation replicates to calculate the probability of obtaining our empirical data, both in terms of the number of alleles in each sample and the absence of gaps. From these probabilities we constructed a likelihood surface (Fig. 3 ). Best-fit to our data occurs at a point where all mutations are one-step and Ne[mu] = 1.65, while the 95% confidence limits exclude all instances where two-step changes exceed 20%. Although step-sizes larger than two were not modelled, it seems intuitively obvious that such mutations must be rare. Interestingly, published germ-line autosomal mutations reveal a two-step frequency of 9%, a value which falls well within the range we predict for our 5 Y-microsatellites [22 mutations comprising 20 one-step and 2 two-step (33 )]. Estimates of the effective size of human populations range between 2500 and 10 000 [see (32 ) and citations therein]. Substituting these values into our best-fit estimate of Ne[mu], yields mutation rate estimates of between 1.65 * 10-4 and 6.6 * 10-4 per locus.


Figure 3. Likelihood surface for the relationship between one-step and two-step mutation frequency and allelic diversity. Allelic diversity was governed by the value of N[mu], where N is effective population size and [mu] is average mutation rate per locus per generation. Boxed numbers indicate likelihood contours.

Using a stepwise mutation model (34 ,35 ), a calculated value of 1.65 for Ne[mu] corresponds to a mean heterozygosity of 0.64, from the equation heterozygosity = 1 - 1/[radic](1+4Ne[mu]). Haploid `heterozygosity' values (Table 2 ) for our data are consistent with this, supporting the inference of single-step mutation and our estimate of Ne[mu].

It should be noted that our simulation model inevitably uses assumptions which may not be an accurate reflection on evolutionary history, for example constant population size and mutation-drift equilibrium. Thus, our results cannot be thought of as definitive. Instead, we present them as points for comparison with other estimates, derived from models based on different data and with their own assumptions.


Table 2 Exclusion probabilities (theoretical `heterozygosities') for each locus, calculated as 1 - P(Pi)2 where Pi is allele frequency.


Homoplasy and divergent evolution of Y chromosome haplotypes

In any system which approximates to the stepwise mutation model, mutations which convert between pre-existing allelic states will be common. When considering haplotypes based on multiple loci, the extent of this homoplasy is reduced, but there still exists the possibility that any given haplotype has evolved more than once. In order to assess the prevalence of recurrent haplotype evolution affecting Y-microsatellite haplotypes, we further typed all samples for a diallelic Alu insertion polymorphism at DYS287 (YAP). This insertion is believed to have occurred once, probably in a sub-Saharan African before the divergence of the major African groups (36 ). The frequency of the Alu insertion among our samples was consistent with published data, our data first: 3.4% amongst East Anglians versus 0% for the U.K. [after careful selection for parental origin (37 )], 6.7% versus 7.3% (28 ) among Sardinians and 73.9% versus 100% [sample size 8 (36 )] among Nigerians.

All 97 different haplotypes detected are shown in Table 1 . Addition of the YAP data revealed three further haplotypes, giving a grand total of 100. Given that the Alu insertion was a unique event (36 ), the three cases where a particular microsatellite haplotype is found both with and without an Alu insertion is a clear demonstration of recurrent haplotype evolution. Based on complete haplotypes (microsatellite + YAP), we found few instances of shared haplotypes between population samples. One haplotype was common to all three populations, one was shared between East Anglians and Nigerians and four were shared between East Anglians and Sardinians.

Initially, we attempted to place the observed haplotypes on a phylogenetic tree by means of maximum parsimony [PAUP 3.0 (38 )]. However, this approach proved uninformative. A large number of equally parsimonious trees were detected in each analysis, the consensus of which was comb-like, effectively lacking internal structure (data not shown). There was no apparent tendency for individuals from the same population to cluster with each other.

A network of `adjacent' haplotypes

Given the failure of parsimony methods in the construction of a meaningful tree, we examined a different approach, based on the concept of parsimonious networks. A similar approach has been used to display mitochondrial sequence data (39 ). A network of adjacent (i.e. differing by only one repeat unit at one single locus) microsatellite haplotypes was constructed, with lines joining all possible `adjacent' relationships. For ease of construction, haplotypes with the same `total length' (allele numbers summed over all five microsatellite loci) were placed on the same horizontal level. The resulting single-step network is given in Figure 4 . Of the 97 microsatellite haplotypes, 77 (79%) could be placed on the network and, of those which could not be placed, all but one could be added if in each case a single two-step change were allowed.


Figure 4. Network of `adjacent' (one repeat difference) haplotypes. The network was constructed by beginning with the commonest haplotype and sequentially adding adjacent haplotypes. Twenty haplotypes could not be placed in this way. All possible `adjacent' relationships are indicated by connecting lines. Arrangement of haplotypes within the network followed two simple rules: all haplotypes with the same combined allele length are placed on the same horizontal level and line crossovers were minimised, but was otherwise subjective. Haplotypes sampled more than once are represented by boxes, where box area is proportional to frequency. Population origin is indicated by font colour and superscript: black, East Anglian only; green, Nigerian only; blue, Sardinian only. Superscripts indicate where haplotypes are detected in multiple populations: aEast Anglians and Sardinians; bEast Anglians and Nigerians; call three population samples. Red underlining indicates that a microsatellite haplotype is only observed in association with the Alu insertion. A red asterisk denotes that both Alu+ and Alu- states are seen with the microsatellite haplotype. Links between Alu+ and Alu- are included for completeness but are dashed to show they are unlikely to represent mutational events.

As drawn, the network appears, by visual inspection, to consist of two sections. The right-hand section is larger and comprises mainly East Anglian haplotypes, including all those sampled more than three times. Haplotypes sampled more than once tend to lie at the centre of the network, a tendency which appears to be an emergent property of the network caused by a strong correlation between haplotype frequency and number of adjacent neighbours (r2 = 0.45, n = 73, p <<0.001). Since individual haplotypes were placed within the network so as to minimise line cross-overs, those haplotypes with many adjacent neighbours will tend to lie centrally. The left-hand section of the network is smaller and contains a greater proportion of Sardinian and Nigerian haplotypes. However, the East Anglian haplotypes within this section are still convincingly connected, suggesting that they are unlikely to result entirely from recent immigrant lineages. Interestingly, as drawn, the two sections of the network are connected only in the lower half, the region associated with haplotypes containing mainly short alleles.

Although it is desirable to obtain some measure of the depth of the haplotype phylogeny, parsimony analysis has been shown to be unhelpful and our network approach does not yield a formal root. Mean pairwise distances provide a poor measure of total divergence because they are confounded by historical population size fluctuations. Consequently, we chose to base our analysis on the maximum observed pair-wise distance (12 ,40 ). This approach has the advantage that it is relatively insensitive to historical population size fluctuations. It is also surprisingly effective with small sample size: Tajima (40 ) states that a sample of 20 is little worse than a sample of 200 for this analysis.

In order to interpret our data, we used computer simulations to determine the relationship between maximum inter-haplotype distance (defined as repeat-unit copy number differences summed over all five loci) and mean mutation number, [mu]G, where [mu] is the mutation rate per generation and G is the number of generations since the common ancestor (Fig. 5 ). Since the simulations are based on a stepwise mutation model operating alongside drift, the calibration curve we obtain allows automatically for realistic levels of homoplasy.


Figure 5. Relationship between maximum inter-haplotype distance and log mutation number, as determined by computer simulation of mutation and drift in a finite population. Mutation number is described by [mu]G, where [mu] is the mutation rate per generation and G is the number of generations.

The maximum distance observed between East Anglian haplotypes was 10 units, corresponding to a [mu]G value of 7.75. However, this value occurred only once and therefore could be due to an immigrant haplotype. More conservative would be to use the next highest value, 9 units, corresponding to a [mu]G of 6.2. This value occurred in comparisons involving 16 different haplotypes, and at least some of these pairs will both be East Anglian. The maximum distance value in the entire data set was 12, equivalent to a [mu]G of 11.4, and involved one East Anglian and one African haplotype. Assuming constant mutation rate, these data imply that the African-Caucasian split is 1.4-1.8 times as ancient as founding events which gave rise to individual European populations.

DISCUSSION

The potential of Y chromosome variation to investigate human population parameters has not yet been exploited due to the difficulties of finding point mutation polymorphisms and the lack of tools for analysing more complex polymorphisms. We have approached the use of Y chromosome microsatellites by considering haplotypes from five loci. That one of these loci has arisen from a partial duplication of another, illustrates the importance of fully characterising such duplications and not simply treating each product as an independent locus. The initial analysis of haplotypes was difficult to interpret because of the small sample sizes involved, but the analysis of a larger sample from one population and the construction of a network proved more informative.

Y microsatellite evolution

The accepted model of autosomal microsatellite mutation is that of slippage, resulting in length changes of mostly one, but occasionally more, repeat units (31 ). Several pieces of evidence from our data suggest that the majority of mutations affecting Y microsatellites also involve a single repeat unit. First, none of the locus-population allele frequency distributions have any internal, unsampled allele length classes. Second, despite vast haplotype diversity and sample sizes which clearly leave many haplotypes unsampled, a high proportion (78%) of all haplotypes could be placed on the one-step network. All remaining haplotypes could be added by invoking a small number of intermediate states which have either gone extinct, not been sampled or were originally bridged by mutational events of more than one step. Third, the rarer classes of haplotype (Alu+, Nigerians, Sardinians) tend to occur conspicuously in adjacent pairs.

Homoplasy, whereby identical alleles arise independently in different lineages, is an expected consequence of the stepwise mutation model. Recurrent evolution of Y haplotypes is demonstrated by the detection of three different microsatellite haplotypes which are found both with and without an Alu insertion at the YAP locus. Also, inspection of the network reveals that several of the central haplotypes are surrounded by adjacent states to such an extent that virtually any one- or two-step mutation will generate a pre-existing haplotype, making recurrent haplotype evolution almost inevitable. Consequently, attempts to construct phylogenetic trees using parsimony analysis are unlikely to be successful, as we ourselves found.

Several recent reports have suggested that human microsatellites show a general tendency towards expansion (41 -43 ). It is therefore interesting to note that the two main clades of the network are joined only at the bottom where the total haplotype length is relatively short, suggesting that the ancestral haplotype was composed of relatively short alleles. A second indication comes from the peripheral haplotypes which were not placed on the network. Assuming the common haplotypes which form the core of the network represent older states (see below), most peripheral haplotypes are likely to be recent in origin. Considering only Alu- East Anglians, we find that unplaced, presumed recent haplotypes (mean total length 18.9, range 15-22, n = 9) are significantly longer than those in the network (mean 16.3, range 12-20, n = 57) (t = 4.48, 63 d.f.; p <<0.001), supporting the concept that microsatellites tend to increase in length with time.

Y microsatellites and population studies

Our results suggest that interpretation of small sample sizes (say <30) will be difficult. In our preliminary screen, few haplotypes were sampled more than once. Indeed, the commonest East Anglian haplotype (11%) was absent from this initial sample (see Table 1 ). Only when the sample size was increased considerably did a consistent pattern emerge, allowing us to construct an informative, well-connected network. From such a network it was possible to identify with confidence the main East Anglian lineage. At the same time, the problems associated with smaller sample sizes are emphasised. For example, both Sardinian and Nigerian haplotypes are distributed widely. It is possible that these populations also have relatively frequent haplotypes which we did not find in the small samples available to us. Even among the East Anglian haplotypes, the occurrence of isolated and adjacent pairs of Alu+ haplotypes indicates the presence of immigrant lineages, both relatively ancient and modern. The presence of such lineages would be difficult to interpret without the main network as a reference point. It is self-evident that, with fewer than six alleles at each locus, the more loci which can be typed, the more informative will be the network.

A larger sample size also provides better information about individual haplotype frequencies, which in turn helps us to interpret the underlying structure of the network. Since recently arisen haplotypes will not have had the opportunity to drift to high frequency, common haplotypes are likely to be relatively old (44 ,45 ). It is reassuring that these commoner haplotypes lie at the centre of the larger branch of the network, and often connect directly to other relatively common haplotypes. In future studies, it may be useful to compare skeleton networks based only on common haplotypes, thereby largely avoiding potential problems associated with immigrant haplotypes and the vast recent increase in human mobility. Such skeleton networks may well exhibit population-specific features which would not be apparent in smaller samples.

Estimating mutation rates by maximum divergence

An area of considerable interest in the study of human population history is the timing of population splits. Y microsatellite haplotypes may prove useful tools for addressing this issue. Our maximum divergence approach yields an estimate of 11.4 for the mean number of single-step mutations which have occurred since the last common African-Caucasian ancestor, and that this event is only around 1.5 times as old as the founders of the East Anglian population. The presence of mutations involving two or more repeats will reduce slightly the depth of the whole tree but should not affect the relative values.

Without knowing the mutation rate of Y chromosome microsatellites we cannot convert mutation number into time since divergence. However, it is nonetheless interesting to see how published estimates for G (number of generations) and [mu] relate to our value of [mu]G. Estimates for global Y chromosome coalescence time vary greatly, from 43 000 [mid-value (6 )] to 188 000 (12 ) years ago. Assuming generation length = 20 years and [mu]G = 11.4, we obtain per locus mutation rates of 1.03 * 10-3 and 2.36 * 10-4, respectively. The higher of these two values is similar to the value of 2.1 * 10-3 estimated by Weber and Wong from 12 GATA repeats on chromosome 19 (33 ), suggesting support for Whitfield. However, a study of 700 human autosomal tetranucleotide repeat loci (46 ) cited by (27 ) calculates a mean mutation rate of 1.5 * 10-4, suggesting support for the older split proposed by Hammer. This lower mutation rate also agrees better with the range of mutation rates we estimate from our Ne[mu] calculations, namely 1.2 * 10-4 to 4.8 * 10-4.

In conclusion, the construction of networks of Y chromosome haplotypes can provide a tool for the analysis of the evolution of the human male lineage. These networks acquire a meaningful structure when large sample sizes from individual populations are analysed. The great diversity of Y chromosome haplotypes we have found indicates either that this chromosome has not been subject to recent selective sweeps, or that the tetranucleotide repeats it carries are highly mutable. Although our data appear to support an ancient rather than recent Y-chromosome coalescence, there is a clear need for empirical evidence about Y microsatellite mutation rates.

MATERIALS AND METHODS

Although we had no specific information regarding ethnicity, East Anglian samples were collected with avoidance of non-English sounding surnames. GATA repeat microsatellites were PCR-amplified using primers described on the Genome Database. Reactions were performed in a total volume of 11 [mu]l containing 50-250 ng DNA, 0.35 [mu]l each primer, 1 * PARR buffer (Cambio), 200 [mu]M each dATP, dGTP and dTTP and 40 [mu]M dCTP and 0.12 [mu]M [[alpha]32P]dCTP (3000 Ci/mmol) and 0.5 U Taq polymerase. Primer specificity was improved by the addition of formamide to 1.8% for locus DYS19, and tetramethylammonium chloride to 27 mM for DYS390. The cycling conditions were 94oC for 2.5 min, followed by 32 cycles of 1 min at 94oC, 1 min at 54-60oC, and 1 min at 72oC. PCR products were resolved on a denaturing 6% polyacrylamide sequencing gel and visualised by autoradiography. The sizes of allele 3 at each locus, as determined from an adjacent M13 sequencing ladder (from -40 Forward primer) were as follows: DYS391 289 bp, DYS19 190 bp; DYS390 212 bp; DYS389A 363 bp (for DYS389B allele 3); DYS389B 253 bp. To sequence the DYS389 alleles, amplification products were ligated into pGEM-T vector (Promega). Recombinant clones were sequenced using ABI PRISM (Perkin Elmer) Dye terminator cycle sequencing. The YAP polymorphism was typed by standard PCR using primers YAP.1 and YAP.2 which amplify products of either 455 bp (Alu+) or 150 bp (Alu-), as described by Hammer and Horai (26 ).

Computer simulations

To test the extent to which our data are compatible with multiple repeat mutations, we used simple stochastic computer simulations. Each simulation began with a population of N alleles and was allowed to evolve at constant size for 4N generations, sufficient such that on average the final population was descended from only one of the founder alleles, thereby reducing possible biases introduced by the starting conditions. At each generation, every allele was allowed to mutate with probability [mu]. Two-step mutations occur with probability p2, giving mutation probabilities for +2, +1, -1 and -2 repeats as 0.5 [mu]p2, 0.5 [mu](1 - p2), 0.5 [mu](1 - p2) and 0.5 [mu]p2 respectively. At each generation a new population was founded from the previous one by random sampling with replacement. After 4N generations, two samples were drawn, one of 174 alleles and one of 20, representing the East Anglian sample and the smaller African/Sardinian samples, respectively. Each sample was classified by the number of alleles counted and whether it contained one or more gaps (i.e. unoccupied allele size classes within the observed size range).

To determine the set of conditions which generate allele frequency distributions most similar to our empirical data, we examined two parameters. First, allelic diversity was varied by keeping N fixed at 400, and varying [mu] to give N[mu] products ranging from 0.3-3. Although these values are unrealistic, this should not matter since it is the product N[mu] which determines allelic diversity. Test simulations confirmed that results for a given value of N[mu] did not change perceptibly if population size was increased to more realistic sizes (data not shown). Second, values for p2 were varied between 0 and 0.5. For each combination of parameter values, 1000 replicate simulations were performed, yielding estimates for the probability of observing any given number of alleles in a sample, with or without a gap in the distribution, in either a large or a small sample. From these component probabilities it is easy to calculate a single probability for obtaining our empirical data, and then to use these to construct a log likelihood surface. The point of maximum likelihood (= 0) then defines the point of best fit to our data and 95% confidence limits are taken to embrace all values greater than -2.

The relationship between maximum pairwise distance and mutation number was also determined by stochastic computer simulation of a finite population undergoing mutation and drift. In each run a pair of identical five-locus haplotypes were allowed to evolve, at each step either mutating or not with fixed probability. All mutations were single step changes. After 1000 `generations' the genetic distance between the two haplotypes was determined as the sum of the repeat unit copy number differences over all five loci. For each mutation rate, this process was repeated 100 times and the maximum value recorded. Confidence limits were set by repeating each 100 trial series 20 times and recording the largest, smallest and mean maximal values.

ACKNOWLEDGEMENTS

We thank J. Old and S. Orru for DNA samples. We are grateful to J. A. Barrett and N. Goldman for advice and two anonymous referees for comments on the manuscript. This work was funded by a grant from the Leverhulme Trust. W.A. was supported by The Royal Society. The automated DNA sequencing facility was funded by the Wellcome Trust.

REFERENCES

1 Cann, R.L., Stoneking, M. and Wilson, C.A. (1986) Nature, 325, 31-36.

2 Hagelberg, E. and Clegg, J.B. (1993) Proc. R. Soc. Lond., Ser. B, 252, 163-170.

3 Torroni, A., Chen, Y., Semino, O., et al. (1994) Am. J. Hum. Genet., 54, 303-318. MEDLINE Abstract

4 Pena, S.D.J., Santos, F.R., Bianchi, N.O., et al. (1995) Nature Genet., 11, 15-16.

5 Rice, W.R. (1987) Genetics, 116, 161-167. MEDLINE Abstract

6 Whitfield, L.S., Sulston, J.E. and Goodfellow, P.N. (1995) Nature, 378, 379-380. MEDLINE Abstract

7 Jakubiczka, S., Arnemann, J., Cooke, H.J., Krawczak, M. and Schmidtke, J. (1989) Hum. Genet., 84, 86-88. MEDLINE Abstract

8 Malaspina, P., Persichetti, F., Novelletto, A., et al. (1990) Ann. Hum. Genet., 54, 297-305. MEDLINE Abstract

9 Spurdle, A. and Jenkins, T. (1992) Hum. Mol. Genet., 1, 169-170. MEDLINE Abstract

10 Ellis, N., Taylor, A., Bengtsson, B.O., Kidd, J., Rogers, J. and Goodfellow, P. (1990) Nature, 344, 663-665. MEDLINE Abstract

11 Dorit, R.L., Akashi, H. and Gilbert, W. (1995) Science, 268, 1183-1185. MEDLINE Abstract

12 Hammer, M.F. (1995) Nature, 378, 376-378. MEDLINE Abstract

13 Jobling, M.A., Fretwell, N., Dover, G.A. and Jeffreys, A.J. (1994) Cytogenet. Cell Genet., 67, 390.

14 Roewer, L., Arnemann, J., Spurr, N.K., Grzeschik, K.-H. and Epplen, J.T. (1992) Hum. Genet., 89, 389-394. MEDLINE Abstract

15 Mathias, N., Bayés, M. and Tyler-Smith, C. (1994) Hum. Mol. Genet., 3, 115-123. MEDLINE Abstract

16 Arnemann, J., Jakubiczka, S., Schmidtke, J., Schafer, R. and Epplen, J.T. (1986) Hum. Genet., 73, 301-303. MEDLINE Abstract

17 Santos, F.R., Pena, S.D.J. and Epplen, J.T. (1993) Hum. Genet., 90, 655-656. MEDLINE Abstract

18 Murray, J.C. (1993) Genome DataBase:365241

19 Murray, J.C. (1993) Genome DataBase:365248

20 Murray, J.C. (1993) Genome DataBase:365251

21 Jobling, M.A. and Tyler-Smith, C. (1995) Trends Genet., 11, 449-456. MEDLINE Abstract

22 Sokal, R.R. and Rohlf, F.J. (1995) Biometry: the principles and practice of statistics in biological research. W. H. Freeman and Company, New York.

23 Ury, H.K. (1976) Technometrics, 18, 89-97.

24 Gomolka, M., Hundrieser, J., Nürnberg, P., Roewer, L., Epplen, J.T. and Epplen, C. (1994) Hum. Genet., 93, 592-596. MEDLINE Abstract

25 Falcone, E., Spadafora, P., De Luca, M., Ruffolo, R., Brancati, C. and De Benedicts, G. (1995) Hum. Biol., 67, 689-701. MEDLINE Abstract

26 Hammer, M.F. and Horai, S. (1995) Am. J. Hum. Genet., 56, 951-962. MEDLINE Abstract

27 Underhill, P.A., Jin, L., Zemans, R., Oefner, P.J. and Cavalli-Sforza, L.L. (1996) Proc. Natl Acad. Sci. USA, 93, 196-200. MEDLINE Abstract

28 Ciminelli, B.M., Pompei, F., Malaspina, P., et al. (1995) J. Mol. Evol., 41, 966-973. MEDLINE Abstract

29 Santos, F.R., Gerelsaikhan, T., Munkhtuja, B., Oyunsuren, T., Epplen, J.T. and Pena, S.D.J. (1996) Hum. Genet., 97, 309-313. MEDLINE Abstract

30 Di Rienzo, A., Peterson, A.C., Garza, J.C., Valdes, A.M., Slatkin, M. and Freimer, N.B. (1994) Proc. Natl Acad. Sci. USA, 91, 3166-3170. MEDLINE Abstract

31 Valdes, A.M., Slatkin, M. and Freimer, N.B. (1993) Genetics, 133, 737-749. MEDLINE Abstract

32 Shriver, M.D., Jin, L., Chakraborty, R. and Boerwinkle, E. (1993) Genetics, 134, 983-993. MEDLINE Abstract

33 Weber, J.L. and Wong, C. (1993) Hum. Mol. Genet., 2, 1123-1128. MEDLINE Abstract

34 Ohta, T. and Kimura, M. (1973) Genet. Res., 22, 201-204.

35 Nei, M. (1987) Molecular Evolutionary Genetics. Columbia University Press, New York.

36 Hammer, M.F. (1994) Mol. Biol. Evol., 11, 749-761. MEDLINE Abstract

37 Persichetti, F., Blasi, P., Hammer, M., et al. (1992) Ann. Hum. Genet., 56, 303-310. MEDLINE Abstract

38 Swofford, D.L. (1989) PAUP: Phylogenetic Analysis Using Parsimony. Illinois Natural History Survey, Chicago.

39 Bandelt, H.J., Forster, P., Sykes, B.C. and Richards, M.B. (1995) Genetics, 141, 743-753. MEDLINE Abstract

40 Tajima, F. (1983) Genetics, 105, 437-460. MEDLINE Abstract

41 Rubinsztein, D.C., Amos, B., Leggo, J., et al. (1994) Nature Genet., 7, 525-530. MEDLINE Abstract

42 Rubinsztein, D.C., Amos, B., Leggo, J., et al. (1995) Nature Genet., 10, 337-343. MEDLINE Abstract

43 Djian, P., Hancock, J.M. and Chana, H.S. (1996) Proc. Natl Acad. Sci. USA, 93, 417-421. MEDLINE Abstract

44 Chakraborty, R. (1977) J. Mol. Evol., 9, 313-322. MEDLINE Abstract

45 Watterson, G.A. and Guess, H.A. (1977) Theoretical Population Biology, 11, 141-160. MEDLINE Abstract

46 Jin, L., Zhong, Y., Shriver, M.D., Deka, R. and Chakraborty, R. (1994) Am. J. Hum. Genet., 55 Suppl., Abstract 200.


*To whom correspondence should be addressed+Present address: University of Cambridge, Department of Zoology, Downing Street, Cambridge CB2 3EJ, UK


This page is maintained by OUP admin. Last updated Thu Oct 31 15:28:51 GMT 1996. Part of the OUP Journals World Wide Web service.Copyright Oxford University Press, 1996


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Proc. Natl. Acad. Sci. USAHome page
B. M. Henn, C. Gignoux, A. A. Lin, P. J. Oefner, P. Shen, R. Scozzari, F. Cruciani, S. A. Tishkoff, J. L. Mountain, and P. A. Underhill
Y-chromosomal evidence of a pastoralist migration through Tanzania to southern Africa
PNAS, August 5, 2008; 105(31): 10693 - 10698.
[Abstract] [Full Text] [PDF]


Home page
Hum Mol GenetHome page
U. Holtkemper, B. Rolf, C. Hohoff, P. Forster, and B. Brinkmann
Mutation rates at two human Y-chromosomal microsatellite loci using small pool PCR techniques
Hum. Mol. Genet., March 1, 2001; 10(6): 629 - 633.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
M. Nishizawa and K. Nishizawa
Amino acid and nucleotide recurrence in aligned sequences: synonymous substitution patterns in association with global and local base compositions
Nucleic Acids Res., October 1, 2000; 28(19): 3801 - 3810.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Bot.Home page
C. M. Clark, T. R. Wentworth, and D. M. O'Malley
Genetic discontinuity revealed by chloroplast microsatellites in eastern North American Abies (Pinaceae)
Am. J. Botany, June 1, 2000; 87(6): 774 - 782.
[Abstract] [Full Text]


Home page
Proc. Natl. Acad. Sci. USAHome page
G. Cooper, N. J. Burroughs, D. A. Rand, D. C. Rubinsztein, and W. Amos
Markov Chain Monte Carlo analysis of human Y-chromosome microsatellites provides evidence of biased mutation
PNAS, October 12, 1999; 96(21): 11916 - 11921.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
I. J. Wilson and D. J. Balding
Genealogical Inference From Microsatellite Data
Genetics, September 1, 1998; 150(1): 499 - 510.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (78)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Cooper, G.
Right arrow Articles by Rubinsztein, D. C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Cooper, G.
Right arrow Articles by Rubinsztein, D. C.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?