The Y-specific locus MSY1 is the only known haploid minisatellite, and displays an extremely high degree of structural diversity which can be assayed by minisatellite variant repeat PCR (MVR-PCR). One group of alleles, in an African-specific class of Y chromosomes (haplogroup 8), behaves unusually in the conventional MVR-PCR assay, and sequencing demonstrates that this is because repeat units in these alleles contain an additional base substitution. We have designed a new MVR-PCR system to detect these novel variants, and show firstly that they are confined to the haplogroup 8 chromosomes, and secondly that the base substitution has spread through these arrays without the elimination of existing repeat variants. The sharing of a particular base substitution between otherwise distinct repeat types in these alleles represents evidence of a remarkable mutation process in their evolutionary history, in which the variant base must have been spread by a biased repair mechanism operating in very small patches within heteroduplexes.
Minisatellites are loci composed of tandem arrays of 10-50 bp repeat units. They have been widely used and investigated because of their high degree of length polymorphism, which results from a high mutation rate to new length alleles. As well as polymorphism in repeat number, there is also polymorphism in the sequence of repeat units. This can be accessed by minisatellite variant repeat PCR (MVR-PCR), in which PCR products are generated between a fixed flanking primer and discriminator primers designed to anneal specifically to one kind of repeat variant, thus mapping the positions of variant repeats along arrays, and allowing the fine structures of alleles to be analysed (1).
As well as in diversity studies, MVR-PCR has been useful in understanding the dynamics of mutation at minisatellites. Comparisons of progenitor and mutant allele structures have shown the importance of inter-allelic, gene conversion-like processes at autosomal loci (2).
We previously have isolated and characterized the only known constitutively haploid minisatellite, the Y-specific locus MSY1 (DYF155S1; ref. 3). This minisatellite is composed of 48-114 copies of a 25 bp repeat unit which is AT rich and predicted to form stable hairpin structures. Five sequence variant repeat types, differing by base substitutions, were identified originally, and four of these (types 1-4) mapped in an MVR-PCR system allowing complete codes for all alleles to be obtained.
We are interested in using MSY1 as a marker for Y chromosome diversity, in combination with other, more slowly mutating polymorphisms. However, it also provides a unique opportunity to study mutation at a haploid minisatellite, exempt from the activities of inter-allelic exchange processes which have been shown to be so important at diploid loci. Here we identify further novel repeat types at MSY1 which provide evidence for a remarkable mutational process in the history of this locus with a strand bias capable of homogenizing particular base substitutions within minisatellite arrays, without eliminating other repeat variants.
In some alleles, three-state MVR-PCR analysis (coding type 1, 3 and 4 repeats) gave reproducible bands of very much lower intensity than usual for some repeats, and for others coded strongly in one direction (reverse mapping) and not at all in the other direction (forward mapping); we interpreted this unusual behaviour in MVR analysis as evidence for the existence of further novel repeat types (3). The last (3'-most) one or two repeats in this class of alleles do not show this reduced intensity (Fig. 1a).
To investigate the molecular basis of this intensity difference, we analysed one of these alleles, in the Nigerian male m118, by direct sequencing. Sequence data were obtained for 12 repeats from each end of this 57 repeat allele. At the 3' end, there is a single type 4 repeat, which corresponds to an intense band in the MVR-PCR coding. All other sequenced repeats share a base substitution, a T -> C transition at position 6 (Fig. 1b); in sequencing autoradiographs, this transition could be seen extending into the allele well beyond the fully readable sequence. In all other respects, these repeats are equivalent to the previously described types 1, 3 and 4, with the exception of the first (5'-most) repeat, which contains an additional base substitution (T -> A transversion) at position 21 (Fig. 1b). To reflect their similarity to the original repeat types, the new types are designated 1a, 3a and 4a.
In order to identify and map these novel repeat types more efficiently and unambiguously, a new MVR-PCR system was designed to detect them. This system discriminates well between repeats which contain the substitution and those which do not (Fig. 1c), including the detection, as a null, of the 5'-most variant in m118, and allows unambiguous MVR codes to be constructed (Fig. 1d).
We used the new MVR-PCR system to determine the codes of a larger sample of alleles carrying the novel repeat types. The males m66 and m118 belong to a Y-chromosomal haplogroup which we refer to as haplogroup 8 (previously `Af'; ref. 4). This is a subgroup of the set of chromosomes which bear the YAP Alu element insertion (DYS287; ref. 5) and the PN2 base substitutional polymorphism (6), and is defined by the presence of the sY81 (DYS271; ref. 7) and PN1 (6) base substitutions. The phylogenetic relationships of these different groups of chromosomes are illustrated in Figure 2. It is because of the derived nature of the haplogroup 8 chromosomes that the MSY1 repeat transition is defined as T -> C, rather than vice versa. This conclusion is also supported by the fact that the monomer repeat unit in the MSY1 homologue, DYF155S2, which may represent an unamplified progenitor of MSY1, has a T rather than a C at position 6 (ref. 3).
An increasing collection of neurodegenerative disorders and fragile sites has been shown to be due to dynamic mutations of trinucleotide repeat loci, and minisatellites are now joining the list of tandemly repeated sequences which can be associated with disease phenotypes. Certain alleles of the H-ras minisatellite can influence transcription of the H-ras genes and lead to increased susceptibility to some cancers (8); a similar phenomenon appears to affect the insulin gene, and thus predisposition to insulin-dependent diabetes (9), and an expanded minisatellite upstream of the cystatin B gene is associated with progressive myoclonus epilepsy type 1 (10). The FRA16B fragile site has been shown to be due to the massive expansion of a minisatellite, which, like MSY1, is AT rich and has strongly predicted secondary structure (11). As more tandemly repeated sequences become implicated in disorders, it is increasingly important to understand the dynamics of the mutation processes which underlie the variability of these kinds of loci.
As well as this, an understanding of the mutational properties of MSY1 is relevant to the specific issue of human Y chromosome diversity and the use of the Y as a tool to investigate population histories (4). MSY1 is the most variable single marker on the chromosome, and it is the only one where the characteristics of large numbers of mutations can be studied in detail; it thus has potential as a dating tool for haplogroups of Y chromosomes defined by slowly mutating polymorphisms such as base substitutions.
Here we have presented evidence of a novel mutation homogenization process. In the haplogroup 8 chromosomes, a T -> C transition mutation is homogenized throughout MSY1 arrays, with the exception of one or two repeats at the 3' ends of alleles, but without eliminating the pre-existing type 1, 3 and 4 repeats. Chromosomes ancestral to this haplogroup completely lack this transition mutation. We can speculate on the chain of events which led to this situation. It seems reasonable to assume that the initial mutation event was the genesis, by point mutation, of a single repeat with the T -> C transition, although we cannot say where within the array this occurred; nor can we say anything about the polarity of spread of the new variant. Subsequent events must have had two remarkable cardinal features: firstly, as the transition spread throughout the allele, presumably through the haploid mutation processes of slippage or unequal sister chromatid exchange (USCE), mispaired intermediates within heteroduplexes were repaired in a biased manner; the direction of change at position 6 was consistently T -> C, rather than C -> T. Secondly, the repair was somehow restricted to position 6: positions 3 and 13, which define repeat types 1, 3 and 4, were not affected. Wholesale repair of mispaired repeat units, either within a helix or between helices, would homogenize a single repeat type (e.g. type 1a) throughout arrays, rather than a single base within each repeat. The different positions of variation within repeat units thus behave as if they are evolving independently. Figure 4 illustrates biased mismatch repair between helices (USCE), and within a helix (slippage), where the formation of the hairpin secondary structures which are predicted on the basis of sequence may be important. If these hairpins do form, either in this homogenization process or in the normal mutation process of MSY1 alleles, then the T -> C transition is expected to destabilize the hairpin and this may have an effect on mutation rate. Rates in haplogroup 8 alleles could, in principle, be compared with those in alleles lacking the transition by mutation studies in sperm DNA.
Figure
What can be said about the rate of this process? Other than those with either one or two type 4 repeats, we do not observe any alleles in which the transition is only partly homogenized. These intermediates must have existed at some time, and we may find them if we survey more chromosomes, or they may now be extinct. Coalescent analysis indicates an age of 19 000 years (~760 generations) for the PN1 mutation (previously 30 000 years; ref. 6) which, together with sY81, defines haplogroup 8. Length distributions of MSY1 alleles (3) suggest that the predominant mutation process is gain or loss of a single repeat unit. If the homogenization, too, were a single-repeat process, and always biased, then the homogenization of a 60 repeat allele would require 60 mutation events. Thus, 760 generations implies a rate for this process of at least 8% per meiosis; however, 95% confidence limits on the age of PN1 are wide (3500-50 000 years), giving a range of rates of 3-43%. Homogenization may certainly be very rapid if mispairing involving misalignment by more than one repeat unit occurs frequently. The finding of intermediate alleles would allow direct mutation analysis to be done and a rate for homogenization to be determined directly.
Patch gene conversion processes within tandem repeats are known to occur on a larger scale, for example in the Bombyx mori late chorion locus (12), but our observations here are unprecedented. There may be other instances of this kind of homogenization, a striking example of molecular drive (13), in other classes of MSY1 alleles, and our continuing diversity survey may detect them.
Taq cycle sequencing from flanking primer sites and three-state MVR-PCR using the original system were carried out as described (3). Discriminator primer sequences for the novel MVR-PCR system are the same as those used previously (3), but contain the additional substitution: Y1TAG1a, 5'-(Tag)-TGTGTATAATATACATCATGTATGTTG-3'; Y1TAG3a, 5'-(Tag)-TGTGTATAATATACATGATGTATGTTG-3'; Y1TAG3aR, 5'-(Tag)-CATCATGT-ATATTATACACAACATACATC-3'; and Y1TAG4aR, 5'-(Tag)-C-ATCATGTATATTATACATAACATACATC-3'. (Tag) is the 20mer 5' extension used by (1): 5'-Tcatgcgtccatggtccgga-3'. First phase annealing temperature is raised from 64 to 66°C for the new primers.
Some DNA samples have been described previously (14,15); others were gifts from Alec Jeffreys, Arpita Pandya and Chris Tyler-Smith.
Typing of Y-specific polymorphisms by PCR was as described: YAP (16), sY81 (7), PN1 and PN2 (6).
We thank many people who gave us DNA samples, Robert Griffiths for coalescent analysis of the PN1 mutation, and John Armour, Gabby Dover, Matt Hurles, Alec Jeffreys and Chris Tyler-Smith for comments on the manuscript. This work was supported by the EC as part of the Network Project: `The Biological History of European Populations' (EC Contract 92-0032). P.G.T. and M.A.J. are supported by the Wellcome Trust; M.A.J. is a Wellcome Career Development Fellow (grant no. 044910).
Human Molecular Genetics
Pages
Introduction
Results
Identification of new variant repeat types
MVR-PCR system specific to new repeat types
Diversity of alleles containing the novel repeat types
Discussion
Materials And Methods
Acknowledgements
References
Figure
REFERENCES
This page is run by Oxford University Press, Great Clarendon Street, Oxford OX2 6DP, as part of the OUP Journals
Comments and feedback: www-admin{at}oup.co.uk
Last modification: 14 Mar 1998
Copyright© Oxford University Press, 1998.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
J. Cocquet, E. De Baere, S. Caburet, and R. A. Veitia Compositional Biases and Polyalanine Runs in Humans Genetics, November 1, 2003; 165(3): 1613 - 1617. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Blanco, M. Shlumukova, C. A Sargent, M. A Jobling, N. Affara, and M. E Hurles Divergent outcomes of intrachromosomal recombination on the human Y chromosome: male infertility and recurrent polymorphism J. Med. Genet., October 1, 2000; 37(10): 752 - 758. [Abstract] [Full Text] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


