| Human Molecular Genetics | Pages |
©1999 Oxford University Press |
Mapping recombination hotspots in human phosphoglucomutase (PGM1)
Introduction
Results
Family studies
Population analysis
Discussion
Materials And Methods
Populations
Family data
Isozymes
Site A genetic markers
Site B genetic markers
Statistical analysis
Abbreviations
Acknowledgements
References
Mapping recombination hotspots in human phosphoglucomutase (PGM1)
Received March 29, 1999; Revised and Accepted June 16, 1999
DDBJ/EMBL/GenBank accession no. AJ243265
Human phosphoglucomutase (PGM1) is a highly polymorphic protein. Three mutations and four intragenic recombination events between the three mutation sites generate eight protein variants including the four universally common alleles, 1+, 1-, 2+ and 2-, and four others that are polymorphic in some Oriental populations, 3+, 3-, 7+ and 7-. The mutations 3/7, 2/1 and +/- are in exons 1A, 4 and 8, and are 40 and 18 kb apart, respectively. Using 12 polymorphic markers, including 2/1 and +/-, we have now obtained direct evidence for a high rate of intragenic recombination across this 58 kb region. From segregation analysis of PGM1 haplotypes in CEPH families, the recombination frequency was estimated to be 1.7%. We have also used a population genetics approach to map the patterns of linkage disequilibrium across the PGM1 gene in three diverse population samples (Caucasian, Chinese and Vietnamese). This has allowed us to compare indirect estimates of intragenic recombination with the meiotic data from family studies. Comprehensive pairwise allelic association analysis of the markers indicated the presence of two recombination `hotspots': one between exons 1A and 4 and the other in the region of exon 7. These locations are in keeping with the meiotic data and with the original hypothesis of intragenic recombination based on PGM1 isozyme analysis.
INTRODUCTION
Protein polymorphisms usually arise from single amino acid substitutions as a result of point mutations. Other mechanisms include small deletions and insertions, as well as more extensive rearrangements of coding sequence. There are also instances of inter- and intragenic recombination leading to protein polymorphism, and well known human examples include the haptoglobin locus, the globin genes and visual pigment gene polymorphisms (1-3). In all of these cases, there is unequal (i.e. non-reciprocal) crossing-over, which inevitably leads to gain or loss of coding sequence and a novel protein. Reciprocal intragenic recombination, however, leads to the exchange of genetic information without alteration in the size of the coding sequence and relatively subtle protein variation that may not be as easy to distinguish from the more frequent point mutation polymorphisms.
Several years ago, it was predicted from protein analysis that the classical human phosphoglucomutase (PGM1) isozyme polymorphism was attributable to a combination of point mutation and intragenic reciprocal recombination (4,5). Furthermore, it was suggested that recombination was relatively frequent (0.5%) within the PGM1 gene (6). The molecular basis of the PGM1 isozyme polymorphism was determined by studies on the four universally common alleles (1+, 1-, 2+, 2-) and four alleles (3+, 3-, 7+, 7-) that are polymorphic in some Oriental populations (7,8). It was concluded that these eight alleles represent haplotypes generated by reciprocal intragenic recombination between three diallelic polymorphic sites separated by 40 kb (between 7/3 in exon 1A and 2/1 in exon 4; site B) and by 18 kb (between 2/1 and +/- in exon 8; site A) (9).
Recently, we generated a panel of new intronic markers designated N1a, N1b, M1, M2a, M2b, M3, M4, M5a, M5b and M6 (10), principally single nucleotide polymorphisms, in the vicinity of the three polymorphic isozyme sites in order to study the diversity and stability of the PGM1 gene in more detail. We have used these markers in family studies to obtain a direct estimate of the recombination frequency and to map the sites of crossing-over within the PGM1 gene. We have also attempted to define the boundaries of the two regions of recombination using pairwise allelic association analyses in three populations.
These studies contribute to current efforts in defining the stability of the human genome, the role of recombination in generating haplotype diversity within functional genes and non-coding sequence and the genetic analysis of complex traits. Detailed knowledge of the regions where there is active recombination in the human genome and the construction of a comprehensive `hotspot' map may be of significance in the efforts to identify susceptibility loci for complex common diseases. This information may also be of relevance to studies of mutational events in somatic cells, for example in cancer genetics (11), and in the study of the meiotic recombination events associated with the formation of chiasmata in synaptonemal complexes (12).
This work on the PGM1 locus provides a model system to complement other gene-based studies of human recombination such as the pioneering work on [beta]-globin (13) and insulin (14) and current work on the [beta]-globin gene cluster (15) and lipoprotein lipase (16,17). Here we are concerned with the analysis of normal variation within general populations, but of course these approaches are applicable to investigations of clinical disorders at the pathological extremes of the phenotypic distribution.
RESULTS
Family studies
We investigated the patterns of inheritance of intragenic PGM1 markers M1-M6, N1, 2/1 and +/- (Fig. 1) in the CEPH families. To simplify this analysis, polymorphic sites that occurred on the same PCR fragment were amalgamated. Thus N1a and N1b, M2a and M2b, and M5a and M5b became N1, M2 and M5, respectively. Out of a total of 290 informative meioses, five recombinant chromosomes were identified (Figs 2 and 3) and all, except in family 1333, were of paternal origin. In families 1333 and 1413, the breakpoints were in site A. In the case of family 1333, the parental origin of recombination could not be identified but the event occurred in the 7.8 kb region between M2 and +/-, and in family 1413 recombination occurred in the 1.5 kb region flanked by M3 and M4. In families 45 and 66, the breakpoints were in site B between N1 and 2/1. In family 884, it was not possible to determine from the haplotypes whether the exchange occurred in site A or site B since the father was homozygous for all the markers between N1 and M2.
Figure 1. Intron/exon structure of PGM1 showing polymorphic loci and the two regions of active recombination.
|
A
B
C
D
E
|
Figure 2. Recombination events in five CEPH pedigrees. In family 45, the haplotypes were deduced from the sibship. In family 1333, it was not possible to determine in which parent the crossover had occurred.
Population analysis
Pairwise allelic association analysis (18) of the seven polymorphic PCR fragments with the isozyme markers 2/1 and +/- suggested that the minimal region for recombination in site A is a segment of 2000 bp spanned by the M4 and M2 fragments, and that a second site (site B) lies between the N1 fragment and exon 4 (data not shown). Overall, there was agreement between the results from each of the three population samples.
In order to refine the boundaries of the recombination site between exons 4 and 8 (site A), we carried out comprehensive allelic association analysis for all pairs of polymorphic sites, including the isozyme sites; a total of 45 pairwise comparisons. Figure 4 shows the strength of linkage disequilibrium, D[prime], plotted for each of the adjacent pairs of polymorphic sites, and Figure 5 summarizes the distribution of P-values for each of the three population samples. These results suggest that site A is bounded by M5b and M6, a 750 bp region. The Caucasian and Vietnamese samples are in particularly good agreement, but the Chinese data are more difficult to interpret due to the broad spread of the P-values in the neighbourhood of M4 (IVS6 -665) and a somewhat low value of D[prime]. Nonetheless, the overall distribution pattern of the statistics is consistent among all three population groups.
We have carried out allelic association analysis in site B using two markers, N1a and N1b, between exons 1A and 4 that contain the 3/7 and 2/1 isozyme sites, respectively (Fig. 4). With one exception, the results are consistent across all three populations in that neither marker shows association with the two downstream isozyme sites 2/1 and +/-, or the intron markers M1-M6. The single exception is the association of N1a and N1b with the +/- site in the Caucasian group. There is strong linkage disequilibrium between N1a and N1b (|D[prime]| = 1.00) whereas N1b and the 2/1 site appear to be in linkage equilibrium, suggesting that there is active recombination in site B.
DISCUSSION
We have used allelic association and haplotype-based approaches in CEPH families and three populations to attempt the localization of the genomic positions of two recombination sites (A and B) within the PGM1 gene. The family studies provided direct evidence of a high frequency of intragenic meiotic recombination. The overall recombination rate for the 58 kb region of PGM1 we studied was estimated at 1.7% (5/290, 95% CI 0.63-4.20%), which equates to 29 cM/Mb, whereas the male recombination rate was 2.1% (four out of 141 male meioses) or 36 cM/Mb. There were no certain female recombinant meioses. Two of the five events (in families 1333 and 1413) led to crossing over in a 12 kb region within site A, giving an overall estimated recombination rate for this region of 0.8%, or 66 cM/Mb. There were also two recombinants assigned unambiguously to site B (in families 45 and 66). The fifth recombinant (in family 884) could not be assigned to a specific site, but if it occurred in the larger target region, site B, the male rate increases from 1.4 to 2.1%, ~50 times greater than expected. A previous estimate of female meiotic recombination of 0.5% between 2/1 and +/- (~30 cM/Mb) derived from isozyme analysis of a large data set (6) contrasts with our data and suggests that there may be a difference in the sex-specific recombination rates for the PGM1 gene.
Population genetics studies were used to investigate the possible boundaries of recombination by measuring allelic association. Strong association between pairs of markers suggests lack of recombination, whereas weak association may be evidence for a history of active recombination between them, i.e. possible hotspots. The validity of this interpretation depends on there being no strong selection at the gene, that the populations are not stratified and that the mutations underlying the polymorphisms are approximately the same age. To overcome the difficulties of excluding selection and determining admixture and the ages of the mutations, we used three independent population groups. We reasoned that consistent results between these groups would be likely to reflect patterns of recombination rather than the effects of population admixture, non-uniform mutational history and weak selection.
Our allelic association analysis suggests that the recombination at site A occurs mainly in a 750 bp region flanked by M5b (IVS6 -49) and M6 (IVS7 +584), although the entire region between M4 and M2b is active. Markers either side of the latter pair of polymorphic sites show strong association. There is virtually no association between M5b and M6, suggesting that there is a recombination hotspot embedded in a relatively cold region. These results are consistent with the family data: one recombination event occurred between M3 and M4 and the other was in the vicinity of M6 (Fig. 3). Fewer markers cover the region of recombination in site B and the population statistics are less informative. Nevertheless N1b (IVS1 +239) is in linkage equilibrium with the 2/1 site in all three populations, again supporting the family data derived from this large region of ~40 kb.
Figure 3. PGM1 recombinant haplotypes from five CEPH families. The top two represent hypothetical haplotypes that differ at each locus. Only the informative markers are shown on the recombinants. Four of the five recombination events are paternal (families 45, 66, 884 and 1413); the fifth, in family 1333, could not be assigned. In families 45 and 66, recombination occurred in site B between N1 in IVS1 and 2/1. In families 1413 and 1333, recombination occurred in site A between M3 and M4 in IVS6, and between M2 and +/-, respectively. The location of crossover for family 884 is less well resolved.
Figure 4. Localization of recombination hotspots in the PGM1 locus using the linkage disequilibrium parameter D[prime]. The x-axis shows the positions of exons and all polymorphic sites referred to in the text. The y-axis shows absolute values of D[prime] for adjacent pairs of polymorphic sites, plotted mid way between each pair of markers.
Figure 5. Allelic association analysis of 45 pairwise comparisons of polymorphic sites in three populations. The results are expressed as P-values. Black cells indicate strong allelic association with P [le] 0.01. Grey cells suggest the absence of association, P >0.05, and hatched cells represent weak allelic association, 0.05 > P > 0.01.
It is noteworthy that site A includes exon 7 (116 bp) which may be part of the hotspot. PGM1 exhibits a high incidence of null alleles that have an estimated combined gene frequency of 0.017 (19,20). Is it conceivable that some of these are due to errors in recombination involving exon 7? We have some evidence for genomic instability in the neighbourhood of exon 7. For example, three insertion/deletion (InDel) polymorphisms have been identified in a 1 kb region that includes exon 7: a single T InDel at IVS6 -665 -> -656 (the M4 marker), a 14 bp InDel at IVS6 -176/-175 (the M5a marker in Caucasians) and the single A InDel at IVS7 +160 -> +163 (10). These are the only InDel polymorphisms that we have found in the PGM1 gene. Given the presence of the hotspot, it is plausible that unequal crossovers or gene conversion could underlie these localized events and exon 7 may be prone to frequent mutations by similar processes. The entire 6.5 kb genomic DNA sequence (GenBank accession no. AJ243265) from IVS4 up to and including the first 1500 nucleotides of IVS7 was scrutinized for motifs possibly associated with recombinogenic activity in human DNA (15,21-27). Many candidate motifs that previously have been associated with recombination activity were discovered including SINES, a LINE, the human minisatellite GGGCAGGARG, a perfect [chi] sequence GCTGGTGG, many Ig switch sequences (GAGCT and GGGCT) and two translin-binding sites (ATGCAG and GCCCWSSW). However, in all of these cases, these motifs were also found in two `control' genomic sequences, DRA1 and [beta]-tubulin, not associated with particularly high levels of recombination. The most interesting observation was the discovery of two sequence motifs (a PUR element, GGNNGAGGGAGAARRRR, and a consensus 21 bp motif, WAWTTDDWWWDHWGWHMAWTT) previously associated with origins of DNA replication (24,26) and recently reported in the vicinity of the [delta]/[beta]-globin recombination hotspot on chromosome 11.
Using two distinct approaches, family studies and allelic association/linkage disequilibrium analysis, we have provided evidence for the location of two regions of active recombination in the PGM1 gene. The family studies provided direct evidence for contemporary intragenic recombination and the population surveys revealed the historical record of past recombinations. To avoid statistical problems arising from multiple testing, we found it was helpful to screen a large number of individuals in diverse populations. The limitation of mapping recombination sites by the allelic association approach arises from its dependence on suitable marker polymorphisms, but this is a feature of all other methods. The resolving power is restricted to the region spanned by the nearest pair of markers. From the population data, our most closely mapped area is the region of 750 bp bounded by M5b and M6 in site A. It remains to be determined whether site A is homologous to site B and whether the mechanism of recombination involves normal reciprocal crossing-over or gene conversion or a mixture of both. The data we have collected from family and population studies are fully consistent with the molecular phylogeny originally based on isozyme analysis (5) and confirmed by nucleotide sequence data (7,8), i.e. there is evidence for two recombination hotspots between the three polymorphic sites which accounts for the eight PGM1 isoforms.
MATERIALS AND METHODS
Populations
Three population samples comprising healthy unrelated individuals were investigated: Caucasian, mainly North European (n = 169), Hong Kong (southern) Chinese (n = 222) and Vietnamese (n = 187). These groups undoubtedly represent differing degrees of population admixture. It is likely that the southern Chinese are the least admixed, whereas the Caucasians might be expected to be the most admixed. However, it is unlikely that any of the three groups have admixed recently with either of the other two, and thus we can regard them as being genetically distinct. The Vietnamese samples were from the recent immigrant population in Hong Kong, which consists mainly of South Vietnamese. Further details of these population samples and the basic population genetic statistics have been reported (10).
Family data
Centre d'Etude du Polymorphisme Humain (CEPH) samples were available locally as purified DNA solutions. Segregation of PGM1 haplotypes were studied in informative families (n = 26); typical pedigrees are shown in Figure 1. PGM1 protein genotype data for CEPH families were obtained from the CEPH public domain database.
Isozymes
PGM1 isozyme phenotypes were determined by isoelectric focusing of haemolysates from EDTA blood as described (28) followed by enzyme activity staining (29). Alternatively, genomic DNA was amplified by duplex PCR followed by single-stranded conformation polymorphism (SSCP) analysis. A 181 bp fragment encompassing the 2/1 polymorphic site in exon 4 was amplified using the primers W4F and W4R (Table 1). Another fragment (299 bp) containing the +/- polymorphic site in exon 8 was amplified from JD8-1F and JD8-4R (Table 1). The PCR mixture (25 µl) contained 1× reaction buffer (50 mM KCl, 10 mM Tris-HCl, pH 9.0, 0.1% Triton X-100), 2.0 mM MgCl2, 0.2 mM of each dNTP, 0.5 µM of the primer pair W4F/W4R, 0.3 µM of the primer pair JD8-1F/JD8-4R, 1 U of Taq DNA polymerase (Promega) and 50-100 ng of genomic DNA. Amplification was carried out for 35 cycles of 94°C/20 s, 63°C/20 s and 72°C/30 s, and followed by a final extension of 10 min at 72°C. Electrophoresis for SSCP analysis in 10%T/2.6%C polyacrylamide gels at 13°C for 4 h at 400 V used procedures described previously (10).
Table 1. Primer sequences
| Polymorphic site | Primera | Sequence (5[prime]->3[prime]) |
| 2/1 | W4F | GCA GGT TTA CAG CAA TAT AGT CAC A |
| W4R | TGA AGC ATC ATG ATA CAC ACA GAA G | |
| +/- | JD8-1F | CCT CCA GGT TCT GAC CAC ATC CG |
| JD8-4R | CCC ACC TTA CCT TGT ACC CCA GC | |
| ARMS | ||
| IVS6 -665/-656 | Y6iASFA1 | AAG GGA AAG GAA TTT TTT TTT AAG TCA |
| Y6iASFB1 | AAG GGA AAG GAA TTT TTT TTT TAA GTC | |
| Common primer | Y7iR3 | TAT CTT TTC ACT AGG CTC AAC ACT G |
Site A genetic markers
A series of eight intron marker polymorphisms located between 2/1 and +/- and designated M1, M2a, M2b, M3, M4, M5a, M5b and M6 were typed by SSCP analysis of six PCR fragments (M1-M6) using procedures described previously (10). Direct analysis of haplotypes across six polymorphic sites was done using an amplification refractory mutation system (ARMS) on the 2.1 kb region between M4 and M2b, followed by individual site-specific amplification and SSCP analysis. The two ARMS primers were used separately with a common primer (Table 1) in a reaction mixture similar to our standard PCR amplification with 0.5 µM of each primer, except for the addition of 0.01% acetylated bovine serum albumin (BSA), 1.5 mM MgCl2 and 0.5 U of Taq polymerase. Hot start was initiated by adding MgCl2 in the initial denaturation phase of 95°C/5 min and the reaction was cycled at 94°C/1 min, 65°C/1 min 10 s and 72°C/3 min for 35 cycles.
Site B genetic markers
Two intron marker polymorphisms located just downstream of the 3/7 site and designated N1a and N1b were typed by SSCP analysis of a PCR-amplified fragment N1 as previously reported (10).
Statistical analysis
Gene frequencies for all markers have been reported (10). Pairwise allelic association analysis of the intron marker polymorphisms and the isozyme markers was carried out using the ASSOCIATE program (18) with 100 iterations. For linkage disequilibrium analysis, the delta value (D) and the expected haplotype frequencies were used to calculate D[prime] where D[prime] = D/Dmax and Dmax = min (p1q2, p2q1) when D is positive and Dmax = min (p1q1, p2q2) when D is negative, where p1 and p2 are the allele frequencies for one locus and q1 and q2 are those for another. The absolute value of D[prime] was used to compare the different groups since the sign of the disequilibrium was irrelevant to the present analysis.
ABBREVIATIONS
InDel, insertion/deletion; IVS, intervening sequence; PGM, phosphoglucomutase; SSCP, single-stranded conformation polymorphism.
ACKNOWLEDGEMENTS
The study was supported by the UK Medical Research Council. S.P.Y. was a recipient of the Commonwealth Academic Staff Scholarship awarded by the Commonwealth Scholarship Commission in the UK and, during the study in London, was on study leave granted by his home university, the Hong Kong Polytechnic University (HKPU). Part of the work was performed in his parent department at HKPU and this fieldwork was partly funded by his parent department. We are also grateful to EUROGEM for access to the CEPH DNA samples.
REFERENCES
+To whom correspondence should be addressed. Tel: +44 020 7504 5038; Fax: +44 020 7387 3496; Email: d.whitehouse{at}galton.ucl.ac.uk
This page is run by Oxford University Press, Great Clarendon Street, Oxford OX2 6DP, as part of the OUP Journals
Comments and feedback: jnl.info{at}oup.co.uk
Last modification:
Copyright© Oxford University Press, 1999.
This article has been cited by other articles:
![]() |
N. A. Rana, N. D. Ebenezer, A. R. Webster, A. R. Linares, D. B. Whitehouse, S. Povey, and A. J. Hardcastle Recombination hotspots and block structure of linkage disequilibrium in the human genome exemplified by detailed analysis of PGM1 on 1p31 Hum. Mol. Genet., December 15, 2004; 13(24): 3089 - 3102. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. A. Schneider, T. E. A. Peto, R. A. Boone, A. J. Boyce, and J. B. Clegg Direct measurement of the male recombination fraction in the human {beta}-globin hot spot Hum. Mol. Genet., February 1, 2002; 11(3): 207 - 215. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Hassold, S. Sherman, and P. Hunt Counting cross-overs: characterizing meiotic recombination in mammals Hum. Mol. Genet., October 1, 2000; 9(16): 2409 - 2419. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. M. Badge, J. Yardley, A. J. Jeffreys, and J. A. L. Armour Crossover breakpoint mapping identifies a subtelomeric hotspot for male meiotic recombination Hum. Mol. Genet., May 1, 2000; 9(8): 1239 - 1244. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||









