| Human Molecular Genetics | Pages |
Analysis of germline mutation spectra at the Huntingtons disease locus supports amitotic mutation mechanism
Introduction
Results
Variation in germline mutation spectra within an individual
Variation in germline mutation spectra between individuals
Effects of parental origin on HD allele instability
Interallelic effects on instability
Models of trinucleotide repeat mutation
Using the mutation spectra to estimate the allele-specific HD mutation rate
The Okazaki fragment processing model of trinucleotide repeat mutations
Modeling HD mutations
Fitting the model to the data
Discussion
Dynamic mutations are likely to occur during germline mitotic divisions
HD dynamic mutations may result from a defect in Okazaki fragment processing
The role of paternal age in determining the mutation spectra
Effect of genetic background on instability
Materials And Methods
Single sperm isolation and preparation and PCR
Reliability of the sperm typing data
Statistical analysis
Probabilistic model
A model for expansions and contractions
Estimation of parameters
Acknowledgements
References
Analysis of germline mutation spectra at the Huntingtons disease locus supports amitotic mutation mechanism
INTRODUCTION
At least 11 neurodegenerative diseases result from expansion in the number of trinucleotide repeats in or adjacent to a protein coding gene (1-4). For most of these diseases the unstable sequence consists of CAG/CTG triplets. A remarkable characteristic of trinucleotide repeat disease alleles is that they can undergo dynamic mutations in which repeat number may change when the disease gene is transmitted from an affected parent to the offspring. The molecular basis of this dynamic mutation process is of great fundamental interest and stands in contrast to the stable transmission of other disease mutations.
Trinucleotide repeat instability is influenced by the sex of the transmitting parent, the number of repeats and the purity of the repeat tract (1-4). The extent to which other factors such as age, genetic background or interallelic effects may contribute to dynamic mutation is less certain and more difficult to analyze (5-9). Since only a small number of offspring is conceived by any one affected parent, it is difficult to study the relationship between mutation characteristics and other variables in a single individual. Pooling transmission data from many different individuals may confound the analysis of important variables. For example, the pooled size distribution of mutant alleles found among offspring of affected parents with the same mutant allele size could reflect a random sampling from the mutation size distribution characteristic of that somatic allele size in any individual. On the other hand, the pooled distribution could reflect sampling from mutation size distributions that differed significantly among the parents. Fortunately, the large sample sizes afforded by single genome analyses such as single sperm typing allow accurate estimates of the germline mutation frequency and the size distribution of mutant sperm (the mutation spectrum) in individual males. Each sperm represents a potential paternal transmission.
A detailed description of the human dynamic mutation process in individual males is useful in three regards. First, it can define a precise quantitative standard against which the results from experimental model systems can be judged for their applicability to human dynamic mutations. Second, it can provide clues as to what molecular mechanisms may be contributing to the mutation process. Finally, with the proper experimental design the data can be used to examine the role of genetic and possibly environmental factors in genetic instability.
Huntingtons disease (HD) is associated with progressive disordered movements, decline in cognitive function and emotional disturbance. Disease-causing alleles exhibit significant instability especially when transmitted paternally (9-17). In order to begin to unravel the contributions of different variables to dynamic mutation at the HD locus, we studied 26 men from the large Venezuelan HD cohort (18). We generated mutation spectra from individuals with somatic allele sizes ranging from 37 to 62 repeats. We used sperm from affected individuals or individuals at risk, including siblings, cousins and a father and his son. Our data were compared with the mutation spectra expected under a simple Okazaki fragment processing model of trinucleotide repeat instability.
RESULTS
A number of studies have shown that somatic variation in HD allele size is extremely limited compared with germline variation (17,19,20). Consequently, trinucleotide repeat mutations can be detected if the HD allele repeat number in each sperm is compared with the allele size in somatic DNA from the same individual (19). Data from the analysis of 27 sperm samples, including three published previously (19), are shown in Table 1. Included is the age of each donor, the somatic HD allele repeat number, the observed expansion and contraction mutation frequencies and the mean and standard deviation of the change in repeat number. In Figure Table 1.
Donor ID
(CAG)n
Age (years)
Sample size
Mean
Standard deviation
% expansion
% contraction
E/O
P/M
ps × 10-4
pD
pL × 10-3
A
62
18
168
32.51
11.15
99
0
E
P
120.0
0.51
400.0
B
62
17
118
9.74
6.90
94
4
O
M
210.0
0.00
310.0
C
53
21
100
12.98
9.82
89
8
E
M
200.0
0.45
140.0
D
51
29
95
20.35
12.25
97
3
O
P
210.0
0.15
170.0
E
50
29
61
15.39
13.16
87
10
O
P
260.0
0.23
120.0
F
49
23
94
10.41
8.24
95
3
O
M
8.0
0.78
41.0
G
48
31
113
3.36
5.12
65
19
O
M
6.4
0.78
8.0
H
48
26
109
2.67
5.51
59
29
O
M
13.0
0.88
5.6
I
47
34
114
4.88
7.65
79
16
O
P
13.0
0.86
6.8
J
46
27
113
0.86
3.22
45
34
O
M
11.0
1.00
0.5
K
45
52
136
3.40
6.51
63
26
O
M
5.1
0.86
2.8
L
45
41
367
4.19
6.40
72
15
O
M
10.0
0.80
5.8
M
45
31
63
6.41
6.07
98
0
O
M
0.0
0.66
21.0
N
45
29
106
2.98
5.44
75
13
O
M
6.1
0.73
8.6
O
44
54
128
1.56
4.10
45
27
E
P
2.3
0.83
1.5
P
44
52
107
23.37
30.31
85
8
E
M
27.0
1.00
9.6
Q
44
44
152
0.67
3.54
42
41
O
M
5.6
0.90
0.6
R
44
30
158
1.20
2.18
61
22
O
P
7.4
0.00
11.0
S
44
21
116
1.13
2.26
53
22
O
M
8.6
0.48
11.0
T
43
52
121
2.56
3.61
77
12
O
M
4.3
0.51
5.5
U
43
17
121
0.42
1.26
41
19
O
M
6.4
0.27
9.8
V
42
35
112
1.75
2.28
70
12
O
3.6
0.29
9.3
W
41
38
191
0.65
1.91
49
23
O
M
3.5
0.35
2.8
X
40
18
183
-0.01
1.42
31
33
E
M
11.0
0.55
0.8
Y
39
65
180
2.73
6.82
69
15
E
P
2.4
0.76
2.4
Z
39
63
210
2.26
3.26
73
18
E
P
3.3
0.47
4.2
AA
37
37
136
0.04
1.10
26
24
E
M
2.0
0.16
0.3
Variation in germline mutation spectra within an individual
We examined whether differences in mutation spectra existed between sperm samples taken at age 63 (sample Z) and age 65 (sample Y) from the same donor. The mutation spectra show indistinguishable patterns except for two sperm in the older sample with allele sizes considerably larger than those seen in the age 63 sample. There is no statistically significant difference between the two samples in the mutation frequency (P = 0.08) or the mean change in repeat number (P = 0.40).
Variation in germline mutation spectra between individuals
Qualitatively, the mutation spectra of different individuals can sometimes vary dramatically, even among individuals with similar ages and somatic allele lengths. Compare, for example, the spectra of individuals A and B shown in Figure
The mutation frequency for each individuals HD allele was calculated from the mutation spectra by dividing the number of sperm which differed in size from the originally inherited HD allele by the total number of HD sperm. The mutation frequency always exceeds 50% and in most cases is >80%. For individuals with at least 50 repeats the average mutation frequency was 98%. The expansion mutation frequency in an individual almost always exceeds that for contractions. As shown in Figure
Effects of parental origin on HD allele instability
It is possible that the parental origin of the HD allele shows an effect on instability. Therefore we divided the sample of 25 distinct individuals with known origin into two groups depending on whether the HD allele was paternally (seven donors) or maternally (18 donors) inherited (Table 1). Using permutation tests (see Statistical analysis), we concluded that parental origin has no discernable effect on the average somatic repeat number. This being the case we further compared the average of the mean change in repeat number and the average mutation frequency in sperm. No effect of parental origin was detected.
![]() |
![]() |
![]() |
![]() |
![]() |
Figure 1. Mutation spectra of sperm samples. The vertical bars are actual data, shown as fractions of the sample size. Data on donors D, E and a portion of Z were previously published (19). The curves trace the distribution expected according to the model described in the text. Note the three distinct x-axis scales that are used. There is a (CCG)n polymorphism almost immediately adjacent to the HD CAG repeat tract (21,22). Our donors could be divided into those that are homozygous for the (CCG)7 allele (19 donors) and those heterozygous for (CCG)7 and (CCG)10 (seven donors) (Table 1). Using the permutation approach described above, we also found that (CCG)n genotype has no significant effect on the average somatic repeat number. Similarly, we found no effect of the (CCG)n polymorphism on the average of the mean change in repeat number and the average mutation frequency in sperm. Our focus is on using our data to help us to understand the molecular mechanism of dynamic mutation at the HD locus. In the next sections, we describe a model (23,24) based on Okazaki fragment processing and show how it can be fitted to the observed mutation spectra. Figure 2. Relationship between the mean change in allele size resulting from mutation and number of repeats in the somatic HD allele of each donor. Molecular events occurring during recombination or DNA replication have been hypothesized to explain instability at microsatellite repeat loci. Based on model organisms, there is little experimental support for a recombination mechanism. Studies on Escherichia coli and yeast strains carrying mutants that dramatically reduce recombination result in little increase in microsatellite (including trinucleotide repeat) instability (25-29). Triplet repeat expansions may occur during DNA replication as a result of slippage or deficient Okazaki fragment processing. Slippage is thought to be the basis of microsatellite repeat mutations observed in yeast and in certain human colon cancers (26,30-34). Loops created by replication slippage can lead to either expansion or contraction mutations depending on whether loop formation occurs on the nascent or template strand, respectively. Recent experiments in yeast suggest that triplet repeat expansions occur during DNA replication of the lagging strand (24,28,35-37). DNA flaps generated during lagging strand synthesis are normally processed by the Okazaki fragment flap endonuclease Fen-1 (38,39). Flaps containing certain trinucleotide repeat sequences are hypothesized to resist Fen-1 cleavage (23). This could lead to loop formation on the nascent strand and result primarily in expansion mutations. Figure 3. Relationship between mutation frequency and somatic HD allele size of each donor. Assuming that germline mutations in the HD gene are generated during mitotic DNA replication, the mutation rate per cell division might be estimated if the number of cell divisions that preceded sperm formation is known. However, dynamic mutations are unlike classical mutations: not all mutant sperm may have experienced the same number of mutation events. Thus, a sperm that has gained 20 repeats compared with somatic DNA could have undergone the expansion due to a single mutation event at one division, could have experienced multiple, but smaller, expansion events occurring over many divisions or could have had an extensive history of both expansions and contractions. Due to this uncertainty, new methods of mutation rate analysis need to be devised. Any estimate of the dynamic mutation rate per cell division using mutation frequency data must consider both the nature of the mutation event (how many repeats are added or subtracted per event) as well as the total number of mutation events that have occurred over the sperms DNA replication history. Our approach to estimating the HD mutation rate is based on comparing the observed mutation spectra with spectra obtained by modeling the mutation process and the history of spermatogonial stem cell divisions. We note that for most of the CAG/CTG diseases, greater instability is observed in male than in female transmissions (1-4); this may be influenced by the continual mitotic divisions experienced by spermatogonia. One candidate model, proposed earlier in the development of replication slippage models for microsatellite mutations, is the stepwise mutation model (40,41). In our setting, this model allows for the addition or deletion of a single repeat when copying a given triplet on either the lagging or leading strand. This mechanism can be asymmetrical, in the sense that a triplet need not be deleted or added with equal probability. When fitted to the data presented here, this model provided an adequate fit to the five donors with somatic allele sizes of at least 50 repeats and to five of those eight with sizes [le]43 repeats. In contrast, the model fitted only one of the 14 donors with somatic allele sizes of 44-49 repeats (further details on the model are provided in P. Marjoram et al., in preparation). This result casts serious doubt on the adequacy of so simple a molecular model for the expansion mechanism. As noted in Leeflang et al. (19), any model has to allow for the addition of much larger numbers of repeats during any one replication event. A mechanism for this is described in the next sections. Recent experiments have shown that yeast carrying a large CAG/CTG repeat tract and an interruption in the RAD27 gene (the Saccharomyces cerevisiae homolog of Fen-1) exhibit a marked increase in the frequency of expansion mutations compared with wild-type yeast cells (24,28). RAD27-deficient strains also exhibit duplication mutations between short direct repeats as well as expansion mutations in dinucleotide repeat tracts (35,37). In addition to these new data, it was previously noted that significant trinucleotide repeat instability in humans is first manifested when the length of the repeated region approaches the size of an average mammalian Okazaki fragment (42-44). Data on the strand preference of certain mutations in E.coli (45,46) also support the idea that trinucleotide repeat expansion mutations may be preferentially initiated during synthesis of the lagging strand. During DNA replication of the lagging strand, Okazaki fragments are initiated sequentially in the direction opposite to that of the advancing replication fork. As a consequence, an Okazaki fragment may have its 5[prime]-end displaced by polymerase extension of the immediately upstream new Okazaki fragment. Normally, the flap would be expected to be eliminated by Fen-1. Gordenin et al. (23) argued that a displaced flap containing CAG or CTG sequences might form a thermodynamically favorable secondary structure, a well-known characteristic of CAG/CTG sequences in vitro (47-53) and in vivo (54,55). Based on the biochemical properties of Fen-1 (38,39), Gordenin et al. (23) further postulated that formation of any hairpin-like structure at the 5[prime]-end of the flap would be expected to inhibit the ability of Fen-1 to remove the flap through its endonuclease activity. Ligation of the 5[prime]-end of a flap (that had not been removed by Fen-1) to the 3[prime]-end of the upstream Okazaki fragment will result in a nascent DNA strand with an increased number of repeats equal to the length of the flap (23,24,36) and result in an expansion mutation. We modeled the trinucleotide repeat mutation process using a simplified Okazaki fragment model of mutation. We posit two different steps in the mutation process. The first is the requirement that the Okazaki fragment is initiated within the CAG/CTG repeat tract to allow for secondary structure formation at its 5[prime]-end if displaced by the upstream Okazaki fragment. To model this, we use experimental data on the size distribution of mammalian Okazaki fragment length (56). In the second step we ask how many repeats are contained in a flap displaced by the upstream Okazaki fragment. This determines the size of the expansion mutation after integration of the unprocessed flap into the nascent DNA strand. In addition to this mechanism we include another one, based on DNA slippage, which allows for the loss (as well as addition) of one repeat when replicating each repeat on the leading strand. This second process is important since mutation spectra with smaller disease alleles contain a sizable number of short contractions. While we chose to model the slippage mechanism occurring on the leading strand only, the model with slippage on both strands can be analyzed in exactly the same way (see Probabilistic model). Finally, the cell division history of each individual sperm is estimated by considering the age of the donor at the time the sample was collected. Before spermatogenesis begins at puberty (assumed to occur at age 13), the spermatogonial stem cells have undergone an estimated 34 divisions since formation of the zygote. After puberty the stem cells divide ~23 times/year (57). As described in more detail in Materials and Methods, these features can be combined into a probability model that allows us to calculate the chance that a repeat tract in a sperm from a donor of given somatic allele size and age has any particular number of repeats. Previous models of the HD expansion process have not included cell division history (19,58). There are three parameters to be estimated in our model: pS gives the probability that a slippage mutation occurs while replicating a triplet on the leading strand (this mutation results in the addition or deletion of a triplet with equal probability); pD is the parameter of the displacement process during Okazaki fragment synthesis; and pL is the ligation probability, the chance that a flap is incorporated into the nascent strand. Estimates of these parameters were determined for each individual data set (Table 1) by the method of maximum likelihood, using the approach outlined in Estimation of parameters. Most striking is the agreement between observed and expected mutation spectra (Fig. One feature of the fits that suggests a more complicated mutation mechanism is the variability in the pattern of the parameter estimates themselves. For example, a simple model might have predicted that the pS values be the same for each individual. However, the standard errors of the parameter estimates (found by simulation) show that the slippage parameter pS varies significantly among the donors. In particular, the values for the longer somatic alleles are considerably bigger than those for the shorter somatic alleles. Similarly, the ligation probabilities pL are considerably bigger for the larger somatic alleles. The biological basis of this variation is not yet understood. Trinucleotide repeat instability could arise in the germline or post-zygotically. The latter was proposed to be the basis for expansions in males inheriting a fragile X pre-mutation (59,60), but recent studies have concluded that expansion to a full mutation is more likely to occur in the germline rather than following fertilization but prior to segregation of the germline (61). If the germline is the site of trinucleotide repeat mutations, then the question of whether instability is a pre-meiotic or meiotic phenomenon still remains unknown. Consideration of our HD data presents difficulties for a single event meiotic model. In the case of the largest HD alleles, the meiotic mutation rate would have to be close to unity since, in individuals with at least 50 repeats, an average of 98% of the sperm carrying the HD allele have undergone a mutation. If HD mutation events occurred once during replication, at the last premeiotic S phase for example, two events would be required to explain the almost 100% observed mutation frequency. First, a mutation must arise on both the leading and lagging strands of the replicating chromosome carrying the HD allele. This would yield two heteroduplex chromatids each with one DNA strand with the original HD allele size and the other DNA strand with the newly mutated HD allele size. Second, each heteroduplex chromatid must undergo repair so that the mutated loop-containing strand is retained and the unmutated original HD strand is lost. A repair bias of almost 50:1 would be required to account for the data on donors with a 98% average mutation frequency. Trinucleotide repeat sequences form palindrome-like structures and the repair of palindromic loops in mammalian cells has been reported to be biased by only as much as 2:1 in favor of retaining the loop structure (62). Additional experimental data on large palindromic loop repair are clearly needed. It would also be difficult to explain the highest HD mutation frequencies (in individuals with at least 50 repeats) based on a meiotic recombination event. If some form of recombination occurred between the normal allele and the HD allele that produced two new mutant HD alleles we would have observed drastically reduced numbers of sperm with the normal allele, which was not the case. Recombination between the HD-containing sister chromatids is a possibility, but would have to occur in almost every meiosis and result only in increases in repeat number on both sister chromatids to explain our data. No mechanism that could accomplish this in a single step has been identified. The above considerations also apply to previously published transmission data on fragile X syndrome and myotonic dystrophy. Mothers with 90-129 repeat premutation alleles passed on the full fragile X mutation (>200 repeats) to their offspring in 98% of the transmissions (6). Similarly, parents carrying DM alleles in the size range 50-300 repeats transmit expansion mutations 87% of the time (63,64). Additional support exists for a germline mitotic model. First,in E.coli (25,27) and yeast (24,28,29,65-67) long inserted trinucleotide repeat sequences exhibit instability in the absence of meiosis. Second, in many of the trinucleotide repeat diseases, some somatic instability is detected (1-4). In HD and a number of other diseases, somatic variation is often less than in the germline. The basis for this difference among the diseases is unknown but may be due to properties unique to the different loci, perhaps relating to the position of replication origins used in germline versus somatic tissues. Finally, analysis of human somatic cells from patients with large myotonic dystrophy disease alleles demonstrated a gradual increase in allele size as a function of an increasing number of generations in culture and supports the mitotic division model (68). Although it can be argued that HD mutations arise during mitotic DNA replication, the fact that our data fit a simple model based on a defect in Okazaki fragment processing does not prove that this is the molecular mechanism. In addition to the integration of the unprocessed flap into the nascent strand that is incorporated in our model, double-strand break formation at the site of the flap might also lead to mutation through recombination or end joining (23,35). Mitotic recombination over many generations of germline cell divisions would also be compatible with the high mutation frequency and large changes in repeat number seen in sperm. However, studies on trinucleotide repeats in E.coli (25,27) and yeast (28,29) as well as on microsatellite repeats in yeast (26,69) do not support an important role for recombination, while the recent data on trinucleotide repeats in yeast strains carrying RAD27 mutants are consistent with some form of the Okazaki fragment processing model. As more is understood about the molecular mechanisms of instability we can refine the Okazaki fragment model to look for better fits to the data and a simpler pattern in the parameter values. Our analysis suggests that there may be a paternal age effect on the HD mutation spectrum. Using the model which incorporates spermatogonial stem cell division, we can predict the effect of age on the mutation spectra of individual sperm donors. For example we determined the expected mutation spectrum of donor P (44 repeats) at the age his son (donor A, 62 repeats) was conceived. The frequency of alleles with at least 62 repeats in the fathers sperm would be expected to have been >27% (data not shown). Figure Figure 4. The expected distribution of allele sizes in sperm from donor F at ages 16, 23, 43 and 63 as predicted according to the model. To prove an effect of age independent of all other genetic variables experimentally, sperm samples donated throughout the lifetime of an individual would be ideal (72). Two samples were taken from the same donor at ages 63 and 65 (Z and Y). No significant effect of age was detected in any of the variables (Table 1 and Fig. Contributions to individual variation in mutation spectra may come from individual differences in genetic background. At the HD locus this variation has been proposed to result from cis factors (72). We also detect individual variation but can disregard the role of sequences tightly linked to the HD disease allele in our study group. All the HD chromosomes we examined derive from the same founder chromosome about eight generations (~200 years) ago (18). Recent analysis of instability in MJD patients reported that a single nucleotide polymorphism on the normal MJD allele influences the instability of the disease-causing chromosome (5). As noted earlier we did not detect any influence on HD instability depending on whether the donors were homozygous for the (CCG)7 allele or were (CCG)7/(CCG)10 heterozygotes. We conclude that there is no detectable interallelic effect on HD instability in our data due to this polymorphism. We note that HD P and his son HD A each have exceptionally high mean changes in allele size when compared with individuals matched for age and repeat number (Table 1). This is suggestive of a familial influence. Familial influences on repeat instability at the FMR1 locus have been reported, but whether the effect is due to a linked or unlinked gene(s) is not known (6,8). Candidates for genes with alleles that might influence repeat instability include those involved in replication or DNA repair, including, but not limited to, classical mismatch repair. Indeed, human polymorphisms have recently been detected in a significant number of DNA repair genes (73). Factors involved in instability may be identified using genetic approaches adapted for the detection of quantitative trait loci. Ideally, a simple assay that does not depend on single genome assays yet provides quantitative information on instability is desirable. If, in a small number of cases, the discrete data generated by single genome analysis can be compared with the data obtained from the simpler total sperm DNA PCR assay (13), the relationship between the two different read-outs of instability could be determined. This would provide the experimental basis for a large-scale genetic analysis of instability. Single sperm isolation and preparation, PCR and product analysis were performed as described (19), unless otherwise noted. First round PCR was designed so as to amplify five independent loci simultaneously. In addition to the HD and D4S127 loci, primers for the DM, SCA-1 and SBMA loci were included in the first round PCR reaction. All primers were at a final concentration of 0.5 µM, with the exception of the D4S127 locus, which were at 0.05 µM each. Primer IT2-B (5[prime]-TCACGGTCGGTGCAGCGGCTCCT) was used in place of IT2 at the HD locus. In the cases where external genomic contamination was suspected, second round PCRs were performed at the non-HD linked loci. For second round PCR, cycle number varied at the HD locus from 21 to 30 and 35, depending on the donor. Twenty three cycles were carried out at the D4S127 locus. We have shown previously that the well-known PCR stutter that occurs during the amplification of microsatellite repeats does not significantly affect the estimates of individual allele sizes (19). Mutations in HD allele size are not due to CGG expansions in the CGG repeat region adjacent to the CAG repeat tract (19). An excellent agreement between sperm typing data (on single individuals with HD and SBMA disease alleles) and the available data on paternal transmissions studied in families has been shown previously (19,74,75). The hypothesis that parental origin of the HD allele has no effect on stability can be examined using permutation tests. In the data, seven individuals inherited the HD allele from their father. The average somatic allele size was 48.1 repeats, their average mutation frequency was 90% and the average mutation expansion size in the sperm was 11.23 repeats (Table 1). We compared these averages with those from 10 000 random selections of seven individuals from the 25 individuals we sampled. The random selections result in an empirical distribution of the statistics under the null hypothesis that every selection is equally likely. To examine the effect of parental origin on stability we compared the observed values of the statistics for the allocation actually seen in the data with the values seen in the random selections. For example, 18% of simulated average somatic allele sizes exceeded the observed value of 48.1 repeats, suggesting that somatic allele size is not influenced by paternal inheritance. We compared the mutation frequency and mean change in repeat number in the same way. No significant departures were detected (although only 5.4% of the simulated selections had an average mean change in repeat number as large as that observed in the data). We conclude that we cannot detect a major effect of parental imprinting on instability. A similar approach can be applied to assess interallelic effects on stability. Once more, somatic allele size does not appear to be influenced by the CCG polymorphism. A similar conclusion applies to the average of the mean change in repeat length in the sperm (for example, 7.3% of the 10 000 simulated samples had a higher average mean change in repeat length than observed in the data). The sperm that were sampled in our experiments are the end products of a number of spermatogonial mitotic stem cell divisions (together with a small number of divisions during the spermato-genesis cycle). It is estimated that there are 34 mitotic divisions prior to the onset of puberty, during which time the spermatogonial stem cell population has undergone very rapid expansion, resulting in some 109 stem cells at puberty. Further, it is estimated that there are 23 stem cell divisions/year after puberty (here assumed to be at age 13) (57). We assume that at each division one daughter cell initiates the spermatogenesis cycle, the other remaining a stem cell. Because of the rapid proliferation of stem cells during the growth phase and the subsequent nature of replication of these cells, a random sample of sperm is likely to have effectively independent mutational histories (because their most recent common ancestor cell is likely to be close to the time of the first differentiated stem cells). It is therefore reasonable to treat sperm from a given donor as independent of each other. The number n of stem cell divisions through which a sperm from a donor of age a has gone may be calculated from the formula
Interallelic effects on instability
Models of trinucleotide repeat mutation
Using the mutation spectra to estimate the allele-specific HD mutation rate
The Okazaki fragment processing model of trinucleotide repeat mutations
Modeling HD mutations
Fitting the model to the data
DISCUSSION
Dynamic mutations are likely to occur during germline mitotic divisions
HD dynamic mutations may result from a defect in Okazaki fragment processing
The role of paternal age in determining the mutation spectra
Effect of genetic background on instability
MATERIALS AND METHODS
Single sperm isolation and preparation and PCR
Reliability of the sperm typing data
Statistical analysis
Probabilistic model
n = 34 + 23(a - 13)
1
We assume the same value of n for each sperm from a given donor.
We assume that expansions and contractions in repeat numbers arise primarily during the n mitotic divisions before meiosis. We postulate two mutation mechanisms at work, which are asymmetric with respect to the two strands of DNA. One mechanism (Type I) adds or deletes a single triplet during copying of the leading strand. The second (Type II) is responsible for larger additions in repeats and occurs only during lagging strand synthesis, as described in the earlier section on Okazaki fragment formation. To model the effects of mutation through the cell line leading to a given sperm, we must first follow a randomly chosen cell from its initiation as a stem cell though to puberty and from there along the stem cell backbone to the mature sperm. We then model the length of the repeat tract on a randomly chosen strand through this cell lineage.
Because we model an asymmetric mutation process we must keep track of which strands being synthesized are leading or lagging strands. We label strands (i,x); i = 0 if they will be the template for lagging strand synthesis and i = 1 if they will be the template for leading strand synthesis. x gives the number of repeats on the strand. The probability that a mutation will result in y repeats when a (0,x) strand is copied is r0(x,y). For a (1,x) strand, the corresponding probability is r1(x,y).
The succession of states (I0,X0), ..., (In,Xn) forms a Markov chain, starting from (I0,X0) = (0,L) with probability 1/2 or (1,L) with probability 1/2. Let Pn[(i,x) (j,y)] denote the n step transition probabilities of this chain; these give the probability that a molecule starting with type (i,x) produces a molecule of type (j,y) after n replications. The probability q(v) that the strand at the nth generation is of length v is
| q(v) =1/2{Pn[(0,L) (0,v)] + Pn[(0,L) (1,v)]} + 1/2{Pn[(1,L) (0,v)] + Pn[(1,L) (1,v)]} | 2 |
for v = 1, 2, 3, ....
A model for expansions and contractions
It remains to determine the transition probabilities r0(x,y) and r1(x,y). We begin with the simpler case of leading strand synthesis. We suppose that triplets are copied independently of one another. Replication of a triplet on a leading strand results in a slippage mutation with probability pS. This mutation results in gain or loss of a single triplet (a Type I mutation), each with probability 1/2. The triplet is copied without error with probability 1 - pS. The probability distribution {r0(x,y), y = 0, 1, 2, ...} can be found by convoluting x times the distribution of the number of repeats added or subtracted when copying a single triplet.
Next we describe how r1(x,y) may be computed. Okazaki fragments have a length distribution that has been determined empirically (56). Using this distribution and the assumption that the origin of replication is a long way from the triplet repeat region, we first approximate the distribution of the length z of the overlap between an Okazaki fragment and the start of the triplet repeat region. If z > x, the Okazaki fragment does not initiate in the repeat region and the length of the repeat region remains x. On the other hand, if x [ge] z an Okazaki fragment ends in the repeat region and there is the possibility of flap formation by the next fragment. This fragment can displace an amount D, which must be between 1 and z triplets in length, of the Okazaki fragment that ends in the repeat region. We model the distribution of the displacement length D as a geometric random variable with probability pD: the chance that D = k is proportional to pDk, the constant of proportionality being determined by the fact that D must be between 1 and z. The two limiting cases are pD = 0 (corresponding to a potential displacement of exactly one repeat) and pD = 1 (corresponding to a potential displacement being uniformly distributed between 1 and z repeats).
This choice of model is predicated on the assumption that displacements are more likely to be short than long. Such a displacement is incorporated into the nascent strand with probability pL, resulting in a repeat region of length y = x + k. With probability 1 - pL, the displacement is not incorporated and the repeat region remains of length x. For repeat tracts of the size seen in these sperm, it is unlikely that more than one Okazaki fragment will end in a repeat region, so we assume that at most one will. The probabilities r1(x,y) may now be computed numerically. Finally, from the values of r0(x,y) and r1(x,y) we can construct numerically the one step transition matrix P and from that the n step matrixPn = Pn, the nth power of P.
Estimation of parameters
For each data set we know the age a of the donor and his somatic allele length L. We can therefore determine the number of stem cell divisions using equation 1. The probability distribution of allele sizes can be determined from equation 2 and the model in the previous section. We assume that allele sizes from a given donor are independent and identically distributed copies of the distribution q(v), v = 0, 1, ... in equation 2. In order to estimate the parameters pS, pD and pL, we use the method of maximum likelihood (76). For a given donor whose sample has ni copies of sperm with i repeats, the log likelihood l(pS, pD, pL) is (up to a constant independent of the parameters)
| l(pS, pD, pL) = [Sigma]i ni log q(i) |
and this expression is maximized numerically.
As noted in the text, there is in general good agreement between the data and the fitted model. For somatic alleles of at least 50 repeats, the fitted models show a symmetrical, bell-shaped estimated mutation spectrum, in stark contrast to the smaller allele sizes. The fits are adequate for all but donor P, who produced a very broad range of sperm allele sizes that is not captured well in our fitted model. To assess these fits further, we used conventional goodness-of-fit tests and we also simulated sperm samples for the larger somatic allele sizes using the estimated parameter values and compared the simulated mutation spectra with those obtained in the data. There was no systematic lack-of-fit revealed by the simulations.
There are clearly some sperm that appear to have anomalous size compared with the mass of the data. For example, donor Y has a sperm of length 118. In order to assess the sensitivity of the parameter estimates on such an outlying data point, we refitted the model with that measurement removed. This resulted in new estimates of pS = 2.5 × 10-4, pD = 0.50 and pL = 3.7 × 10-3. The parameter pD changed most in absolute terms, from an initial estimate of pD = 0.76. The other parameter estimates were less influenced by the potential outlier.
ACKNOWLEDGEMENTS
This work was supported by National Institutes of Health grants R37 GM37645 (N.A.), NS16367 (J.F.G.), NS22031 (N.W.), NS32765 (M.M.) and NS16367 (M.M.) and by grant BIR 95-04393 (S.T. and P.M.) from the National Science Foundation. E.P.L. was partially supported by a grant from the HDF.
REFERENCES
This article has been cited by other articles:
This page is run by Oxford University Press, Great Clarendon Street, Oxford OX2 6DP, as part of the OUP Journals
Comments and feedback: www-admin{at}oup.co.uk
Last modification: 4 Feb 1999
Copyright©Oxford University Press, 1999.
![]()
CiteULike
Connotea
Del.icio.us What's this?
![]()
![]()

![]()
![]()
![]()
V C Wheeler, F Persichetti, S M McNeil, J S Mysore, S S Mysore, M E MacDonald, R H Myers, J F Gusella, N S Wexler, and The US Venezuela Collaborative Research Group
Factors associated with HD CAG repeat instability in Huntington disease
J. Med. Genet.,
November 1, 2007;
44(11):
695 - 701.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
D. L. Daee, T. Mertz, and R. S. Lahue
Postreplication Repair Inhibits CAG {middle dot} CTG Repeat Expansions in Saccharomyces cerevisiae
Mol. Cell. Biol.,
January 1, 2007;
27(1):
102 - 110.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
R. Pelletier, B. T. Farrell, J. J. Miret, and R. S. Lahue
Mechanistic features of CAG*CTG repeat contractions in cultured cells revealed by a novel genetic assay
Nucleic Acids Res.,
September 30, 2005;
33(17):
5667 - 5676.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
I.V. Kovtun, A.R. Thornhill, and C.T. McMurray
Somatic deletion events occur during early embryonic development and modify the extent of CAG expansion in subsequent generations
Hum. Mol. Genet.,
December 15, 2004;
13(24):
3057 - 3068.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
V. Gorbunova, A. Seluanov, D. Mittelman, and J. H. Wilson
Genome-wide demethylation destabilizes CTG{middle dot}CAG trinucleotide repeats in mammalian cells
Hum. Mol. Genet.,
December 1, 2004;
13(23):
2979 - 2989.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
P. Astolfi, A. De Pasquale, and L.A. Zonta
Late paternity and stillbirth risk
Hum. Reprod.,
November 1, 2004;
19(11):
2497 - 2501.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
S. Bhattacharyya and R. S. Lahue
Saccharomyces cerevisiae Srs2 DNA Helicase Selectively Blocks Expansions of Trinucleotide Repeats
Mol. Cell. Biol.,
September 1, 2004;
24(17):
7324 - 7330.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
The U.S.-Venezuela Collaborative Research Project, N. S. Wexler, J. Lorimer, J. Porter, F. Gomez, C. Moskowitz, E. Shackell, K. Marder, G. Penchaszadeh, S. A. Roberts, et al.
Venezuelan kindreds reveal that genetic and environmental factors modulate Huntington's disease age of onset
PNAS,
March 9, 2004;
101(10):
3498 - 3503.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
I. V. Kovtun, G. Welch, H. D. Guthrie, K. L. Hafner, and C. T. McMurray
CAG Repeat Lengths in X- and Y-bearing Sperm Indicate That Gender Bias during Transmission of Huntington's Disease Gene Is Determined in the Embryo
J. Biol. Chem.,
March 5, 2004;
279(10):
9389 - 9391.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
L. Martorell, J. Gamez, M. L. Cayuela, F. K. Gould, J. P. McAbney, T. Ashizawa, D. G. Monckton, and M. Baiget
Germline mutational dynamics in myotonic dystrophy type 1 males: Allele length and age effects
Neurology,
January 27, 2004;
62(2):
269 - 274.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
C. Savouret, C. Garcia-Cordier, J. Megret, H. te Riele, C. Junien, and G. Gourdon
MSH2-Dependent Germinal CTG Repeat Expansions Are Produced Continuously in Spermatogonia from DM1 Transgenic Mice
Mol. Cell. Biol.,
January 15, 2004;
24(2):
629 - 637.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
L. Kennedy, E. Evans, C.-M. Chen, L. Craven, P. J. Detloff, M. Ennis, and P. F. Shelbourne
Dramatic tissue-specific mutation length increases are an early molecular event in Huntington disease pathogenesis
Hum. Mol. Genet.,
December 15, 2003;
12(24):
3359 - 3367.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
Y. Lai and F. Sun
The Relationship Between Microsatellite Slippage Mutation Rate and the Number of Repeat Units
Mol. Biol. Evol.,
December 1, 2003;
20(12):
2123 - 2131.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
S.-R. Yoon, L. Dubeau, M. de Young, N. S. Wexler, and N. Arnheim
Huntington disease expansion mutations in humans can occur before meiosis is completed
PNAS,
July 22, 2003;
100(15):
8834 - 8838.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
T. Nenguke, M. I. Aladjem, J. F. Gusella, N. S. Wexler, The Venezuela HD Project, and N. Arnheim
Candidate DNA replication initiation regions at human trinucleotide repeat disease loci
Hum. Mol. Genet.,
May 1, 2003;
12(9):
1021 - 1028.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
J. L. Meservy, R. G. Sargent, R. R. Iyer, F. Chan, G. J. McKenzie, R. D. Wells, and J. H. Wilson
Long CTG Tracts from the Myotonic Dystrophy Gene Induce Deletions and Rearrangements during Recombination at the APRT Locus in CHO Cells
Mol. Cell. Biol.,
May 1, 2003;
23(9):
3152 - 3162.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
M.C. Gonzalez-Gonzalez, M.J. Trujillo, M. Rodriguez de alba, and C. Ramos
Early Huntington disease prenatal diagnosis by maternal semiquantitative fluorescent-PCR
Neurology,
April 8, 2003;
60(7):
1214 - 1215.
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
S. Bhattacharyya, M. L. Rolfsmeier, M. J. Dixon, K. Wagoner, and R. S. Lahue
Identification of RTG2 as a Modifier Gene for CTG{middle dot}CAG Repeat Instability in Saccharomyces cerevisiae
Genetics,
October 1, 2002;
162(2):
579 - 589.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
Y. Zhang, D. G. Monckton, M. J. Siciliano, T. H. Connor, and M. L. Meistrich
Age and insertion site dependence of repeat number instability of a human DM1 transgene in individual mouse sperm
Hum. Mol. Genet.,
April 1, 2002;
11(7):
791 - 798.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
W. J. A. A. van den Broek, M. R. Nelen, D. G. Wansink, M. M. Coerwinkel, H. te Riele, P. J. T. A. Groenen, and B. Wieringa
Somatic expansion behaviour of the (CTG)n repeat in myotonic dystrophy knock-in mice is differentially affected by Msh3 and Msh6 mismatch-repair proteins
Hum. Mol. Genet.,
January 1, 2002;
11(2):
191 - 198.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
S. M. Young Jr. and R. J. Samulski
Adeno-associated virus (AAV) site-specific recombination does not require a Rep-dependent origin of replication within the AAV terminal repeat
PNAS,
November 9, 2001;
(2001)
241508998.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
M. L. Rolfsmeier, M. J. Dixon, L. Pessoa-Brandão, R. Pelletier, J. J. Miret, and R. S. Lahue
Cis-Elements Governing Trinucleotide Repeat Instability in Saccharomyces cerevisiae
Genetics,
April 1, 2001;
157(4):
1569 - 1579.
[Abstract]
[Full Text]
![]()
![]()
![]()

![]()
![]()
![]()
M. J. Sobrido and D. H. Geschwind
Molecular Genetics and Inherited Ataxias: Redefining Phenotypes and Pathogenesis
Neuroscientist,
December 1, 2000;
6(6):
465 - 474.
[Abstract]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
I. V. Kovtun, T. M. Therneau, and C. T. McMurray
Gender of the embryo contributes to CAG instability in transgenic mice containing a Huntington's disease gene
Hum. Mol. Genet.,
November 1, 2000;
9(18):
2767 - 2775.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
D. C. Crawford, B. Wilson, and S. L. Sherman
Factors involved in the initial mutation of the fragile X CGG repeat as determined by sperm small pool PCR
Hum. Mol. Genet.,
November 1, 2000;
9(19):
2909 - 2918.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
L. Kennedy and P. F. Shelbourne
Dramatic mutation instability in HD mouse striatum: does polyglutamine load contribute to cell-specific vulnerability in Huntington's disease?
Hum. Mol. Genet.,
October 1, 2000;
9(17):
2539 - 2544.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
R. P. Grewal, G. Cancel, E. P. Leeflang, A. Durr, M. S. McPeek, D. Draghinas, X. Yao, G. Stevanin, M.-O. Alnot, A. Brice, et al.
French Machado-Joseph disease patients do not exhibit gametic segregation distortion: a sperm typing analysis
Hum. Mol. Genet.,
September 1, 1999;
8(9):
1779 - 1784.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
C. Jankowski, F. Nasar, and D. K. Nag
Meiotic instability of CAG repeat tracts occurs by double-strand break repair in yeast
PNAS,
February 29, 2000;
97(5):
2134 - 2139.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
S. M. Young Jr. and R. J. Samulski
Adeno-associated virus (AAV) site-specific recombination does not require a Rep-dependent origin of replication within the AAV terminal repeat
PNAS,
November 20, 2001;
98(24):
13525 - 13530.
[Abstract]
[Full Text]
[PDF]
![]()
This Article ![]()
![]()
Abstract
![]()
FREE Full Text (PDF)
![]()
Alert me when this article is cited
![]()
Alert me if a correction is posted
![]()
Services ![]()
![]()
Email this article to a friend
![]()
Similar articles in this journal
![]()
Similar articles in ISI Web of Science
![]()
Similar articles in PubMed
![]()
Alert me to new issues of the journal
![]()
Add to My Personal Archive
![]()
Download to citation manager
![]()
Search for citing articles in:
ISI Web of Science (59)
![]()
Request Permissions ![]()
Google Scholar ![]()
![]()
Articles by Leeflang, E. P.
![]()
Articles by Arnheim, N.
![]()
Search for Related Content
![]()
PubMed ![]()
![]()
PubMed Citation
![]()
Articles by Leeflang, E. P.
![]()
Articles by Arnheim, N.
![]()
Social Bookmarking ![]()
![]()
What's this?




