Survey of maximum CTG/CAG repeat lengths in humans and non-human primates: total genome scan in populations using the Repeat Expansion Detection method
Survey of maximum CTG/CAG repeat lengths in humans and non-human primates: total genome scan in populations using the Repeat Expansion Detection methodGiorgio Sirugo1, Amos S. Deinard1,2, Judith R. Kidd1 and Kenneth K. Kidd1,*
Yale University, 1School of Medicine, Department of Genetics, and 2Department of Anthropology, New Haven, 06510, Connecticut, USA
Received September 17, 1996;Revised and Accepted December 31, 1996
Repeat Expansion Detection (RED) is an efficient and simple method for detecting repeat expansions in the human genome, including expansion mutations resulting in disease. Here we report the first population survey of CTG/CAG repeat lengths in humans using the RED method; we have determined maximum CTG/CAG repeat length in 244 individuals from six human populations: Danes, Chinese, Japanese, Rondonian Surui, Maya and Mbuti/Biaka Pygmies. We have also sampled a number of non-human primates including eight orang-utans (Pongo pygmaeus), seven gorillas (Gorilla gorilla), seven pygmy chimpanzees (Pan paniscus), 13 common chimpanzees (Pan troglodytes) and three Hylobatidae (one Hylobates lar, one H.klossii, and one H.syndactylus). Our results demonstrate the existence of significant variation in the sizes and frequencies of the longest CTG/CAG repeat length seen per individual both within and between human populations. The population differences argue that overall mutation rates at CTG/CAG repeat loci are sufficiently low that mutation does not obliterate the effect of random genetic drift and clearly indicate that population stratification could occur in disease association studies using the RED method. No significant differences were detected among the non-human primates sampled. Our results also show that both common chimpanzees and pygmy chimpanzees (bonobos) are polymorphic for maximum length of any CTG/CAG repeats while no variation was found for gorillas and orang-utans.
Expansions of unstable DNA repeats have been demonstrated to be involved in ten different inherited disorders including fragile X syndrome, myotonic dystrophy and Huntington disease (1 ,2 ). Most of these trinucleotide repeat expansion disorders share unusual genetic features: increasing penetrance and disease severity in successive generations (genetic anticipation) along with a parental (maternal or paternal) sex bias in the transmission of the severe form of the disease which correlates with the degree of meiotic instability and allelic expansion. The rather frequent occurrence of triplet repeats in mRNA indicates that more loci containing unstable DNA expansions could be discovered (1 ). Allowing for the complementary nature of DNA and frame permutations, ten possible triplet repeats can occur at the DNA level; thus far (CAG)n (identical to (CTG)n) repeats are involved in seven of the disorders already identified while the remaining disorders involve two with (CGG)n and one with (GAA)n repeats (2 ,3 ). At all of these loci polymorphic repeat lengths (means of ~20 repeats) correspond to the normal alleles while abnormal, disease-causing alleles are larger. When the repeats are located in coding regions of genes (six loci, all responsible for neurodegenerative syndromes), the disease alleles are in the 35-100 repeats range (2 ,3 ), while pathogenic expansions for the remaining disorders may be as `small' as 50 repeats (4 ) and can extend to hundreds or thousands of repeats.
Using 5 [mu]g of genomic DNA for RED gives highly reproducible results as illustrated for the three samples represented in Figure 1 A and B. The number of ligation products is also clearly a function of the genomic DNA sample used since the reactions for the samples in Figure 1 A were run using aliquots of the same reaction cocktail and simultaneously cycled. The separate set of RED reactions illustrated in lanes 4-10 in Figure 1 B were also identical, except for the genomic DNA.
Results of our survey are shown in Figures 2 and 3 . The distributions of the maximum number of ligated (CTG)17 oligos in each sample for different human populations and for non-human primates are shown as the number of (CTG)17 ligation events on the X axis, and the relative frequency (human populations, Fig. 2 ) or the absolute frequency (non-human primates, Fig. 3 ) on the Y axis. We found that the distributions of largest CTG/CAG repeat lengths differed significantly among human populations ([chi]2 = 107, p<0.001, 15 d.f.), while the differences detected among non-human primate species were not significant (2-tailed Fisher exact test, p = 0.455). Differences of frequency distributions of (CTG)n repeats in human populations were also tested by pairwise, non-parametric Kolmogorov-Smirnof analysis. Differences were highly significant (p< 0.005) for eight of the 15 pairwise comparisons analyzed, but the other seven were not significant: Chinese vs. Japanese (p = 0.141), Chinese vs. Maya (p = 0.141), Japanese vs. Surui (p = 0.239), Japanese vs. Maya (p = 0.25), Japanese vs. Danes (p = 0.021) and Surui vs. Maya (p = 0.89) pairwise tests. In general, with the exception of Chinese vs. Danes (not significant) and the Chinese vs. Surui (significant) comparisons, Asian vs. Asian, Asian vs. New World and New World vs. New World comparisons were the only non-significant ones. These reduced `geographical' differences in the distribution of maximum CTG/CAG repeat lengths reflect very well the known genetic similarities of populations (9 -11 ). Rough estimates of the relative `haploid' ligation-number frequencies in the populations analyzed were computed using the HAPLO program (12 ), assuming a linear dominance mode of inheritance (Table 1 ). Derived diversity (13 ) values were calculated as: 33% for Danes, 76% for Chinese, 55% for Japanese, 66% for Rondonian Surui, 71% for Maya and 54% for Mbuti/Biaka pygmies. These diversity values correspond to the probability of two gametes chosen at random having different maximum lengths as measured by number of (CTG)17 oligos ligating. This is analogous to heterozygosity at an autosomal locus.
Our total genome scan for maximum CTG/CAG maximum repeat lengths demonstrates that RED with a (CTG)17 is a reliable and highly reproducible method. While additional experiments would be required before any broad generalization can be strongly supported, we do find that even the relative intensities of ligation products within a sample show surprising reproducibility. Comparison of the results of three separate RED analyses done several months apart (Fig. 1 ) shows this reproducibility. We can only speculate, but intensity may reflect the number of copies per genome of sequences of the various longer lengths and the exact lengths of the segments. For example, a locus with a (CTG)115 should more frequently (in a higher fraction of cycles) allow the final ligations of six (CTG)17 than would a locus with a (CTG)102.
. Estimated `gametic' relative frequencies of maximum ligated (CTG)17 for the populations analysed
Danes
Chinese
Japanese
R. Surui
Maya
Pygmies
(d = 33%)
(d = 76%)
(d = 55%)
(d = 66%)
(d = 71%)
(d = 54%)
Maximum
CTG17 size
x = 2
0.81
0.32
0.62
0.35
0.32
0.61
x = 3
0.05
0.29
0
0.26
0.3
0.27
x = 4
0.07
0.15
0.25
0.38
0.29
0.1
x = 5
0.012
0.16
0.1
0
0.074
0
x = 6
0.012
0.075
0
0
0
0
x = 7
0.012
0
0
0
0
0
x = 8 or more
0.011
0
0.019
0
0
0
Based on a model of linear dominance these frequencies are estimations of the distributions of the maximum length in the gamete pools of the populations. Diversity values (d) for each population were calculated according to the formula:d = 1 - {sum from {i = 2} to n} p ( i r {) sup 2}where p(ir) is the estimated gametic frequency of i ligated oligos and n is the maximum number seen in a population.
A striking result of this study is the large proportion of individuals with moderately long CTG/CAG repeats and the evidence that profound differences exist in the distributions of longest (CTG)n repeat lengths in different human populations. The previous studies of Europeans had indicated that (CTG)n greater than 84 repeats were rare (5 -8 ) and even lengths between 51 and 84 repeats occurred in only a few individuals. RED results in the range of >67 repeats have often been referred to as `expansions' (8 ,14 ). In contrast we find lengths up to 85-92 CTG/CAG repeats to be common in east Asian populations. While this size range corresponds to unstable disease-causing alleles at all `CTG' or `CAG' disease loci described so far (1 -4 ), these lengths at the loci being detected in this study represent normal variation and would appear to be relatively stable. Thus, we cannot refer to these as `expansions' with any sense of abnormality or dynamic change in length.
The most likely explanation for these observations is random genetic drift of alleles at (CTG)n-containing loci. Additionally, we note that the mutation rate at such loci cannot be very great; a high mutation rate would most likely homogenize allele frequencies within a short evolutionary time, erasing any evidence of random genetic drift causing frequency differences among populations. In this light, gross instability events observed within populations must be considered a rather rare phenomenon, even in populations where repeat lengths of 78 CTG/CAG triplets or greater are frequent (Chinese, Japanese, Maya; Fig. 2 ). The (CTG)n repeat sizes that we have observed within the Danish population correlate very well with the RED survey of (CAG)17 maximum repeat lengths within northern Europeans reported by others (5 -8 ). The very similar frequency pattern of maximum CTG/CAG repeat lengths observed by independent RED surveys in different northern European population samples minimizes the possibility that the CTG/CAG repeat lengths we detected may have arisen as new alleles from somatic mutations (as either contractions or expansions of CTG/CAG repeats) in the lymphoblastoid cell lines. While mutations at microsatellite loci in EBV transformed cell lines have been shown to occur, they are seen only after multiple passages in culture (all cell lines used in our RED survey went through only a single passage and are overwhelmingly polyclonal) and, in the great majority of cases, involve only small length changes which would remain largely undetected by RED (15 ,16 ). This lack of fine tuning is due to an inherent methodological limit in the RED technique, where differences of maximum repeat lengths between samples can only be detected as multiples of the length of the oligonucleotide used in the reactions.
Allthe human DNAs used in this study were purified from cultured cells using standard proteinase, phenol-chloroform extraction and alcohol precipitation (18 ). In total 244 individuals were tested. The Japanese (N = 26) and Pygmy (N = 50) samples were collected and described by Cavalli-Sforza et al. (1986) (19 ). The Japanese sampled are from the San Francisco area or students/postdocs from Japan and the Pygmies are from the Central African Republic (Biaka) and Zaire (Mbuti). The Chinese sample (N = 48) was collected in Taiwan by R.B. Lu (20 ). The Danish sample (N = 42) was collected by J. Parnas and consists of unrelated individuals from the Copenhagen area. Rondonian Surui (N = 40) were collected by F.L. Black in the Rondonia province, western Amazon basin. The Maya (N = 38) were collected in the Yucatan peninsula by K. Weiss. Both the R. Surui and Maya were described by Kidd et al. (1991) (21 ).
All DNA samples of non-human primates were extracted from lymphoblastoid cell lines established in our laboratory unless otherwise noted. The same protocol for DNA extraction was used as for the human DNA samples (18 ).The provenances of non-human primates sampled for this project were as follows:
(i) Orang-utans (Pongo pygmaeus): Sibu and Tupa were obtained from the Yerkes Regional Primate Center, Atlanta, GA; Ben was from the Henry Doorly Zoo, Omaha, NE; cell lines from Jari, Puti and CP81 were obtained from David Lawlor of Stanford University; Sunda was from the Sacramento Zoo, Sacramento, CA; (ii) Gorillas (Gorilla gorilla): Oko, Cal, and Ozoum were from the Yerkes Regional Primate Center; Abe and Murphy were from the Henry Doorly Zoo; cell lines from Rok and Machi were obtained from David Lawlor; (iii) Common chimpanzees (Pan troglodytes): Harriet was from the Arizona Primate Foundation, Tempe, AZ; Herman was from the Lowery Zoo, Tampa, FL; Hannibal, Juno, Bullet, and Lottie were from Alfred Prince, New York Blood Center, New York, NY; Bakoumba, Cheetah, Mabolite, and Julie were from the Centre International de Recherches Medicales de Franceville; A333, A336, and A208 were from the New Iberia Research Center, New Iberia, LA; (iv) Pygmy chimpanzees (Pan paniscus): Kidongo, Matata, Linda and Bosondjo were from the Yerkes Regional Primate Center; Lody and Maringa were from the Milwaukee County Zoo, Milwaukee, WI; (v) Hylobatidae (Hylobates lar, Hylobates klossii and Hylobates syndactylus):lymphoblastoid cell lines from a single Hylobates lar, a single Hylobates klossii and a single Hylobates syndactylus were obtained from David Ward of Yale University.
In the RED method Ampligase, thermostable DNA ligase (Epicentre Technologies, Madison, WI) is used in a cycling procedure that generates multimers of the oligonucleotide(s) utilized. Following cycling, the reaction products are electrophoresed, blotted and detected by hybridization with a radiolabeled oligonucleotide probe complementary to the oligonucleotide used in the ligation reactions. We have modified the original RED protocol described by Schalling et al. (5 ,22 ). We used ~5 [mu]g of genomic DNA, a (CTG)17 oligo and 5 U Ampligase in all reactions (20 [lambda]). We observed a linear increase of the signal strength when five [mu]g of genomic DNA (2.5 attomoles/[lambda]) were used, instead of 1 [mu]g (0.5 attomoles/[lambda]) as originally described (5 ). By using a (CAG)141 cloned in Bluescript as template, Schalling et al. (5 ) have previously shown that increasing amounts of template DNA, ranging from 0.5 to 5 attomoles of repeat / [lambda](1 to 10 pg of plasmid DNA) in the ligation reaction do not have the effect of producing artefactual multimers but improve the strength of the hybridization signal. Faint artefactual bands were only observed when the plasmid DNA template was increased to 50 attomoles/[lambda] per reaction (100 pg) (5 ).The size of the largest ligation product is highly repeatable (Fig. 1 A and B) and varies among individuals. After 400 cycles (95oC, 10 s; 70oC 30 s), the reaction products were fractionated by electrophoresis on a 6% acrylamide denaturing gel and transferred onto a nylon membrane. Multimers were then detected by hybridization with a radio-labeled oligonucleotide (CAG)10 probe.
Gametic frequency estimates were made using the HAPLO program (12 ). Program and documentationareavailable via anonymous FTP from paella.med.yale.edu, in the directory pub/haplo.
G.S. dedicates this paper to the memory of Elisabetta Sirugo.We thank Drs Francesc Calafell and Wayne Fenton and Matthew Hawley for helpful discussions and advice. This work was supported by research grants MH-30929, MH-39239, MH-50390 (to K.K.K.), NSF Grant SBR 9408934 (to J.R.K.) and by grants of the Leakey Foundation and Wenner-Gren Foundation for Anthropological Research (to A.S.D.).
4 Harris S., Moncrieff C. and Johnson K. (1996) Hum. Mol. Genet., 5, 1417-1423.
5 Schalling M., Hudson T.J., Buetow K.H. and Housman D.E. (1993) Nature Genet.,4, 135-139.
6 O'Donovan M.C., Guy C., Craddock N., Murphy K.C., Cardno A.G., Jones L.A., Owen M.J. and McGuffin P. (1995) Nature Genet., 10, 380-381.MEDLINE Abstract
7 Lindblad K., Nylander P.O., De Bruyn A., Sourey D., Zander C., Engstrom C., Holmgren G., Hudson T., Chotui J., Mendlewicz J., Broeckhoven C. and Schalling M. (1995) Neurobiol. Disease, 2, 55-62.
8 Morris A.G., Gaitonde E, Mc Kenna P.J., Mollon J.D. and Hunt D.M. (1995) Hum. Mol. Genet., 4, 1957-1961.
9 Cavalli-Sforza L.L., Menozzi P., Piazza A. (1995) The history and geography of human genes. Princeton University press.
10 Kidd K.K. and Kidd J.R. (1996) In Boyce, A.J. and Mascie-Taylor C.G.N. (Eds), Molecular biology and human diversity, Cambridge University Press.
11 Tishkoff S.A., Dietzsch E., Speed W.C., Pakstis A.J., Cheung K., Kidd J.R., Bonne-Tamir B., Santachiara-Benerecetti A.S., Moral P., Watson E., Krings M., Paabo S., Risch N., Jenkins T. and Kidd K.K. (1996) Science, 271, 1380-1387.
12 Hawley M.E. and Kidd K.K. (1995) J. Hered., 86, 409-411
14 Bleyl S., Nelson L., Odelberg S. J., Ruttenberg H.D., Otterud B., Leppert M. and Ward K. (1995) Am. J. Hum. Genet., 56, 408-415.
15 Banchs I., Bosch A., Guimera J., Lazaro C., Puig A. and Estivill X. (1994) Hum. Mutat., 3, 365-372.
16 Ashizawa T., Monckton D.G., Vaishnav A., Patel B.J., Voskova A. and Caskey C.T. (1996) Genomics, 36, 47-53.
17 Rubinsztein D.C., Amos W., Leggo J., Goodburn S., Jain S., Li S-H., Margolis R.L., Ross C.A. and Ferguson-Smith M.A. (1995) Nature Genet., 10, 337-343.
18 Sambrook J., Fritsch E.F., Maniatis T. (1989) Molecular Cloning: a laboratory manual, 2nd edn., Cold Spring Harbor Laboratory, Cold Spring Harbor, NY.
19 Cavalli-Sforza L.L., Kidd J.R., Kidd K.K., Bucci C., Bowcock A.M., Hewlett B.S. and Freidlaender J.S. (1986) Cold Spring Harbor Symposia on Quantitative Biology, 51, 411-417.
20 Lu R-B., Ko H-C., Chang F-M., Castiglione CM., Schoolfield G., Pakstis A.J., Kidd J.R. and Kidd K.K. (1996) Biol. Psychiatry,39, 419-429.
21 Kidd J.R., Black F.L., Weiss K.M., Balazs I. and Kidd K.K. (1991) Human Biology, 63, 775-794.
22 Sirugo G. and Kidd K.K. (1995) Epicentre Forum, 2, 1-3.
*To whom correspondence should be addressed
-->
This page is maintained by OUP admin. Last updated Fri Feb 7 12:40:48 GMT 1997. Part of the OUP Journals World Wide Web service.Copyright Oxford University Press, 1996