Human Molecular Genetics Advance Access originally published online on November 28, 2007
Human Molecular Genetics 2008 17(6):789-799; doi:10.1093/hmg/ddm350
Haemoglobin S and haemoglobin C: quick but costly versus slow but gratis genetic adaptations to Plasmodium falciparum malaria


1 Dipartimento di Scienze di Sanità Pubblica, University of Rome La Sapienza, Rome, Italy 2 Dipartimento di Biologia, University of Rome Tor Vergata, Rome, Italy 3 Centre National de Transfusion Sanguine, Ouagadougou, Burkina Faso 4 Centre de Recherche Biomoleculaire Pietro Annigoni (CERBA), Ouagadougou, Burkina Faso
* To whom correspondence should be addressed at: Facoltà di Scienze Matematiche, Fisiche e Naturali, Dipartimento di Biologia, Università di Roma Tor Vergata, Via della Ricerca Scientifica, 00133 Rome, Italy. Tel: +39 0672594341; +39 0672594330; Fax: +39 062023500; Email: modiano{at}uniroma2.it
Received September 20, 2007; Accepted November 27, 2007
| ABSTRACT |
|---|
|
|
|---|
Haemoglobin S (HbS; β6Glu
Val) and HbC (β6Glu
Lys) strongly protect against clinical Plasmodium falciparum malaria. HbS, which is lethal in homozygosity, has a multi-foci origin and a widespread geographic distribution in sub-Saharan Africa and Asia whereas HbC, which has no obvious CC segregational load, occurs only in a small area of central West-Africa. To address this apparent paradox, we adopted two partially independent haplotypic approaches in the Mossi population of Burkina Faso where both the local S (SBenin) and the C alleles are common (0.05 and 0.13). Here we show that: both C and SBenin are monophyletic; C has accumulated a 4-fold higher recombinational and DNA slippage haplotypic variability than the SBenin allele (P = 0.003) implying higher antiquity; for a long initial lag period, the C alleles did apparently remain very few. These results, consistent with epidemiological evidences, imply that the C allele has been accumulated mainly through a recessive rather than a semidominant mechanism of selection. This evidence explains the apparent paradox of the uni-epicentric geographic distribution of HbC, representing a slow but gratis genetic adaptation to malaria through a transient polymorphism, compared to the polycentric quick but costly adaptation through balanced polymorphism of HbS. | INTRODUCTION |
|---|
|
|
|---|
Haemoglobin S (HBB E6V) and Haemoglobin C (HBB E6K) provide considerable protection from severe Plasmodium falciparum malaria (1–3 for Hb S; 3–4 for Hb C) and from mild malaria attacks (1,3,5). The S allele has become polymorphic independently in different locations (6), it is common all over tropical and equatorial Africa, in Arabia and in India and its large diffusion is explained by the relationships between the fitness w of its three genotypes (wSS
0 << wAA < wAS) under a strong P. falciparum malaria selective pressure (7). This makes the A/S polymorphism the best example of balanced polymorphism of human biology, namely of a class of genetic adaptations intrinsically bad because at the equilibrium the frequency of the advantaged heterozygous genotype can be at most 50% and the segregational load may be very high (as in the case of the A/S polymorphism where one of the two homozygous genotypes is even lethal). The C allele, instead, occurs in a single and quite restricted area of central West Africa (unicentricity and epicentricity) and even in this area its frequency is not dramatically higher than that of the S allele (3). Since the CC homozygosity provides a full (or very nearly so) protection against P. falciparum malaria, two selective models can be figured out: a strong protection also of the AC heterozygotes (semidominant model) or a mild protection of the AC heterozygotes (recessive model). Both these models would expect at the long run the C allele fixation, hence a full protection for the whole population, but they dramatically differ for three fundamental aspects: (i) the probability that the C allele starts to be accumulated, instead of disappearing by pure chance; (ii) its exportability to neighbouring populations by demic diffusion and (iii) the time required to attain a common frequency and, eventually, fixation. With the semidominant model, the probability of accumulation and the exportability are high and the time required to become polymorphic short; on the contrary, with the recessive model, the probability of accumulation and the exportability are small and the time required to become polymorphic long. The explanation of the contradiction between the extremely large diffusion of the costly S allele, on one side, and the very restricted diffusion of the costless C allele on the other side, must be based on a correct choice between the two models. Three sources of information can be utilized to choose between the two models:
- direct, epidemiological data (Fig. 1). They strongly suggest that the AC protection is much lower than the CC protection, but, owing to the large confidence intervals, do not rule out a partially semidominant model (3);
- the C and S frequencies in the only African area where they coexist. The fact that the frequency of the C allele is not dramatically higher than that of the S allele favours the recessive model of selection for the C allele, but is far from proving it because this finding could have been the consequence of a delayed appearance of the C allele with respect to that of the S allele.
- the uni- epi-centricity mentioned above. They would be neatly explained by the recessive model, but almost incompatible with the semidominant one.
|
On these bases, the recessive model has been proposed as the most likely one (3), though further evidence on (a) and/or (b) issues was needed to consider it conclusively proved.
In the present investigation we focused on the (b) issue by studying, through a haplotypic approach, the evolutionary histories of the C and S alleles in a single population. The survey was performed in the Mossi of Burkina Faso, central West Africa, where the C and SBenin (the haplotype where the S mutation of Mossi is found) alleles are both common (0.128 ± 0.004 and 0.051 ± 0.003) (3), thus providing an ideal opportunity to study the evolutionary stories of these two malaria-protective alleles within the same epidemiologic context and genetic background. The ages of the two alleles have been estimated through the classical approach based on the linkage disequilibrium (LD) decay, namely on the extent of accumulation of initially absent C and SBenin haplotypes (hereafter designated new haplotypes) produced by recombination and/or by DNA slippage events. Moreover, we devised a novel semiquantitative approach to gather information on the time-course of the C allele accumulation.
| RESULTS |
|---|
|
|
|---|
Due to the existence of a Hot Spot of Recombination region (HSR), it is necessary to subdivide the data on the slowly evolving markers (sites which haplotypes with respect to the β6 codon can only change by recombination) into two classes: those lying on the same side of the 6th codon with respect to HSR (downstream or 3' markers) and those lying on the opposite side (upstream or 5' markers) (Fig. 2).
|
The haplotype variability for the 3' slowly evolving markers
Table 1 reports the frequencies of the haplotypes made up of the β6 codon and another marker (3' two-loci haplotypes), and Table 2 the frequencies of the 3' multi-loci haplotypes. Only one C and one SBenin haplotype (delAT, T, T, C, T) were found among the unambiguously characterized 50 C and 25 SBenin chromosomes. This confirms previous reports on Afro-Americans (11,12) and, combined with the present observation that this haplotype is not common among the A haplotypes (3/23 = 0.13 ± 0.07), prove that the C and SBenin alleles are both monophyletic in the Mossi.
|
|
The haplotype variability for the 5' slowly evolving markers
The data are presented at the two-loci (Table 3) and multiloci (Table 4) haplotype level. The region of ca. 40 Kb here studied shows a very low (substantially nil) recombination rate (13) so that it can be considered formally as a multiallelic site which alleles correspond to the haplotypes. Out of the 128 theoretically possible haplotypes (i.e. 27, where 7 is the number of SNPs studied), 10 have been found or inferred by ML (Maximum Likelihood) with a frequency
2 in the sample of 152 A clusters examined (Table 4). These frequency estimates are compatible with those of several other studies dealing with less numerous samples (6,12,14–17). In contrast to the absence of variability observed for the 3' markers (see above), the C and the SBenin clusters show some variability for the 5' markers: the original C and SBenin haplotypes are clearly recognizable, but three diverse types of C and one of SBenin recombinant (new = not ancestral) haplotypes have been found in a sample of 58 C and 42 SBenin clusters and their overall frequency among the C is much higher than that among the SBenin alleles (7/58 versus 1/42).
|
|
The haplotype variability for the fast evolving (microsatellite) markers
The results concern two different simple tandem repeats (STRs). In both cases, the frequencies have been obtained by direct counting. For the (AT)xTy microsatellite (Table 5) neither C nor SBenin clusters show any variation, despite the high variability occurring among the A chromosomes (HA = 0.723), thus confirming the already mentioned monophyletic origin of the two β6 variants and indicating that the two mutations occurred recently in different microsatellite haplotypes [the (AT)7T7 for C and the (AT)8T4 for SBenin] of the same 3' slowly evolving markers haplotype (delAT, T, T, C, T, T, Hpa I-) (Table 1). The (ATTTT)n STR at ca.–1400 bp is quite variable within the A clusters (HA = 0.621), and some degree of variability is displayed also by C (HC = 0.186) and SBenin (HS = 0.054) clusters (Table 6) which are very different from each other for the frequency of the new haplotypes (18/175 versus 3/108, P
0.015).
|
|
The data on the upstream slowly evolving markers (Table 4), combined with those on the (ATTTT)n site (Table 6), subjected to partially independent mechanisms of evolution (recombination versus recombination plus DNA slippage), conclusively show that the journey accomplished by the C allele in the direction of attaining the same haplotype variation of the A allele has been much longer than that of the SBenin allele.
| DISCUSSION |
|---|
|
|
|---|
The C allele, if fixed, could provide a full protection to all individuals from severe P. falciparum malaria, whereas the S allele, even at its best, only protects a minority of the population and even this partial protection is paid with a quite high segregational load. Yet, the C allele occurs only in a single and very limited geographic area of central West Africa and in Thailand (19) while the S allele is distributed all over Africa, Arabia and India. In other words, at the world-wide scale, the protection afforded by the C allele is orders of magnitude smaller than that of the S allele.
The goal of the present investigation was to explain this apparent evolutionary contradiction by studying, through a haplotypic approach, a population where both these alleles are common. The approach consisted in comparing the C and SBenin LD decays and is based on the assumption that all the C (or the SBenin) haplotypes have the same selective value (irrespective of whether they are in the ancestral or in a new haplotype), therefore their LD decays proceeded at a constant rate, equal for C and SBenin, through the whole process from their birth up to now. In other words, it is here assumed that these alleles are the only determinants of the selective value of the cluster they belong to. The approach here adopted can be successful only if (i) both the C and SBenin alleles are monophyletic and (ii) their relative LD decays are neither both just started nor both almost completed. Monophyletism is necessary because only in this case it can be assumed that any C and SBenin new haplotype has been produced by recombination or DNA slippage thus allowing one to infer the allele age from the observed LD decay: our findings demonstrated that both the C and the SBenin alleles are monophyletic (see Results). Also the second condition was fulfilled: the C LD decay turned out to be far from both the extremes (0.165: see Table 4), allowing one to compare the LD decays of the two alleles, hence their relative ages.
The comparison between the C and SBenin haplotype variabilities and its implications: C is more ancient than SBenin
This comparison has been carried out through two partially independent sets of data, those on the upstream recombinant haplotypes and those on the STR haplotypes (produced by recombination and/or DNA slippage). In both cases, the result of such comparison can be expressed in terms either of number of distinct types of events which produced the new haplotypes or of the LD decays of the ancestral haplotypes.
With the present sample sizes, the maximum number of distinguishable types of recombination events was 3, both for the C and SBenin haplotypes (for C with the haplotype ID no. 1, or with the ID no. 3 or with anyone of the remaining pooled uncommon A haplotypes; for SBenin with the ID no. 2, or with the ID no. 3 or with anyone of the remaining pooled A haplotypes: see Table 4). For the C allele, all the three types of recombinants have been found, whereas only one was found for the SBenin allele. The 4-fold difference observed between the C and the SBenin relative overall LD decays, though large, is not significant (0.165 versus 0.04, see footnotes of Table 4; P
0.2, a value calculated by taking into account, also the C and the SBenin sample sizes). As to the (ATTTT)n STR (Table 6) both the possible one-step slippage haplotypes (ATTTT5
6 and ATTTT5
4) were found among the C alleles, but only one (ATTTT5
4) among the SBenin alleles. The cumulative frequencies of the new haplotypes were 18/175 (= 0.103) versus 3/108 (= 0.028), a 3.7-fold difference (P
0.015).
On the whole, present data show that the C haplotype variability is about 4-fold greater than that of the SBenin and the combined statistical significance of this difference is high (P
0.003). In spite of recent reports on biallelic HSRs with alleles showing different efficiencies in promoting recombination (20,21), in the present case higher haplotype variability implies greater antiquity for at least two reasons: (i) a study on the HSR β globin cluster showed no evidence of polymorphism in recombination rate (20) and (ii) the A
C and the A
SBenin mutations occurred in the same 3' haplotype (Table 2) making even more unlikely that they are associated with different HSR alleles (if any).
The absolute age of the C and SBenin alleles
To estimate the absolute age of these two alleles from the extent of accumulation of new C and SBenin haplotypes, different types of markers have been studied and partially independent estimates have been obtained. Two sets consisted of upstream or downstream slowly evolving markers; the remaining were fast evolving markers, but for the (AT)xTy STR new haplotypes could only have been formed through DNA slippage, whereas for the (ATTTT)n STR they could have been produced also by recombination. It is worth to point out that the ultimate, overall accumulation of new haplotypes should not be affected either by the demographic or by the selective history of the population and not even by its time-course (see Statistical Methods).
Table 7 presents all the age estimates. They range between 38 and 120 generations for the C allele, and 10 and 28 generations for the SBenin allele, depending on the value of the recombination rate (R) and/or on the type of haplotype (SNPs or STRs) considered. For the C allele, present age estimate is in agreement with literature data (75–150 generations with an upper limit <275 generations) (24). The uncertainty about the haplotypes frequency estimates is relatively small when compared with the very large one concerning R. These sources of uncertainty make the present absolute age estimates no more than tentative; however, reasonable values seem to be 100 generations for the C and 25 generations for the SBenin allele.
|
The time-course of the C accumulation: a major lag followed by a short phase of rapid frequency increase
The present approach has consisted in evaluating the extent of the deviations, if any, from the theoretical expectation that the LD decays of different new haplotypes are all equal. By studying two types of new haplotypes [the 5' SNP and the (ATTTT)n haplotypes], large and highly significant differences among their LD decays have been found for both of them (P
0.0005, see Table 8; and P
0.007; see Table 9), thus making the combined likelihood vanishingly small (P
4 x 10–6). This finding shows that for a long initial period, the C alleles remained very few, thus ruling out both a rapid self-sustained (not due to immigration) increase of the population size and a strong selective advantage of the C allele during that period even though P. falciparum was already there (25,26).
|
|
Implications of the above findings. The fact that the selective advantage of the C allele towards P. falciparum malaria had been very poor when this allele was still rare implies that such protection was very mild when it was brought about only through the AC heterozygosity. This rules out the semidominant model and proves by exclusion the recessive model. A further evidence is the finding that C was born much earlier than the SBenin allele. In fact, if the AC genotype afforded a protection comparable with that of AS, the C allele—having existed for a much longer time than the SBenin and being at the homozygous state highly advantageous (instead than lethal)—would have attained a frequency exceeding that of the SBenin allele to a much larger extent than it actually did (0.13 versus 0.05). Indeed it should have approached fixation.
The slow increase of the number of C alleles during the lag phase was probably due to a combined effect of a population expansion and a selective advantage of the AC heterozygotes both of mild degree. As to their rapid post-lag expansion, since so tremendous an increase in so short a time cannot be accounted for by selection only, it is mandatory to postulate that a huge Mossi population expansion accompanied by a strong mating structure (spanning in the whole range from inbreeding to village and territory isolation) favouring the production of homozygotes, played a costarring, rather than a marginal, role in the process of the C allele accumulation. This state of affairs, specific of this particular system, makes unfeasible (because too arbitrary and speculative) any simulation approach not based on a reliable knowledge of the demographic, mating structure and malarial histories of the Mossi during the last 100–200 generations.
In summary, present findings, by ruling out a delayed birth of the C allele as the reason why its frequency is not dramatically higher than that of SBenin and by showing that the C allele accumulation process initiated with a long lag, conclusively prove the recessive model. This conclusion is supported by recent in vitro studies, which showed that CC parasitized Red Blood Cells are very different from AA parasitized RBCs for three features relevant for the severity of the disease (cytoadherence, rosetting and agglutination by immune sera), whereas AC parasitized RBCs are much less modified (27).
The fact that the AC heterozygosity protection is much lower than the CC protection may be helpful to figure out hypotheses on the molecular basis of the C protection. For example, considering that the expected approximate proportion of
2β2C haemoglobin is ca. 100% in the CC homozygotes and only 25% in the AC heterozygotes, a reasonable (perhaps too naïf and simplistic) hypothesis is that the extent of the protection depends on the percentage of
2β2C Hb.
It was known since long that the SC genotype is severely disadvantaged (sickle-cell-haemoglobin C disease), but the role of the S and the C alleles in shaping each other's evolutionary fate could not be inferred because the fitness wAC of the AC heterozygotes was not known. The present result that (wAC–wAA) << (wAS–wAA) (because ProtectionAC << ProtAS) allows one to assign a clear-cut, important and substantially unidirectional role to the S allele. In fact, the weighted mean fitness of the C heterozygotes (AC plus SC) with respect to the AA homozygotes, depends, by definition, on the fitness of the two types of heterozygotes and on their frequencies, according to the following equation:
|
|
Since
|
|
|
|
namely when
|
|
which shows that the overall fitness of the C heterozygotes is equal to that of the AA homozygotes if the frequency of the A allele exceeds that of the S allele to the same extent as the large disadvantage of the SC genotype exceeds the small advantage of the AC genotype. Thus, the interplay between the three alleles may create a kind of pseudo-balanced polymorphism within the C alleles carried by the heterozygotes, where the balance would take place between the advantage of the C alleles carried by the AC heterozygotes and the disadvantage of those of the SC heterozygotes. In other words, even a modest S allele frequency may be sufficient to turn the C cumulative heterozygotes advantage into a disadvantage. The same effect on the S allele may be brought about by the C allele, but it is very unlikely because, owing to the large AS advantage, the C allele frequency required to cause such an effect would be very high. In summary, because of the large SC disadvantage coupled with the small AC advantage, the S allele has been potentially able to make even more unlikely for the C allele to increase its diffusion, particularly so if such coexistence occurred when the C allele was still in its lag phase.
On the whole, two adverse odds had been overcome by the C allele while attaining its present polymorphic status: the low rate of production (as any SNS) and the high probability of disappearance by pure chance through the whole lag period. The chance of attaining a polymorphic frequency has been even more adverse in populations with a common S allele because of the strong disadvantage of the SC genotype. Therefore, the fact that all this took place in one occasion is perhaps more surprising than the fact that it occurred only once.
A considerable part of this scenario had been already suggested long time ago in a pioneer book (28) where, on the basis of the deviations from the HW equilibrium observed in a pooled sample of 72 African populations, the following fitness were assigned to the six genotypes for the A, S and C alleles: wAA = 0.89 ± 0.03, wSS = 0.20 ± 0.11, wCC = 1.31 ± 0.29, wAS = 1, wAC = 0.89 ± 0.035, wSC = 0.70 ± 0.07.
In conclusion, the genetic response of the Mossi to P. falciparum malaria has been brought about by a quick but costly balanced polymorphism and a slow but gratis transient polymorphism. This, which is the main conclusion of the present investigation, primes two types of general evolutionary implications.
General implications
Alleles which protect from malaria in a mainly (or exclusively) recessive fashion
Three of such alleles,
–3.7 thal (29), fy (30) and HbC, are presently known (though for the fy allele its past selective value, if any, is still debated (see, for example, the discussion in 31). For each of them the success, in terms of geographic distribution, has largely been a function of the rate of production (very high for the
–3.7 thal, which is produced by a displaced but homologous crossing over and extremely low for fy and C, which require a single, specific SNS) and of the time elapsed since the appearance of the selective factor (very long for the fy allele which protects from the very ancient P. vivax malaria and relatively short for the
–3.7 and C which protect from the more recent P. falciparum malaria). The C allele, being strongly disfavoured for both these aspects, has been much less successful than the
–3.7 thal allele, which is polycentric, and the fy allele, which is unicentric (Western sub-saharan Africa) but fixed in a large area.
General aspects of the pathways of genetic adaptation
The biological impact (pattern and velocity of evolution, and ultimate fate) of a major adaptive allele x towards a stringent, continuous and long-lasting selective factor essentially depends on: (i) the rate at which x is produced (it may range from the very low value of a specific SNS as βS and βC, to the much higher value of the loss of function alleles, as the thal alleles); (ii) the selection model and the genetic load of the adaptation: the A/S polymorphism is a balanced polymorphism with a high genetic load (quick but costly genetic adaptation), whereas βC is an almost recessive selective polymorphism with, obviously, no CC segregational load (slow but gratis genetic adaptation). Duration, stringency and continuity of the selective pressure appear to be the main extra-genetic factors relevant for the phase achieved by an adaptive process (besides the demographic and mating structure histories of the exposed population).
One can figure out a genetic adaptation as a four-phase (i–iv) process usually—but not necessarily—starting with the appearance or even the pre-existence of quick but costly emergency alleles and ending with the fixation of one slow but gratis allele with no emergency alleles left. This simplistic scheme may be convenient to frame the single actual adaptive scenarios so far known or hypothesized. (i) A possible example of adaptive scenario still in the first phase is the ensemble of the silent, lethal alleles of four structural genes for lysosomal enzymes in the Ashkenazi Jews (32,33), apparently a set of quick but costly alleles providing an emergency adaptation towards an unidentified stringent but recent selective factor and not accompanied by any known slow but gratis adaptive allele. (ii) The second phase consists in the coexistence of emergency alleles with one slow but gratis allele theoretically travelling towards fixation, but still far from it. This scenario is possibly represented by the coexistence, in central West Africa, of the βS and βC alleles. (iii) Conclusively demonstrated examples of third-phase scenarios in which one slow but gratis adaptive allele is fixed (or very nearly so), while relics of quick but costly alleles are still present, are not available. However, it has been hypothesized (34) that the Cystic Fibrosis-causing lethal alleles, which are common (cumulative frequency = 0.02) in Northern European populations where a Lactase-Persistence allele (slow but gratis) is almost fixed, are relics of emergency alleles. They would have provided a partial—and costly—adaptation by mitigating, in the heterozygotes, the severity of the diarrhoea due to the dairy milk diet adopted by these populations when they were (as all mammals) still lactose-intolerant. (iv) The fourth phase, that of a slow but gratis adaptive allele fixed and apparently no longer accompanied by relics of quick but costly alleles, may be represented by the fy allele in Central-West Africa. The long time elapsed since the appearance of the supposed selective agent (P. vivax malaria) is likely to be the reason why this adaptive allele has attained fixation probably long time ago. Further possible examples are the
–3.7 thal alleles almost fixed among Tharus of Southern Nepal (35) and in Papua New Guinea (36).
| MATERIALS AND METHODS |
|---|
|
|
|---|
The sample
It consists of 390 unrelated Mossi of Burkina Faso with the following genotypes: 120 AA, 180 AC, 35 AS, 31 CC and 24 SS. All subjects gave informed consent. Not all the specimens have been studied for all the markers. The total number of A, C and S alleles examined for each marker is specified in the Tables.
The DNA region and the markers
The markers studied (the β6 codon; two STRs; 12 SNPs; 1 dinucleotide insertion/deletion) are distributed along the entire length of the β cluster which contains a hot spot of recombination (HSR) (Fig. 2). Haemoglobin genotypes have been identified either by RFLP analysis (3) or direct sequencing. The region of ca. 1350 bp around the β6 codon (from –700 to +641 bp from the cap site of the β gene) has been studied by sequencing allele-specific PCR (ARMS-PCR) fragments. It comprises five SNPs (–551 T/C,–543 C/T,–491 A/C,–340 T/C and + 569 G/T), one STR [–541 (AT)xTy] and one dinucleotide insertion/deletion (–570 indelAT). Two distinct ARMS-PCR were carried out for each heterozygous subject: the first with the forward primer ATxTyFWD plus one of the allele specific reverse primers (βAREV or βCREV or βSREV) to amplify a region 750 bp long upstream (5') to the β6 codon; the other with the reverse primer 3'S-REV plus one of the allele specific forward primers (β6 FWD or βCFWD or βS FWD) to amplify a region 570 bp long downstream (3') to the β6 codon (for details see Tables 10 and 11). The (ATTTT)n STR was studied by acrylamide gel (7%) electrophoresis analysis of a 300 bp PCR fragment obtained through an ARMS-PCR (primers pair: AT4FWD2 plus βA or βC or βSREV) followed by a nested PCR (primers pair: AT4FWD plus AT4REV). All primer sequences and PCR conditions are listed in Tables 10 and 11. All the other SNPs have been studied on homozygous subjects only (100 HbAA, 30 HbCC and 24 HbSS) using primers and PCR conditions already published (37) except than for the HincII site 3 to the
gene for which new primers were designed (Table 10).
|
|
| STATISTICAL METHODS |
|---|
|
|
|---|
Maximum likelihood (ML) haplotype frequency estimates were calculated with the Arlequin 3.0 software (10).
Estimates of the linkage disequilibrium (LD) decays
They have been calculated as overall decays (Table 4), as well as in some cases, referring to single haplotypes (Table 8).
The overall C LD decay is the decrease from 1 [the frequency, fC, ancC, 0, among the C haplotypes, of the ancestral C haplotype (the one where the C allele was borne) at the beginning of the process (time 0)] down to the present value after n generations (fC, ancC, n):
|
|
where fA, ancC is the frequency of the ancestral C haplotype among the A haplotypes (assumed to be equal at time 0 and time n), (when fC, ancC, n = fA, ancC the LD decay is completed, i.e. the equilibrium has been reached).
The same applies for the SBenin allele.
The LD decay concerning a single new haplotype is expressed by the increase of its frequency from 0 (its initial frequency) up to its present frequency (fC, newC, n) having as a term of reference the haplotype frequency among the A haplotypes (fA, newC, n):
|
|
and, since fC, newC, 0 is 0, the expression becomes: fC, newC, n/fA, new.
It is important to point out that both the C versus A and the SBenin versus A systems are still in so strong a LD that the ancestral C and SBenin haplotypes are clearly identifiable.
Estimates of the C and SBenin allele absolute ages
They have been obtained from: (i) the frequency of new (C or SBenin) upstream slowly evolving haplotypes; (ii) the frequency of new downstream slowly evolving haplotypes; (iii) the frequency of new fast evolving haplotypes.
In each case, for the C allele, we applied the general formula:
|
| ([1]) |
where fA is the average frequency of the A allele during the C allele life-span. Since fA was 1 at the beginning of the process and is 0.82 now [1–(fC + fS)] = [1–(0.13 + 0.05)], it has been approximated to 1; R is the recombination rate; 0.5 is the proportion of the recombinant gametes carrying the C allele among those produced by the C/Anon ancestralC heterozygotes and µ is the mutation rate.
For the SBenin allele an equivalent formula has been used.
Estimates based on the frequency of new (C or SBenin) upstream slowly evolving haplotypes
These new haplotypes can be produced by recombination only (since µ is extremely low).
The C allele age. Considering that fC, ancC, n = 0.88, fA, non ancC = (1–0.266) = 0.734 (see Table 4) and fC, ancC, 0 (the frequency, among the C alleles, of the ancestral C haplotype at the beginning of the process) = 1, the formula [1] becomes:
|
|
so that n = log 0.88/log (1–0.367 R), where R is the recombination rate in the HSR site.
The SBenin allele age. Since fS, ancS, n = 0.976 and fA, non ancS = 0.599 (Table 4), the formula [1] becomes:
|
|
and n = log 0.976/log (1–0.3 R).
The by far most relevant source of uncertainty of these estimates is due to the large range of the different estimates of R, which span between 2.9 and 7.1 x 10–3 (13,20,22,23).
Estimates based on the frequency of downstream slowly evolving haplotypes
Also in this case, new haplotypes can be produced by recombination only. Since the observed frequency of recombinants have been 0/54 for the C allele and of 0/34 for the SBenin allele (last row of Table 1), these findings are poorly informative allowing one only to identify an upper threshold for these alleles age. The value of this threshold can be inferred from the recombination rate of this region, 4.5 x 10–2 per Mb per generation (13), the distance between the β6 codon and the farthest SNP site (+7446 HpaI), and the frequency of the donor HpaI(+) allele (1–28/170 = 0.835; see Table 1) which make one to expect that C and SBenin new haplotypes have been accumulated at a rate of 1.4 x 10–4 per generation.
Estimates based on the frequency of the new fast evolving haplotypes
The (ATTTT)n site. New one-step C or SBenin haplotypes [i.e. those with the (ATTTT)4 or the (ATTTT)6 allele] may have been produced by recombination or by DNA slippage. Therefore, by resolving the formula [1] where fA, non ancC = fA, non ancS = 0.549 (1–0.451; see Table 6) the age estimates are:
|
|
where 0.897 and 0.972 are the frequency of the ancestral (ATTTT)5 allele among the present C and, respectively, SBenin clusters; and 0.0009 is the here adopted mutation rate of the (ATTTT)n site. This figure has been obtained as a weighted mean between the data on Y (38) and autosomal (39) STRs.
The (AT)xTy site. Since this site is close to the right end of the HSR region, at a first approximation it will be regarded as located at the same side of the β6 codon with respect to the HSR. In other words, it is here assumed that new C or SBenin haplotypes for this site could only have been generated through DNA slippage, namely with a rate equal to the product (1 x 0.0009), where 1 is the frequency of the ancestral C or SBenin haplotype at time 0 and 0.0009 is the DNA mutation rate.
An approach to ascertain whether the C alleles remained very few for most of the C genealogy life-span (very long lag)
The study of the C LD decay in its wholeness can be informative about the C allele age only, while the LD decay study of the single different new C haplotypes can, in favourable circumstances, provide information also about what the C allele genealogy did during its existence.
The following symbols will be used hereafter: NC,final is the ultimate absolute number of C alleles in the present five million Mossi. Since qC = 0.13, NC,final
0.13 x 107; rnew = combined rate of production of new (recombinant or recombinant + DNA slippage) C haplotypes; ni the absolute number of C alleles at the ith generation; ei the expansion factor of the ni alleles, which is equal to NC,final/ni.
The time course of the C accumulation is expected not to affect the ultimate absolute number of the new C haplotypes (NC,new). In fact the expected absolute contribution ci of the ith generation to NC,new is independent from ni being both directly proportional (for the number of recombination events) and inversely proportional (for ei) to it. This is shown by the formula (valid when the LD is still small, as in the case under study) ci = (rnew x ni x ei) = (rnew x ni x NC,final/ni) = (rnew x NC,final), which does not contain ni. For example, a period of the C genealogy life-span characterized by extremely few C alleles is expected to contribute to the ultimate number of new C alleles as an equally long period with a very high (average) number of C alleles (it may be worth noticing that, if the LD decay were affected by the time-course of the variation of an allele's number, it could not be utilised to infer the allele age as it is usually done). However, their respective contributions differ greatly for the origins: that derived from very few C alleles consists of very few recombinant megaclones (both these features are potentially able to implement the chance deviations), whereas that derived from many C alleles consists of very many recombinant microclones (two features which tend to counteract chance deviations). In other words, a very few megaclones pattern is likely not to comply the theoretical expectation that all the possible LD decays (one per haplotype) be equal the one to the other and to the overall LD decay. On the contrary, a very many microclones pattern should result into a good compliance to theoretical expectations (namely into a subdivision of the new haplotypes mirroring that of the A haplotypes).
The degree of compliance of the C new haplotypes subdivision to the theoretical expectation of being equal to that occurring within the A haplotypes should then ultimately depend on the ratio between the contribution of new C haplotypes provided by the period when the C alleles were very few and that of the period when they were very many. Thus, If, and only if, the first period (the one which may produce very few megaclones) had been much longer than the second one, the discrepancies between the actual and the expected data accumulated during that initial period are likely not to have been buffered by the later contribution, so that the final outcome as a rule consists of new C haplotypes subdivided in a way different from that of the A haplotypes, besides of being different from one another, hence also from their mean.
In conclusion, large discrepancies from the expected subdivision of the new C haplotypes would necessarily imply that the C accumulation had been a biphasic process where a very long period with very few C alleles had been followed by a short phase of rapid expansion.
It is worth noting that, if the C alleles are very few, even the pool of new recombinant C haplotypes, and not only its subdivision into the possible recombinant haplotypes, is likely not to comply with its expected frequency. Thus, if the lag had been very long and the C alleles during that lag very few, the confidence intervals of the C allele age estimates inferred from the whole present proportion of new C haplotypes among the C alleles are particularly large.
The just described approach can be utilized, mutatis mutandis, also for new DNA slipped C haplotypes and may have general application.
| FUNDING |
|---|
|
|
|---|
This work was supported by grants from Italian Ministry of Education (MIUR COFIN 2001, 2003), from the University of Rome La Sapienza and from the EU, Sixth Framework Programme, BioMalPar Network of Excellence, N. LSHP-CT-2004-503578.
| ACKNOWLEDGEMENTS |
|---|
We are grateful to the study participants in Burkina Faso for their understanding and cooperation and to the laboratory staff at the Centre Medical Saint Camille of Ouagadougou, Burkina Faso.
Conflict of Interest statement. None declared.
| FOOTNOTES |
|---|
The authors with it to be known that, in their opinion, the first two authors should be regarded as joint Authors. | REFERENCES |
|---|
|
|
|---|
- Allison A.C. The distribution of the sickle-cell trait in East Africa and elsewhere, and its apparent relationship to the incidence of subtertian malaria. Trans. R. Soc. Trop. Med. Hyg. (1954) 48:312–318.[Medline]
- Hill A.V., Allsopp C.E., Kwiatkowski D., Anstey N.M., Twumasi P., Rowe P.A., Bennett S., Brewster D., McMichael A.J., Greenwood B.M. Common west African HLA antigens are associated with protection from severe malaria. Nature (1991) 352:595–600.[CrossRef][Medline]
- Modiano D., Luoni G., Sirima B.S., Simpore J., Verra F., Konate A., Rastrelli E., Olivieri A., Calissano C., Paganotti G.M., et al. Haemoglobin C protects against clinical Plasmodium falciparum malaria. Nature (2001) 414:305–308.[CrossRef][Medline]
-
Agarwal A., Guindo A., Cissoko Y., Taylor J.G., Coulibaly D., Kone A., Kayentao K., Djimde A., Plowe C.V., Doumbo O., et al. Hemoglobin C associated with protection from severe malaria in the Dogon of Mali, a West African population with a low prevalence of hemoglobin S. Blood (2000) 96:2358–2363.
[Abstract/Free Full Text] -
Rihet P., Flori L., Tall F., Traore A.S., Fumoux F. Hemoglobin C is associated with reduced Plasmodium falciparum parasitemia and low risk of mild malaria attack. Hum. Mol. Genet. (2004) 13:1–6.
[Abstract/Free Full Text] -
Pagnier J., Mears J.G., Dunda-Belkhodja O., Schaefer-Rego K.E., Beldjord C., Nagel R.L., Labie D. Evidence for the multicentric origin of the sickle cell hemoglobin gene in Africa. Proc. Natl Acad. Sci. USA (1984) 81:1771–1773.
[Abstract/Free Full Text] -
Allison A.C. Polymorphism and natural selection in human populations. Cold Spring Harb. Symp. Quant. Biol. (1964) 29:137–149.
[Abstract/Free Full Text] - Mockenhaupt F.P., Ehrhardt S., Cramer J.P., Otchwemah R.N., Anemana S.D., Goltz K., Mylius F., Dietz E., Eggelte T.A., Bienzle U. Hemoglobin C and resistance to severe malaria in Ghanaian children. J. Infect. Dis. (2004) 190:1006–1009.[CrossRef][Web of Science][Medline]
-
Labie D., Dunda-Belkhodja O., Rouabhi F., Pagnier J., Ragusa A., Nagel R.L. The -158 site 5' to the G gamma gene and G gamma expression. Blood (1985) 66:1463–1465.
[Abstract/Free Full Text] - Excoffier L., Laval G., Schneider S. Arlequin ver. 3.0: an integrated software package for population genetics data analysis. Evol. Bioinform. Online (2005).
- Trabuchet G., Elion J., Dunda O., Lapoumeroulie C., Ducrocq R., Nadifi S., Zohoun I., Chaventre A., Carnevale P., Nagel R.L., et al. Nucleotide sequence evidence of the unicentric origin of the beta C mutation in Africa. Hum. Genet. (1991) 87:597–601.[Medline]
- Boehm C.D., Dowling C.E., Antonarakis S.E., Honig G.R., Kazazian H.H. Jr. Evidence supporting a single origin of the beta(C)-globin gene in Blacks. Am. J. Hum. Genet. (1985) 37:771–777.[Medline]
- Chakravarti A., Buetow K.H., Antonarakis S.E., Waber P.G., Boehm C.D., Kazazian H.H. Nonuniform recombination within the human beta-globin gene cluster. Am. J. Hum. Genet. (1984) 36:1239–1258.[Web of Science][Medline]
- Talacki C.A., Rappaport E., Schwartz E., Surrey S., Ballas S.K. Beta-globin gene cluster haplotypes in Hb C heterozygotes. Hemoglobin (1990) 14:229–240.[CrossRef][Medline]
- Nagel R.L., Fabry M.E., Pagnier J., Zohoun I., Wajcman H., Baudin V., Labie D. Hematologically and genetically distinct forms of sickle cell anemia in Africa. The Senegal type and the Benin type. N. Engl. J. Med. (1985) 312:880–884.[Abstract]
- Zago M.A., Silva W.A. Jr., Dalle B., Gualandro S., Hutz M.H., Lapoumeroulie C., Tavella M.H., Araujo A.G., Krieger J.E., Elion J., Krishnamoorthy R. Atypical beta(s) haplotypes are generated by diverse genetic mechanisms. Am. J. Hematol. (2000) 63:79–84.[CrossRef][Medline]
-
Zago M.A., Silva W.A. Jr, Gualandro S., Yokomizu I.K., Araujo A.G., Tavela M.H., Gerard N., Krishnamoorthy R., Elion J. Rearrangements of the beta-globin gene cluster in apparently typical beta S haplotypes. Haematologica (2001) 86:142–145.
[Abstract/Free Full Text] -
Chebloune Y., Pagnier J., Trabuchet G., Faure C., Verdier G., Labie D., Nigon V. Structural analysis of the 5' flanking region of the beta-globin gene in African sickle cell anemia patients: further evidence for three origins of the sickle cell mutation in Africa. Proc. Natl Acad. Sci. USA (1988) 85:4431–4435.
[Abstract/Free Full Text] - Sanchaisuriya K., Fucharoen G., Sae-ung N., Siriratmanawong N., Surapot S., Fucharoen S. Molecular characterization of haemoglobin C in Thailand. Am. J. Hematol. (2001) 67:189–193.[CrossRef][Medline]
-
Holloway K., Lawson V.E., Jeffreys A.J. Allelic recombination and de novo deletions in sperm in the human beta-globin gene region. Hum. Mol. Genet. (2006) 15:1099–1111.
[Abstract/Free Full Text] - Jeffreys A.J., Neumann R. Reciprocal crossover asymmetry and meiotic drive in a human recombination hot spot. Nat. Genet. (2002) 31:267–271.[CrossRef][Web of Science][Medline]
- Wall J.D., Frisse L.A., Hudson R.R., Di Rienzo A. Comparative linkage-disequilibrium analysis of the beta-globin hotspot in primates. Am. J. Hum. Genet. (2003) 73:1330–1340.[CrossRef][Web of Science][Medline]
-
Schneider J.A., Peto T.E., Boone R.A., Boyce A.J., Clegg J.B. Direct measurement of the male recombination fraction in the human beta-globin hot spot. Hum. Mol. Genet. (2002) 11:207–215.
[Abstract/Free Full Text] - Wood E.T., Stover D.A., Slatkin M., Nachman M.W., Hammer M.F. The beta -globin recombinational hotspot reduces the effects of strong selection around HbC, a recently arisen mutation providing resistance to malaria. Am. J. Hum. Genet. (2005) 77:637–642.[CrossRef][Medline]
-
Rich S.M., Licht M.C., Hudson R.R., Ayala F.J. Malaria's Eve: evidence of a recent population bottleneck throughout the world populations of Plasmodium falciparum. Proc. Natl Acad. Sci. USA (1998) 95:4425–4430.
[Abstract/Free Full Text] -
Joy D.A., Feng X., Mu J., Furuya T., Chotivanich K., Krettli A.U., Ho M., Wang A., White N.J., Suh E., Beerli P., Su X.Z. Early origin and recent expansion of Plasmodium falciparum. Science (2003) 300:318–321.
[Abstract/Free Full Text] - Fairhurst R.M., Baruch D.I., Brittain N.J., Ostera G.R., Wallach J.S., Hoang H.L., Hayton K., Guindo A., Makobongo M.O., Schwartz O.M., et al. Abnormal display of PfEMP-1 on erythrocytes carrying haemoglobin C may protect against malaria. Nature (2005) 435:1117–1121.[CrossRef][Medline]
- Cavalli-Sforza L.L., Bodmer W.F. The genetics of human populations. (1971) San Francisco: Freeman WH.
-
Allen S.J., O'Donnell A., Alexander N.D., Alpers M.P., Peto T.E., Clegg J.B., Weatherall D.J. alpha+-Thalassemia protects children against disease caused by other infections as well as malaria. Proc. Natl Acad. Sci. USA (1997) 94:14736–14741.
[Abstract/Free Full Text] - Miller L.H., Mason S.J., Clyde D.F., McGinniss M.H. The resistance factor to Plasmodium vivax in blacks. The Duffy-blood-group genotype, FyFy. N. Engl. J. Med. (1976) 295:302–304.[Abstract]
- Hamblin M.T., Thompson E.E., Di Rienzo A. Complex signatures of natural selection at the Duffy blood group locus. Am. J. Hum. Genet. (2002) 70:369–383.[CrossRef][Web of Science][Medline]
- Goodman R.M., Motulsky A.G. Genetic diseases among Ashkenazi Jews. (1979) New York: Raven Press.
- Motulsky A.G. Jewish diseases and origins. Nat. Genet. (1995) 9:99–101.[CrossRef][Web of Science][Medline]
- Modiano G., Ciminelli B.M., Pignatti P.F. Cystic fibrosis and lactase persistence: a possible correlation. Eur. J. Hum. Genet. (2007) 15:255–259.[CrossRef][Medline]
- Modiano G., Morpurgo G., Terrenato L., Novelletto A., Di Rienzo A., Colombo B., Purpura M., Mariani M., Santachiara-Benerecetti S., Brega A., et al. Protection against malaria morbidity: near-fixation of the alpha-thalassemia gene in a Nepalese population. Am. J. Hum. Genet. (1991) 48:390–397.[Medline]
- Oppenheimer S.J., Higgs D.R., Weatherall D.J., Barker J., Spark R.A. Alpha thalassaemia in Papua New Guinea. Lancet (1984) 1:424–426.[CrossRef][Medline]
- Sutton M., Bouhassira E.E., Nagel R.L. Polymerase chain reaction amplification applied to the determination of beta-like globin gene cluster haplotypes. Am. J. Hematol. (1989) 32:66–69.[Web of Science][Medline]
- Gusmao L., Sanchez-Diz P., Calafell F., Martin P., Alonso C.A., Alvarez-Fernandez F., Alves C., Borjas-Fajardo L., Bozzo W.R., Bravo M.L., et al. Mutation rates at Y chromosome specific microsatellites. Hum. Mutat. (2005) 26:520–528.[CrossRef][Web of Science][Medline]
-
Yan J., Liu Y., Tang H., Zhang Q., Huo Z., Hu S., Yu J. Mutations at 17 STR loci in Chinese population. Forensic Sci. Int. (2006) 162:53–44.[CrossRef][Medline]
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


is the G