Human Molecular Genetics Advance Access originally published online on September 20, 2005
Human Molecular Genetics 2005 14(21):3191-3201; doi:10.1093/hmg/ddi350
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Distribution of the strength of selection against amino acid replacements in human proteins
1Department of Biological Sciences, East Tennessee State University, Johnson City, TN 37614-1710, USA, 2Section of Ecology, Behavior and Evolution, University of California, San Diego, 9500 Gilman Dr, La Jolla, CA 92093-0346, USA and 3National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894, USA
* To whom correspondence should be addressed. Tel: +1 3014358944; Fax: +1 3014809241; Email: kondrashov{at}ncbi.nlm.nih.gov
Received May 3, 2005; Revised July 27, 2005; Accepted September 12, 2005
| ABSTRACT |
|---|
|
|
|---|
The impact of an amino acid replacement on the organism's fitness can vary from lethal to selectively neutral and even, in rare cases, beneficial. Substantial data are available on either pathogenic or acceptable replacements. However, the whole distribution of coefficients of selection against individual replacements is not known for any organism. To ascertain this distribution for human proteins, we combined data on pathogenic missense mutations, on human non-synonymous SNPs and on humanchimpanzee divergence of orthologous proteins. Fractions of amino acid replacements which reduce fitness by >102, 102104, 104105 and <105 are 25, 49, 14 and 12%, respectively. On average, the strength of selection against a replacement is substantially higher when chemically dissimilar amino acids are involved, and the Grantham's index of a replacement explains 35% of variance in the average logarithm of selection coefficients associated with different replacements. Still, the impact of a replacement depends on its context within the protein more than on its own nature. Reciprocal replacements are often associated with rather different selection coefficients, in particular, replacements of non-polar amino acids with polar ones are typically much more deleterious than replacements in the opposite direction. However, differences between evolutionary fluxes of reciprocal replacements are only weakly correlated with the differences between the corresponding selection coefficients.
| INTRODUCTION |
|---|
|
|
|---|
Amino acid replacements can have vastly different impacts on the structure and function of the protein and on fitness of the organism. On one hand, many pathogenic replacements are individually lethal or severely deleterious (1
104 amino acids which deviate from the populational consensus (2
3x105 of such differences even between human and chimpanzee) have only very small (
1045) impacts on fitness (3
Several studies dealt separately with the two tails of this distribution, i.e. with replacements leading either to very strong (5
7
) or to very weak (8
14
) selection. To ascertain the whole distribution of s, one needs to combine data of very different kinds and to analyze them within a unified framework. Such an analysis can reveal new patterns, not evident when data of any one kind are considered separately.
We estimate fractions of all possible replacements which are deleterious enough (i) to cause a Mendelian disease by affecting one of 34 human morbid proteins, (ii) to not segregate within human populations as non-synonymous SNPs in 13 533 human genes or (iii) to never reach fixation in the course of humanchimpanzee divergence of the same set of genes. Knowing these fractions makes it possible to approximate the distribution of s by a four-column histogram. Pathogenic missense mutations were compared to nonsense mutations (15
), which allowed us to obtain absolute, instead of relative (5
), probabilities of pathogenic effects. Data on intrapopulation diversity and interspecies divergence were polarized by outgroup sequences, of chimpanzee and of murine orthologs, respectively, which allowed us to estimate separate effects of reciprocal replacements, instead of their averages (13
,14
).
| RESULTS |
|---|
|
|
|---|
Coefficients of selection revealed by data on diseases, intrapopulation diversity and interspecies divergence
Missense mutations, which are responsible for Mendelian diseases, almost always disrupt the function of the protein completely or, at least, drastically (1
12% (21
Figure 1 shows how contributions of a deleterious mutation to intrapopulation diversity (heterozygosity) and to evolution of the lineage depend on s. Rate of evolution, the per site per generation number of substitutions, was calculated using Eq. (10
) from Bulmer (22
). Diversity, the probability that two sequences randomly drawn from the population are different at a site (2
), was calculated by combining this equation with Eq. (10
) from McVean and Charlesworth (23
). These contributions are maximal as long as a mutation is effectively neutral, s<0.25Ne1, where Ne is the effective population size. When s increases past this value, both rate of evolution and diversity rapidly drop, becoming negligible with s>2Ne1 and s>10Ne1, respectively, because diversity drops a little slower. There is a general agreement that SNPs segregating in modern humans mostly reflect the properties of the population with Ne=104 (24
). It is not clear whether Ne in the course of humanchimpanzee divergence was only slightly higher, i.e. 2x104 as in modern chimpanzees (24
), or substantially higher, i.e. 510x104 as suggested by some (25
,26
), although not all (27
), analyses of ancient polymorphisms, than 104. We accept Ne=5x104 for humanchimpanzee divergence, as otherwise it is hard to explain why missense mutations are much more common among SNPs than among replacements (discussed subsequently).
|
Let us define FMD, FSNP and FEVOL as the fractions of replacements which cause Mendelian diseases, are absent from non-synonymous SNPs and do not contribute to protein evolution, respectively. Then, the aforementioned analysis suggests that, as a rough approximation, FMD, FSNP and FEVOL correspond to fractions of amino acid replacements with s>102, s>104 and s>105, respectively. Thus, we can estimate the fractions of amino acid replacements associated with selection coefficients s>102, 102104, 104105 and <105 as FMD, FSNPFMD, FEVOLFSNP and 1FEVOL, respectively.
Distributions of selection coefficients
Table 1 presents data on FMD, FSNP and FEVOL for each of the 150 amino acid replacements. A case where an entry exceeds 1 occurs because the number of observed non-synonymous nucleotide substitutions exceeded what was expected from data on nonsense substitutions, presumably due to sampling errors. The raw data on which Table 1 is based are presented in Supplementary Material, Tables S1 and S2.
|
Because s varies over several orders of magnitude, we will mostly use its decimal logarithm as an independent variable. The observed average values of FMD, FSNP and FEVOL (Table 1) imply that 25, 49, 14 and 12% of amino acid replacements lead to log s >2, 2> log s>4, 4>log s>5 and log s<5, respectively (Fig. 2). Among the 150 distributions of log s, standard deviations (and coefficients of variation) of their fractions which fall within these four ranges are 0.20 (0.77), 0.23 (0.49), 0.11 (0.79) and 0.12 (1.03), respectively. Table 2 presents M[log s], the mean value of log s, for each of the 150 distributions. The average and the standard deviation of M[log s] among them are 3.045 and 0.638, respectively. Supplementary Material, Figure S1 displays 20 pairs of distributions of selection coefficients associated with all replacements in which a particular amino acid is either a source or a destination.
|
|
Impact of chemical characteristics of amino acids
Replacements involving chemically dissimilar amino acids (their characteristics were taken from Amino Acid Index, 28) are, on average, associated with stronger selection (Table 3). Among individual characteristics, polarity has the largest impact (Table 3; Fig. 3), but the impacts of Grantham index (29
|
|
|
The dependence of the strength of selection associated with replacements, which involve a particular amino acid on the average chemical distance between the amino acid and its neighbors in the genetic code, i.e. the amino acids, which can be obtained from it after a single-nucleotide substitution, is shown in Figure 4. Regression, over this distance, of the average M[log s] for all replacements in which an amino acid participates either as the source or as the destination are 0.68 and 0.62, respectively.
|
Differences between reciprocal replacements
Reciprocal amino acid replacements may be associated with different distributions of selection coefficients (Tables 1 and 2). We describe this phenomenon by normalized differences between the properties of such reciprocal distributions of log s. For example, the normalized difference between the reciprocal values of M[log s] for amino acids A and B is (MA>B[log s]MB>A[log s])/(MA>B[log s]+MB>A[log s]). If no order is introduced on the set of amino acids (and, thus, each reciprocal pair is left unordered), absolute values of these normalized differences must be used. Among the 75 reciprocal pairs of distributions of log s, the average absolute value of the normalized difference (and its coefficients of variation) between their values of M[log s] is 0.09 (1.07) and the corresponding values for the fractions of distributions which fall within the four ranges of log s are 0.27 (1.26), 0.56 (1.11), 0.32 (0.70) and 0.35 (1.56) for log s>2, 2>log s>4, 4>log s>5 and log s<5, respectively.
Differences between selection coefficients associated with reciprocal replacements depend on the chemical properties of the amino acids involved (Table 5; Figs 5 and 6). Replacements which reduce polarity are associated with significantly higher probabilities of log s>2 and with larger M[log s] than reciprocal replacements (Table 5; Figs 5 and 6).
|
|
|
| DISCUSSION |
|---|
|
|
|---|
Caveats
Our estimate of the fraction of amino acid replacements which destroy the protein function (FMD=25%, Table 1) may be too high because: (i) disease-causing loci encoding less robust proteins are more likely to be already discovered; (ii) we ignored a few robust proteins (JAG1, PDK1, DMD) with many nonsense mutations but below 40 pathogenic missense mutations are known and (iii) some missense mutations are pathogenic because they disrupt splicing (31
The real fraction of replacements which can be fixed in the course of evolution may be even below our already low estimate (1FEVOL=12%, Table 1), because positive selection leads to very rapid fixations of some of the allowed replacements. However, the fraction of selection-driven replacements is probably not very high, i.e. 25±20% in Drosophila (32
) and the data on hominids are even less certain (33
). In contrast, ignoring 0.9% of protein sites where the murine amino acid differs from both primate amino acids probably biased our sample of sites towards those which are more conservative and led to underestimation of 1FEVOL.
Finally, the range of selection coefficients which allows segregation within modern human populations but not fixation in the course of humanchimpanzee divergence is not known precisely, due to uncertainty of the history of Ne in hominid evolution. However, neither of these problems is likely to affect the observed patterns strongly.
General features of selection
For a vast majority of amino acid replacements, FMD<FSNP<FEVOL (Table 1), which testifies to the general soundness of our data and analysis. The distribution of selection coefficients associated with nucleotide substitutions (34
) or amino acid replacements encompasses several orders of magnitude and, thus, cannot be inferred from any one kind of data. In particular, direct measurements of fitness (34
) are not suitable to ascertain the tail of the distribution which corresponds to small values of s. Data presented in Table 1 are in good agreement with the literature (8
14
) regarding this tail of the distribution and with (5
) regarding its opposite tail, and also reveal several new patterns.
First, the distribution of log s is, very roughly, flat: each order of magnitude is populated by 1020% of selection coefficients (Fig. 2). Every particular amino acid replacement can lead to the whole range of effects on fitness. Negative correlation, among 150 possible replacements, between the probabilities that a particular replacement leads to log s>2 versus log s<5 is not strong (0.26, P<0.002). Thus, any replacement can be associated with any selection coefficient, and the impact of a replacement depends on its context within the protein more than on its own nature. Although a replacement could be selectively neutral, it is unlikely that log s can be routinely below, say, 7 (4
). As the dependence of the rate of evolution on log s is strongly non-linear (Fig. 1), flat distribution of log s leads to very slow evolution at the majority of protein sites, with only
10% of sites (where s<2Ne1) being responsible for most of the fixed replacements (9
12
).
Secondly, average selection coefficients associated with different amino acid replacements can be rather different (Table 2). For seven replacements, M[log s]>2, and for another eight replacements, M[log s]<4, which corresponds to 100-fold difference between the geometric means of the corresponding selection coefficients.
Thirdly, probabilities of very strong (log s>2) and of very weak (log s<5) selection are very heterogeneous among different amino acid replacements. A number of conservative replacements (e.g. A>G, L>M, V>I) lead to log s>2 with probabilities below 5%, and for some radical replacements (e.g. I>R, G>D, W>C), these probabilities exceed 50%. Thus, identities of amino acids involved in a replacement may per se provide useful information on its possible clinical relevance.
Finally, strength of selection substantially depends on the difference between chemical characteristics of the amino acids involved in a replacement. In particular, replacements of amino acids with similar polarity are usually associated with small M[log s] (Fig. 3). Large differences between polarities or hydrophobicities of the amino acids involved in a replacement mostly shift the distribution of log s to the right, increasing its portion which corresponds to log s>2 and decreasing its portion which corresponds to log s<5 (Table 3). Still, simple chemical characteristics of the amino acids can explain only 2535% of the variance in M[log s] among the 75 pairs of reciprocal replacements (Table 4). In other words, adding a measure of amino acid exchangeability to a linear mutational model does not increase very much its ability to predict frequencies of substitutions among human disease-causing mutations, non-disease SNPs or in humanchimpanzee alignments (30
).
Heterogeneity of the strength of selection remains high if we consider together all replacements, which either remove or insert a particular amino acid. The average values of M[log s] for all replacements which remove C or S (the two opposite extreme cases) are 2.14 or 3.40, respectively. The average values of M[log s] for all replacement which insert C or A are 2.61 or 3.48, respectively (Table 2). The average strength of selection associated with replacements, which involve a particular amino acid, strongly depends on the chemical distance of the amino acids from its 1-substitution neighbors in the genetic code table (Fig. 4), with the effect of the individual characteristics of the amino acid being much weaker (data not shown).
Differences between reciprocal replacements
The patterns described earlier do not imply any difference between distributions of selection coefficients associated with reciprocal replacements. However, such differences are often substantial (Table 3; Fig. 5). Indeed, within a reciprocal pair, the two values of M[log s] may differ from each other by >1.0 (e.g. 1.63 for I>K versus 3.17 for K>I or 1.62 for Y>S versus 3.30 for S>Y). In other words, any matrix which describes selection due to amino acid replacements must be asymmetrical.
In several cases, differences persist if we consider together all replacements which involve a particular amino acid either as a source or as a destination (Table 2; Supplementary Material, Fig. S1). I, C and G are the most disproportionally protected amino acids: selection generated by their removals is, on average, substantially stronger than selection generated by their insertions. The opposite pattern is observed for P, D and S (Table 2; Fig. 4).
The impact of polarity on the differences between selection coefficients associated with reciprocal replacements is mostly confined to the right tail of the distribution of log s (Fig. 6). Replacements which increase polarity lead to drastic selection with higher probabilities than reciprocal replacements (Table 5). Perhaps, such replacements more often occur within the cores of globular proteins. In fact, left tails of the distributions of log s associated with reciprocal replacements may also be rather different (Table 1), but these differences are not associated with differences between chemical characteristics of the replaced amino acids (Table 5).
Fluxes and constraints
Evolution of proteins involves directional fluxes of amino acid replacements, leading to accumulation of C, M, H, S and F and to loss of P, A, E and G (35
). This pattern can be due to different strengths of selection associated with reciprocal replacements. Indeed, there is a substantial correlation, 0.51 (P<0.025), between the difference in average values of M[log s] associated with all replacements, which either create or remove an amino acid, and the normalized difference between the numbers of such replacements in the course of humanchimpanzee divergence. The order of the increasing differences in the values of M[log s] which favors the removal of an amino acid is (Table 2): I, C, G, A, H, W, V, N, L, Y, R, F, T, Q, M, K, E, S, D, P. In particular, P, the extreme loser, is the least protected, and C, the extreme gainer, is the second most protected amino acid. However, G and A, both losers, are also highly protected. Moreover, if, instead of M[log s], we consider the fractions of the distributions of selection coefficients with log s<5, which may be more relevant to evolutionary fluxes, the correlation with the direction of these fluxes disappears (0.15, P<0.52, order of amino acids: Q, I, L, P, T, H, W, S, M, N, Y, C, K, E, G, D, A, V, R, F). Thus, the pattern of evolutionary gain or loss of amino acids appears to be mostly determined not by any asymmetry in the matrix which describes selection, but by amino acid composition of proteins. Genetic code neighbors of gainers are probably overabundant, and genetic code neighbors of losers are under-represented.
| MATERIALS AND METHODS |
|---|
|
|
|---|
General approach
Because selection coefficients span several orders of magnitude, data of different nature are needed to analyze different portions of their distribution. The approaches behind the analyses of all types of data, however, are essentially identical. We compare frequencies of nucleotide substitutions leading to amino acid replacements with that of substitutions of known phenotypic effects, either to nonsense substitutions (having a drastic effect) or to substitutions in introns (having no effect).
All the comparisons take into account the rate at which substitutions are introduced by mutation. Only four classes of mutations are considered: non-CpG transversions, non-CpG transitions, CpG transversions and CpG transitions, because CpG context has, so far, the largest impact on the mutation rate in mammals (36
).
We analyze separately three kinds of amino acid replacements: (i) those which disrupt protein function severely, (ii) those which do not segregate in human populations and (iii) those which are not fixed in the course of hominid evolution. In each case, we compare the abundance of the observed missense substitutions to a standard with the properties known a priori, either to drastic nonsense substitutions (i) or to selectively neutral substitutions in introns (ii and iii). The results are then combined to crudely ascertain the distribution of selection coefficients associated with each of the 150 amino acid replacements, which can occur due to a single-nucleotide substitution.
Mendelian diseases
What is the fraction of codons, in a disease-causing gene, at which a particular amino acid replacement is non-pathogenic and thus never attracts medical attention? Medically important replacements at a codon may be unknown because of two reasons: either these replacements are indeed non-pathogenic or they are pathogenic, but have not been detected yet, because of the limited sample size. To discriminate between these two situations, we need to estimate the probability of a pathogenic replacement being observed, given the depth of the sample for a particular gene. Observed frequencies of nonsense mutations, properly corrected for target size and mutation rates, can be used to obtain such estimates.
Missense and nonsense mutations were analyzed for 34 loci which cause severe autosomal dominant or X-linked Mendelian diseases, such that even mild clinically relevant alleles do not persist in the population for too long and are mostly confined to one recognized family. Data on loci causing autosomal recessive diseases are too noisy, because pathogenic alleles of such loci may have very long persistence times. We analyzed all the loci for which (i) at least 40 missense germline mutations are known, (ii) loss-of-function alleles cause clear-cut diseases (thus, only haploinsufficient loci were used, ruling out, for example, MYH7 and VMD2; negative dominance on top of haploinsufficiency, as in FBN1, does not make a locus unsuitable) and (iii) loss-of-function alleles are not pre-natally lethal (ruling out G6PD). The following loci satisfied these requirements: AT3, FBN1, GCK, KCNH2, LDLR, MEN1, MLH1, MPZ, MYBPC3, NF1, PAX6, RDS, TP53, TSC2, VHL, VWF (autosomal dominant) and ABCD1, AR, AVPR2, BTK, CYBB, F8, F9, GJB1, GLA, HPRT, IDS, IL2RG, L1CAM, MECP2, OTC, PHEX, RPS6KA3, RS1 (X-linked). Because Human Gene Mutation Database does not contain information on how many times a particular mutation has been detected (1
) and combines data from all studies (including those where ascertainment of mutations was obviously biased, for example, due to inclusion of data obtained with PTT, which overlooks missense mutations), we collected mutations from locus-specific databases, reviews and primary literature (Supplement 1).
For each locus, the analysis was performed as follows. First, for each amino acid replacement A>B and each mutational class m (m=1, ..., 4), we determined the target TA>B,m, the number of codons encoding A in the wild-type allele where a mutation of class m would cause an A>B replacement. Per site per generation mutation rates were assumed to be 0.39x108, 0.74x108, 1.1x108 and 11x108 for non-CpG transversions, non-CpG transitions, CpG transversions and CpG transitions, respectively (15
).
Then, we estimated selective constraint pathCA>B,m, the fraction of TA>B,m made by codons where a class m mutation causing an A>B replacement would be pathogenic. The unbiased estimate of pathCA>B,m is provided by PA>B,m/(DA>B,mTA>B,m), where PA>B,m is the number of codons, where one or more pathogenic class m mutations causing A>B replacement were observed, and DA>B,m is the probability that such an observation would be made for a codon where such mutations are, indeed, pathogenic. As pathogenic mutations present in every sample occurred independently, the number of class m mutations causing A>B replacements at a codon has Poisson distribution and DA>B,m=1exp(EA>B,m), where EA>B,m is the expected number of such mutations per codon. EA>B,m was estimated using data on nonsense mutations, almost all of which must be loss-of-function and, thus, pathogenic and observable.
The number of class m mutations per codon where such a mutation would produce a nonsense replacement is Nnonsense,m/Tnonsense,m, where Nnonsense,m is the observed number of class m nonsense mutations and Tnonsense,m the corresponding target. We used the average for all four classes of nonsense mutations weighted by the corresponding targets and, thus, estimated EA>B,m as µm
i=1, ...,4(Nnonsense,i/µi)/Tnonsense, where µm is the per nucleotide rate of m class mutations and Tnonsense the overall target for all nonsense mutations. If mutations of more than one class can cause A>B replacements, pathCA>B was calculated as the average of all relevant CA>B,m values weighted only by the corresponding targets. Thus, we estimated the fraction of pathogenic replacements from all possible A>B replacements. Finally, the average value of pathCA>B across the 34 loci was calculated for each replacement.
By taking into account the corresponding mutation rates, we can also estimate the fraction of pathogenic mutations among all de novo A>B mutations, which is disproportionally affected by selective constraint at more mutable codons. However, these two fractions are usually very similar (data not shown).
Polymorphisms
Missense nucleotide substitutions are present among polymorphisms and among mutations fixed in the course of evolution at lower proportions than neutral substitutions occurring at the same mutation rates. This deficit of missense substitutions can be used to estimate the severity of constraint acting against each amino acid replacement. We compared proportions of missense and neutral substitutions having equal mutation rates by considering separately the four classes of mutations.
Human SNPs were obtained from dbSNP (37
). We used SNPs with the following submitter handles in dbSNP flat file: CSHL-HAPMAP, BCM_SSAHASNP, SC_JCM, SSAHASNP, WI_SSAHASNP, TSC-CSHL, WUGSC_SSAHASNP, SC_SNP, SC. These SNPs are validated and were obtained in random, genome-wide, non-exon targeted assays (37
). All SNPs were analyzed together. All SNPs were polarized by orthologous chimpanzee sequences from UC Santa Cruz humanchimpanzee pairwise alignments (http://www.genome.ucsc.edu/cgi-bin/hggateway). As the total number of missense SNPs which we used was 5251 for 13 533 loci, the depth of the genome-wide coverage is only
2, and the density of SNPs reflects nucleotide diversity (see Results of 2).
We estimated SNPCA>B,m, the fraction of class m mutations causing A>B replacement which are too deleterious to be present in SNPs as 1(NA>B,m/TA>B,m)/
m, where TA>B,m is the number of human codon sites (including monomorphic codons and ancestral alleles at polymorphic codons) which encode A at which an m class mutation would cause an A>B replacement, NA>B,m is the total number of such replacements observed as SNPs at these codons and
m is the nucleotide diversity, due to mutations of class m, at neutral sites.
m was estimated as 0.00053, 0.00064, 0.0014 and 0.0069 for non-CpG transversion, non-CpG transitions, CpG transversions and CpG transitions, respectively, using 74 406 SNPs from the same sample at RepeatMasked intron sites, ignoring the first and the last 50 nucleotides of each intron. Data were averaged with weighting by the frequency of each source codon, but not by frequency of individual mutations.
Fixed replacements
Three-way alignments of the same 13 533 triplets of orthologous loci from human, chimpanzee and mouse were created by finding the orthologous mouse gene for each UCSC humanchimpanzee pair with the two-directional best BLAST hit approach (38
). All loci were analyzed together. Codon sites where the murine amino acid was different from both primate amino acids were ignored, as well as codons bordering a humanchimpanzee mismatched nucleotide or rare codons at which human and chimpanzee differ by more than one substitution. At all other codon sites, the primate codon, which encoded the same amino acid as the mouse codon, was assumed to be the ancestral for human and chimpanzee.
The fraction of class m mutations causing A>B replacements which are too deleterious to be fixed in evolving human and primate lineages evolCA>B,m was estimated as 1(NA>B,m/TA>B,m)/Em, where TA>B,m is the number of ancestral codons which encoded A at which an m class mutation would cause an A>B replacement, NA>B,m the total number of such replacements which occurred (in both lineages) at these codons and Em the neutral humanchimpanzee divergence. Em was ascertained, for the same intron sites which were used for estimating
m, simply as a fraction of mismatches of the corresponding class among all sites, because the probability of multiple humanchimpanzee substitutions at a site is very low. Em equals 0.0038, 0.0049, 0.0435 and 0.1380 for non-CpG transversion, non-CpG transitions, CpG transversions and CpG transitions, respectively. There is no suitable outgroup for the introns, and we assumed that when CpG in one hominid was aligned with a non-CpG in the other hominid, the ancestral sequence was CpG in 50% of cases (equilibrium). Again, data were averaged with weighting by the frequency of each source codon.
| SUPPLEMENTARY MATERIAL |
|---|
|
|
|---|
Supplementary Material is available at HMG Online.
Conflict of Interest statement. None declared.
| REFERENCES |
|---|
|
|
|---|
- Stenson, P.D., Ball, E.V., Mort, M., Phillips, A.D., Shiel, J.A., Thomas, N.S., Abeysinghe, S., Krawczak, M. and Cooper, D.N. (2003) Human Gene Mutation Database (HGMD): 2003 update. Hum. Mutat., 21, 577581.[CrossRef][ISI][Medline]
-
Sunyaev, S., Ramensky, V., Koch, I., Lathe, W., III, Kondrashov, A.S. and Bork, P. (2001) Prediction of deleterious human alleles. Hum. Mol. Genet., 10, 591597.
[Abstract/Free Full Text] - Kimura, M. (1983) The Neutral Theory of Molecular Evolution. Cambridge University Press, Cambridge.
-
Ohta, T. (2002) Near-neutrality in evolution of genes and gene regulation. Proc. Natl Acad. Sci. USA, 99, 1613416137.
[Abstract/Free Full Text] - Vitkup, D., Sander, C. and Church, G.M. (2003) The amino-acid mutational spectrum of human genetic disease. Genome Biol., 4, R72.[CrossRef][Medline]
- Terp, B.N., Cooper, D.N., Christensen, I.T., Jorgensen, F.S., Bross, P., Gregersen, N. and Krawczak, M. (2002) Assessing the relative importance of the biophysical properties of amino acid substitutions associated with human genetic disease. Hum. Mutat., 20, 98109.[CrossRef][ISI][Medline]
- Miller, M.P., Parker, J.D., Rissing, S.W. and Kumar, S. (2003) Quantifying the intragenic distribution of human disease mutations. Ann. Hum. Genet., 67, 567579.[CrossRef][ISI][Medline]
-
Henikoff, S. and Henikoff, J.G. (1992) Amino acid substitution matrices from protein blocks. Proc. Natl Acad. Sci. USA, 89, 1091510919.
[Abstract/Free Full Text] - Rand, D.M., Weinreich, D.M. and Cezairliyan, B.O. (2000) Neutrality tests of conservative-radical amino acid changes in nuclear- and mitochondrially-encoded proteins. Gene, 261, 115125.[CrossRef][ISI][Medline]
-
Grishin, N.V., Wolf, Y.I. and Koonin, E.V. (2000) From complete genomes to measures of substitution rate variability within and between proteins. Genome. Res., 10, 9911000.
[Abstract/Free Full Text] -
Nielsen, R. and Yang, Z. (2003) Estimating the distribution of selection coefficients from phylogenetic data with applications to mitochondrial and viral DNA. Mol. Biol. Evol., 20, 12311239.
[Abstract/Free Full Text] -
Piganeau, G. and Eyre-Walker, A. (2003) Estimating the distribution of fitness effects from DNA sequence data: implications for the molecular clock. Proc. Natl Acad. Sci. USA, 100, 1033510340.
[Abstract/Free Full Text] - Majewski, J. and Ott, J. 2003 Amino acid substitutions in the human genome: evolutionary implications of single nucleotide polymorphisms. Gene, 305, 167173.[CrossRef][ISI][Medline]
-
Tang, H., Wyckoff, G.J., Lu, J. and Wu, C.I. (2004) A universal evolutionary index for amino acid changes. Mol. Biol. Evol., 21, 15481556.
[Abstract/Free Full Text] - Kondrashov, A.S. (2003) Direct estimates of human per nucleotide mutation rates at 20 loci causing Mendelian diseases. Hum Mutat., 21, 1227.[CrossRef][ISI][Medline]
-
Thatcher, J.W., Shaw, J.M. and Dickinson, W.J. (1998) Marginal fitness contributions of nonessential genes in yeast. Proc. Natl Acad. Sci. USA, 95, 253257.
[Abstract/Free Full Text] -
Kondrashov, F.A., Ogurtsov, A.Y., Kondrashov, A.S. (2004) Bioinformatical assay of human gene morbidity. Nucleic Acids Res., 32, 17311737.
[Abstract/Free Full Text] - Sawyer, S.L., Berglind, L.C. and Brookes, A.J. (2003) Negligible validation rate for public domain stop-codon SNPs. Hum. Mutat., 22, 252254.[CrossRef][ISI][Medline]
- Lewontin, R.C. (1974) The Genetic Basis of Evolutionary Change. Columbia University Press, New York.
-
McCune, A.R., Fuller, R.C., Aquilina, A.A., Dawley, R.M., Fadool, J.M., Houle, D., Travis, J. and Kondrashov, A.S. (2002) A low genomic number of recessive lethals in natural populations of bluefin killifish and zebrafish. Science, 296, 23982401.
[Abstract/Free Full Text] - Crow, J.F. (1979) Minor viability mutations in Drosophila. Genetics, 92, s165s172.[ISI][Medline]
- Bulmer, M. (1991) The selection-mutation-drift theory of synonymous codon usage. Genetics, 129, 897907.[Abstract]
- McVean, G.A.T. and Charlesworth, B. (1999) A population genetic model for the evolution of synonymous codon usage: patterns and predictions. Genet. Res., 74, 145158.[CrossRef]
-
Yu, N., Jensen-Seaman, M.I., Chemnick, L., Kidd, J.R., Deinard, A.S., Ryder, O., Kidd, K.K. and Li, W.-H. (2003) Low nucleotide diversity in chimpanzees and bonobos. Genetics, 164, 15111518.
[Abstract/Free Full Text] - Chen, F.C. and Li, W.-H. (2001) Genomic divergences between humans and other hominoids and the effective population size of the common ancestor of humans and chimpanzees. Am. J. Hum. Genet., 68, 444456.[CrossRef][ISI][Medline]
- Satta, Y., Hickerson, M., Watanabe, H., O'hUigin, C. and Klein, J. (2004) Ancestral population sizes and species divergence times in the primate lineage on the basis of intron and BAC end sequences. J. Mol. Evol., 59, 478487.[CrossRef][ISI][Medline]
-
Rannala, B. and Yang, Z. (2003) Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. Genetics, 164, 16451656.
[Abstract/Free Full Text] -
Kawashima, S. and Kanehisa, M. (2000) AAindex: amino acid index database. Nucleic Acids Res., 28, 374.
[Abstract/Free Full Text] -
Grantham, R. (1974) Amino acid difference formula to help explain protein evolution. Science, 185, 862864.
[Abstract/Free Full Text] -
Yampolsky, L.Y. and Stoltzfus, A. (2005) The exchangeability of amino acids in proteins. Genetics, 170, 14591472.
[Abstract/Free Full Text] - Gorlov, I.P., Gorlova, O.Y., Frazier, M.L. and Amos, C.I. (2003) Missense mutations in hMLH1 and hMSH2 are associated with exonic splicing enhancers. Am. J. Hum. Genet., 73, 11571161.[CrossRef][ISI][Medline]
-
Bierne, N. and Eyre-Walker, A. (2004) The genomic rate of adaptive amino acid substitution in Drosophila. Mol. Biol. Evol., 21, 13501360.
[Abstract/Free Full Text] -
Vallender, E.J. and Lahn, B.T. (2004) Positive selection on the human genome. Hum Mol Genet. 13 (Spec no. 2), R245R254.
[Abstract/Free Full Text] -
Sanjuan, R., Moya, A. and Elena, S.F. (2004) The distribution of fitness effects caused by single-nucleotide substitutions in an RNA virus. Proc. Natl Acad. Sci. USA, 101, 83968401.
[Abstract/Free Full Text] - Jordan, I.K., Kondrashov, F.A., Adzhubei, I.A., Wolf, Y.I., Koonin, E.V., Kondrashov, A.S. and Sunyaev, S. (2005) A universal trend of amino acid gain and loss in protein evolution. Nature, 433, 633638.[CrossRef][Medline]
-
Hwang, D.G. and Green, P. (2004) Bayesian Markov chain Monte Carlo sequence analysis reveals varying neutral substitution patterns in mammalian evolution. Proc. Natl Acad. Sci. USA, 101, 1399414001.
[Abstract/Free Full Text] -
Tatusov, R.L., Koonin, E.V. and Lipman, D.J. (1997) A genomic perspective on protein families. Science, 278, 631637.
[Abstract/Free Full Text] -
Sherry, S.T., Ward, M.H., Kholodov, M., Baker, J., Phan, L., Smigielski, E.M. and Sirotkin, K. (2001) dbSNP: the NCBI database of genetic variation. Nucleic Acids Res., 29, 308311.
[Abstract/Free Full Text]
This article has been cited by other articles:
![]() |
D. Bachtrog The Temporal Dynamics of Processes Underlying Y Chromosome Degeneration Genetics, July 1, 2008; 179(3): 1513 - 1525. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Charlesworth and A. Eyre-Walker The McDonald-Kreitman Test and Slightly Deleterious Mutations Mol. Biol. Evol., June 1, 2008; 25(6): 1007 - 1015. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. T. Saunders and P. Green Insights from Modeling Protein Evolution with Context-Dependent Mutation and Asymmetric Amino Acid Selection Mol. Biol. Evol., December 1, 2007; 24(12): 2632 - 2647. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. L. Thorne, S. C. Choi, J. Yu, P. G. Higgs, and H. Kishino Population Genetics Without Intraspecific Data Mol. Biol. Evol., August 1, 2007; 24(8): 1667 - 1677. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. G. Nackley, S. A. Shabalina, I. E. Tchivileva, K. Satterfield, O. Korchynskyi, S. S. Makarov, W. Maixner, and L. Diatchenko Human Catechol-O-Methyltransferase Haplotypes Modulate Protein Expression by Altering mRNA Secondary Structure Science, December 22, 2006; 314(5807): 1930 - 1933. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Subramanian and S. Kumar Higher Intensity of Purifying Selection on >90% of the Human Genes Revealed by the Intrinsic Replacement Mutation Rates Mol. Biol. Evol., December 1, 2006; 23(12): 2283 - 2287. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. M. Teshima, G. Coop, and M. Przeworski How reliable are empirical genomic scans for selective sweeps? Genome Res., June 1, 2006; 16(6): 702 - 712. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Eyre-Walker, M. Woolfit, and T. Phelps The Distribution of Fitness Effects of New Deleterious Amino Acid Mutations in Humans Genetics, June 1, 2006; 173(2): 891 - 900. [Abstract] [Full Text] [PDF] |
||||
![]() |
|










