Skip Navigation


Human Molecular Genetics Advance Access originally published online on October 20, 2005
Human Molecular Genetics 2005 14(23):3595-3603; doi:10.1093/hmg/ddi387
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Material
Right arrow All Versions of this Article:
14/23/3595    most recent
ddi387v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (9)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Freimuth, R. R.
Right arrow Articles by Kwok, P.-Y.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Freimuth, R. R.
Right arrow Articles by Kwok, P.-Y.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Polymorphism discovery in 51 chemotherapy pathway genes

Robert R. Freimuth1,2,3,{dagger},{ddagger}, Ming Xiao4,{dagger}, Sharon Marsh1,2,3, Matthew Minton1,2,3, Nicholas Addleman4, Derek J. Van Booven1,2,3, Howard L. McLeod1,2,3 and Pui-Yan Kwok4,*

1Department of Medicine, 2Department of Molecular Biology and Pharmacology and 3Department of Genetics, Washington University School of Medicine, St Louis, MO, USA and 4Cardiovascular Research Institute and Department of Dermatology, University of California, San Francisco, CA, USA

* To whom correspondence should be addressed at: Department of Dermatology, University of California, 513 Parnassus Avenue, PO Box 0793, HSW-901G, San Francisco, CA 94143-0793, USA. Tel: +1 4155143802; Fax: +1 4154762956; Email: pui.kwok{at}ucsf.edu

Received August 5, 2005; Revised October 3, 2005; Accepted October 11, 2005


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 MATERIALS AND METHODS
 SUPPLEMENTARY MATERIAL
 REFERENCES
 
Candidate gene pharmacogenetic studies offer a strategy for the rapid assessment of putative predictive markers. As a first step toward studying the pharmacogenetics of cancer chemotherapy, 51 candidate genes from the pathways of antineoplastic agents were resequenced to identify common genetic polymorphisms that might alter therapeutic response or toxicity. Forty DNA samples were screened from each of three population groups: African-Americans, Asian-Americans and European-Americans. Nearly 378 kb of genomic sequence was obtained from each sample. Nine hundred and four variants were identified, including 139 coding single nucleotide polymorphisms (cSNPs). Three hundred and fifty-six (40%) polymorphisms were common to all three populations and 366 (41%) were population specific. Three hundred and forty-six (38%) variants were novel polymorphisms that were not present in the three public databases that were examined. One hundred and eleven (35%) of the 319 non-synonymous cSNPs that were identified by either resequencing or database mining were predicted by PolyPhen to be either possibly or probably damaging. For the non-synonymous cSNPs identified by resequencing, both the number of cSNPs found and the maximum estimated allele frequency decreased with increasing predicted severity. These results provide experimental validation and estimated allele frequencies for polymorphisms in three common ethnic groups and facilitate applied pharmacogenetic studies of anticancer drugs.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 MATERIALS AND METHODS
 SUPPLEMENTARY MATERIAL
 REFERENCES
 
Pharmacogenetics is the study of how common genetic variations influence the metabolism, response or toxicity of therapeutic agents (1Go,2Go). Therefore, pharmacogenetic studies require both a set of candidate genes and a comprehensive list of genetic variants within those genes. Historically, pharmacogenetic studies have focused on the genetic variation found within a single gene, but the completion of the human genome project, combined with a growing understanding of the proteins involved in the transport, metabolism and mechanism of action of drugs, has enabled investigators to take a wider approach to pharmacogenetic studies that focuses on candidate genes in drug pathways (3Go).

Public single nucleotide polymorphism (SNP) databases provide investigators with a set of putative variants within genes of interest, and therefore, they are an essential resource for pharmacogenetic studies. However, only about half of the 10 million polymorphisms in dbSNP have been validated (and experimental validation accounts for only a subset of those), which limits the number of SNPs that can be confidently included in pharmacogenetic studies [National Center for Biotechnology Information (NCBI) dbSNP build 124, www.ncbi.nlm.nih.gov/SNP] (4Go,5Go). In addition, both public and private resequencing projects are still discovering many novel variants, and population studies are finding examples of SNPs that have striking differences in allele frequency between ethnic groups (6Go–14Go). This suggests that the true extent of variation in the human genome is still unknown, and the distribution of those variants among population groups is not completely understood. Therefore, SNP discovery efforts in diverse ethnic populations are still necessary for pharmacogenetic studies, especially for widely used drugs that have a narrow therapeutic index.

In this study, 51 candidate genes from the pathways of nine anticancer agents were resequenced to identify common genetic polymorphisms that might alter therapeutic response or toxicity. DNA samples from each of three different populations, African-Americans (n=40), Asian-Americans (n=40) and European-Americans (n=40), were screened, which allowed comparison of SNP discovery rates and estimated allele frequencies between the three ethnically diverse groups. In addition, the polymorphisms found by the resequencing project were compared with those that were present in three public SNP databases. Finally, the functional significance was predicted for all non-synonymous coding single nucleotide polymorphisms (cSNPs) that were identified within the 51 genes and the predicted severity of the amino acid change was compared with the estimated allele frequency of each SNP.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 MATERIALS AND METHODS
 SUPPLEMENTARY MATERIAL
 REFERENCES
 
Resequencing
Fifty-one candidate genes from the metabolic pathways of anticancer agents were selected for resequencing (Table 1) (http://pharmacogenetics.wustl.edu). To identify polymorphisms within the exons of each gene, resequencing reactions were designed that included the exons, intron sequence flanking each exon and proximal portions of the 5'- and 3'-flanking regions. Resequencing was performed using 40 DNA samples from each of three ethnically diverse populations: African-Americans, Asian-Americans and European-Americans. Nearly 378 kb of genomic sequence was obtained from each sample (Table 2). Of that sequence, approximately 107 kb (28%) was from exons, 70 kb (19%) was from the 5'- or 3'-flanking region and 201 kb (53%) was from intronic sequence.


View this table:
[in this window]
[in a new window]
 
Table 1. Genes included in this study
 

View this table:
[in this window]
[in a new window]
 
Table 2. Resequencing results
 
A total of 904 variants were identified by resequencing the 51 genes (Table 2). Of those, 242 (27%) were in the flanking region, 107 (12%) were in the untranslated region (UTR), 139 (15%) were in the open-reading frame (ORF) and 416 (46%) were in introns. On average, 2.4 polymorphisms were found per 1000 bp of sequence. The flanking regions had the highest density of variants (3.4 per kilobase), and the ORF was the least variable region (1.9 per kilobase).

The variants that were identified by resequencing were deposited into the Pharmacogenetics and Pharmacogenomics Knowledge Base (PharmGKB, http://www.pharmgkb.org).

Variant loci identified by database mining
The PolyMAPr program identified 8662 variants from dbSNP for the 51 genes in this study, of which 2037 mapped to regions of the genes that were resequenced. In addition, PolyMAPr identified 744 variants from JSNP (262 were in resequenced regions) and 282 variants from CGAP (257 were in resequenced regions). After eliminating inter-database redundancy, a total of 8851 variants were identified in the three databases, 2196 (25%) of which were in resequenced regions. Of those 2196 variants, 558 (25%) were within the ORF, 322 (15%) were within the flanking region, 296 (14%) were within the UTR and 1020 (46%) were within introns (Table 3).


View this table:
[in this window]
[in a new window]
 
Table 3. Variants in all 51 genes
 
Validation status of variants
In an attempt to quantitate the contribution of the resequencing data to that which already existed in public databases and to obtain an estimate of the confidence one might have in the variants found in public databases but not found by this resequencing study, the validation status was determined for each variant that was identified by database mining. Variants from dbSNP were considered validated if any of the validation tags (other-pop, by-frequency, by-cluster or by-2hit-2allele) was present in the SNP record. Variants from the JSNP and CGAP databases were classified as validated if an allele frequency was given (JSNP) or if they were present in the ‘validated’ SNPs file (CGAP), see ‘Database Mining’ for details.

Five hundred and fifty-eight (62%) of the 904 variants identified by resequencing were already present in at least one of the public databases, and 346 (38%) of the 904 variants were not found in any of the three public databases and therefore represent novel polymorphisms (Table 3). In all, 2542 variants were identified either by resequencing or by database mining. Of the 558 loci that were identified by both methods, 442 (79%) were already listed as validated in the databases, but only 104 (18.6%) also had frequency data available. Of the 1638 variants that were present in the databases, but not found by this resequencing study, only 562 (34%) were listed as validated, and of those, only 95 (5.8%) also had frequency data. The majority of variants that were not identified by resequencing and found only by database mining (1076, 66%) did not have any validation data (Fig. 1).



View larger version (24K):
[in this window]
[in a new window]
 
Figure 1. Validation status of variants in public databases. The number of variants in the resequenced regions of the 51 genes is shown, categorized according to whether they were found only by resequencing, only by database mining or by both methods. The number of variants that had validation and/or frequency data in the databases is also shown.

 
Distribution of variants among ethnic groups
Estimated allele frequencies for all three ethnic groups were available for 900 of the 904 polymorphisms that were identified by resequencing. In all the population groups, the number of variants identified decreased with increasing estimated allele frequency (Fig. 2). In addition, the African- American sample set contained more variants with allele frequencies <30% than either the European-American or the Asian-American sample sets, but the difference between population groups in the number of variants found also decreased with increasing estimated allele frequency. The African-American and Asian-American sample sets contained the highest and the lowest number of variants, respectively (Fig. 2).



View larger version (16K):
[in this window]
[in a new window]
 
Figure 2. Frequency histogram for variants found by resequencing. The number of variants identified in each ethnic group, stratified by estimated allele frequency.

 
Approximately 40% (356) of the polymorphisms were present in all three populations, 20% (178) were found in two of the three populations and 40% (366) were found in only one of the three groups (Fig. 3A). The number of variants found only in the African-American samples was 1.5- and 3.5-fold greater than the number found only in the European-American and Asian-American samples, respectively.



View larger version (20K):
[in this window]
[in a new window]
 
Figure 3. Number of variants found by resequencing. The number of variants identified by resequencing 40 samples from each of three different ethnic groups (n=120). The estimated allele frequencies were used to determine which variants were specific to a single population and which were found in more than one population. (A) Estimated allele frequencies were available for 900 of the 904 variants identified by resequencing. (B) Estimated allele frequencies were available for 137 of the 139 ORF variants. The number of non-synonymous and synonymous changes is given as non-synonymous/synonymous.

 
Of the 900 variants that were identified by resequencing and that had estimated allele frequency data, 556 (62%) were also present in a public database. On the basis of the estimated allele frequencies, 72% (398) of those 556 variants were found in more than one ethnic group, and 75% (417) had estimated allele frequencies ≥10% (Fig. 4). In contrast, only 40% (136) of the 344 novel variants (i.e. those identified only by resequencing) were present in more than one population and 38% (130) of them had allele frequencies ≥10%.



View larger version (19K):
[in this window]
[in a new window]
 
Figure 4. Variants found by resequencing and database mining. The number of variants identified by resequencing that were also found in public databases or found only by this resequencing project. The proportion of variants found that were specific to an ethnic group or that were found in more than one group is shown in the top row. The number of variants that had estimated allele frequencies >=10% is shown in the bottom row.

 
The average minor allele frequency for variants in each region of the gene was calculated for each population group. The average minor allele frequencies for variants located within non-coding regions (flanking region, UTR or introns) were similar both among regions and ethnic groups (11–18%) (Table 4). Likewise, synonymous and non-synonymous variants in the coding region had similar average frequencies among populations (7–11%). The average frequencies for variants in the coding region were uniformly lower than those for the non-coding regions. Finally, the overall average minor allele frequency for variants in each region was calculated using the greatest minor allele frequency observed in any population for each variant. The difference in average minor allele frequency was even greater between non-coding (22–27%) and coding (17%) variants when this measure was used (Table 4).


View this table:
[in this window]
[in a new window]
 
Table 4. Average minor allele frequencies by gene region
 
cSNPs identified by resequencing
One hundred and thirty-nine of the 904 variants found by resequencing mapped to the ORF. Of those, 64 changed the encoded amino acid, including 62 non-synonymous cSNPs, one triallelic SNP and one insertion–deletion (indel). Seventy-five of the 139 variants were synonymous. About 53% of the non-synonymous polymorphisms and 57% of the synonymous cSNPs were found in only one of the three population groups (Fig. 3B). The number of cSNPs found decreased with increasing estimated allele frequency. The overall ratio of non-synonymous:synonymous cSNPs was 0.8 (Table 5).


View this table:
[in this window]
[in a new window]
 
Table 5. cSNPs identified by resequencing
 
Functional analysis of non-synonymous cSNPs
The maximum estimated allele frequency among all three population groups was determined for each biallelic cSNP that was identified by resequencing (Fig. 5). Frequencies >50% represent cases where the variant allele in one population is the common allele in another population. In those cases, the variant allele was assigned arbitrarily and used so consistent comparisons could be made between population groups. The maximum estimated allele frequency observed among all 75 synonymous cSNPs was 71%. The maximum estimated allele frequencies for the 62 non-synonymous cSNPs decreased with increasing severity as predicted by PolyPhen: 71% for benign SNPs, 30% for possibly damaging changes and 13% for variants predicted to be probably damaging. The number of cSNPs found in each category also decreased with increasing predicted severity (Fig. 5).



View larger version (16K):
[in this window]
[in a new window]
 
Figure 5. Predicted functional significance of cSNPs. The number and maximum estimated allele frequency for the cSNPs identified by resequencing, by functional significance (as predicted by PolyPhen).

 
The combined approaches of resequencing and database mining identified 599 cSNPs. More than half (337, 56%) changed the encoded amino acid sequence. Of those, 319 (95%) were biallelic missense SNPs, 10 were nonsense SNPs, five were triallelic SNPs and three were indels. The PolyPhen program was used to predict the functional significance of the 319 missense cSNPs: 197 were benign, 54 were possibly damaging and 57 were probably damaging (predictions could not be made for the remaining 11 cSNPs).

As an alternative measure of conservation, BLOSUM62 scores were determined for each of the 319 non-synonymous and 262 synonymous biallelic cSNPs that were identified by either resequencing or database mining. Common amino acid changes have higher scores, whereas changes that are less common (and therefore assumed to be not as well tolerated) have lower scores. Non-synonymous cSNPs that were predicted by PolyPhen to be damaging tend to have lower BLOSUM62 scores than those that were predicted to be benign, but cSNPs from both categories were found throughout the entire range of BLOSUM62 scores for non- synonymous cSNPs (data not shown).


    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 MATERIALS AND METHODS
 SUPPLEMENTARY MATERIAL
 REFERENCES
 
Definitive pharmacogenetic studies require a comprehensive set of polymorphisms within candidate genes. As a first step toward studying the pharmacogenetics of cancer chemotherapy, 51 candidate genes from the pathways of antineoplastic agents were resequenced to identify common genetic polymorphisms that might alter therapeutic response or toxicity. Nine hundred and four polymorphisms were identified in 40 samples from each of three ethnic groups: African-Americans, Asian-Americans and European-Americans. Three hundred and forty six (38%) variants were novel polymorphisms that were not present in the three public databases examined, whereas 558 polymorphisms found by resequencing were already present in at least one of the databases. Of the 558 variants that were already present in the databases, only 79% were listed as validated and <19% also had frequency data available. Therefore, our resequencing effort provided experimental validation for more than 100 polymorphisms that were already in the databases, but that were not yet confirmed. In addition, this study also obtained estimated allele frequency data for 900 variants in three different ethnic groups. Even with substantial contributions by initiatives such as the HapMap project, more than 800 (89%) of those polymorphisms did not previously have any frequency data available in the dbSNP, JSNP or CGAP databases. Similar proportions of novel and validated variants have been observed in other recent resequencing studies (10Go,11Go,14Go–16Go). The identification of so many novel and common SNPs in this set of 51 genes implies that there are many common SNPs in other genes that are yet to be found. Therefore, additional resequencing efforts are still needed for comprehensive candidate gene pharmacogenetic studies.

Of the novel polymorphisms identified by resequencing, 38% had estimated allele frequencies ≥10% and 40% were found in more than one population group. In contrast, 75% of the variants that were already present in the databases had estimated allele frequencies ≥10%, and 72% were found in more than one ethnic group, similar to the data reported by Nelson et al. (15Go). Therefore, the novel variants identified by this resequencing study were more likely to be specific to one population group and have allele frequencies <10%, whereas the variants that were also found in a public database were more likely to be common to multiple populations and have allele frequencies ≥10%. In addition, the African-American samples contained the greatest number of variants and had the most population-specific variants, compared with the European-American and Asian-American samples, as has been reported in other polymorphism identification studies that used multiple ethnic groups (12Go,17Go–20Go). Conversely, 25, 31 and 45% of the variants were not found in the African-American, European-American and Asian-American samples, respectively, and 21–34% of the variants had allele frequencies <10% in those populations. Similar distributions were observed in other resequencing studies (8Go,13Go,15Go,17Go–19Go,21Go). About 40% of the variants identified were ‘cosmopolitan’ (both alleles were found in all three populations). Taken together, these results emphasize the importance of determining allele frequencies for candidate variants prior to selecting them for analysis in a specific ethnic group.

The distributions of average minor allele frequencies across different regions of the gene were similar in the three population groups. In addition, the average minor allele frequency for variants in different non-coding regions (flanking region, UTR or introns) were similar to each other (11–18%), but they were ~50% higher than those observed in the coding regions (7–11%). The average minor allele frequency for synonymous and non-synonymous variants was nearly identical in all populations. Similar results have been reported previously (8Go,12Go,22Go).

Finally, the functional significance of all 319 non- synonymous cSNPs that were identified by either resequencing or database mining was predicted. One hundred and eleven (35%) were predicted by PolyPhen to be possibly or probably damaging. The BLOSUM62 score was also determined for each cSNP and compared with the prediction by PolyPhen. In general, cSNPs predicted by PolyPhen to be damaging tended to have lower BLOSUM62 scores, but a significant number of cSNPs predicted to be ‘benign’ had low BLOSUM62 scores, and many cSNPs predicted to be damaging had BLOSUM62 scores greater than 0. Therefore, the BLOSUM62 score is often discordant with functional significance, as predicted by PolyPhen. Leabman et al. (9Go) also found that the BLOSUM62 score did not distinguish between damaging and benign non-synonymous cSNPs. In the set of non-synonymous cSNPs that were identified by resequencing (i.e. cSNPs for which allele frequencies were available for all three population groups), both the number of cSNPs found and the maximum estimated allele frequency decreased with increasing predicted severity. These observations are consistent with other resequencing studies (8Go,9Go,12Go).

It must be emphasized that none of the programmatic methods used to predict the functional effect of non- synonymous cSNPs (e.g. Grantham scores, BLOSUM62 scores, SIFT and PolyPhen) accurately predicts the functional effect of all cSNPs, and therefore, the predictive power of these methods is less than desired (9Go,23Go). However, given the relatively large number of non-synonymous cSNPs, it is necessary to make some attempt to identify which non- synonymous cSNPs are the most likely to have a functional effect so that experimental resources can be more optimally allocated. The PolyPhen program was chosen in this study because it incorporates data obtained from multiple sources; perhaps, the most important of which is the data obtained from domain databases and the annotation of critical residues. However, the predictions made by PolyPhen or any other program should only serve to generate hypotheses which must be confirmed experimentally.

In summary, 51 genes from the pathways of antineoplastic agents were resequenced using 40 samples from each of three ethnically diverse populations. Nine hundred and four polymorphisms were identified, including 346 novel variants and 139 cSNPs. About 27% of the non-synonymous cSNPs identified by resequencing were predicted to be damaging. These results provide experimental validation and estimated allele frequencies for polymorphisms that can be used in future pharmacogenetic studies of anticancer drugs. In addition, the number of novel and population-specific variants that were identified, and the dramatic differences in allele frequency for those variants between ethnic groups, emphasizes the need for continued polymorphism discovery efforts in diverse sets of DNA samples.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 MATERIALS AND METHODS
 SUPPLEMENTARY MATERIAL
 REFERENCES
 
Gene and sample selection
Fifty-one candidate genes from the pathways of nine anticancer agents were selected for resequencing (Table 1, http://pharmacogenetics.wustl.edu). The pathways included proteins known to be involved in the transport, metabolism and mechanism of action of the drugs, as well as proteins involved in DNA repair and cell-cycle regulation. The drug pathways were constructed by literature mining. Specifically, all genes in the pathways have experimental evidence that indicates that they play a role in or are associated with the pharmacokinetics or pharmacodynamics of the agent in question (http://pharmacogenetics.wustl.edu, http://www.pharmgkb.org).

Resequencing was performed using 40 DNA samples from each of three ethnically diverse populations: African- Americans, Asian-Americans and European-Americans. All 120 samples were part of the TSC DNA Panel (http://snp.cshl.org/allele_frequency_project/panels.shtml), and they were obtained from the Coriell Institute (http://coriell.umdnj.edu/ccr/ccrsumm.html). The panel name and repository ID for all samples that were resequenced in this study are given in Supplementary Material, Table S1.

Resequencing and polymorphism discovery
Human genomic reference sequences were retrieved from UCSC Golden path (http://genome.ucsc.edu) (24Go). Repeat regions were identified and masked with RepeatMasker, and primers were designed using a modified version of the Primer3 program (25Go,26Go). PCR products ranged in size from 250 to 10 000 bp and were designed to include the exons, exon–intron splice junctions and at least 100 bp of flanking intron sequence. Approximately 1 kb of the 5'- and 3'-flanking regions was also resequenced.

DNA samples were resequenced in eight pools of five samples per population. In addition, one sample was resequenced individually (i.e. not in a pool) and used as a reference sequence. In general, a SNP with 10% minor allele frequency can be accurately identified using this method (27Go). A 10% decrease in the peak height of the major allele is readily detectable when compared with the reference sample, and the appearance of a second peak (representing the minor allele) confirms the presence of the polymorphism. Given a sample size of 120 individuals (240 chromosomes) and a SNP with a minor allele frequency of 10%, 24 chromosomes would be expected to carry the minor allele.

Dye terminator sequencing was used and samples were analyzed on an ABI 3700 capillary sequencer (ABI, Foster City, CA, USA). Electrochromatograms were aligned and analyzed using the Sequencher (GeneCode, Ann Arbor, MI, USA; http://www.genecodes.com) and Mutation Surveyor (SoftGenetics, LLC, State College, PA, USA; http://www.softgenetics.com) software packages. Allele frequencies were estimated for each sample pool using the relative peak heights of each base at each polymorphic location (28Go). Sequence flanking each polymorphism was extracted and used as input for the PolyMAPr program, which was used to map the variants to the gene reference sequence (29Go).

Database mining
PolyMAPr (Polymorphism Mining and Annotation Programs) was used to obtain a list of variants for each gene from three public SNP databases: dbSNP (www.ncbi.nlm.nih.gov/SNP), JSNP (http://www-alis.tokyo.jst.go.jp/HOWDY) and CGAP (http://cgap.nci.nih.gov) (4Go,5Go,29Go–32Go). PolyMAPr mapped all the variants found by database mining onto the gene reference sequence, tabulated inter- and intra-database redundancy and determined the validation status of each polymorphic locus. Variants from dbSNP were considered validated if any of the validation tags (other-pop, by-frequency, by-cluster or by-2hit-2allele) was present in the SNP record. Variants from the JSNP and CGAP databases were classified as validated if an allele frequency was given (JSNP) or if they were present in the ‘validated’ SNPs file (CGAP). For this study, the overall validation status of a polymorphic locus was determined by all the SNP entries that map to that locus. Therefore, if a SNP was referenced in multiple databases and any one of the three databases listed that SNP as validated, that locus was considered to have been validated.

Functional analysis of non-synonymous cSNPs
The PolyPhen program was used to predict the functional significance of non-synonymous cSNPs (http://www.bork.embl-heidelberg.de/PolyPhen) (33Go). Three different methods are used by PolyPhen to predict the effect of an amino acid change: a sequence-based analysis using annotations of known domains, a phylogenetic comparison that uses a profile matrix constructed from homologous proteins and a series of calculations based on structural parameters, molecular contacts and known three-dimensional structures. Those three methods are used to predict whether a given amino acid change is benign, possibly damaging or probably damaging (or ‘unknown’, if there are not enough data available to make a prediction). The PolyPhen results were compared with the amino acid substitution scores from the BLOSUM62 matrix, which had been used to classify amino acid changes as conservative or non-conservative (8Go,9Go). Common amino acid changes have higher scores and are presumed to be functionally benign, whereas changes that are less common have lower scores and are assumed to be evolutionarily less acceptable.


    SUPPLEMENTARY MATERIAL
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 MATERIALS AND METHODS
 SUPPLEMENTARY MATERIAL
 REFERENCES
 
Supplementary Material is available at HMG Online.


    ACKNOWLEDGEMENT
 
The authors wish to thank Cristi King for her assistance in verifying the gene annotations. This work was supported by NIH grant U01 GM63340.

Conflict of Interest statement. The authors have no conflicts to disclose.


    FOOTNOTES
 
{dagger} The authors wish it be known that, in their opinion, the first two authors should be regarded as joint First Authors. Back

{ddagger} Present address: Department of Pathology and Immunology, Washington University School of Medicine, St Louis, MO, USA. Back


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 MATERIALS AND METHODS
 SUPPLEMENTARY MATERIAL
 REFERENCES
 

  1. Evans, W.E. and McLeod, H.L. (2003) Pharmacogenomics—drug disposition, drug targets, and side effects. N. Engl. J. Med., 348, 538–549.[Free Full Text]

  2. Weinshilboum, R. (2003) Inheritance and drug response. N. Engl. J. Med., 348, 529–537.[Free Full Text]

  3. McLeod, H.L. (2004) Drug pathways: moving beyond single gene pharmacogenetics. Pharmacogenomics, 5, 139–141.[Medline]

  4. Sherry, S.T., Ward, M.H., Kholodov, M., Baker, J., Phan, L., Smigielski, E.M. and Sirotkin, K. (2001) dbSNP: the NCBI database of genetic variation. Nucleic Acids Res., 29, 308–311.[Abstract/Free Full Text]

  5. Smigielski, E.M., Sirotkin, K., Ward, M. and Sherry, S.T. (2000) dbSNP: a database of single nucleotide polymorphisms. Nucleic Acids Res., 28, 352–355.[Abstract/Free Full Text]

  6. Holden, A.L. (2002) The SNP consortium: summary of a private consortium effort to develop an applied map of the human genome. Biotechniques, (Suppl. 22–24), 26.

  7. Sachidanandam, R., Weissman, D., Schmidt, S.C., Kakol, J.M., Stein, L.D., Marth, G., Sherry, S., Mullikin, J.C., Mortimore, B.J., Willey, D.L. et al. (2001) A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature, 409, 928–933.[CrossRef][Medline]

  8. Cargill, M., Altshuler, D., Ireland, J., Sklar, P., Ardlie, K., Patil, N., Shaw, N., Lane, C.R., Lim, E.P., Kalyanaraman, N. et al. (1999) Characterization of single-nucleotide polymorphisms in coding regions of human genes. Nat. Genet., 22, 231–238.[CrossRef][ISI][Medline]

  9. Leabman, M.K., Huang, C.C., DeYoung, J., Carlson, E.J., Taylor, T.R., de la Cruz, M., Johns, S.J., Stryke, D., Kawamoto, M., Urban, T.J. et al. (2003) Natural variation in human membrane transporter genes reveals evolutionary and functional constraints. Proc. Natl Acad. Sci. USA, 100, 5896–5901.[Abstract/Free Full Text]

  10. Livingston, R.J., von Niederhausern, A., Jegga, A.G., Crawford, D.C., Carlson, C.S., Rieder, M.J., Gowrisankar, S., Aronow, B.J., Weiss, R.B. and Nickerson, D.A. (2004) Pattern of sequence variation across 213 environmental response genes. Genome Res., 14, 1821–1831.[Abstract/Free Full Text]

  11. Solus, J.F., Arietta, B.J., Harris, J.R., Sexton, D.P., Steward, J.Q., McMunn, C., Ihrie, P., Mehall, J.M., Edwards, T.L. and Dawson, E.P. (2004) Genetic variation in eleven phase I drug metabolism genes in an ethnically diverse population. Pharmacogenomics, 5, 895–931.[CrossRef][ISI][Medline]

  12. Goddard, K.A., Hopkins, P.J., Hall, J.M. and Witte, J.S. (2000) Linkage disequilibrium and allele-frequency distributions for 114 single-nucleotide polymorphisms in five populations. Am. J. Hum. Genet., 66, 216–234.[CrossRef][ISI][Medline]

  13. Gabriel, S.B., Schaffner, S.F., Nguyen, H., Moore, J.M., Roy, J., Blumenstiel, B., Higgins, J., DeFelice, M., Lochner, A., Faggart, M. et al. (2002) The structure of haplotype blocks in the human genome. Science, 296, 2225–2229.[Abstract/Free Full Text]

  14. Freudenberg-Hua, Y., Freudenberg, J., Winantea, J., Kluck, N., Cichon, S., Brèuss, M., Propping, P. and Nèothen, M.M. (2005) Systematic investigation of genetic variability in 111 human genes-implications for studying variable drug response. Pharmacogenomics J., 5, 183–192.[Medline]

  15. Nelson, M.R., Marnellos, G., Kammerer, S., Hoyal, C.R., Shi, M.M., Cantor, C.R. and Braun, A. (2004) Large-scale validation of single nucleotide polymorphisms in gene regions. Genome Res., 14, 1664–1668.[Abstract/Free Full Text]

  16. Iida, A. and Nakamura, Y. (2005) Identification of 156 novel SNPs in 29 genes encoding G-protein coupled receptors. J. Hum. Genet., 50, 182–191.[Medline]

  17. Stephens, J.C., Schneider, J.A., Tanguay, D.A., Choi, J., Acharya, T., Stanley, S.E., Jiang, R., Messer, C.J., Chew, A., Han, J.H. et al. (2001) Haplotype variation and linkage disequilibrium in 313 human genes. Science, 293, 489–493.[Abstract/Free Full Text]

  18. Hinds, D.A., Stuve, L.L., Nilsen, G.B., Halperin, E., Eskin, E., Ballinger, D.G., Frazer, K.A. and Cox, D.R. (2005) Whole-genome patterns of common DNA variation in three human populations. Science, 307, 1072–1079.[Abstract/Free Full Text]

  19. Carlson, C.S., Eberle, M.A., Rieder, M.J., Smith, J.D., Kruglyak, L. and Nickerson, D.A. (2003) Additional SNPs and linkage-disequilibrium analyses are necessary for whole-genome association studies in humans. Nat. Genet., 33, 518–521.[CrossRef][ISI][Medline]

  20. Halushka, M.K., Fan, J.B., Bentley, K., Hsie, L., Shen, N., Weder, A., Cooper, R., Lipshutz, R. and Chakravarti, A. (1999) Patterns of single-nucleotide polymorphisms in candidate genes for blood-pressure homeostasis. Nat. Genet., 22, 239–247.[CrossRef][ISI][Medline]

  21. Kamatani, N., Sekine, A., Kitamoto, T., Iida, A., Saito, S., Kogame, A., Inoue, E., Kawamoto, M., Harigai, M. and Nakamura, Y. (2004) Large-scale single-nucleotide polymorphism (SNP) and haplotype analyses, using dense SNP maps, of 199 drug-related genes in 752 subjects: the analysis of the association between uncommon SNPs within haplotype blocks and the haplotypes constructed with haplotype-tagging SNPs. Am. J. Hum. Genet., 75, 190–203.[CrossRef][ISI][Medline]

  22. Cha, P.C., Yamada, R., Sekine, A., Nakamura, Y. and Koh, C.L. (2004) Inference from the relationships between linkage disequilibrium and allele frequency distributions of 240 candidate SNPs in 109 drug-related genes in four Asian populations. J. Hum. Genet., 49, 558–572.[CrossRef][ISI][Medline]

  23. Lâetourneau, I.J., Deeley, R.G. and Cole, S.P. (2005) Functional characterization of non-synonymous single nucleotide polymorphisms in the gene encoding human multidrug resistance protein 1 (MRP1/ABCC1). Pharmacogenet. Genomics, 15, 647–657.[ISI][Medline]

  24. Kent, W.J., Sugnet, C.W., Furey, T.S., Roskin, K.M., Pringle, T.H., Zahler, A.M. and Haussler, D. (2002) The human genome browser at UCSC. Genome Res., 12, 996–1006.[Abstract/Free Full Text]

  25. Smit, A.F.A., Hubley, R. and Green, P. (1996–2004) RepeatMasker Open-3.0.

  26. Rozen, S. and Skaletsky, H. (2000) Primer3 on the WWW for general users and for biologist programmers. In Krawetz, S. and Misener, S. (eds), Methods in Molecular Biology. Humana Press, Totowa, NJ, pp. 365–386.

  27. Taillon-Miller, P., Piernot, E.E. and Kwok, P.Y. (1999) Efficient approach to unique single-nucleotide polymorphism discovery. Genome Res., 9, 499–505.[Abstract/Free Full Text]

  28. Kwok, P.Y., Carlson, C., Yager, T.D., Ankener, W. and Nickerson, D.A. (1994) Comparative analysis of human DNA variations by fluorescence-based sequencing of PCR products. Genomics, 23, 138–144.[CrossRef][ISI][Medline]

  29. Freimuth, R.R., Stormo, G.D. and McLeod, H.L. (2005) PolyMAPr: programs for polymorphism database mining, annotation, and functional analysis. Hum. Mutat., 25, 110–117.[CrossRef][ISI][Medline]

  30. Ohnishi, Y., Tanaka, T., Yamada, R., Suematsu, K., Minami, M., Fujii, K., Hoki, N., Kodama, K., Nagata, S., Hayashi, T. et al. (2000) Identification of 187 single nucleotide polymorphisms (SNPs) among 41 candidate genes for ischemic heart disease in the Japanese population. Hum. Genet., 106, 288–292.[CrossRef][ISI][Medline]

  31. Yamada, R., Tanaka, T., Ohnishi, Y., Suematsu, K., Minami, M., Seki, T., Yukioka, M., Maeda, A., Murata, N., Saiki, O. et al. (2000) Identification of 142 single nucleotide polymorphisms in 41 candidate genes for rheumatoid arthritis in the Japanese population. Hum. Genet., 106, 293–297.[CrossRef][ISI][Medline]

  32. Buetow, K.H., Edmonson, M.N. and Cassidy, A.B. (1999) Reliable identification of large numbers of candidate SNPs from public EST data. Nat. Genet., 21, 323–325.[CrossRef][ISI][Medline]

  33. Ramensky, V., Bork, P. and Sunyaev, S. (2002) Human non-synonymous SNPs: server and survey. Nucleic Acids Res., 30, 3894–3900.[Abstract/Free Full Text]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Clin. Cancer Res.Home page
J. M. Hoskins, E. Marcuello, A. Altes, S. Marsh, T. Maxwell, D. J. Van Booven, L. Pare, R. Culverhouse, H. L. McLeod, and M. Baiget
Irinotecan Pharmacogenetics: Influence of Pharmacodynamic Genes
Clin. Cancer Res., March 15, 2008; 14(6): 1788 - 1796.
[Abstract] [Full Text] [PDF]


Home page
JCOHome page
S. Marsh, J. Paul, C. R. King, G. Gifford, H. L. McLeod, and R. Brown
Pharmacogenetic Assessment of Toxicity and Outcome After Platinum Plus Taxane Chemotherapy in Ovarian Cancer: The Scottish Randomised Trial in Ovarian Cancer
J. Clin. Oncol., October 10, 2007; 25(29): 4528 - 4535.
[Abstract] [Full Text] [PDF]


Home page
CarcinogenesisHome page
S. Michiels, P. Danoy, P. Dessen, A. Bera, T. Boulet, C. Bouchardy, M. Lathrop, A. Sarasin, and S. Benhamou
Polymorphism discovery in 62 DNA repair genes and haplotype associations with risks for lung and head and neck cancers
Carcinogenesis, August 1, 2007; 28(8): 1731 - 1739.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
Y. Bromberg and B. Rost
SNAP: predict effect of non-synonymous polymorphisms on function
Nucleic Acids Res., June 28, 2007; 35(11): 3823 - 3835.
[Abstract] [Full Text] [PDF]


Home page
ASH Education BookHome page
S. M. Davies
Pharmacogenetics, Pharmacogenomics and Personalized Medicine: Are We There Yet?
Hematology, January 1, 2006; 2006(1): 111 - 117.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Material
Right arrow All Versions of this Article:
14/23/3595    most recent
ddi387v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (9)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Freimuth, R. R.
Right arrow Articles by Kwok, P.-Y.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Freimuth, R. R.
Right arrow Articles by Kwok, P.-Y.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?