Human Molecular Genetics Advance Access originally published online on July 21, 2005
Human Molecular Genetics 2005 14(17):2481-2483; doi:10.1093/hmg/ddi251
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
COMMENTARY |
Guidelines for association studies in Human Molecular Genetics
1Department of Psychiatry, 2Department of Human Genetics, UCLA Center for Neurobehavioral Genetics, Semel Institute for Neuroscience and Human Behavior, 695 Charles E. Young Drive South, Room 3506, Los Angeles, CA 90095-1761 and 3Department of Statistics, UCLA, 695 Charles E Young Aiwe South, Los Angeles, CA 90095-088, USA
* To whom correspondence should be addressed. Tel: +310 7949571; Fax: +310 7949613; Email: nfreimer{at}mednet.ucla.edu
| Introduction |
|---|
|
|
|---|
The number of genetic association studies is growing rapidly, and this growth is likely to accelerate in the future. The correct interpretation of such studies has important implications for our understanding of disease causation, variability in drug response and drug side effects and the biology of populations. There is increasing awareness that the literature of association studies requires the application of stricter standards, to prevent the promulgation of false positive results. Although there is no single agreed upon set of standards for such studies, we believe that journals must, through the manuscripts that they choose to publish, promote an improvement in the quality of association studies. Guidelines can be an instrument for both authors and reviewers, to make such improvements. The subsequent guidelines should be followed for all future submissions to Human Molecular Genetics. The references provide background which may be useful for authors.
As the popularity of association studies increases, two difficulties in interpreting their results emerge. On the one hand, the large number of markers typed and the variety of phenotype definitions considered lead to a multiplicity of statistical tests, whose significance is meaningless unless appropriately corrected for multiple comparisons. On the other hand, interpretation of association studies depends in part on the degree of justification for the scope of the study: is the number of individuals typed sufficient for the identified goal? What led to the choice of the studied candidate genes? Is the evidence strong enough that we can really exclude consideration of the rest of the genome or are the candidate genes studies more appropriately considered as a random subset of genes?
We have recently published a review article in which we discussed these issues extensively (1
) and we reviewed the consequences for the interpretation of the study results of searching too much as well as searching too little. The present guidelines aim at helping authors and reviewers to make sure that the published paper is self-contained in terms of providing all the details needed for readers to be able to interpret the results. In this spirit, we require the authors to justify their study design and to provide a context in which correction of multiple comparisons of their significance results can be carried out.
Acknowledging the need for authors to address these issues, does not, however, translate into specifying one particular threshold of significance that studies must meet for publication. Two factors, at least, argue against making such a specification. On the one hand, there is not a clear consensus on what statistical approach is most appropriate for the problem and imposing one would be immature and reductive; on the other hand, we believe that a study may be informative, even if it fails to map the gene for a trait, provided that some standards in its design are met. With regard to the first point, there are different statistical definitions of global errors and in different contexts one or the other may be appropriate. Traditionally, in linkage studies, the family wise error rate has been controlled and this strategy has proved adequate for mapping Mendelian diseases. In the context of complex diseases, and particularly for association studies, it may be appropriate to consider a less stringent definition of error rates, as for example the false discovery rate. It goes with out saying that different definitions of global errors translate into different cutoffs on the nominal P-values. (2
,3
).
Bayesian procedures have inspired some of the most traditional significance cutoffs in statistical genetics (e.g. the lod score of 3
) and may be particularly useful in the context of candidate gene studies. The identification of significance results in the Bayesian context and the treatment of multiple comparisons do not correspond to a universal P-value cutoff. Association tests involving different markers and different phenotypes may be dependent (due to correlation in the phenotypes or linkage disequilibrium between the markers) and that dependency can translate into an effective reduction of the number of multiple comparisons, with consequent variation of the appropriate significance cutoff. Unfortunately, we do not have an adequate model for dependence among association tests at different markers, for example, and increase in power due to the consideration of dependence can be achieved only on a case-by-case basis using resampling techniques. This lack of a general model also makes it also difficult to define a common significance threshold to be required in all studies.
The subsequent guidelines are written with the view that there is a burden of proof that either the detected association is significant or the study was able to exclude association with the typed polymorphisms, and this burden of proof rests on the authors shoulders. Although there are a variety of ways of addressing this problem, simply presenting nominal P-values is not one of them. The following guidelines outline the expectations of the journal in this regard. We have tried to be as synthetic as possible to provide a practical instrument that may be used as a checklist by authors and reviewers. The guidelines reflect the multiple options open to authors and require the usual scholarly contribution of reviewers in assessing the quality of the paper. It goes without saying that more attention should be paid to the spirit of the guidelines than the letter.
Candidate gene and genome-wide association studies face different statistical issues, so we consider the guidelines for these categories separately.
Candidate genes
For these studies, authors must provide explicit justification for the choice of the candidate gene. Use of the literature for such justification must include a rigorous accounting of the level of statistical evidence from prior studies, including both studies supportive of the candidate hypothesis and negative studies.
- Authors must make a clear distinction between candidate genes for which there is previous statistical evidence of linkage or association with the specific disease under study and genes that are proposed solely on the basis of a specific biological hypothesis.
- For genes for which there is prior statistical evidence of linkage or association to the trait in question, authors should report the P-values obtained in previous studies. In addition, they should (a) specify any differences between the phenotypes employed in prior cited studies and those employed in the current manuscript; (b) specify the populations employed in prior, cited studies as well as the size of the samples employed and (c) provide details about the extent of the genomic region implicated in the prior, cited studies as well as the location and allele frequencies of variants.
- Over all, authors should provide a quantitative estimate (using the cited evidence) of the prior probability that the list of genes considered contains a relevant one. Procedures for such estimation are described in Wacholder et al. (4
). This prior probability can be used in a Bayesian procedure to obtain a posterior probability of association (17
,18
). The Bayesian analysis that controls a measure of global error is, indeed, an option that authors may want to consider.
- Authors should correct the P-values in their results for multiple comparisons due to the multiple genes studied, obtaining an adjusted P-value. They should provide a justification for the specific statistical method used for this correction. Furthermore, we encourage authors to present, in addition to P-values, an indication of the magnitude of the genetic effect and its associated standard error; this facilitates the combination of results across different studies.
- In a conservative, least favorable scenario, it has been suggested that a significant P-value should be lower than
107 (1
). Authors should use their arguments made in (B) and (C) to argue why a variant with a nominal association P-value higher than this level should be considered noteworthy.
Genome screens
For genome-wide association studies, authors must indicate why they believe that the study design is appropriate. They must do the following.
- Justify the sample size employed, including estimation of the power to detect significant association results for given effect sizes; the basis for considering such effect sizes to be realistic must also be provided.
- Justify the marker density employed for the particular study population.
- Implement a procedure to correct for multiple comparisons. Either FWER or FDR are acceptable criteria, as well as global error thresholds less stringent than 0.05. In addition, Bayesian procedures can be adopted. What is crucial is that authors provide an evaluation of the global error rate achieved when they have deemed an association to be significant. It may also be useful for authors to discuss the rationale for their choice of either Bayesian or frequentist interpretations of their results.
- Attempt to quantify the uncertainty associated with the estimated location of trait-associated variant(s). Although the methods for estimating such uncertainty are not as straightforward as the ones used to determine confidence intervals for lod scores in linkage analysis, several papers have suggested reasonable assumptions that authors can make in association studies (5
12
).
Conflict of Interest statement. None declared.
| ACKNOWLEDGEMENTS |
|---|
We thank Susan Service for helpful comments. The authors are supported by grants from NIH and NSF.
| REFERENCES |
|---|
|
|
|---|
- Freimer, N. and Sabatti, C. (2004) The use of pedigree, sib-pair and association studies of common diseases for genetic mapping and epidemiology (review). Nat. Genet., 36(10), 10451051.[CrossRef][Web of Science][Medline]
-
Sabatti, C., Service, S. and Freimer, N. (2003) False discovery rate in linkage and association genome screens for complex disorders. Genetics, 164, 829833.
[Abstract/Free Full Text] -
Storey, J.D. and Tibshirani, R. (2003) Statistical significance for genomewide studies. Proc. Natl Acad. Sci. USA, 100, 94409445.
[Abstract/Free Full Text] -
Wacholder, S., Chanock, S., Garcia-Closas, M., El Ghormli, L. and Rothman, N. (2004) Assessing the probability that a positive report is false: an approach for molecular epidemiology studies. J. Natl Cancer Inst., 96, 434442.
[Abstract/Free Full Text] - McPeek, M.S. and Strahs, A. (1999) Assessment of linkage disequilibrium by the decay of haplotype sharing, with application to fine-scale genetic mapping. Am. J. Hum. Genet., 65(3), 858875.[CrossRef][Web of Science][Medline]
- Lazzeroni, L.C. (1998) Linkage disequilibrium and gene mapping: an empirical least-squares approach. Am. J. Hum. Genet., 62(1), 159170.[CrossRef][Web of Science][Medline]
- Devlin, B., Risch, N. and Roeder, K. (1996) Disequilibrium mapping: composite likelihood for pairwise disequilibrium. Genomics, 36(1), 116.[CrossRef][Web of Science][Medline]
- Xiong, M. and Sun-Wei, G. (1997) Fine-scale genetic mapping based on linkage disequilibrium: theory and application. Am. J. Hum. Genet., 60, 15131531.[Web of Science][Medline]
- Kaplan, N.L., Hill, W.G. and Weir, B.S. (1995) Likelihood methods for locating disease genes in nonequilibrium populations. Am. J. Hum. Genet., 56(1), 1832.[Web of Science][Medline]
- Morris, A.P., Whittaker, J.C. and Balding, D.J. (2000) Bayesian fine-scale mapping of disease loci, by hidden Markov models. Am. J. Hum. Genet., 67, 155169.[CrossRef][Web of Science][Medline]
- Rannala, B. and Reeve, J.P. (2001) High-resolution multipoint linkage-disequilibrium mapping in the context of a human genome sequence. Am. J. Hum. Genet., 69, 159178.[CrossRef][Medline]
-
Liu, J., Sabatti, C., Teng, J., Keats, B. and Risch, N. (2001) Bayesian analysis of haplotypes for linkage disequilibrium mapping. Genome Res., 11, 17161724.
[Abstract/Free Full Text] - Colhoun, H.M., McKeigue, P.M. and Smith, G.D. (2003) Problems of reporting genetic associations with complex outcomes. Lancet, 361, 865872.[CrossRef][Web of Science][Medline]
- Freedman, M.L. et al. (2004) Assessing the impact of population stratification on genetic association studies. Nat. Genet., 36, 388393.[CrossRef][Web of Science][Medline]
- Helgason, A., Yngvadottir, B., Hrafnkelsson, B., Gulcher, J. and Stefansson, K. (2004) An Icelandic example of the impact of population structure on association studies. Nat. Genet., 37, 9095.
- Marchini, J., Cardon, L.R., Phillips, M.S. and Donnelly, P. (2004) The effects of human population structure on large genetic association studies. Nat Genet., 36, 512517.[CrossRef][Web of Science][Medline]
-
Thomas, D.C. and Clayton, D.G. (2004) Betting odds and genetic associations. J. Natl Cancer Inst., 96, 421423.
[Free Full Text] -
Vieland, V. (1998) Bayesian linkage analysis, or: how I learned to stop worrying and love the posterior probability of linkage. Am. J. Hum. Genet., 63, 947954.[CrossRef][Medline]
This article has been cited by other articles:
![]() |
J. Little, J. P.T. Higgins, J. P.A. Ioannidis, D. Moher, F. Gagnon, E. von Elm, M. J. Khoury, B. Cohen, G. Davey-Smith, J. Grimshaw, et al. STrengthening the REporting of Genetic Association Studies (STREGA): An Extension of the STROBE Statement Ann Intern Med, February 3, 2009; 150(3): 206 - 215. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. E. Driver, H. Song, F. Lesueur, S. Ahmed, N. L. Barbosa-Morais, J. P. Tyrer, B. A.J. Ponder, D. F. Easton, P. D.P. Pharoah, A. M. Dunning, et al. Association of single-nucleotide polymorphisms in the cell cycle genes with breast cancer in the British population Carcinogenesis, February 1, 2008; 29(2): 333 - 341. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. P. Ioannidis, P. Boffetta, J. Little, T. R O'Brien, A. G Uitterlinden, P. Vineis, D. J Balding, A. Chokkalingam, S. M Dolan, W D. Flanders, et al. Assessment of cumulative evidence on genetic associations: interim guidelines Int. J. Epidemiol., February 1, 2008; 37(1): 120 - 132. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. D. G. Despriet, A. A. B. Bergen, J. E. Merriam, J. Zernant, G. R. Barile, R. T. Smith, I. A. Barbazetto, S. van Soest, A. Bakker, P. T. V. M. de Jong, et al. Comprehensive Analysis of the Candidate Genes CCL2, CCR2, and TLR4 in Age-Related Macular Degeneration Invest. Ophthalmol. Vis. Sci., January 1, 2008; 49(1): 364 - 371. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. R. Rebbeck, M. J. Khoury, and J. D. Potter Genetic Association Studies of Cancer: Where Do We Go from Here? Cancer Epidemiol. Biomarkers Prev., May 1, 2007; 16(5): 864 - 865. [Full Text] [PDF] |
||||
![]() |
H. Song, S. J. Ramus, L. Quaye, R. A. DiCioccio, J. Tyrer, E. Lomas, D. Shadforth, E. Hogdall, C. Hogdall, V. McGuire, et al. Common variants in mismatch repair genes and risk of invasive ovarian cancer Carcinogenesis, November 1, 2006; 27(11): 2235 - 2242. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Nakayama, N. Kuroi, M. Sano, Y. Tabara, T. Katsuya, T. Ogihara, Y. Makita, A. Hata, M. Yamada, N. Takahashi, et al. Mutation of the Follicle-Stimulating Hormone Receptor Gene 5'-Untranslated Region Associated With Female Hypertension Hypertension, September 1, 2006; 48(3): 512 - 518. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Meaburn, L. M. Butcher, L. C. Schalkwyk, and R. Plomin Genotyping pooled DNA using 100K SNP microarrays: a step towards genomewide association scans Nucleic Acids Res., February 14, 2006; 34(4): e28 - e28. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||






