Skip Navigation

Human Molecular Genetics 2005 14(Review Issue 2):R157-R162; doi:10.1093/hmg/ddi273
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Farrall, M.
Right arrow Articles by Morris, A. P.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Farrall, M.
Right arrow Articles by Morris, A. P.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Gearing up for genome-wide gene-association studies

Martin Farrall1,* and Andrew P. Morris2

1Department of Cardiovascular Medicine, Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK and 2Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK

* To whom correspondence should be addressed at: Department of Cardiovascular Medicine, Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford, OX3 7BN, UK. Email: martin.farrall{at}well.ox.ac.uk

Received July 6, 2005; Accepted July 22, 2005


    ABSTRACT
 TOP
 ABSTRACT
 LESSONS FROM GENOME-WIDE LINKAGE...
 DAWN OF LARGE-SCALE GENE...
 BUT HANG ON, GENE-ASSOCIATION...
 MAKING SENSE OF THE...
 FINAL REMARKS
 REFERENCES
 
One of the grand challenges of human genetics to systematically map by gene-association susceptibility genes for complex diseases is underway. High-throughput genotyping platforms have been developed; a comprehensive map of human genetic variation (HapMap) to guide efficient marker selection is imminent and many researchers have assembled suitable cohorts of patients. Expectations are understandably high and it is timely to review the promise and pitfalls of this strategy.


    LESSONS FROM GENOME-WIDE LINKAGE SCREENS
 TOP
 ABSTRACT
 LESSONS FROM GENOME-WIDE LINKAGE...
 DAWN OF LARGE-SCALE GENE...
 BUT HANG ON, GENE-ASSOCIATION...
 MAKING SENSE OF THE...
 FINAL REMARKS
 REFERENCES
 
Successes in positional cloning of Mendelian diseases, such as cystic fibrosis, have encouraged human geneticists to search for susceptibility genes that underlie common, multifactorial diseases. The development of ‘high-throughput’ microsatellite genotyping platforms made genome-wide linkage scans feasible in large cohorts of families. This experimental design, which had played a critical role in studies of monogenic diseases, had been adopted by investigators to assemble collections of densely affected families (the most popular specific design being affected sib pairs). Increasingly, ambitious projects involving hundreds and, recently, thousands of families were launched, based on ‘power calculations’ that (with hindsight) naively assumed a limited number of medium-sized genetic effects. To maximize the chances of successful mapping, researchers were painstaking when applying clinical ascertainment criteria, as they wished to minimize etiological heterogeneity and hopefully to enrich genetic signals. Progress was fitful, with some notable successes (e.g. inflammatory bowel disease and NOD2—reviewed in 1), and it did not take long for researchers to realize that most complex disease genes did not dangle quite as close to the ground as they had hoped. Some despondency then arose as these studies were widely perceived to be failing to deliver reliable map locations for susceptibility genes for common diseases.

The most worrying phenomenon was lack of replication. Now, the concept of replication for human genetic studies is not as strict as might be attempted in other biological experiments, it means evaluating the evidence for linkage in a similarly (in terms of phenotype and genetic ancestry) ascertained cohort of families, using a comparable marker map. Few of any cohorts of families are expressly ascertained and assembled to test previous linkage findings and therefore, differences in phenotype and ancestry between studies which are conceivably important sources of heterogeneity are inevitable. For instance, different genetic studies of hypertension might be based on different blood pressure thresholds or different genetic studies of coronary heart disease might include slightly different clinical presentations, myocardial infarction versus acute coronary syndrome. However, of all the explanations that have been proposed to explain the general lack of replication, the most persuasive is that insufficient brute force had been applied to the problem, that individual complex genetic effects are small and that very substantial numbers of families are needed to reliably detect them. At the heart of this argument, lies models of the genetic architecture of common diseases and how many genes with what range of effect sizes jointly determine susceptibility to the disease. Most researchers had at least implicitly rejected Fisher's infinitesimal model as acceptance would make mapping attempts hopeless. Nowadays, the acceptance of an oligogenic model of complex disease is pervasive with the expectation that a few genes of moderate effect and progressively more genes of small effect will be found (2Go). It seems likely that many true positive findings identified in linkage screens were probably flukes and the power to detect them was low because the test statistics bumbled around the 5% significance threshold with the consequence that the magnitude of the genetic effect was grossly overestimated (3Go). As replication is typically attempted in cohorts of comparable (or smaller) size to the original hypothesis generating study, the chances of detecting these true susceptibility genes is less. Therefore, in practise, many researchers appear to have given up hope of formally replicating linkage results and keep their fingers crossed while they follow-up tentatively linked regions with fine-mapping and linkage disequilibrium (LD) mapping studies.

Statistical geneticists became absorbed in a major problem concerning linkage data, establishing appropriate thresholds for declaring genome-wide significance. Significance was evaluated using conventional (frequentist) techniques by calculating P-values (the probability that the null hypothesis has been incorrectly rejected, i.e. a false-positive result), allowing appropriately for multiple testing (dozens of linked markers on each chromosome). The distribution of lodscores in genome-wide screens has been examined analytically as well as by computer simulation techniques, and a reasonable correspondence indicates that critical thresholds for claiming linkage are accepted without too much argument. However, it was recognized that stringent conventional statistical testing might not be optimal when multiple linked loci were present (4Go).


    DAWN OF LARGE-SCALE GENE-ASSOCIATION STUDIES
 TOP
 ABSTRACT
 LESSONS FROM GENOME-WIDE LINKAGE...
 DAWN OF LARGE-SCALE GENE...
 BUT HANG ON, GENE-ASSOCIATION...
 MAKING SENSE OF THE...
 FINAL REMARKS
 REFERENCES
 
All these are tediously familiar and provide the prologue for a commentary by Risch and Merikangas (5Go), which changed the whole focus of the gene mapping community from linkage to gene association. This note was timely as it closely followed the association of non-synonymous variation in apolipoprotein E with Alzheimer's disease, which enthused the whole field (6Go). Gene-association studies have a long and productive history. Pioneering studies detected associations between blood groups and human diseases (e.g. ABO and peptic cancer and ulceration) (7Go,8Go) and led to a number of well-known major histocompatibility complex associations (e.g. HLA-B27 and ankylosing spondylitis) (9Go). The development of various technologies to detect DNA polymorphisms stimulated the candidate gene-association approach in which genes were selected for study on the basis of information regarding the biochemical pathways implicated in a disease; cloned genes were surveyed for polymorphisms, which were then genotyped in cohorts of patients and healthy controls and led to some major insights (e.g. type 1 diabetes and the insulin gene) (10Go).

Original vision of Risch and Merikangas was of direct gene association with a huge number of markers (106). Single nucleotide polymorphisms (SNPs) were known to occur with sufficient frequency in the human genome (one every few hundred nucleotides) to allow testing on this genomic scale by directly measuring disease causing variants themselves. At that time, genotyping costs made these genome-wide association studies impractical with the sample sizes required to detect the moderate effects (1.2<GRR<1.5) we expect for complex diseases. However, it was suggested that improvements in the efficiency of the design could be made by careful SNP selection, taking advantage of the background patterns of LD. The results of a large number of empirical studies demonstrate that the extent of LD is extremely variable throughout the genome, and in some regions, extending over distances of about hundreds of kilobases (reviewed in 11). Thus, considerable savings can be made by focussing genotyping efforts on a subset of so-called ‘tag SNPs’, to act as proxies for ‘nearby’ correlated variants without substantial loss in power.

Therefore, the International Haplotype Map (HapMap) project was conceived: a comprehensive view of the structure of LD throughout the genome in multiple populations, with a view to inform tag-SNP selection (12Go). The initial phase of the project genotyped at least one ‘common SNP’ every ~5 kb throughout the genome in four samples: 30 trios from the Yoruba people of Ibadan, Nigeria; 45 unrelated individuals from Tokyo, Japan; 45 unrelated individuals from Beijing, China; and 30 US trios with northern and western European ancestry, collected by the Centre d'Etude du Polymorphisme Humain (CEPH). At this density, much of the genome is covered by ‘blocks’ of correlated SNPs, although there are still gaps of low LD. To counteract this problem, phase II of the project, due for completion in October 2005, will increase the density of the map to one SNP every kilobase in each of the samples.

The completed HapMap will provide a unique data resource for tag-SNP selection and also for the interpretation of the results of gene-association studies. With detailed knowledge of the extent of LD of a disease-associated SNP, we can hope to refine the location of the disease-causing variant to a region containing correlated polymorphisms. This, coupled with recent improvements in the efficiency of high-throughput genotyping platforms (13Go), suggest that the future prospects for undertaking projects on such a scale are promising. Indeed, results from the first tranche of genome-scaled gene-association studies (14Go,15Go) have been already reported.


    BUT HANG ON, GENE-ASSOCIATION STUDIES HAVE THEIR PROBLEMS TOO
 TOP
 ABSTRACT
 LESSONS FROM GENOME-WIDE LINKAGE...
 DAWN OF LARGE-SCALE GENE...
 BUT HANG ON, GENE-ASSOCIATION...
 MAKING SENSE OF THE...
 FINAL REMARKS
 REFERENCES
 
Candidate gene-association studies have been very popular over the last 20 years and concerns regarding the reliability of their findings have been recently expressed (16Go). In particular, positive results that stand the test of time by being consistently detected by other researchers are proving to be difficult to achieve (17Go). A typical pattern of results might start with an initial association detected in a modestly sized cohort. This is followed up by a number of modest studies, most showing positive association. Finally, much larger cohorts examine the association and a meta-analysis is done, the final estimate of genetic effect size is small and is either marginally significant or insignificant (the study of ACE and coronary heart disease risk provides a pertinent case-study) (18Go). Of particular concern is the phenomenon of publication bias, that studies that fail to detect an association will tend to be unpublished. We must hope that large, well-executed studies which should have a major impact whatever their results will be published as such studies will have the greatest leverage in meta-analyses. As was the case in linkage scans, different cohorts are likely to vary in terms of phenotype, covariates and ancestry. Furthermore, the polymorphisms selected for genotyping may overlap in varying degrees across the studies, further complicating the combination of results. Finally, meta-analyses are usually carried out assuming fixed effects model, but it seems that random effects models might be more appropriate given the potential sources of heterogeneity between cohorts.

Two basic experimental designs have been used in gene-association studies. Case–control (cross-sectional) designs predominate as they are particularly easy to implement. Epidemiologists usually prefer prospective (longitudinal) studies as they may be concerned about selective recall of ‘explanatory’ factors (life-events), and the ‘direction’ of causality, but this is obviously not an issue in the case of DNA markers. For case–control studies, samples of unrelated affected and unaffected individuals are ascertained from the study population. However, it has been proposed that we may gain extra power by sampling affected cases from the same family, provided that the data are appropriately analysed by taking account of the correlation between related individuals (19Go). Related cases are more likely to share a disease-causing variant in common than randomly selected cases and may have already been collected as part of a previous linkage study, reducing ascertainment burden and possibly genotyping costs.

Population stratification (admixture) is a potential nuisance for case–control genetic studies as this is well known to inflate the false-positive rate if not accounted for in the analysis. Researchers are aware of this so they often make considerable efforts to ensure that both cases and controls are drawn from a population for which there is no evidence of recent (last few generations) influx of genes with differing ancestries (the difference usually being inter-continental). In practise, this is achieved by sampling from one ‘ethnic’ group that resides in a particular geographical region and enquiring as to the ancestry of the participant's grand-parents (e.g. ‘white British’). As researchers become more sensitive to possible confounding from relatively small differences in ancestry, it seems likely that case–control collections will be stratified and analysed by matching on the basis of regional ancestry (e.g. ‘Yorkshire born and bred’). Hopefully, such efforts to reduce the effects of admixture will prove useful but taking this to the extreme (cases and controls drawn from a sub-population with few founders) is likely to increase the problem of cryptic relatedness (20Go). Ignoring this non-independence will underestimate the variance of the gene-association test statistics, thereby overestimating the significance of any association. Both admixture and cryptic relatedness can be controlled for using information from a series of genomic control markers (21Go), although concerns remain as to efficiency of this strategy (22Go).

Family-based designs, and particularly parent–child trios, provide an alternative to case–control studies. They circumvent the problem of stratification by focussing on transmissions from parents to affected children, providing a matched control chromosome for each putative ‘disease’ chromosome. However, the costs of this approach are 2-fold: first, three individuals are genotyped to track two case and two control chromosomes, and secondly, complete families will be inevitably more difficult to identify and recruit. Indeed, for late-onset disorders, the strategy would seem hopeless, although it is possible to screen large populations to identify useful numbers of such families (23Go). Alternative designs focus on sibships, comparing differences in allele frequency between affected and unaffected sibs.

Finally, care is always taken to clinically evaluate and phenotype the cases. However, it may be difficult to rigorously exclude relevant pathologies from the controls. For instance, in a study of myocardial infarction (MI) risk, cases are (relatively) simple to identify on the basis of a constellation of clinical and biochemical symptoms and signs. Information on controls may be limited to self-reported lack of symptoms, but atherosclerosis which underlies MI risk is endemic, so all controls will carry at least some atheromatous plaques. Diagnostic screening with invasive procedures (coronary angiography) cannot be used to screen healthy controls for cryptic disease. Therefore, such case-control studies will inevitably include misclassified controls, which will presumably reduce the power to detect true associations.


    MAKING SENSE OF THE RESULTS FROM LARGE-SCALE GENE-ASSOCIATION STUDIES
 TOP
 ABSTRACT
 LESSONS FROM GENOME-WIDE LINKAGE...
 DAWN OF LARGE-SCALE GENE...
 BUT HANG ON, GENE-ASSOCIATION...
 MAKING SENSE OF THE...
 FINAL REMARKS
 REFERENCES
 
Irrespective of the design, case–control or family based, the investigator will eventually be faced with a huge number of test statistics to muse over when the large-scale genotyping experiment is completed. This scale of problem has been encountered by molecular biologists using microarray techniques for expression profiling, and similar statistical approaches are being considered for analysing gene-association experiments.

To illustrate how these statistical approaches might be applied, it can be useful to explore a simplified hypothetical scenario, so we now consider in detail the results of a genome-wide experiment in which statistics are computed on a gene-by-gene basis. Genotyping platforms are now available to analyse upwards of 500K SNPs, so the evidence for gene-association across multiple SNPs might be well summarized using haplotype-based statistics. We can then imagine that the results from the genome-wide screen could be presented as a simple list of P-values (probabilities that the null hypothesis has been incorrectly rejected, i.e. false-positive rate) for each of upwards of 25 000 genes. A major problem then is how to best evaluate our results taking this large number of tests into account. Now, the degree to which statistics from adjacent genes will be correlated due to long-range LD will depend on the ancestry of the population under study, but for simplicity, we will treat the results from each gene as being independent of each other; permutation techniques could be used to formally allow for inter-genic statistical correlation. Then it is simple to account for the inflation in type 1 error due to multiple testing by applying Bonferroni's (or the slightly less conservative Sidak's correction) correction. If a total of 25 000 genes have been examined, then a significance level of 2x10–6 would be interpreted as corresponding to a 5% experiment-wide significance level, providing a conventional interpretation of the results along the same lines as was done for genome-wide linkage studies.

At this level of stringency, genes of moderate effect sizes (genotypic relative risk—GRR=1.4) would be detectable (85% power) in studies of 1000 cases and 1000 controls under favourable circumstances (e.g. common disease variant). It is, of course, an open question as to how many susceptibility genes of at least this magnitude exist for a given complex disease, but the oligogenic model with an exponential distribution of gene effects predicts that there will be a greater number of genes with smaller effects (1.2<GRR<1.4). Such genes would clearly be unreliably mapped unless the sample sizes are dramatically increased (~4K cases and 4K controls for 85% power). However, if there are multiple genes of this magnitude (say 20 genes 1.2<GRR<1.4), then, although each has at least a 4% power of being detected in a 1K+1K case–control study, there is an excellent chance that one or more will be detected. Consequently, it seems inevitable that susceptibility genes with ‘significant’ associations will include true-positives for small effect genes that will be difficult to replicate.

Investigators analysing microarray experiments to study expression patterns of tens of thousands of genes found that only a handful of genes surpassed the stringent thresholds in a conventional family-wise error rate (FWER) approach to analysis, however, it was expected that many more genes were likely to show important differential expression patterns. They have considered alternative approaches to deal with the multiple testing problem efficiently without being overly stringent. In particular, the ‘false discovery rate’ (FDR) method (24Go) has received considerable attention; this technique calculates the proportion of null results among significant results and is appropriate for situations in which multiple ‘hits’ are expected in large-scale experiments. For instance, consider the findings of Ozaki et al. (14Go) who tested 65671 SNPs for association to MI. Under a recessive model, 134 SNPs were associated at the P=0.001 level. The FDR is calculated as the minimum (number of testsxnominal P-value/rank, 1) or minimum (65671x0.001/134)=0.49; this means that approximately half of these positively associated SNPs are expected to be truly null (i.e. unassociated).

For another example of the application of these statistical methods, we review the results of a study of 25 candidate genes (25Go) in which the results of a gene-by-gene statistical test are ranked by their significance (Table 1). Using Sidak's method to adjust the type 1 error to allow for multiple testing, only the first ranked gene (F5) is significant [p-value=1–(1–0.001)25=0.025]. The FDR interpretation of these results is that of three genes (F5, OPRM1 and IL1RN) that show significant association (P<0.05), 23% these genes are expected to be truly null (i.e. unassociated). Therefore, the conventional interpretation of these results would be that just one gene, F5 is significantly associated with disease; in contrast, the FDR analysis suggests that at least two of the three most associated genes are likely to be true positives.


View this table:
[in this window]
[in a new window]
 
Table 1. Frequentist and Bayesian analysis of candidate-gene data
 
Both these interpretations follow the convention that the investigator takes an objective standpoint and has no reason to favour one gene over another based on prior knowledge. The alternative approach of applying Bayesian methods has the potential to synthesize an interpretation that combines the present information on gene association with prior information (16Go,26Go). This information can take on various forms, qualities and reliabilities. For instance, candidate genes have been usually proposed on the basis of pathological links between biochemical pathways and disease (e.g renin–angiotensin system and hypertension). The validity of the link might be clear, but that does not necessarily mean that genetic variation in the pathway affects susceptibility. Additional sources of information include published gene association or linkage data, studies of genetic variation in animal models or differential expression data in diseased and healthy tissues.

This all seems sensible and straightforward as it mirrors the way that experts cogitate diverse evidence, but quantification of the evidence is likely to be contentious. A Bayesian analysis of the results shown in Table 1 is included for three levels of prior confidence in the candidature of the genes. A prior probability of 0.04 (1/25) might correspond to an exhaustive list of 25 candidate genes, where we expect exactly one to be a true positive. If these 25 candidates were the first of 100 genes to be tested, a prior probability of 0.01 would be more appropriate. However, if we believed that a single gene across the whole genome would be truly associated with the trait, and the 25 candidates tested were no more likely to be associated, a priori, a prior probability of 1/25000 might be more reasonable. It is evident that the highest ranked gene shows a noteworthy result in that the posterior probability of being associated is >95% provided that the prior probability for these genes being associated is high (0.04).

Therefore, we have seen how the same data can be interpreted in a conventional frequentist test that corrects for multiple testing, by quantifying the FDR and by calculating the posterior probability of association. Each approach has strengths and weaknesses. For instance, if 50 (instead of 25) candidate genes had been studied, then the conventional and FDR interpretations would alter, but the Bayesian interpretation would be unchanged. In contrast, in the future, other data relevant to these genes may become available which might make you want to revise the prior probability of an association which could change your interpretation.

The advantages that can be gained by modelling the interplay between genes are a matter of considerable current interest and debate (27Go,28Go). Treating each gene as an independent entity does not fit well with an oligogenic model of complex disease and ignores the increasing body of evidence from model organisms for the role of epistasis in genetic associations (29Go–32Go). Joint analysis of multiple genes will take account of the correlation between their effects on disease association. This approach may help to identify associations that would otherwise be masked by strong marginal effects of other genes (33Go,34Go). Evaluating all possible pairs of genes will dramatically increase the number of tests performed. However, even with the increased burden of multiple testing, such an approach will be more powerful than single-gene analyses for several models of epistasis when the marginal effects are of the order we expect for complex diseases (35Go). There are, of course, standard statistical methods for selecting significant risk factors, and interactions between them, from a list of potential covariates. However, it is not clear how well these model selection techniques will perform with upwards of 25 000 candidate genes, particularly if we wish to take account of epistasis and interaction with environmental risk factors. These techniques tend to ‘over-fit’ the data and do not take account of uncertainty in the underlying genetic model, potentially inflating the false-positive error rate. Bayesian model averaging methods (36Go) and Bayesian networks (37Go) allow for this and have the potential to incorporate prior information on the number of genes involved and the size of genetic effects. We might also want to inform the prior probability of epistasis between genes in the same or different underlying biological pathways. Although these approaches show some promise, they (and others reviewed in 27) have yet to be rigorously evaluated, so their respective abilities to glean further insights into the genetic architecture of complex disease are unclear.


    FINAL REMARKS
 TOP
 ABSTRACT
 LESSONS FROM GENOME-WIDE LINKAGE...
 DAWN OF LARGE-SCALE GENE...
 BUT HANG ON, GENE-ASSOCIATION...
 MAKING SENSE OF THE...
 FINAL REMARKS
 REFERENCES
 
In common with other new technologies and experimental approaches, genome-wide gene-association studies have had a great burden of expectation imposed on them. It seems now that technically it is feasible to generate billions of genotypes to estimate genetic risks across the genome, and statistical geneticists will apply increasingly complex methods of analysis to extract the maximum information from this data (e.g. 35). Researchers are putting plans in place for longitudinal studies (e.g. BIOBANK, www.ukbiobank.ac.uk) to complement the case–control design which dominates the field for the time being. Experience suggests that it will take some considerable time and repeated study to confirm positive findings with confidence. The promise of genetic strategies is to reveal unexpected links between genes, biological pathways and disease, which could lead to novel therapeutic interventions. This is such a valuable objective that these expensive experiments must surely run their full course.


    ACKNOWLEDGEMENTS
 
M.F. thanks the British Heart Foundation, the Medical Research Council and the Wellcome Trust for supporting this work, and A.P.M. thanks the Leverhulme Trust and the Wellcome Trust for support.

Conflict of Interest statement. None declared.


    REFERENCES
 TOP
 ABSTRACT
 LESSONS FROM GENOME-WIDE LINKAGE...
 DAWN OF LARGE-SCALE GENE...
 BUT HANG ON, GENE-ASSOCIATION...
 MAKING SENSE OF THE...
 FINAL REMARKS
 REFERENCES
 

  1. Mathew, C.G. and Lewis, C.M. (2004) Genetics of inflammatory bowel disease: progress and prospects. Hum. Mol. Genet., 13, R161–R168.[Abstract/Free Full Text]

  2. Farrall, M. (2004) Quantitative genetic variation: a post-modern view. Hum. Mol. Genet., 13, R1–R7.[Abstract/Free Full Text]

  3. Goring, H.H., Terwilliger, J.D. and Blangero, J. (2001) Large upward bias in estimation of locus-specific effects from genomewide scans. Am. J. Hum. Genet., 69, 1357–1369.[CrossRef][Web of Science][Medline]

  4. Wiltshire, S., Cardon, L.R. and McCarthy, M.I. (2002) Evaluating the results of genomewide linkage scans of complex traits by locus counting. Am. J. Hum. Genet., 71, 1175–1182.[CrossRef][Web of Science][Medline]

  5. Risch, N. and Merikangas, K. (1996) The future of genetic studies of complex human diseases. Science, 273, 1516–1517.[Abstract/Free Full Text]

  6. Corder, E.H., Saunders, A.M., Strittmatter, W.J., Schmechel, D.E., Gaskell, P.C., Small, G.W., Roses, A.D., Haines, J.L. and Pericak-Vance, M.A. (1993) Gene dose of apolipoprotein E type 4 allele and the risk of Alzheimer's disease in late onset families. Science, 261, 921–923.[Abstract/Free Full Text]

  7. Aird, I., Bentall, H.H. and Roberts, J.A. (1953) A relationship between cancer of stomach and the ABO blood groups. Br. Med. J., 4814, 799–801.

  8. Aird, I., Bentall, H.H., Mehigan, J.A. and Roberts, J.A. (1954) The blood groups in relation to peptic ulceration and carcinoma of colon, rectum, breast, and bronchus; an association between the ABO groups and peptic ulceration. Br. Med. J., 4883, 315–321.

  9. Schlosstein, L., Terasaki, P.I., Bluestone, R. and Pearson, C.M. (1973) High association of an HL-A antigen, W27, with ankylosing spondylitis. N. Engl. J. Med., 288, 704–706.

  10. Bell, G.I., Horita, S. and Karam, J.H. (1984) A polymorphic locus near the human insulin gene is associated with insulin-dependent diabetes mellitus. Diabetes, 33, 176–183.[Abstract]

  11. Ardlie, K.G., Kruglyak, L. and Seielstad, M. (2002) Patterns of linkage disequilibrium in the human genome. Nat. Rev. Genet., 3, 299–309.[CrossRef][Web of Science][Medline]

  12. International HapMap Consortium. (2003) The international HapMap project. Nature, 26, 789–795.

  13. Gewin, V. (2005) Array of possibilities opens up in genotyping. Nature, 435, 1159.[Medline]

  14. Ozaki, K., Ohnishi, Y., Iida, A., Sekine, A., Yamada, R., Tsunoda, T., Sato, H., Sato, H., Hori, M., Nakamura, Y. and Tanaka, T. (2002) Functional SNPs in the lymphotoxin-alpha gene that are associated with susceptibility to myocardial infarction. Nat. Genet., 32, 650–654.[CrossRef][Web of Science][Medline]

  15. Klein, R.J., Zeiss, C., Chew, E.Y., Tsai, J.Y., Sackler, R.S., Haynes, C., Henning, A.K., Sangiovanni, J.P., Mane, S.M., Mayne, S.T. et al. (2005) Complement factor H polymorphism in age-related macular degeneration. Science, 308, 385–389.[Abstract/Free Full Text]

  16. Colhoun, H.M., McKeigue, P.M. and Davey Smith, G. (2003) Problems of reporting genetic associations with complex outcomes. Lancet, 361, 865–872.[CrossRef][Web of Science][Medline]

  17. Ioannidis, J.P., Ntzani, E.E., Trikalinos, T.A., Contopoulos-Ioannidis, D.G. (2001) Replication validity of genetic association studies. Nat. Genet., 29, 306–309.[CrossRef][Web of Science][Medline]

  18. Keavney, B., McKenzie, C., Parish, S., Palmer, A., Clark, S., Youngman, L., Delepine, M., Lathrop, M., Peto, R. and Collins, R. (2000) Large-scale test of hypothesised associations between the angiotensin-converting-enzyme insertion/deletion polymorphism and myocardial infarction in about 5000 cases and 6000 controls. International Studies of Infarct Survival (ISIS) Collaborators. Lancet, 355, 434–442.[Web of Science][Medline]

  19. Risch, N. and Teng, J. (1998) The relative power of family-based and case-control designs for linkage disequilibrium studies of complex human diseases: 1 DNA pooling. Genome Res., 8, 1273–1288.[Abstract/Free Full Text]

  20. Bacanu, S.A., Devlin, B. and Roeder, K. (2000) The power of genomic control. Am. J. Hum. Genet., 66, 1933–1944.[CrossRef][Web of Science][Medline]

  21. Devlin, B., Bacanu, S.A. and Roeder, K. (2004) Genomic control to the extreme. Nat. Genet., 36, 1129–1130.[CrossRef][Web of Science][Medline]

  22. Marchini, J., Cardon, L.R., Phillips, M.S. and Donnelly, P. (2004) The effects of human population structure on large genetic association studies. Nat. Genet., 36, 512–517.[CrossRef][Web of Science][Medline]

  23. PROCARDIS Consortium. (2004) A trio family study showing association of the lymphotoxin-alpha N26 (804A) allele with coronary artery disease. Eur. J. Hum. Genet., 12, 770–774.[CrossRef][Web of Science][Medline]

  24. Benjamini, Y. and Hochberg, Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B, 57, 289–300.

  25. Hao, K., Wang, X., Niu, T., Xu, X., Li, A., Chang, W., Wang, L., Li, G., Laird, N. and Xu, X. (2004) A candidate gene association study on preterm delivery: application of high-throughput genotyping technology and advanced statistical methods. Hum. Mol. Genet., 13, 683–691.[Abstract/Free Full Text]

  26. Wacholder, S., Chanock, S., Garcia-Closas, M., El Ghormli, L. and Rothman, N. (2004) Assessing the probability that a positive report is false: an approach for molecular epidemiology studies. J. Natl Cancer Inst., 96, 434–442.[Abstract/Free Full Text]

  27. Hoh, J. and Ott, J. (2003) Mathematical multi-locus approaches to localizing complex human trait genes. Nat. Rev. Genet., 4, 701–709.[CrossRef][Web of Science][Medline]

  28. Wang, W.Y., Barratt, B.J., Clayton, D.G. and Todd, J.A. (2005) Genome-wide association studies: theoretical and practical concerns. Nat. Rev. Genet., 6, 109–118.[CrossRef][Web of Science][Medline]

  29. Routman, E.J. and Cheverud, J.M. (1997) Gene effects on a quantitative trait: two-locus epistatic effects measured at microsatellite markers and at estimated QTL. Evolution, 51, 1654–1662.[CrossRef]

  30. Mackay, T.F. (2001) Quantitative trait loci in Drosophila. Nat. Rev. Genet., 2, 11–20.

  31. Williams, S.M., Haines, J.L. and Moore, J.H. (2004) The use of animal models in the study of complex disease: all else is never equal or why do so many human studies fail to replicate animal findings. Bioessays, 26, 170–179.[CrossRef][Web of Science][Medline]

  32. Segre, D., Deluna, A., Church, G.M. and Kishony, R. (2005) Modular epistasis in yeast metabolism. Nat. Genet., 37, 77–83.[CrossRef][Web of Science][Medline]

  33. Cordell, H.J. and Clayton, D.G. (2002) A unified stepwise regression approach for evaluating the relative effects of polymorphisms within a gene using case/control or family data: application to HLA in type 1 diabetes. Am. J. Hum. Genet., 70, 124–141.[CrossRef][Web of Science][Medline]

  34. Simmonds, M.J., Howson, J.M., Heward, J.M., Cordell, H.J., Foxall, H., Carr-Smith, J., Gibson, S.M., Walker, N., Tomer, Y., Franklyn, J.A. et al. (2005) Regression mapping of association between the human leukocyte antigen region and Graves disease. Am. J. Hum. Genet., 76, 157–163.[CrossRef][Web of Science][Medline]

  35. Marchini, J., Donnelly, P. and Cardon, L.R. (2005) Genome-wide strategies for detecting multiple loci that influence complex diseases. Nat. Genet., 37, 413–417.[CrossRef][Web of Science][Medline]

  36. Raftery, A.E. (1996) Approximate Bayes factors and accounting for model uncertainty in generalized linear models. Biometrika, 83, 251–266.[Abstract/Free Full Text]

  37. Sebastiani, P., Ramoni, M.F., Nolan, V., Baldwin, C.T. and Steinberg, M.H. (2005) Genetic dissection and prognostic modeling of overt stroke in sickle cell anemia. Nat. Genet., 37, 435–440.[CrossRef][Web of Science][Medline]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
T. Takabatake, H. Ishihara, Y. Ohmachi, I. Tanaka, M. M. Nakamura, K. Fujikawa, T. Hirouchi, S. Kakinuma, Y. Shimada, Y. Oghiso, et al.
Microarray-based global mapping of integration sites for the retrotransposon, intracisternal A-particle, in the mouse genome
Nucleic Acids Res., June 1, 2008; 36(10): e59 - e59.
[Abstract] [Full Text] [PDF]


Home page
Int J EpidemiolHome page
M. J Khoury, J. Little, M. Gwinn, and J. P. Ioannidis
On the synthesis and interpretation of consistent but weak gene-disease associations in the era of genome-wide association studies
Int. J. Epidemiol., April 1, 2007; 36(2): 439 - 445.
[Abstract] [Full Text] [PDF]


Home page
Cancer Epidemiol. Biomarkers Prev.Home page
D. C. Thomas
Are We Ready for Genome-wide Association Studies?
Cancer Epidemiol. Biomarkers Prev., April 1, 2006; 15(4): 595 - 598.
[Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Farrall, M.
Right arrow Articles by Morris, A. P.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Farrall, M.
Right arrow Articles by Morris, A. P.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?