Skip Navigation

This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (17)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Goodstadt, L.
Right arrow Articles by Ponting, C. P.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Goodstadt, L.
Right arrow Articles by Ponting, C. P.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Human Molecular Genetics, 2001, Vol. 10, No. 20 2209-2214
© 2001 Oxford University Press

Sequence variation and disease in the wake of the draft human genome

Leo Goodstadt and Chris P. Ponting+

MRC Functional Genetics Unit, Department of Human Anatomy and Genetics, University of Oxford, South Parks Road, Oxford OX1 3QX, UK

Received July 1, 2001; Accepted July 17, 2001.


    ABSTRACT
 TOP
 ABSTRACT
 GENOME ANNOTATION RESOURCES AND...
 AMINO ACID ALLELIC VARIANTS
 FINDING HOMOLOGUES
 BESTROPHIN: A WORKED EXAMPLE...
 FUTURE DIRECTIONS
 REFERENCES
 
The sequencing phase of the human genome project will soon be over. In its wake, repertoires of sequence polymorphisms among the human population are being sampled and a battery of functional genomics projects, from gene and protein expression studies to whole proteome interaction experiments, are generating vast quantities of data. Now that the data, or the means to generate data, are available it is the application of this information in enhancing our understanding of biology that represents the next formidable challenge. Two prominent issues should be considered. First, existing data must be analysed using the best methods available. The prediction of enzymatic activity for bestrophin, whose gene is mutated in Best macular dystrophy, is described in this review. This is an example of the experimentally testable hypotheses that can result from such detailed and exhaustive analyses. Secondly, the torrents of data from high-throughput studies will need to be made more accessible to all using web-based resources that integrate and digest complementary data types. The internet sites that showcase the human genome sequence are blazing a new trail. Ultimately, the success of genome sequencing and functional genomics will be measured not by the quantity and accuracy of raw data generated, but how rapidly they can be harnessed to span the divide between genotype and phenotype.

Inherited variations in the human genome provide a basis for phenotypic differences. Most of these are neutral and have little effect on an individual’s health. Single nucleotide polymorphisms (SNPs) are single base pair variants with allelic frequency values of at least 1%. These represent 90% of all polymorphisms in the human population. Of the more than 1.4 million SNPs identified in the draft human genome (1), only 60 000 are estimated to cause amino acid substitutions (1) (http://snp.cshl.org/). Of the estimated 1000 detrimental polymorphisms predicted in each individual (2), most will contribute to complex polygenic traits rather than being directly responsible for single gene disorders. In the human population, only approximately 1000 genes are known to be associated with Mendelian inheritable diseases (3) (so-called ‘disease genes’).

The recent availability of the human genome draft sequence (4,5) has already allowed a dramatic acceleration in disease gene discovery. Previously, this involved positional cloning after the use of genetic markers in linkage disequilibrium and association studies. The growing availability of high density cytogenic SNP linkage maps now allows the loci of rare Mendelian disorders and even some complex traits with a polygenic basis to be identified.

A further revolution is needed, however, if these breakthroughs are to usher in a new age of ‘genetic medicine’ (6) in which (i) the genetic basis of all common heritable diseases, traits or predispositions can be identified; (ii) the genetic makeup of each individual can contribute to clinical diagnoses and prognoses; (iii) the heterogeneous origins of diseases in different patients can be unravelled even when they share similar symptoms; and (iv) the resulting treatments can be adjusted to match the pharmacogenetic profiles of patient and drugs. Some of these goals are within reach and may be achievable within the next 10 years; for example, by simply extending the use of association studies with SNP linkage disequilibrium profiles without actually identifying the allelic culprits (7).

Efforts to exploit genetic information in tackling diseases can be divided into two broad approaches: ‘discovery genetics’ and ‘discovery genomics’ (7) or, in approximate terms, ‘diseases in search of genes’ and ‘genes in search of a disease’. The great advantage of the former, in proceeding from known disorders, is that, by definition, any disease genes identified will be of immediate relevance to disease diagnosis and often treatment. Already, a growing proportion of the more easily tackled common Mendelian disorders with distinct phenotypes and large families have been, or are being, addressed. Many of the remaining polygenic traits, however, are not easy targets for linkage analysis or high-throughput screening. More worrying still, there is a danger that discovery genetics may miss many potential therapeutic targets which are themselves not disease genes.

Paralogues of known human disease genes represent additions to the standard gene targets selected by discovery genetics. Paralogues are homologous genes that arose from intra-genome duplications. Some human paralogous genes have been found to be mutated in similar diseases. For example, discovery of polycystin-2 was greatly facilitated by prior identification of polycystin-1; both of these genes are mutated in autosomal dominant polycystic kidney disease (8). The availability of the human genome draft sequence allowed an initial investigation into whether novel disease gene paralogues can be identified (4). The vast majority of the 286 candidates, however, appear to represent pseudogenes or to have arisen due to the error-prone nature of the initial draft genome sequence (C.Ponting, unpublished data). For example, novel paralogues of {gamma}- and {delta}-sarcoglycans, which are mutated in human limb-girdle muscular dystrophies (9), and a dystrophin-related protein were predicted on chromosomes 8p22 and 2q34, respectively (unpublished data). However, attempts to identify these gene products using human cDNA libraries have not been successful (S.Phelps and D.Powell, unpublished data) probably indicating that they both represent pseudogenes.

‘Discovery genomics’ has the potential to deliver a greater range of gene targets for both disease diagnosis and therapeutics and yet poses a far more exacting challenge. Not only must the cellular roles of genes be accurately predicted but variations in molecular function and dysfunction must somehow be correlated with pathologies of entire systems, linking phenomena at vastly different scales. Even when the process of gene identification is successful, there is little guarantee that the disease in question will be a matter of priority and real significance for public health (10). The paucity of studies that have identified likely disease genes ab initio is an illustration of the many difficulties that remain to be overcome. One rare example of disease genomics in action is the prediction of a link between high levels of iron found in dopaminergic neurons of patients suffering from neurodegenerative diseases and proteins believed to contain one or both of putative ferric reductase and catecholamine-binding domains (11). This hypothesis has yet to be tested empirically.

It is clear that the crucial task of predicting the molecular function and cellular role of genetic sequences can only be achieved by taking into account all available information from homology to gene expression patterns to analogies with previously described genotype–phenotype relationships. This requires not only that high-throughput data representing gene expression, tissue expression, protein localization and binding partner information be widely available but also that all the various data and results of different analyses be comprehensively cross-linked and integrated. It is only through such a confluence of experimental information that biological knowledge can be teased out of raw sequence data.


    GENOME ANNOTATION RESOURCES AND SEQUENCE VARIATION
 TOP
 ABSTRACT
 GENOME ANNOTATION RESOURCES AND...
 AMINO ACID ALLELIC VARIANTS
 FINDING HOMOLOGUES
 BESTROPHIN: A WORKED EXAMPLE...
 FUTURE DIRECTIONS
 REFERENCES
 
Free and direct access to the human genome sequence and its annotation is provided by Ensembl (http://www.ensembl.org/; see also http://www.ensembl.org/genome/central/), by the National Center for Biotechnology Information (NCBI; http://www.ncbi.nlm.nih.gov/genome/guide/human/) and by the University of California at Santa Cruz (http://genome.cse.ucsc.edu/goldenPath/hgTracks.html). The greatest value of these web sites stems from their provision of different bioinformatics data cross-referenced and mapped directly onto the human genome sequence. Thus, specific regions of chromosomes may be viewed in the context of other vertebrate sequences, predicted genes and serial analysis of gene expression (SAGE) libraries. Annotated gene products are further labelled by predicted domains, known protein tertiary structures and exonic structures. The combination and juxtaposition of data from all these different sources immediately challenges scientists to associate disparate findings.

Two features of genome annotation with these resources are perhaps of particular interest to medical geneticists. The first is mapping of SNPs onto the genome. SNPs in disease genes can be used as potential candidates for causative phenotypic variations. The second is the highlighting of gene variants known to be associated with human disease genes. In both Ensembl and NCBI’s LocusLink (http://www.ncbi.nlm.nih.gov/LocusLink/), links from sequence to disease are provided using the online version of the Mendelian inheritance in man (OMIM) database, curated by Victor McKusick and colleagues, and accessible via NCBI’s Entrez system (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMIM).

OMIM provides succinct synopses of clinical diagnoses and results from genetics experiments. Allelic variants are provided, thereby allowing links between gene sequences and phenotypic information. The undoubted utility of this database, however, is diminished by the low correspondence between allelic variant and sequence information. Only 60% of OMIM entries contain allelic variants that can all be mapped faithfully onto sequences linked to OMIM via LocusLink. Some OMIM entries are not even internally consistent. For example, OMIM entry 120150 for {alpha}1 chain collagen contains allelic variants W->C and G->C that centre on a single amino acid. Many of these discrepancies are likely to have arisen from a combination of sequence errors, alternative transcripts and gene mispredictions, in addition to annotation errors. It is advisable, therefore, that analyses of variants for any particular disease listed in OMIM should proceed by first retrieving the original sequences from the associated literature sources.

Another repository of allelic variant information is SwissProt (http://ca.expasy.org/sprot/). This is a protein sequence database that provides a high degree of annotation and a low level of redundancy (12). SwissProt lists allelic variants that are associated with human disease and provides helpful links to OMIM entries. In contrast to OMIM, SwissProt variants all map faithfully onto their corresponding sequences. Since SwissProt entries are linked through to the human genomic sequence from Ensembl, this means that missense mutations can be located within the human genome and thus may be compared alongside SNPs in protein coding regions.


    AMINO ACID ALLELIC VARIANTS
 TOP
 ABSTRACT
 GENOME ANNOTATION RESOURCES AND...
 AMINO ACID ALLELIC VARIANTS
 FINDING HOMOLOGUES
 BESTROPHIN: A WORKED EXAMPLE...
 FUTURE DIRECTIONS
 REFERENCES
 
The latest version of SwissProt (June 18, 2001) contains 10 121 missense mutations from 734 protein sequences that have been linked to human disease by OMIM. An analysis of the frequency matrix for disease-associated missense mutations (Fig. 1) reveals, as expected, some differences from amino acid substitution rates seen in wild-type proteins. In particular, of the eight mutations that occur significantly more frequently (>10x) in disease-associated than in wild-type proteins (bars shown in red in Fig. 1), seven changes involve two amino acid types, cysteine (C->R, C->Y and C->G) and arginine (R->G, C->R, H->R, W->R and R->L). Other substitutions commonly found in wild-type proteins such as L->M, L->I, A->S and F->Y are relatively rare causes of disease, even though each of these mutations can arise from single base changes. Presumably, this is because similarities in physicochemical properties between these amino acids usually ensure conservation of protein function despite their substitutions.



View larger version (64K):
[in this window]
[in a new window]
 
Figure 1. The frequency matrix for 9747 missense mutations in SwissProt sequences corresponding to disease genes identified in the OMIM database. The height of each bar indicates the mutation frequency (F) between each pair of amino acids. The log2 ratios (R) of the missense mutation frequencies to those predicted using a PAM1 evolutionary model have been mapped to a colour gradient. Ratios greater than 10 are printed above the corresponding bars. Thus, bars coloured in shades of red [e.g. cysteine (C) to arginine (R)] represent mutations which are more numerous than predicted by the model while those in yellow [e.g. threonine (T) to alanine (A)] are within the expected range.

 
Cysteine and arginine are also prominent when amino acids are ranked according to the differences between mutation frequencies and background amino acid frequencies (Fig. 2). These two amino acids are both more highly substituted and more highly substituting in disease-associated variations. For cysteine, this is likely to arise from its unusual chemical properties, in particular the gain or loss, in disease variants, of disulphide-bridges or free thiols. For example, a C->T variant, predicted to disrupt a disulphide bridge, occurs in the hemochromatosis gene product (HFE or HLA-H) that is associated with hereditary haemochromatosis in Northern Europeans (2).



View larger version (25K):
[in this window]
[in a new window]
 
Figure 2. Missense mutation frequencies in diseases for each amino acid type compared with the amino acid composition of all SwissProt sequences. (A) Wild-type amino acid frequencies at the mutated position; (B) frequency distribution of substituting amino acids. The base of each arrow represents the SwissProt composition value while the tip represents the corresponding missense mutation frequency. The length of a red or blue arrow is thus in proportion to an increase or decrease in amino acid composition at the missense position compared to the wild-type composition value. Arginines and cysteines, for example, are over-represented in disease-causing missense mutations, both in replaced (A) and replacing residues (B), while fewer diseases are due to mutations at positions with alanine than expected from random errors.

 
The pre-eminence of arginine in disease-associated missense variants may be due, in part, to the degenerate arrangement of the triplet genetic code. Mutations in a single base can cause substitutions between arginine and the highest number, 12, of other amino acid types. This factor alone, however, does not account for the popularity of arginine, since leucine also more frequently replaces, or is replaced by, highly occurring amino acids but features infrequently in disease-causing mutations. This suggests that the physicochemical properties of arginine, in particular its participation in salt bridges and its prevalence in solvent-accessible peripheries rather than the hydrophobic interiors of proteins, are important for molecular stability and function.


    FINDING HOMOLOGUES
 TOP
 ABSTRACT
 GENOME ANNOTATION RESOURCES AND...
 AMINO ACID ALLELIC VARIANTS
 FINDING HOMOLOGUES
 BESTROPHIN: A WORKED EXAMPLE...
 FUTURE DIRECTIONS
 REFERENCES
 
Identification of a disease-associated gene with amino acid allelic variations provides the starting point for an investigative chain pointing from genotype to a prospective phenotype. The first link in this chain is the determination of the molecular basis of disease. Often, the first clues for a gene’s role result from an analysis of its sequence, particularly by making inferences from the common molecular function of its homologues. Determination of those homologues that are orthologues leads to a greater refinement in prediction power since orthologues possess the greatest similarities in function. Orthologues are genes that arose from speciation rather than intra-genomic duplication events and are often the most sequence-similar genes from different species. It is important here to note that only a minority of human, fruit fly and Caenorhabditis elegans (nematode worm) proteins have detectable orthologues in each of the three organisms (4). It is assumed that it is this minority of proteins that governs homologous biological processes such as morphogenesis and cellular metabolism. This hypothesis is consistent with the finding that fruit fly versions of human genes are found in comparatively greater numbers for cancer, malformation syndromes, metabolic diseases, renal diseases and neurological diseases than for other disorders (13).

Methods to predict homology by sequence similarity have advanced in recent years to the extent that many divergent homologues can now be identified. Many such analyses have been employed successfully to understand the molecular function of monogenic disease gene products (reviewed in 14). Guidance for best practice in such analyses has been provided in the literature (15). Yet the published analyses of disease gene sequences frequently fail to exploit fully their predictive potential. In order to assist such analyses we suggest the following five-point approach.

(I) Query protein sequence databases using BLAST (http://www.ncbi.nlm.nih.gov/BLAST/) and PSI-BLAST (15)
BLAST interrogates databases searching for sequences with significant similarity to a reference gene, or its gene product. In the absence of biases in amino acid composition, a pair of sequences that are aligned with a score x and have been assigned an expect (E)-value of <2 x 10–3 is highly likely to be homologous. (An E-value represents the number of different alignments with scores equivalent to, or better than, x that are expected from the database search simply by chance.) For protein sequence database searches, PSI-BLAST (http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/psi1.html) is preferred to gapped BLAST since it detects significantly greater numbers of homologues (17). PSI-BLAST queries protein databases in an iterative manner using previously found homologues to detect increasingly more subtle, yet significant, sequence similarities. Detection of divergent homologues is important since their molecular functions may be more conserved than their sequences initially may suggest.

(II) Use all available sequence databases
These should include non-redundant (nr) nucleotide and protein sequence databases, expressed sequence tag (EST) databases and predicted proteins from incompletely sequenced genomes. The latter should include the human and mouse drafts (e.g. http://www.ensembl.org/perl/blastview and http://mouse.ensembl.org/perl/blastview) and unfinished microbial genome sequences (http://www.ncbi.nlm.nih.gov/Microb_blast/unfinishedgenome.html). A useful ability to search predicted prokaryotic protein sequences using PSI-BLAST is provided by the ViruloGenome site (http://www.vge.ac.uk/blast/psiblast.html).

(III) Investigate sequences with marginal similarities
Results from PSI-BLAST searches are not symmetrical: a search of a database with sequence B might detect a significantly sequence-similar homologue A, but a search with A’s sequence might not detect B with significance. Thus, in an investigation of a sequence (A), it is sometimes worthwhile performing additional ‘reciprocal’ PSI-BLAST database searches using sequences (B) that just fail to be aligned with significant statistics (E > 2 x 10–3).

(IV) Search for protein repeats
Some proteins contain informative internal repetitions that may not be detectable using BLAST, but are found using Prospero (18) (http://www.well.ox.ac.uk/ariadne/). Prospero is implemented on the SMART web site (http://smart.embl-heidelberg.de/).

(V) Search for domains, repeats and motifs
Ninety-one percent of human disease gene products listed in SwissProt contain a domain that is recognized by either Pfam (http://www.sanger.ac.uk/Pfam/) or SMART (http://smart.embl-heidelberg.de/) resources. Consequently, it is suggested that protein sequences be annotated by domain, repeat or motif family using these resources prior to performing BLAST searches.


    BESTROPHIN: A WORKED EXAMPLE OF DISCOVERY GENETICS
 TOP
 ABSTRACT
 GENOME ANNOTATION RESOURCES AND...
 AMINO ACID ALLELIC VARIANTS
 FINDING HOMOLOGUES
 BESTROPHIN: A WORKED EXAMPLE...
 FUTURE DIRECTIONS
 REFERENCES
 
In order to illustrate how different genomes, sequence databases and analysis programs can be used to detect previously unforeseen evolutionary relationships, we shall describe the analysis of bestrophin, the product of the Best macular dystrophy or vitelliform macular dystrophy type-2 gene (VMD2). Patients with mutations in VMD2 are visually impaired with ‘egg-like’ lesions in the macular area (19,20) (http://www.uni-wuerzburg.de/humangenetics/vmd2.html). BLAST-based analysis of the bestrophin sequence shows that it is a member of protein family with numerous representatives in C.elegans (21) (http://www.sanger.ac.uk/cgi-bin/Pfam/getacc?PF01062), but no other versions in mammalia.

The existence of a bestrophin paralogue (BEST2) in mammals, however, can be established with BLAST-based searches of protein databases, EST databases and the human draft sequence. The human BEST2 sequence can be predicted from a human mRNA sequence (FLJ20132), from human genomic sequence in chromosome 19p13.2 (accession no. AC018761) and by comparison with an EST coding for mouse BEST2 (GeneInfo code 11655934). Whether mutations of the BEST2 gene result in visual defects remains to be investigated.

The molecular functions of bestrophin are unknown. In order to determine whether the identification of additional bestrophin homologues might provide some insight into their functions, we performed additional database searches. PSI-BLAST searches of the nr database at the NCBI identified no further homologues. A similar conclusion was drawn from PSI-BLAST searches of the nr database at the ViruloGenome site (http://www.vge.ac.uk/blast/psiblast.html). In the latter search, an Escherichia coli sequence (ORF b1520; GeneInfo code 7466763) showed marginal sequence similarity to bestrophin (E = 5.3). However, in a reciprocal search of the ViruloGenome nr database using this sequence and default search parameters, PSI-BLAST identified bestrophin as an E.coli b1520 homologue in round 2 with significant statistics (E = 8 x 10–4). The multiple sequence alignment of bestrophin and E.coli b1520 homologues is presented in Figure 3.



View larger version (66K):
[in this window]
[in a new window]
 
Figure 3. Multiple sequence alignment of bestrophilin, its paralogue BEST2, and plant and bacterial homologues. Of 64 known missense mutations in the bestrophilin protein sequence in patients with Best disease, 45 are contained within this alignment and are indicated by a red asterisk. Purple vertical lines indicate pairs of predicted transmembrane helices whose sequences have been excised from the alignment. An additional database search using the motif searching tool (23) and the region of 19 b1520-like homologues (overlined in red) identified C.elegans bestrophin homologues (C43G2.4 and F32B6.9) in a single round with significance (E < 2.5 x 10–2). Numbers in parentheses represent excised sequences. In descending order, GeneInfo and amino acid residue numbers (if appropriate) are: 6175195 (17–324), 8923137 (1–281), 6942151 (18–331), 7294298 (18–328), 7511281 (9–317), 7507956 (151–458), 11282347 (69–383), 7486458 (73–387), 7466763 (26–305) and 13473700 (17–295). Species abbreviations: At, Arabidopsis thaliana; Ce, C.elegans; Dm, Drosophila melanogaster; Ec, E.coli; Hs, Homo sapiens; Ml, Mesorhizobium loti. For the human BEST2 sequence, amino acids in lower case correspond to mouse sequence that is currently missing from human data. The multiple sequence alignment has been coloured using CHROMA (24) (http://www.lg.ndirect.co.uk/chroma/) and an 80% consensus threshold.

 
This example highlights the benefits of using PSI-BLAST, reciprocal searches and comprehensive databases (steps I, II and III above). Unfortunately, since none of the newly identified bestrophin homologues have been subjects of empirical investigation, the molecular function of bestrophin cannot simply be inferred from that of its homologues. In such circumstances, it is sometimes worthwhile to consider several ‘rules-of-thumb’ for predicting function by analogy (22). These include that: (i) active sites usually consist of conserved polar residues (C, D, E, H, K, N, Q, R, S and T); (ii) large aromatic residues (F, H, W and Y) are often found in protein–ligand binding sites; (iii) C, D, E, H, N and Q can coordinate zinc ions in active sites or zinc fingers; and (iv) H, K, R, S and T sometimes are involved in binding phosphate or sulphate groups.

For the bestrophin homologues, five amino acids are absolutely conserved (Fig. 3): S, R, P, Y and D. With the exception of proline (P), which is likely to have a structural, rather than a functional role, conservation of these amino acids implies that the bestrophin family possess catalytic activity. The precise type or specificity of this enzymatic activity, however, cannot be predicted directly.


    FUTURE DIRECTIONS
 TOP
 ABSTRACT
 GENOME ANNOTATION RESOURCES AND...
 AMINO ACID ALLELIC VARIANTS
 FINDING HOMOLOGUES
 BESTROPHIN: A WORKED EXAMPLE...
 FUTURE DIRECTIONS
 REFERENCES
 
The increasing availability of vast quantities of data from high-throughput projects represents an almost unprecedented wealth for all of biology. Two efforts from the biological community are required to transform these newfound riches into medical and scientific breakthroughs. The first is to make more effective use of current data and analytic methods so as to exploit fully the new genetic and molecular information. The second requirement is for more comprehensive integration of contrasting bioinformatics data types. New sets of focused and empirically useful hypotheses will be inspired by the conjunction and cross-referencing of complementary information for genome sequences, gene and tissue expression and protein post-translational modification, localisation and interaction, as well as comparisons of homologues across species, whether paralogues or orthologues. The ultimate goal must be to span the divides between genetic and molecular phenomena and cellular and whole organism physiology.


    ACKNOWLEDGEMENTS
 
We would like to thank Drs Pat Clissold and Richard Emes for their helpful suggestions and comments.


    FOOTNOTES
 
+ To whom correspondence should be addressed. Tel/Fax: +44 1865 272175; Email: chris.ponting@anat.ox.ac.uk Back


    REFERENCES
 TOP
 ABSTRACT
 GENOME ANNOTATION RESOURCES AND...
 AMINO ACID ALLELIC VARIANTS
 FINDING HOMOLOGUES
 BESTROPHIN: A WORKED EXAMPLE...
 FUTURE DIRECTIONS
 REFERENCES
 
1 Sachidanandam, R., Weissman, D., Schmidt, S.C., Kakol, J.M., Stein, L.D., Marth, G., Sherry, S., Mullikin, J.C., Mortimore, B.J., Willey, D.L. et al. (2001) A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature, 409, 928–933.[Medline]

2 Sunyaev, S., Ramensky, V., Koch, I., Lathe, W.,III, Kondrashov, A.S. and Bork, P. (2001) Prediction of deleterious human alleles. Hum. Mol. Genet., 10, 591–597.[Abstract/Free Full Text]

3 Antonarakis, S.E. and McKusick, V.A. (2000) OMIM passes the 1,000-disease-gene mark. Nat. Genet., 25, 11.[Web of Science][Medline]

4 Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W. et al. (2001) Initial sequencing and analysis of the human genome. Nature, 409, 860–921.[Medline]

5 Venter, J.C., Adams, M.D., Myers, E.W., Li, P.W., Mural, R.J., Sutton, G.G., Smith, H.O., Yandell, M., Evans, C.A., Holt, R.A. et al. (2001) The sequence of the human genome. Science, 291, 1304–1351.[Abstract/Free Full Text]

6 Collins, F.S. and McKusick, V.A. (2001) Implications of the human genome project for medical science. J. Am. Med. Assoc., 285, 540–544.[Abstract/Free Full Text]

7 Roses, A.D. (2000) Pharmacogenetics and the practice of medicine. Nature, 405, 857–865.[Medline]

8 Schneider, M.C., Rodriguez, A.M., Nomura, H., Zhou, J., Morton, C.C., Reeders, S.T. and Weremowicz, S. (1996) A gene similar to PKD1 maps to chromosome 4q22: a candidate gene for PKD2. Genomics, 15, 1–4.

9 Bushby, K.M.D. (1999) The limb-girdle muscular dystrophies-multiple genes, multiple mechanisms. Hum. Mol. Genet., 8, 1875–1882.[Abstract/Free Full Text]

10 Risch, N.J. (2000) Searching for genetic determinants in the new millennium. Nature, 405, 847–856.[Medline]

11 Ponting, C.P. (2001) Domain homologues of dopamine ß-hydroxylase and ferric reductase: roles for iron metabolism in neurodegenerative disorders? Hum. Mol. Genet., 10, 1853–1858.[Abstract/Free Full Text]

12 Bairoch, A. and Apweiler, R. (2000) The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res., 28, 45–48.[Abstract/Free Full Text]

13 Fortini, M.E., Skupski, M.P., Boguski, M.S. and Hariharan, I.K. (2000) A survey of human disease gene counterparts in the Drosophila genome. J. Cell Biol., 150, F23–F29.[Abstract/Free Full Text]

14 Sreekumar, K.R., Aravind, L. and Koonin, E.V. (2001) Computational analysis of human disease-associated genes and their protein products. Curr. Opin. Genet. Dev., 11, 247–257.[Web of Science][Medline]

15 Bork, P. and Koonin, E.V. (1998) Predicting functions from protein sequences — where are the bottlenecks? Nat. Genet., 18, 313–318.[Web of Science][Medline]

16 Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D.J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 3389–3402.[Abstract/Free Full Text]

17 Park, J., Karplus, K., Barrett, C., Hughey, R., Haussler, D., Hubbard, T. and Chothia, C. (1998) Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. J. Mol. Biol., 284, 1201–1210.[Web of Science][Medline]

18 Mott, R. (2000) Accurate formula for P-values of gapped local sequence and profile alignments. J. Mol. Biol., 300, 649–659.[Web of Science][Medline]

19 Marquardt, A., Stöhr, H., Passmore, L.A., Krämer, F., Rivera, A. and Weber, B.H.F. (1998) Mutations in a novel gene, VMD2, encoding a protein of unknown properties cause juvenile-onset vitelliform macular dystrophy (Best’s disease). Hum. Mol. Genet., 7, 1517–1525.[Abstract/Free Full Text]

20 Petrukhin, K., Koisti, M.J., Bakall, B., Li, W., Xie, G., Marknell, T., Sandgren, O., Forsman, K., Holmgren, G., Andreasson, S. et al. (1998) Identification of the gene responsible for Best macular dystrophy. Nat. Genet., 19, 241–247.[Web of Science][Medline]

21 Sonnhammer, E.L. and Durbin, R. (1997) Analysis of protein domain families in Caenorhabditis elegans. Genomics, 46, 200–216.[Web of Science][Medline]

22 Ponting, C.P. (2001) Issues in predicting protein function from sequence. Brief. Bioinformatics, 2, 19–29.[Abstract/Free Full Text]

23 Tatusov, R.L., Altschul, S.F. and Koonin, E.V. (1994) Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks. Proc. Natl Acad. Sci. USA, 91, 12091–12095.[Abstract/Free Full Text]

24 Goodstadt, L. and Ponting, C.P. (2001) CHROMA: Consensus-based colouring of multiple alignments for publication. Bioinformatics, in press.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
M. Ryan, M. Diekhans, S. Lien, Y. Liu, and R. Karchin
LS-SNP/PDB: annotated non-synonymous SNPs mapped to Protein Data Bank structures
Bioinformatics, June 1, 2009; 25(11): 1431 - 1432.
[Abstract] [Full Text] [PDF]


Home page
Genes Dev.Home page
S. M. Garcia, M. O. Casanueva, M. C. Silva, M. D. Amaral, and R. I. Morimoto
Neuronal signaling modulates protein homeostasis in Caenorhabditis elegans post-synaptic muscle cells
Genes & Dev., November 15, 2007; 21(22): 3006 - 3016.
[Abstract] [Full Text] [PDF]


Home page
J. Neurosci.Home page
A. M. Isaacs, P. L. Oliver, E. L. Jones, A. Jeans, A. Potter, B. H. Hovik, P. M. Nolan, L. Vizor, P. Glenister, A. K. Simon, et al.
A Mutation in Af4 Is Predicted to Cause Cerebellar Ataxia and Cataracts in the Robotic Mouse
J. Neurosci., March 1, 2003; 23(5): 1631 - 1637.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
I. Letunic, L. Goodstadt, N. J. Dickens, T. Doerks, J. Schultz, R. Mott, F. Ciccarelli, R. R. Copley, C. P. Ponting, and P. Bork
Recent improvements to the SMART domain-based sequence annotation resource
Nucleic Acids Res., January 1, 2002; 30(1): 242 - 244.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (17)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Goodstadt, L.
Right arrow Articles by Ponting, C. P.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Goodstadt, L.
Right arrow Articles by Ponting, C. P.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?