Skip Navigation

Human Molecular Genetics 2006 15(Review Issue 1):R81-R87; doi:10.1093/hmg/ddl086
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Reeves, G. A.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Reeves, G. A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Integrating biological data through the genome

Gabrielle A. Reeves*, Janet M. Thornton and the BioSapiens Network of Excellence

EMBL—European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK

* To whom correspondence should be addressed. Tel: +44 1223492536; Fax: +44 1223494486; Email: gabby{at}ebi.ac.uk

Received March 6, 2006; Revised March 14, 2006; Accepted March 30, 2006


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 BioSapiens NETWORK OF EXCELLENCE
 SOME RECENT ADVANCES IN...
 NEW APPROACHES FOR DERIVING...
 CONCLUSIONS
 REFERENCES
 
Owing to the ongoing success of the genome sequencing and structural genomics projects, the increase in both sequence and structural data is rapid. The development of tools for the annotation of sequence and structural data has become more important in the hope of keeping up with this data explosion. Scientists in this field have addressed these issues over the last 10 years and there now exists a wealth of methods and approaches to help interpret these data. However, there is no current way in which these methods can be incorporated easily so that the resulting annotations can be viewed together. This review discusses the development of these annotation methods and introduces the BioSapiens Network of Excellence, which has been formed in order to integrate the methods which have been developed in Europe.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 BioSapiens NETWORK OF EXCELLENCE
 SOME RECENT ADVANCES IN...
 NEW APPROACHES FOR DERIVING...
 CONCLUSIONS
 REFERENCES
 
The first draft of the human genome sequence was published in 2001 (1Go). In total, there are now over 300 completed genomes (Fig. 1). It is now possible to sequence a whole bacterium in a few days and the flood of new sequence data is likely to continue for the next decade. In addition, with the advent of the structural genomics initiatives, it is hoped that we will also witness a significant growth in the diversity of three-dimensional protein structures that are available, many of which are unknown functions. With this rapid increase in both structural and sequence data comes new challenges into the interpretation and integration of these data. After all, DNA is merely a string of letters from which we must determine the coding and non-coding and promoter and regulatory regions that control transcription and translation.


Figure 0861
View larger version (16K):
[in this window]
[in a new window]
 
Figure 1. Number of sequenced genomes (data are non-redundant at the species level).

 
What is annotation and why is it needed?
Annotating the biological role of a molecule usually involves both experimental and in silico approaches. Genome annotation starts by defining the positions of all the genes along the sequence and by identifying their coding regions, regulatory sequences and promoters. From the gene sequence, the next task is the definition of the proteome—all the proteins that can be encoded by a particular genome. Although this is simple in principle, it is complicated by alternatively spliced transcript variants and processing of preprotein sequences, such as removal of propeptides. It has been estimated that there are 10 times as many different proteins as there are genes. Once the proteins (and RNAs) and their cellular localization have been defined, secondary annotation to provide identification of biochemical and biological function is needed. Currently, around half of all proteins defined in most genome projects have no assigned function (2Go).

Protein families provide a powerful route to improved protein annotation. The three-dimensional structures of proteins provide detailed knowledge of residue locations and probable functional sites. Post-translational modifications are often important for function, and localization in the cell or organism is another important constraint. Function prediction for these gene products can be made through sequence analysis. This can be combined with analysis into when and where this gene product is produced (transcriptomics). In addition, the identification of relevant protein–protein interactions will provide further clues for functional characterization, as can knowledge of the pathways and networks in which they participate. Tools for comparative genomics, to map interactions and networks from one organism to another, will be critical. In addition, for humans, sequence variations among individuals are particularly important, especially in the context of disease and inherited disorders. A summary of the different categories of annotation can be seen in Box 1 of Figure 2.


Figure 0862
View larger version (30K):
[in this window]
[in a new window]
 
Figure 2. Schematic representation showing the different types of annotations which are carried out on nucleic acid and protein sequences and structures (Box 1). Box 2 shows the three different types of methods for annotation; manual curation (including third party annotations), inference of functional information from structural or sequence homology and computational prediction.

 
The process of annotation is highly complex and can be subdivided into a number of different categories (Fig. 2, Box 2). Those from experimental observations and manually curated provide the most accurate information. This information can be transferred to homologous sequences and structures. Most annotations in the databases are derived using this approach. In addition, some features can be computationally predicted. Manual curation provides robust and reliable annotations; however, methods are slow and it is not possible to provide all annotations in this way. Methods for the transfer of annotations from one homologue to another rely on the quality of the annotation being transferred and on the quality of the method being used for the identification of homology. In addition, as the sequences or structures become more distantly related, the transfer of information becomes more risky. For all computationally predicted annotations, problems still remain with the accuracy and confidence of such annotations. Providing these annotations in conjunction with those experimentally observed annotations adds weight. Methods for annotation of nucleic acid and protein sequence and structure are divided among the scientific community with surprisingly little communication between groups.

Current status of genome annotation
Methods for the transfer of this data from experimentalist to the databases are slow to develop. Currently, the majority of such information is transferred from the literature to the databases via manual curators. In addition, many of the current computational tools are inadequate and need further development and careful validation against experimental data. The results from annotation efforts must be integrated in such a way that the information is clear and incisive in order to guide experimentalist's future work.

As structural and sequence data increases, it will be impossible to experimentally validate all the predictions of protein functions. Indeed, even now, probably <1% of all known proteins have ever been experimentally characterized. Therefore, the need to improve computational approaches, by increasing the accuracy of functional inference from sequence and structure, is of paramount importance. The proteins encoded by most of the newly sequenced genomes have very limited annotations in UniProt Knowledgebase/Swiss-Prot (3Go) because Swiss-Prot depends on manual annotation to uphold its high standard of functional annotation. It is impossible for Swiss-Prot's curators to keep up with the current deluge of genomic sequence. Currently, those sequences which are released but are not annotated are stored in UniProtKB/TrEMBL, the computer annotated supplement to Swiss-Prot in which annotations are inferred from Swiss-Prot to TrEMBL sequences by homology (4Go). Similarly, as the number of three-dimensional structures solved through structural genomics initiatives increases, so does the need for automated methods to derive functional information from protein structures and to feed this information back into genome annotation. Transcriptome and proteome data are now routinely generated, and these data must also be integrated into genome annotation so that functional inferences can be made about gene products on a genomewide scale.

To date, European scientists have been very active in the field of genome and protein annotation, with Ensembl (5Go) and Swiss-Prot being the primary resources in use worldwide. Until recently, the flow of information from experimental studies to databanks has been via the literature by curator or direct input from experimentalists or in some cases from bioinformatics analysis (for protein sequence databases Swiss-Prot and TrEMBL and nucleic acid databases EMBL, GenBank and DDBJ). However, the increase in bioinformatics tools providing functional annotations (see Table 1 for selected examples) has led to the need for a joint effort to provide an infrastructure to view the results in a single place. Many of the tools used in genome and protein sequence and structure annotation, prediction and validation and pathway analysis have been developed in Europe, as well as many of the secondary resources derived from protein sequences and structures are also European (Table 1). However, the groups that develop the methods to improve genome annotation are widely distributed throughout Europe and the best methods have not yet been incorporated into publicly available genome annotations. Furthermore, these methods are continually changing and improving, so keeping up to date becomes problematic. Annotation is not a one-pass activity but needs to be reiterated as knowledge increases in related areas. The fragmentation of currently available resources for genome annotation means that only a few bioinformatics experts know where to look for them. Consequently, most experimentalists cannot access all the best information about a genome. This problem will only worsen as annotation methods become more sophisticated and more bioinformatics laboratories are established to handle all the new data.


View this table:
[in this window]
[in a new window]
 
Table 1. A selection of tools and resources developed in Europe for genome and protein sequence and structure annotation
 
Over recent years, there has been a move towards the integration of annotations and methods for annotating by the formation of consortium-based projects. The National Human Genome Research Institute began a public research consortium in September 2003 to annotate all functional elements in the human genome, in order to create an Encyclopaedia of DNA Elements (ENCODE). This ENCODE project is currently in its pilot phase which is to annotate selected regions constituting 1% of the human genome. In 2004, the European-based BioSapiens Network of Excellence was formed from 26 participating organizations from 14 countries. The aim of this consortium was to create a European Virtual Institute for Genome Annotation. This review outlines the goals of the BioSapiens Network of Excellence in providing an answer to the problems of integration and discusses the infrastructure for doing this. In addition, examples of advances made from each of the three categories of annotation (manual curation, computationally inferred and computationally predicted) within the BioSapiens Network of Excellence are discussed. The problems of ever-increasing data and functional information faced by manual annotators have been tackled by setting up the infrastructure for annotations by a third party (third party annotations). We have chosen to focus the final section of this review on the prediction of function from structure where there have been significant advances in methods for inferring and computationally predicting annotations.


    BioSapiens NETWORK OF EXCELLENCE
 TOP
 ABSTRACT
 INTRODUCTION
 BioSapiens NETWORK OF EXCELLENCE
 SOME RECENT ADVANCES IN...
 NEW APPROACHES FOR DERIVING...
 CONCLUSIONS
 REFERENCES
 
The BioSapiens Network of Excellence coordinates and exploits the best developments in genome annotation in Europe and includes some of the best bioinformatics laboratories in the world, chosen for their complementary expertise to cover the current major challenges in genome annotation. The objective of the BioSapiens Network of Excellence is to provide an infrastructure to support a large scale, concerted effort to annotate genome data by laboratories distributed around Europe, using both informatics tools and input from experimentalists. This has been achieved technically through the implementation of distributed annotation systems (DAS) (6Go).

DAS: an infrastructure to deliver annotations
The annotations generated by the institute are gradually being made available in the public domain and easily accessible through a single portal on the web (http://www.biosapiens.info/) through a DAS. In this system, a central reference server is initiated from which a number of annotation servers can display their data. The final format of the page shows all information from the uploaded annotation servers. This allows genome and protein annotations generated and stored in other institutes to be updated at source and made instantly visible through the EBI. Currently, the BioSapiens Network of Excellence has produced annotations at 19 partner sites providing 64 different distributed annotation sources. This comprises information for genomic sequences and protein sequences as well as for protein structures. Visualization of these data is provided by three DAS clients (software for viewing annotations presented in DAS server format): the Ensembl genome browser for both genomic and proteomic annotations, Dasty for protein sequence annotations and Spice for both protein sequence and structural annotations.


    SOME RECENT ADVANCES IN ANNOTATION METHODS WITHIN THE BioSapiens NETWORK OF EXCELLENCE
 TOP
 ABSTRACT
 INTRODUCTION
 BioSapiens NETWORK OF EXCELLENCE
 SOME RECENT ADVANCES IN...
 NEW APPROACHES FOR DERIVING...
 CONCLUSIONS
 REFERENCES
 
Within the BioSapiens Network of Excellence, examples of advances in all three categories of annotation can be seen. The feasibility establishing protocols for the addition and update of manually curated annotations from experimental studies has been examined. There have also been advances in algorithms for the prediction and inference of annotations. In this review, we will concentrate on the work carried out in the structure-to-function work package within the BioSapiens Network of Excellence and the developments of methods within this domain.

Manual curation from experimental evidence: third party annotations
Within the BioSapiens Network of Excellence, a review was undertaken into the feasibility and progress of the integration of experimental data from a third party into new and pre-existing entries in major knowledgebases (third party annotations). The study was carried out by examining evidence from the existing TPA protocol implemented by the EMBL Nucleotide Sequence Database (7Go) in collaboration with GenBank (8Go) and DDBJ (9Go).

This system has been running for 4 years so far and has encountered a number of challenges and provided a number of solutions. First, with third party annotations comes a greater flow of information into the database. This has consequences for controlling both the quality of the input and also the uniformity within the descriptions of the different database fields. The EMBL Nucleotide Sequence Database has employed a process of accepting only peer reviewed, published information in order for the quality to remain high. In addition, annotations are carefully controlled by the direct submission protocol. Curators then have a manageable task communicating with the experimentalists in order for the annotation to be added to the database. The second challenge is to encourage experimentalists to spend time inputting their data into these sources. To address this, submission directly into the database for both primary and third party sources is encouraged via a number of major journal publishers with submission being a pre-requisite for publication. The protocol has been so successful that the DDBJ/EMBL/GenBank collaboration has recently increased coverage by accepting data, which do not have experimental evidence and thus introducing an expansion in their data set of third party annotations (10Go).

The UniProtKB/Swiss-Prot (11Go) team is proposing a similar protocol in order to widen the bottle-neck existing for the flow of data from experimentalist to database. There will be differences between the existing EMBL Nucleotide Sequence Database protocol and the proposed Swiss-Prot systems. This is due to the higher level of ambiguity involved in describing the features of a protein sequence. Direct experimental group submissions would typically be free text input leading to interaction between curator and experimentalist, ensuring the highest quality annotation will be achieved for the sequence. This is currently being explored with the yeast community as a test case in order to develop the scheme further. Other schemes are also proposed in order to exploit fully the knowledge available in the life-science community. The adopt-a-protein scheme would encourage individual scientists to be responsible for the annotations of one or more particular proteins. These experts would ensure that entries were up-to-date and of a uniform quality. In a similar vein, UniProt would like to extend the collaborative expert curation of protein families and also make use of a growing number of senior scientists who no longer run their own laboratories, to enhance both annotation quality and productivity. It only remains to be seen whether these methods will need further development once direct submissions, both primary and third party, increase in number.


    NEW APPROACHES FOR DERIVING FUNCTIONAL ANNOTATIONS FROM STRUCTURE
 TOP
 ABSTRACT
 INTRODUCTION
 BioSapiens NETWORK OF EXCELLENCE
 SOME RECENT ADVANCES IN...
 NEW APPROACHES FOR DERIVING...
 CONCLUSIONS
 REFERENCES
 
Within the BioSapiens network, one of the major areas of development has been methods for the elucidation of function from structure. The inference of function at a structural level is more informative than at the sequence level. Structure provides us with information, for example, the three-dimensional distances between functional residues and the shape and electrostatic properties of the surface (Fig. 3). In addition, structures with >35% sequence identity generally share a similar structure (12Go), allowing us to look deeper into their evolutionary relationships. Within this consortium, two automatic functional annotation pipelines have been developed, which use these structural properties.


Figure 0863
View larger version (49K):
[in this window]
[in a new window]
 
Figure 3. The prediction of function from structure.

 
Prediction of function from structure: automatic annotation pipelines
FunCut (13Go) is an automatic protein annotation system which assigns functional information to a query sequence on the basis of the study of its homologous sequences. The method applies a clustering algorithm to classify sequences into protein subfamilies. The functional descriptions of the sequences related to the query protein, information provided in the ‘keyword’ field of Swiss-Prot, the E.C. numbers and the Swiss-Prot description line, are filtered and weighted by the distance between subfamilies and transferred using a set of manually derived rules.

The ProFunc server (14Go) is a fully automated prediction server for predicting the likely function of proteins whose 3D structure is known. A number of sequence and structure-based methods are run in order to identify functional motifs or close relationships to functionally characterized proteins. The methods currently incorporated in the server include standard sequence searches (BLAST) (15Go), sequence motif scans (InterProScan and SUPERFAMILY) (16Go,17Go) and gene neighbour analysis. In addition, wholly structure-based methods including fold matching (SSM) (18Go), structural motifs (19Go) (DNA-binding HTH motifs and ‘nests’) (20Go,21Go) and 3D residue templates (including enzyme active-site templates from the CSA) (22Go) are also used.

Prediction of function from structure: advances in binding site prediction
The recognition of small molecules (ligands, metals and cofactors) by proteins remains a key factor in cellular processes. A greater understanding into the extent to which the conformational space of a ligand is restricted upon binding to protein could advance the fields of docking, structure refinement and function prediction. With this in mind, an analysis of the conformational variability shown by three highly ubiquitous biological ligands, ATP, NAD and FAD, when bound to different proteins, has been carried out (23Go). The results show that the ligands bind to proteins in a wide array of conformations including some energetically unfavourable orientations. The study provides quantitative assessment of previous observations that ligands tend to unfold when binding to proteins.

A major goal in the annotation of uncharacterized protein structures is the identification of location, shape and size of the ligand-binding site of that structure. The recently developed method SURFNET-ConSurf (24Go) identifies the location and shape of ligand-binding sites. It combines two known measures of ‘functionality’ in proteins: cleft volume and residue conservation. First, this two-step method uses the SURFNET program to identify clefts in the protein surface that are potential binding sites. These clefts are then trimmed by cutting away regions which are distant from highly conserved residues, using the ConSurf-HSSP database definitions. The largest remaining clefts are more likely to be those where ligands bind. The algorithm was tested by the analysis of a non-redundant set of 244 protein structures from the PDB and found that SURFNET-ConSurf identified a ligand-binding pocket in 75% of them.

Prediction of function from structure: protein–protein interactions
Another area of major importance in protein-structure annotation is understanding of how proteins interact with each other and the interfaces which exist in fully functional biological units. Previous software (PQS) (25Go) has been developed to distinguish between biological interactions and crystal-packing interactions. However, advances have been made in this field by the development of NOXclass (26Go). Protein–protein interfaces can be defined using different properties: interface area, ratio of interface area to protein surface area, amino-acid composition of the interface, correlation between the amino-acid compositions of interface and protein surface, interface shape complementarity and conservation of the interface. A two-stage SVM classifier was trained with these interface properties and produces a classifier for distinguishing three types of protein–protein interfaces: non-biological interactions and biological in the form of obligate and non-obligate interactions. In obligate interactions, at least one of the two binding partners takes its native structure only after binding, but in non-obligate interactions, both interaction partners are structurally stable by themselves. The method is available as a web service (http://noxclass.bioinf.mpi-inf.mpg.de/).


    CONCLUSIONS
 TOP
 ABSTRACT
 INTRODUCTION
 BioSapiens NETWORK OF EXCELLENCE
 SOME RECENT ADVANCES IN...
 NEW APPROACHES FOR DERIVING...
 CONCLUSIONS
 REFERENCES
 
With the deluge in both new sequence and structure data comes the need to improve methods for the functional characterization of these data. The process of annotation is complex and can be done in a number of ways. First, manual annotation via curator who uses evidence directly from the experimentalist and from the literature. Secondly, the transfer of annotations from one homologue to another and lastly by computational prediction. Approaches for annotation in all three categories have been developed in earnest over the last 10 years. However, these methods lay scattered over the web with no way of integrating the resulting annotations. The incorporation of these annotations into a single resource is clearly the next step and the BioSapiens Network of Excellence was set up in order to compete such a task.


    ACKNOWLEDGEMENT
 
This work was funded through the BioSapiens Network of Excellence, by the European Commission within its FP6 Programme, under the thematic area ‘Life Sciences, Genomics and Biotechnology for Health,’ contract number LHSG-CT- 2003-503265.

Conflict of Interest statement. None declared.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 BioSapiens NETWORK OF EXCELLENCE
 SOME RECENT ADVANCES IN...
 NEW APPROACHES FOR DERIVING...
 CONCLUSIONS
 REFERENCES
 

  1. Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W. et al. (2001) Initial sequencing and analysis of the human genome. Nature, 409, 860–921.[CrossRef][Medline]

  2. Marsden, R.L., Lee, D., Maibaum, M., Yeats, C. and Orengo, C.A. (2006) Comprehensive genome analysis of 203 genomes provides structural genomics with new insights into protein family space. Nucleic Acids Res., 34, 1066–1080.[Abstract/Free Full Text]

  3. Bairoch, A., Boeckmann, B., Ferro, S. and Gasteiger, E. (2004) Swiss-Prot: juggling between evolution and stability. Brief. Bioinform., 5, 39–55.[Abstract/Free Full Text]

  4. Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M.C., Estreicher, A., Gasteiger, E., Martin, M.J., Michoud, K., O'Donovan, C., Phan, I. et al. (2003) The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res., 31, 365–370.[Abstract/Free Full Text]

  5. Birney, E., Andrews, D., Caccamo, M., Chen, Y., Clarke, L., Coates, G., Cox, T., Cunningham, F., Curwen, V., Cutts, T. et al. (2006) Ensembl 2006. Nucleic Acids Res., 34, D556–D561.[Abstract/Free Full Text]

  6. Dowell, R.D., Jokerst, R.M., Day, A., Eddy, S.R. and Stein, L. (2001) The distributed annotation system. BMC Bioinform., 2, 7.[CrossRef][Medline]

  7. Cochrane, G., Aldebert, P., Althorpe, N., Andersson, M., Baker, W., Baldwin, A., Bates, K., Bhattacharyya, S., Browne, P., van den, B.A. et al. (2006) EMBL Nucleotide Sequence Database: developments in 2005. Nucleic Acids Res., 34, D10–D15.[Abstract/Free Full Text]

  8. Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J. and Wheeler, D.L. (2006) GenBank. Nucleic Acids Res., 34, D16–D20.[Abstract/Free Full Text]

  9. Okubo, K., Sugawara, H., Gojobori, T. and Tateno, Y. (2006) DDBJ in preparation for overview of research activities behind data submissions. Nucleic Acids Res., 34, D6–D9.[Abstract/Free Full Text]

  10. Cochrane, G., Bates, K., Apweiler, R., Tateno, Y., Mashima, J., Kosuge, T., Mizrachi, I.K., Schafer, S. and Fetchko, M. (2006) Omics, in press.

  11. Wu, C.H., Apweiler, R., Bairoch, A., Natale, D.A., Barker, W.C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez, R. et al. (2006) The Universal Protein Resource (UniProt): an expanding universe of protein information. Nucleic Acids Res., 34, D187–D191.[Abstract/Free Full Text]

  12. Chothia, C. and Lesk, A.M. (1987) The evolution of protein structures. Cold Spring Harb. Symp. Quant. Biol., 52, 399–405.[Abstract/Free Full Text]

  13. Abascal, F. and Valencia, A. (2003) Automatic annotation of protein function based on family identification. Proteins, 53, 683–692.[CrossRef][Web of Science][Medline]

  14. Laskowski, R.A., Watson, J.D. and Thornton, J.M. (2005) ProFunc: a server for predicting protein function from 3D structure. Nucleic Acids Res., 33, W89–W93.[Abstract/Free Full Text]

  15. Altschul, S.F., Gish, W., Miller, W., Myers, E.W. and Lipman, D.J. (1990) Basic local alignment search tool. J. Mol. Biol., 215, 403–410.[CrossRef][Web of Science][Medline]

  16. Quevillon, E., Silventoinen, V., Pillai, S., Harte, N., Mulder, N., Apweiler, R. and Lopez, R. (2005) InterProScan: protein domains identifier. Nucleic Acids Res., 33, W116–W120.[Abstract/Free Full Text]

  17. Madera, M., Vogel, C., Kummerfeld, S.K., Chothia, C. and Gough, J. (2004) The SUPERFAMILY database in 2004: additions and improvements. Nucleic Acids Res., 32, D235–D239.[Abstract/Free Full Text]

  18. Krissinel, E. and Henrick, K. (2004) Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. Acta Crystallogr. D Biol. Crystallogr., 60, 2256–2268.[CrossRef][Medline]

  19. Laskowski, R.A., Watson, J.D. and Thornton, J.M. (2005) Protein function prediction using local 3D templates. J. Mol. Biol., 351, 614–626.[CrossRef][Web of Science][Medline]

  20. Shanahan, H.P., Garcia, M.A., Jones, S. and Thornton, J.M. (2004) Identifying DNA-binding proteins using structural motifs and the electrostatic potential. Nucleic Acids Res., 32, 4732–4741.[Abstract/Free Full Text]

  21. Watson, J.D., Laskowski, R.A. and Thornton, J.M. (2005) Predicting protein function from sequence and structural data. Curr. Opin. Struct. Biol., 15, 275–284.[CrossRef][Web of Science][Medline]

  22. Torrance, J.W., Bartlett, G.J., Porter, C.T. and Thornton, J.M. (2005) Using a library of structural templates to recognise catalytic sites and explore their evolution in homologous families. J. Mol. Biol., 347, 565–581.[CrossRef][Web of Science][Medline]

  23. Stockwell, G.R. and Thornton, J.M. (2006) Conformational diversity of ligands bound to proteins. J. Mol. Biol., 356, 928–944.[CrossRef][Web of Science][Medline]

  24. Glaser, F., Morris, R.J., Najmanovich, R.J., Laskowski, R.A. and Thornton, J.M. (2006) A method for localizing ligand binding pockets in protein structures. Proteins, 62, 479–488.[CrossRef][Web of Science][Medline]

  25. Henrick, K. and Thornton, J.M. (1998) PQS: a protein quaternary structure file server. Trends Biochem. Sci., 23, 358–361.[CrossRef][Web of Science][Medline]

  26. Zhu, H., Domingues, F.S., Sommer, I. and Lengauer, T. (2006) NOXclass: prediction of protein–protein interaction types. BMC Bioinform., 7, 27.[CrossRef][Medline]

  27. Blom, N., Gammeltoft, S. and Brunak, S. (1999) Sequence and structure-based prediction of eukaryotic protein phosphorylation sites. J. Mol. Biol., 294, 1351–1362.[CrossRef][Web of Science][Medline]

  28. Julenius, K., Molgaard, A., Gupta, R. and Brunak, S. (2005) Prediction, conservation analysis, and structural characterization of mammalian mucin-type O-glycosylation sites. Glycobiology, 15, 153–164.[Abstract/Free Full Text]

  29. Bendtsen, J.D., Nielsen, H., von Heijne, G. and Brunak, S. (2004) Improved prediction of signal peptides: SignalP 3.0. J. Mol. Biol., 340, 783–795.[CrossRef][Web of Science][Medline]

  30. Emanuelsson, O., Nielsen, H., Brunak, S. and von Heijne, G. (2000) Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. J. Mol. Biol., 300, 1005–1016.[CrossRef][Web of Science][Medline]

  31. Jones, D.T., Taylor, W.R. and Thornton, J.M. (1994) A model recognition approach to the prediction of all-helical membrane protein structure and topology. Biochemistry, 33, 3038–3049.[CrossRef][Medline]

  32. Orengo, C.A. and Taylor, W.R. (1996) SSAP: sequential structure alignment program for protein structure comparison. Methods Enzymol., 266, 617–635.[Web of Science][Medline]

  33. Jones, D.T., Bryson, K., Coleman, A., McGuffin, L.J., Sadowski, M.I., Sodhi, J.S. and Ward, J.J. (2005) Prediction of novel and analogous folds using fragment assembly and fold recognition. Proteins, 61(Suppl. 7), 143–151.

  34. Laskowski, R.A., Rullmannn, J.A., MacArthur, M.W., Kaptein, R. and Thornton, J.M. (1996) AQUA and PROCHECK-NMR: programs for checking the quality of protein structures solved by NMR. J. Biomol. NMR, 8, 477–486.[Web of Science][Medline]

  35. Bryson, K., McGuffin, L.J., Marsden, R.L., Ward, J.J., Sodhi, J.S. and Jones, D.T. (2005) Protein structure prediction servers at University College London. Nucleic Acids Res., 33, W36–W38.[Abstract/Free Full Text]

  36. Pearl, F., Todd, A., Sillitoe, I., Dibley, M., Redfern, O., Lewis, T., Bennett, C., Marsden, R., Grant, A., Lee, D. et al. (2005) The CATH Domain Structure Database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis. Nucleic Acids Res., 33, D247–D251.[Abstract/Free Full Text]

  37. Andreeva, A., Howorth, D., Brenner, S.E., Hubbard, T.J., Chothia, C. and Murzin, A.G. (2004) SCOP database in 2004: refinements integrate structure and sequence family data. Nucleic Acids Res., 32, D226–D229.[Abstract/Free Full Text]

  38. Mulder, N.J., Apweiler, R., Attwood, T.K., Bairoch, A., Bateman, A., Binns, D., Bradley, P., Bork, P., Bucher, P., Cerutti, L. et al. (2005) InterPro, progress and status in 2005. Nucleic Acids Res., 33, D201–D205.[Abstract/Free Full Text]

  39. Finn, R.D., Mistry, J., Schuster-Bockler, B., Griffiths-Jones, S., Hollich, V., Lassmann, T., Moxon, S., Marshall, M., Khanna, A., Durbin, R. et al. (2006) Pfam: clans, web tools and services. Nucleic Acids Res., 34, D247–D251.[Abstract/Free Full Text]

  40. Letunic, I., Copley, R.R., Pils, B., Pinkert, S., Schultz, J. and Bork, P. (2006) SMART 5: domains in the context of genomes and networks. Nucleic Acids Res., 34, D257–D260.[Abstract/Free Full Text]

  41. Bateman, A., Coin, L., Durbin, R., Finn, R.D., Hollich, V., Griffiths-Jones, S., Khanna, A., Marshall, M., Moxon, S., Sonnhammer, E.L. et al. (2004) The Pfam protein families database. Nucleic Acids Res., 32, D138–D141.[Abstract/Free Full Text]

  42. Hulo, N., Bairoch, A., Bulliard, V., Cerutti, L., De Castro, E., Langendijk-Genevaux, P.S., Pagni, M. and Sigrist, C.J. (2006) The PROSITE database. Nucleic Acids Res., 34, D227–D230.[Abstract/Free Full Text]

  43. Servant, F., Bru, C., Carrere, S., Courcelle, E., Gouzy, J., Peyruc, D. and Kahn, D. (2002) ProDom: automated clustering of homologous domains. Brief. Bioinform., 3, 246–251.[Abstract/Free Full Text]

  44. Bru, C., Courcelle, E., Carrere, S., Beausse, Y., Dalmar, S. and Kahn, D. (2005) The ProDom database of protein domain families: more emphasis on 3D. Nucleic Acids Res., 33, D212–D215.[Abstract/Free Full Text]

  45. Attwood, T.K., Bradley, P., Flower, D.R., Gaulton, A., Maudling, N., Mitchell, A.L., Moulton, G., Nordle, A., Paine, K., Taylor, P. et al. (2003) PRINTS and its automatic supplement, prePRINTS. Nucleic Acids Res., 31, 400–402.[Abstract/Free Full Text]

  46. Yeats, C., Maibaum, M., Marsden, R., Dibley, M., Lee, D., Addou, S. and Orengo, C.A. (2006) Gene3D: modelling protein structure, function and evolution. Nucleic Acids Res., 34, D281–D284.[Abstract/Free Full Text]

  47. Thomas, P.D., Campbell, M.J., Kejariwal, A., Mi, H., Karlak, B., Daverman, R., Diemer, K., Muruganujan, A. and Narechania, A. (2003) PANTHER: a library of protein families and subfamilies indexed by function. Genome Res., 13, 2129–2141.[Abstract/Free Full Text]

  48. Jensen, L.J., Ussery, D.W. and Brunak, S. (2003) Functionality of system components: conservation of protein function in protein feature space. Genome Res., 13, 2444–2449.[Abstract/Free Full Text]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
G. A. Reeves, K. Eilbeck, M. Magrane, C. O'Donovan, L. Montecchi-Palazzi, M. A. Harris, S. Orchard, R. C. Jimenez, A. Prlic, T. J. P. Hubbard, et al.
The Protein Feature Ontology: a tool for the unification of protein feature annotations
Bioinformatics, December 1, 2008; 24(23): 2767 - 2772.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
F. M. McCarthy, S. M. Bridges, N. Wang, G. B. Magee, W. P. Williams, D. S. Luthe, and S. C. Burgess
AgBase: a unified resource for functional analysis in agriculture
Nucleic Acids Res., January 12, 2007; 35(suppl_1): D599 - D603.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Reeves, G. A.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Reeves, G. A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?