Human Molecular Genetics, Vol 6, 1735-1744, Copyright © 1997 by Oxford University Press
JM Claverie
Research into new methods to identify genes in anonymous genomic sequences
has been going on for more than 15 years. Over this period of time, the
field has evolved from the designing of programs to identify protein coding
regions in compact mitochondrial or bacterial genomes, to the challenge of
predicting the detailed organization of multi-exon vertebrate genes. The
best program currently available perfectly locates more than 80% of the
internal coding exons, and only 5% of the predictions do not overlap a real
exon. Given such accuracy, computational methods are indeed very useful;
however, they do not alleviate the need for experimental validation. If the
performances are satisfactory for the identification of the coding moiety
of genes (internal coding exons), the determination of the full extent of
the transcript (5' and 3' extremities of the gene) and the location of
promoter regions are still unreliable. As the human and mouse genome
sequencing projects enter a production mode, the fully automated annotation
of megabase-long anonymous genomic sequences is the next big challenge in
bioinformatics.
REVIEWS
Computational methods for the identification of genes in vertebrate genomic sequences
Structural and Genetic Information Laboratory, CNRS-EP.91, Marseille, France. jmc@igs.cnrs-mrs.fr
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
C. Xing, D. L. Bitzer, W. E. Alexander, M. A. Vouk, and A.-M. Stomp Identification of protein-coding sequences using the hybridization of 18S rRNA and mRNA during translation Nucleic Acids Res., February 1, 2009; 37(2): 591 - 601. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. K. Mimouni, R. B. Lyngso, S. Griffiths-Jones, and J. Hein An Analysis of Structural Influences on Selection in RNA Genes Mol. Biol. Evol., January 1, 2009; 26(1): 209 - 216. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Hanada, X. Zhang, J. O. Borevitz, W.-H. Li, and S.-H. Shiu A large number of novel coding small open reading frames in the intergenic regions of the Arabidopsis thaliana genome are transcribed and/or under purifying selection Genome Res., May 1, 2007; 17(5): 632 - 640. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Xie, S. Wu, K.-M. Lam, and H. Yan PromoterExplorer: an effective promoter identification method based on the AdaBoost algorithm Bioinformatics, November 15, 2006; 22(22): 2722 - 2728. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Florea, V. Di Francesco, J. Miller, R. Turner, A. Yao, M. Harris, B. Walenz, C. Mobarry, G. V. Merkulov, R. Charlab, et al. Gene and alternative splicing annotation with AIR Genome Res., January 1, 2005; 15(1): 54 - 66. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Issac and G. P. S. Raghava EGPred: Prediction of Eukaryotic Genes Using Ab Initio Methods After Combining With Sequence Similarity Approaches Genome Res., September 1, 2004; 14(9): 1756 - 1766. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Stanke, R. Steinkamp, S. Waack, and B. Morgenstern AUGUSTUS: a web server for gene finding in eukaryotes Nucleic Acids Res., July 1, 2004; 32(suppl_2): W309 - W312. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Maudling and T. K. Attwood FAN: fingerprint analysis of nucleotide sequences Nucleic Acids Res., July 1, 2004; 32(suppl_2): W620 - W623. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. E. Moore and J. A. Lake Gene structure prediction in syntenic DNA segments Nucleic Acids Res., December 15, 2003; 31(24): 7271 - 7279. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. S. Clark, Y. J.K. Edwards, D. Peterson, S. W. Clifton, A. J. Thompson, M. Sasaki, Y. Suzuki, K. Kikuchi, S. Watabe, K. Kawakami, et al. Fugu ESTs: New Resources for Transcription Analysis and Genome Annotation Genome Res., December 1, 2003; 13(12): 2747 - 2753. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Kotlar and Y. Lavner Gene Prediction by Spectral Rotation Measure: A New Method for Identifying Protein-Coding Regions Genome Res., August 1, 2003; 13(8): 1930 - 1937. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Zhang, V. Pavlovic, C. R Cantor, and S. Kasif Human-Mouse Gene Identification by Comparative Evidence Integration and Evolutionary Analysis Genome Res., June 1, 2003; 13(6): 1190 - 1202. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Parra, P. Agarwal, J. F. Abril, T. Wiehe, J. W. Fickett, and R. Guigo Comparative Gene Prediction in Human and Mouse Genome Res., January 1, 2003; 13(1): 108 - 117. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Mathe, M.-F. Sagot, T. Schiex, and P. Rouze Current methods of gene prediction, their strengths and weaknesses Nucleic Acids Res., October 1, 2002; 30(19): 4103 - 4117. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. S. Halfon and A. M. Michelson Exploring genetic regulatory networks in metazoan development: methods and models Physiol Genomics, September 3, 2002; 10(3): 131 - 143. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Tjaden, R. M. Saxena, S. Stolyar, D. R. Haynor, E. Kolker, and C. Rosenow Transcriptome analysis of Escherichia coli using high-density oligonucleotide probe arrays Nucleic Acids Res., September 1, 2002; 30(17): 3732 - 3738. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Heller Application of physiological genomics to the study of hearing disorders J. Physiol., August 15, 2002; 543(1): 3 - 12. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Shibuya and I. Rigoutsos Dictionary-driven prokaryotic gene finding Nucleic Acids Res., June 15, 2002; 30(12): 2710 - 2725. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. A. Camargo, H. P. B. Samaia, E. Dias-Neto, D. F. Simao, I. A. Migotto, M. R. S. Briones, F. F. Costa, M. Aparecida Nagai, S. Verjovski-Almeida, M. A. Zago, et al. From the Cover: The contribution of 700,000 ORF sequence tags to the definition of the human transcriptome PNAS, October 9, 2001; 98(21): 12103 - 12108. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Greenbaum, N. M. Luscombe, R. Jansen, J. Qian, and M. Gerstein Interrelating Different Types of Genomic Data, from Proteome to Secretome: 'Oming in on Function Genome Res., September 1, 2001; 11(9): 1463 - 1468. [Abstract] [Full Text] [PDF] |
||||
![]() |
R.-F. Yeh, L. P. Lim, and C. B. Burge Computational Inference of Homologous Gene Structures in the Human Genome Genome Res., May 1, 2001; 11(5): 803 - 816. [Abstract] [Full Text] |
||||
![]() |
S. Rogic, A. K. Mackworth, and F. B.F. Ouellette Evaluation of Gene-Finding Programs on Mammalian Sequences Genome Res., May 1, 2001; 11(5): 817 - 832. [Abstract] [Full Text] |
||||
![]() |
Z. Kan, E. C. Rouchka, W. R. Gish, and D. J. States Gene Structure Prediction and Alternative Splicing Analysis Using Genomically Aligned ESTs Genome Res., May 1, 2001; 11(5): 889 - 900. [Abstract] [Full Text] |
||||
![]() |
C. Gemund, C. Ramu, B. Altenberg-Greulich, and T. J. Gibson Gene2EST: a BLAST2 server for searching expressed sequence tag (EST) databases with eukaryotic gene-sized queries Nucleic Acids Res., March 15, 2001; 29(6): 1272 - 1277. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Guigó, P. Agarwal, J. F. Abril, M. Burset, and J. W. Fickett An Assessment of Gene Prediction Accuracy in Large DNA Sequences Genome Res., October 1, 2000; 10(10): 1631 - 1642. [Abstract] [Full Text] |
||||
![]() |
J.-M. Claverie From Bioinformatics to Computational Biology Genome Res., September 1, 2000; 10(9): 1277 - 1279. [Full Text] |
||||
![]() |
M. Hirosawa, K.-i. Ishikawa, T. Nagase, and O. Ohara Detection of Spurious Interruptions of Protein-Coding Regions in Cloned cDNA Sequences by GeneMark Analysis Genome Res., September 1, 2000; 10(9): 1333 - 1341. [Abstract] [Full Text] |
||||
![]() |
E. Beaudoing, S. Freier, J. R. Wyatt, J.-M. Claverie, and D. Gautheret Patterns of Variant Polyadenylation Signal Usage in Human Genes Genome Res., July 1, 2000; 10(7): 1001 - 1010. [Abstract] [Full Text] |
||||
![]() |
C.-H. Lai, C.-Y. Chou, L.-Y. Ch'ang, C.-S. Liu, and W.-c. Lin Identification of Novel Human Genes Evolutionarily Conserved in Caenorhabditis elegans by Comparative Proteomics Genome Res., May 1, 2000; 10(5): 703 - 713. [Abstract] [Full Text] |
||||
![]() |
G. D. Stormo Gene-Finding Approaches for Eukaryotes Genome Res., April 1, 2000; 10(4): 394 - 397. [Full Text] |
||||
![]() |
G. Parra, E. Blanco, and R. Guigó GeneID in Drosophila Genome Res., April 1, 2000; 10(4): 511 - 515. [Abstract] [Full Text] |
||||
![]() |
S. Schwartz, Z. Zhang, K. A. Frazer, A. Smit, C. Riemer, J. Bouck, R. Gibbs, R. Hardison, and W. Miller PipMaker---A Web Server for Aligning Two Genomic DNA Sequences Genome Res., April 1, 2000; 10(4): 577 - 586. [Abstract] [Full Text] |
||||
![]() |
J.-M. Claverie Computational methods for theidentification of differential and coordinated gene expression Hum. Mol. Genet., September 1, 1999; 8(10): 1821 - 1832. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Q. Zhang Large-Scale Gene Expression Data Analysis: A New Challenge to Computational Biologists Genome Res., August 1, 1999; 9(8): 681 - 688. [Abstract] [Full Text] |
||||
![]() |
J. Murray, J. Buard, D. L. Neil, E. Yeramian, K. Tamaki, C. Hollies, and A. J. Jeffreys Comparative Sequence Analysis of Human Minisatellites Showing Meiotic Repeat Instability Genome Res., February 1, 1999; 9(2): 130 - 136. [Abstract] [Full Text] |
||||
![]() |
S. Audic and J.-M. Claverie Self-identification of protein-coding regions in microbial genomes PNAS, August 18, 1998; 95(17): 10026 - 10031. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Q. Zhang Identification of Human Gene Core Promoters in Silico Genome Res., March 1, 1998; 8(3): 319 - 326. [Abstract] [Full Text] |
||||







