Human Molecular Genetics, 2002, Vol. 11, No. 5 535-546
© 2002 Oxford University Press
Different evolutionary processes shaped the mouse and human olfactory receptor gene families
Division of Human Biology, Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue N., C3-168, Seattle, WA 98109, USA
Received November 2, 2001; Revised and Accepted December 20, 2001.
DDBJ/EMBL/GenBank accession nos BH405737BH406512.
| ABSTRACT |
|---|
|
|
|---|
We report a comprehensive comparative analysis of human and mouse olfactory receptor (OR) genes. The OR family is the largest mammalian gene family known. We identify
93% of an estimated 1500 mouse ORs, exceeding previous estimates and the number of human ORs by 50%. Only 20% are pseudogenes, giving a functional OR repertoire in mice that is three times larger than that of human. The proteins encoded by intact human ORs are less highly conserved than those of mouse, in patterns that suggest that even some apparently intact human OR genes may encode non-functional proteins. Mouse ORs are clustered in 46 genomic locations, compared to a much more dispersed pattern in human. We find orthologous clusters at syntenic human locations for most mouse genes, indicating that most OR gene clusters predate primaterodent divergence. However, many recent local OR duplications in both genomes obscure one-to-one orthologous relationships, thereby complicating cross-species inferences about ORligand interactions. Local duplications are the major force shaping the gene family. Recent interchromosomal duplications of ORs have also occurred, but much more frequently in human than in mouse. In addition to clarifying the evolutionary forces shaping this gene family, our study provides the basis for functional studies of the transcriptional regulation and ligand-binding capabilities of the OR gene family. | INTRODUCTION |
|---|
|
|
|---|
Mammals are able to detect and discriminate thousands of different odors (1). This capability is important to find food, identify mates and offspring, and avoid danger. The first step in the complex pathway resulting in the sense of smell is the interaction of odorant molecules with olfactory receptors (ORs) in the nose. ORs are G-protein-coupled seven-transmembrane-domain proteins that can trigger a signaling cascade in sensory neurons (2). Recognition of diverse odorants is achieved by using an estimated 1000 OR genes distributed around the rodent (3) and human genomes (4). It is the largest mammalian gene family known, comprising 1/30 to 1/50 of all genes in the genome. However, the evolution, transcriptional regulation and odorant binding capabilities of the OR gene family are still poorly understood.
Over 900 human OR genes were identified recently in the almost complete human genome sequence (4). Approximately 350 of these genes are intact and appear to be functional (5). Most human OR genes are clustered in the genome in arrays that can contain over 100 genes (4). Human OR genes have been found at over 40 locations in the human genome by fluorescence in situ hybridization (FISH) (6,7) and at over 100 locations by sequence analysis (4). The human regions containing OR genes show a bias for chromosomal bands near telomeres and centromeres (6,7). The human OR gene family appears to be evolving quicklyaround half of human OR-containing genomic clones hybridize to more than one genomic location, indicating that large blocks of DNA containing these genes have duplicated recently (7). Some of these duplications are so recent that their copy number is polymorphic in the human population (8).
Individual humans vary in their ability to detect some odors, and some specific anosmias have been shown to be genetically determined (9,10). In no case has the molecular basis of variation or deficit in sensory perception been defined. Odorantreceptor relationships are not yet known for any human gene. Functional studies of human OR genes are hindered by the difficulties encountered in attempts to obtain live neurons from human donors (11) and to functionally express OR genes in heterologous cell lines (12). Studies in experimentally tractable model organisms, such as mouse, will be needed to determine the ligand-binding properties of OR genes and to understand how these genes are regulated. The identification of orthologous relationships between human and mouse OR genes will be key to translating data from mouse studies into an understanding of human olfaction. So far, the comparative analysis of only a few pairs of mouse and human orthologous clusters has been reported (1317).
The murine OR gene family is much less well characterized than the human OR family. In one study, 21 OR genes were found at 11 different genomic locations (18), and various other studies have identified additional loci (1921). Analysis of small samples of OR genes suggests that most mouse genes are functional, whereas a substantial fraction of OR genes in microsmatic species such as hominoids, old world monkeys and dolphins are pseudogenes (22,23). Genomic sequences at several mouse OR loci have been recently characterized (13,14,16,24), but these studies give only a limited picture of the gene family. Knowledge of the entire gene family will provide the basis for studies of transcriptional regulation and receptorligand interactions in the mouse.
The recent availability of a whole genome shotgun sequence of the mouse (Celera Genomics) has enabled us to assemble an almost complete catalog of OR genes. With this catalog, we describe the evolution of the OR genes and report striking differences between the human and mouse OR gene families with respect to pseudogene content, protein sequence conservation and mechanisms of duplication. The differences we observe between orthologous gene clusters give insights into the pressures and processes that have acted on this gene family during rodent and primate evolution.
| RESULTS |
|---|
|
|
|---|
Human OR genes are more dispersed in the genome than mouse ORs
In order to determine the genomic distribution of the mouse OR gene family, we identified and mapped OR-containing bacterial artificial chromosome (BAC) clones. BAC clones covering 1.6% of the mouse genome (2471 clones) were positive in a hybridization screen for OR genes using probes made by degenerate PCR. A subset of 94 BACs was subjected to secondary PCR tests; 94% of these clones were confirmed to contain OR genes. Thus, only a small proportion of clones may be false positives due to low-stringency hybridization conditions.
FISH of 272 of the hybridization- and PCR-positive clones (approximately 1.3x clone coverage of the OR subgenome) shows that there are at least 32 cytogenetically distinct OR-containing locations in the mouse genome (Fig. 1), although additional loci can be identified by sequence analysis (see below). Half of the 272 FISH-mapped clones were chosen randomly and the remainder were chosen to ensure good coverage of OR-containing genomic regions (Materials and Methods). End sequences were obtained for the FISH-mapped clones to allow integration of cytogenetic and sequence information and independently verify the map location of OR clusters found by sequence analysis (see below). The number of mouse OR locations found by FISH is fewer than the 42 locations previously observed by FISH in the human genome (7), despite the fact that far more mouse OR clones than human OR clones were analyzed. This result indicates that the human OR gene family is more dispersed than the mouse family. The distribution of locations is also grossly different; only 8/32 (25%) of mouse OR locations are in subtelomeric or pericentromeric bands, compared to 23/42 (55%) of human locations (7).
|
FISH analysis can also detect recent duplications of OR-containing blocks to cytogenetically distinct chromosomal locations that might not be apparent from sequence analysis. Our FISH results show that such interchromosomal duplication events are less frequent in the mouse genome than in the human genome. Only 4% (10/272) of the mouse OR-containing clones resulted in FISH signals at two or more genomic locations (Fig. 1), while around half of human OR-containing clones do so (7). Multiple hybridization signals indicate recent large duplication(s) involving at least some of the sequence contained in the clone.
The mouse genome contains approximately 1500 OR genes
By searching Celeras mouse genome assembly, we have identified 866 intact, full-length OR genes and 340 apparent pseudogenes. Partial sequence data are available for an additional 187 genes, making a total of 1393 OR sequences. Our database-searching strategy used 34 OR protein queries to search the mouse genome (Materials and Methods). Our original blast searches were sensitive enough to findand eliminate from further analysis95 sequences that matched a non-olfactory G protein-coupled receptor better than an OR. We are reasonably confident that our analysis is restricted to bona fide OR genes; the characteristic protein sequence motifs of this gene family are remarkably well conserved in the genes identified (see below). In addition, although intact sequences were not selected with a percent identity-based cutoff, all 866 intact genes share
40% amino acid identity with an annotated OR sequence from the public databases. To date, we have experimentally validated over 400 of the identified OR genes by sequencing cDNA clones derived from mouse olfactory neuroepithelium (J.Young, J.Ross, E.Williams, T.Newman, L.Tonnes-Priddy, R.Lane and B.Trask, manuscript in preparation).
Two subsets of the OR genes were analyzed further. The full-length dataset comprises the 1054 sequences of both genes and pseudogenes, but not sequences interrupted by repeats or by ends or gaps in scaffold sequences. The comprehensive dataset contains all 1468 OR sequences identified, including all partial sequences, some of which are redundant with one another (i.e. they represent short scaffold sequences, which should have assembled with other short scaffolds or into gaps in the larger scaffolds).
In order to assess the sensitivity of our method of database mining and to estimate the coverage and sequence error rate of the OR subgenome in the Celera assembly, a non-redundant set of all 155 previously identified mouse OR nucleotide sequences was downloaded from GenBank (25). Of these 155 sequences, 143 (93%) match a sequence in the comprehensive dataset with
98% identity over
200 bp. Therefore, we estimate that the complete mouse genome contains at least 1510 OR sequences (1393 ÷ 0.93).
Failure to find 12 Genbank OR sequences is not due to insensitivity in our OR gene-finding method. When these sequences were used to search the entire Celera mouse genome assembly, none of the 12 genes was present. One of the non-matching Genbank sequences (GenBank accession no. X89682) is mislabeled as a mouse sequence; it must be of human origin, since it exactly matches several human genomic sequences in Genbank.
The mouse OR gene family has 20% pseudogenes
Of the 340 apparent pseudogenes in the comprehensive dataset that are not interrupted by gaps in the sequence data, 134 (39%) are interrupted by interspersed repeat sequences, 27 are not interrupted by any recognizable repeat, but do not align to other ORs over their entire length, and the remaining 179 are full length, but contain one or more stop codons and/or frameshifting errors.
Based on Celera sequence, 28% of OR sequences appear to encode pseudogenes, but some of the full-length pseudogenes are likely to be intact genes with sequencing errors, since the Celera assembly is still in draft form. To estimate the rate of frameshifting sequencing errors, we compared Celera and Genbank sequences for the 123 Celera sequences (including 15 pseudogenes) that matched a Genbank OR with
99.5% identity. Six of these Celera sequences have one or more single base pair insertions or deletions in their coding region as compared to the Genbank sequence. In all six cases, the Celera sequence appears to be a pseudogene, and the Genbank sequence appears intact, strongly suggesting that the discrepancy is due to an error in the Celera sequence (we encountered nine frameshift errors in 93 kb sequence surveyed). Given this error rate, we estimate that approximately 70 of the apparent pseudogenes are actually intact, yielding approximately 940 intact genes and 250 pseudogene sequences, or a pseudogene fraction of 20%. If this 20% rate is applied to our whole-genome estimate, we extrapolate a total of approximately 1210 intact genes and 300 pseudogenes.
Mouse OR clusters map to 46 genomic locations
The 1468 OR genes in the comprehensive dataset derive from 243 of Celeras scaffold sequences, reflecting the clustered organization of these genes in the genome (see below). Scaffolds containing 1267 (86%) of these genes could be assigned to 46 genomic locations using two complementary methods (Fig. 1). End sequences of FISH-mapped BAC sequences were used to map 65 scaffold sequences; Celera used the genetic or radiation-hybrid map positions of markers in the sequence to map 75 scaffold sequences (details in legend to Fig. 1). The number of mouse OR locations is far fewer than the 104 OR-gene locations in the human genome (4). We have identified OR sequences in Celera scaffolds corresponding to every OR location detected by FISH with at least two clones. Sequence analysis uncovered an additional 12 OR-containing sites not detected by our FISH analysis of 272 BACs (see above). Missing locations are expected, since the clones analyzed by FISH represent only a 1.3-fold coverage of the OR subgenome. Of these 12 sites, eight contain only one OR gene.
OR genes are arranged in the mouse genome in clusters containing an average of 16 genes with average gene-to-gene spacing of 21 kb. We examined the spacing between all genes identified (comprehensive dataset) and found that the distance between neighboring genes on the same scaffold varies considerably, from 318 bp to >5 Mb, although 90% of distances are <40 kb. In the eight cases where gene-to-gene distance is >0.5 Mb, it appears that two distinct clusters are present in the same scaffold sequence (the spacing between the two clusters is much more than the average spacing within the clusters). Using 0.5 Mb as the cutoff distance for distinguishing OR clusters, the average gene-to-gene spacing within clusters is 21 kb, but is highly variable (SD = 26 kb). Gene-to-gene distances may partly reflect the requirement for space upstream of genes for 5' untranslated exons and transcriptional control regions. Only 10/40 (25%) of genes with another OR gene <5 kb upstream have full length sequence available and are apparently intact, compared to 59% for all genes, suggesting that genes without these upstream sequences degenerate into pseudogenes.
To estimate cluster size, we considered only OR sequences on Celera scaffolds spanning over 1 Mb, as these are more likely to contain complete OR clusters than are shorter scaffolds. There are 72 such clusters containing a total of 1164 genes in the comprehensive dataset. Of these clusters, 20 contain only one gene (1.7% of genes), but most genes (1018 or 87%) are in clusters of 10 or more genes. In contrast, 50 human genes are in singleton clusters (4), reflecting the greater genomic dispersion of the human gene family. Our ability to determine cluster size is limited by gaps between Celera scaffold sequences; it will therefore be an underestimate of true cluster size. With this caveat, cluster size ranges from 1 to 98 genes (mean = 16) and is highly variable (SD = 22). The physical size of clusters is also very variable and ranges from 910 bp (one gene) to
2 Mb (mean = 340 kb; SD = 470 kb).
OR proteins are less conserved in human than mouse
Alignment of protein translations of the 866 intact mouse OR genes reveals highly conserved motifs in some regions of transmembrane domains (TM) 2, 3, 6 and 7, as well as at several extracellular cysteine residues and some other small motifs (e.g. S-Y in TM5). Other positions in the protein are highly variable. Three positions are absolutely conserved in all 866 mouse sequences, and there are 16 positions where
99% of proteins have the same amino acid. These positions are less conserved in the 347 intact human OR genes reported by Zozulya et al. (5). There are no absolutely conserved positions in the human proteins, and only two positions where
99% of proteins have the same amino acid. Many other positions show lower conservation in the human proteins than in mouse. For example, an arginine residue is found in the conserved MAYDRYVAIC motif (TM3) in 98% of mouse sequences, but only 89% of human sequences (Fig. 2). Sequence conservation can also be measured using information theory, where the information content of each position in the sequence is scored on the basis of the distribution of amino acids present (26), with conserved positions scoring more than variable positions. The total information content of the mouse and human proteins are 668.6 0.2 bits and 645.6 1.1 bits, respectively, confirming that the human ORs are less conserved than the mouse family.
|
Recent tandem events have shaped the OR gene family
An alignment of all human and mouse OR genes shows that genes near each other in the genome are often very similar in sequence, implying that tandem events (duplications and/or gene conversions) are the major evolutionary force shaping the diversity of this gene family (Fig. 3). For 823 (78%) of the 1054 mouse genes in the full-length dataset, the closest mouse relative resides in the same genomic cluster. In the human genome, tandem events are also the major force, with 484 (73%) of 665 full-length genes related most closely to another gene in the same cluster. These tandem duplications are also evident on a phylogenetic tree (Fig. 4).
|
|
However, a subset of human OR genes has recently duplicated interchromosomally, resulting in highly similar genes in distant genomic locations (Fig. 3B, arrow). There are 203 pairs of human genes that share >90% amino acid identity. Of these 203 very similar gene pairs, 120 (59%), involving 79 genes, map to different genomic clusters, indicating recent interchromosomal duplications (and/or tandem duplications followed by gross chromosomal rearrangements). In contrast, in mouse only 33/207 (16%) of the gene pairs of >90% identity are in different clusters, indicating that most recent duplications in mouse are local in nature. Of the 79 dispersed human genes, 40 are members of the very large OR7E subfamily of pseudogenes, which has recently scattered to at least 35 places around the genome (4). Many of the other dispersed genes are in subfamilies OR4F and OR4G, representing OR genes in a multicopy subtelomeric sequence block (27) or in subfamily OR4K, representing genes found near the centromeres of several, predominantly acrocentric, chromosomes (see below and Fig. 5).
|
On a smaller scale, the percent identity of sequence pairs is weakly and inversely correlated to their physical separation (Fig. 3C). Within clusters, neighboring genes are also often in the same transcriptional orientation. Of the 1225 neighbor pairs in our comprehensive dataset, 850 (69%) are in the same orientation. This percentage is significantly more than the 50% expected if assortment was random and indicates that the tandem duplications are not generally associated with inversions.
Mouse locations are syntenic to human OR loci
Phylogenetic trees constructed using all full-length mouse and human OR sequences show that most major clades contain both mouse and human sequences (Fig. 6). This pattern suggests that most OR subfamilies were present in the common ancestor. We could identify orthologous locations in the human genome for 27 of the mouse OR-containing genomic locations, which together contain 1170 (92%) of the 1267 mapped OR genes (Fig. 5). Thus, most OR clusters were present when the primate and rodent lineages diverged and still exist now. The chromosomal locations of most pairs of orthologous clusters correspond to known syntenic blocks (http://www.ncbi.nlm.nih.gov/Homology/index.html) and include several previously described orthologous OR clusters (1317). Figure 4 illustrates that mouse and human genes from two pairs of locations that we identify as orthologous indeed belong, with few exceptions, to the same major phylogenetic clades. Some matches between mouse and human gene clusters (Fig. 5, labeled in red) are not part of known syntenic relationships. In human, three groups of such clusters represent the genes subject to interchromosomal duplications (see above), both interstitially and in subtelomeric and pericentromeric regions. In mouse, two groups of genes appear to have spread to multiple chromosomes, although not near telomeres or centromeres.
|
Of the 1054 mouse genes in the full-length dataset, 836 (79%) have a human match of
70% nucleotide identity over
200 bp, indicating that a potential ortholog can be found. However, several genes may share the same ortholog (below and Fig. 5C) due to the expansion of many clusters by local duplications in both the mouse and human genomes. This phenomenon is particularly common in mouse, and two striking examples are shown in Figure 5A. A more detailed examination of mousehuman orthologous relationships (e.g. Fig. 5C) reveals many changes in both species since the primate and rodent lineages diverged, with the result that few genes have a single ortholog. Most mouse genes in the full-length dataset (809/1054 or 77%) have a closer relative in mouse than in human. Similarly, most human genes (548/906 or 60%) have a closer relative in human than in mouse. This observation is supported by a phylogenetic tree (Fig. 6), which shows many groups of mouse or human sequences that are more similar to one another than to any sequence from the other species. These species-specific sequence groups could arise from duplication and/or gene conversion since primaterodent divergence or from loss of the orthologous gene(s) (loss from the genome or because the datasets are incomplete). Most of the 809 mousemouse best matches arose since primaterodent divergence, since their level of identity is greater than is typical for orthologous genes of this family (Fig. 7).
|
| DISCUSSION |
|---|
|
|
|---|
We report here the results of a comprehensive analysis of orthologous and syntenic relationships of mouse and human OR genes. This analysis reveals striking differences in the size, functional constraints and evolutionary processes that have acted on the OR family since primaterodent divergence.
The availability of an almost complete genome sequence has allowed us to identify the sequences of approximately 1400 mouse OR genes. From this number, we estimate that there are at least 1500 OR genes in the mouse genome. This is
50% more than previously predicted (3), and ~50% more than is found in the human genome (4).
Frequent local duplications of OR genes in the rodent lineage are responsible for much of the size difference between the mouse and human OR families. These events are apparent on phylogenetic trees (Figs 4 and 6) and by comparison of the maps of the two OR subgenomes (Fig. 5). Several instances of new mouse OR genes created by local duplications were also noted in previous studies that compared clusters in the mouse and human genomes (1315,17). Deletions of OR genes in the human lineage (14) have further exacerbated the differences between the two families. However, genes from both species persist in most major phylogenetic clades. In addition, murine clusters containing 92% of mapped OR genes have orthologous OR loci in the human genome (Fig. 5), showing that the gross arrangement of these gene clusters was established before primaterodent divergence.
Despite their greater number, mouse OR genes are found at markedly fewer locations than human ORs (46 versus 104) (4). Whereas most recent activity in both mouse and human OR families has involved local duplication and/or gene conversion events, a subset of human OR genes has undergone recent duplication to distant locations in the genome, accounting for most of the increased genomic dispersion of the human OR gene family. These interchromosomal duplications often involve large blocks of sequence and are apparent by both our comparative sequence analyses and FISH (7,8). We observe three groups of human OR genes involved in interchromosomal duplications (Fig. 5): the OR7E genes (4), the OR4F and OR4G genes located in subtelomeric regions (8,27) and a previously undescribed group, the OR4K genes, in the pericentromeric regions of several, predominantly acrocentric, chromosomes. Duplication of the OR7E genes to at least 35 locations is especially surprising given that they all are pseudogenes (4); their duplication cannot be driven by selective pressure to increase the functional OR gene repertoire. In contrast, some of the subtelomeric and pericentromeric OR genes appear to be intact, and at least one is transcribed (28). The unusual evolutionary dynamics of these regions (27,29) may contribute to functional diversity in the human OR gene family.
The recent changes in the human and mouse OR families are reminiscent of the pattern of evolutionary changes observed for nematode chemosensory receptor genes (30,31) and other gene families for which sequence diversity is important, such as the eosinophil-associated RNase (32), MHC and immunoglobulin gene families (33). The evolution of these gene families is consistent with the gene birth-and-death model, where new gene family members have arisen by gene duplication, followed by divergence and maintenance of some duplicate genes, and deletion or accumulation of mutations in other genes (33). In this model, the balance between rates of duplication and loss determines the size and pseudogene content of the gene family. Both the mouse and human families show a high rate of gene birth, as evidenced by the fact that over half of all genes in both species match another gene within the same genome better than one in the genome of the other species. Both species are losing genes, although far fewer pseudogenes are found in mouse than in human.
Approximately 20% of mouse OR genes are pseudogenes. Our estimate of pseudogene content is higher than was estimated previously from analysis of 33 genes (22). However, our sequence data-mining strategy enabled us to identify 134 pseudogenes (50% of the total) that are interrupted by interspersed repeat sequences. These pseudogenes would not have been amplified under the degenerate primer-based strategy used previously. The human OR gene family contains a much greater proportion of pseudogenes (63%) (4). This marked difference suggests a greater selective pressure in mouse to maintain a large functional OR repertoire, but may also be partly due to a faster elimination of pseudogenes from the mouse than the human genome (34).
The differences in family size and pseudogene fraction mean that the functional OR repertoire of mouse is more than three times larger than that of human (1180 versus 350 intact genes). The smaller repertoire is consistent with the observation that humans have a poor sense of smell compared to other mammals (1). Humans decreased dependence on olfaction for survival, compensated for by an increased reliance on vision and hearing, would result in lower selective pressure to maintain or expand olfactory capabilities.
Our analysis of proteins encoded by intact mouse OR genes shows that they are more conserved than human ORs. Two possible explanations for the increased diversity in the human family are (i) positive selection to provide a diverse repertoire of odorant binding receptors and (ii) lower selective constraints on protein sequence in the human OR family. If the increased diversity was due to positive selection, one would expect most of the increased diversity to be in the variable regions thought to be important in ligand binding. However, we observe that the highly conserved parts of the protein thought to be important for functions common to all ORs are less conserved in the human family than the mouse family (e.g. Fig. 2). Loss of these conserved residues suggests that some apparently intact human ORs may not encode functional proteins.
Although a single clear ortholog can be identified for some OR genes (13,16), most genes have more than one ortholog due to numerous changes in the mouse and human families since rodentprimate divergence. Local and interchromosomal duplications and/or gene-conversion events obscure many of the relationships between human and mouse genes that were once functional orthologs. Odorant ligands have been identified for a small number of rodent OR genes (35,36), and correlation with OR protein sequence could clarify structurefunction relationships. Our analyses show that caution should be exercised when inferring receptorligand relationships across species, especially since even slight changes in receptor sequence can change the ligand that elicits the largest response (36).
OR genes are subject to a remarkable, but as yet undiscovered transcriptional control mechanism. Each OR gene is expressed in only one of four physical zones of the olfactory epithelium (37), and each olfactory neuron within a zone expresses only one allele (38) of a single OR gene (35,40). It is not known whether OR genes must be clustered in the genome for correct expression, or whether this arrangement exists simply because the gene family has expanded by tandem duplications. Control of expression may operate at the level of individual genes (via transcription factors or recombination), at the level of OR gene clusters (via a locus control mechanism or regulation of chromatin structure) or by stochastic mechanisms (37,38). The genomic context of expressed OR genes, as well as comparisons between the orthologous and paralogous OR genes and clusters identified by our study, will help elucidate these transcriptional control mechanism(s). Our comparative analysis of the mouse and human OR gene families will be useful in the study of this and other functional and evolutionary aspects of mammalian olfaction.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Identification of BACs containing OR genes
Clones containing OR genes were identified by low-stringency hybridization with a probe generated by degenerate PCR of genomic DNA. Degenerate primers used for PCR matched conserved regions (transmembrane domains 2, 3, 6 and 7) of the gene family. One novel primer, TM3deg1 (5'-CAIA(C/T)IGCIAC(G/A)(A/T)AIC(T/G)(G/A)TC(G/A)TA-3') was designed and other primer sequences were as described previously: OR5B, OR3B (41); P24, P28 (37); and P26 and P27 (35). Various primer combinations (OR5B/OR3B and P24/P28 with annealing temperature 40°C; TM3deg1/P28 and P26/P27 at 45°C) were used to amplify segments of mouse genomic DNA, as no single set of primers was expected to identify all OR genes. These low annealing temperatures were empirically determined to generate OR-specific probes when tested on Southern blots of BACs with known OR gene content. An initial denaturing step of 94°C for 5 min was followed by 35 cycles of 94°C for 30 s, annealing at temperatures stated above for 1 min, and extension at 72°C for 1 min, with a final extension at 72°C for 10 min. PCR products were labeled by inclusion of digoxigenin-11-dUTP (Roche Molecular Biochemicals) in the reaction and hybridized to nylon filters on which 158 900 recombinant clones (an estimated 11.2-fold genome coverage) from a mouse BAC library (RPCI-23) had been arrayed (42). A probe generated from human genomic DNA using the same PCR method was hybridized to filters containing 109 657 clones (6.7x coverage) from a human BAC library (RPCI-11) (43). This strategy anticipates that some OR genes may not be sufficiently similar to the rest of the gene family for both PCR primers to bind, but would be similar enough in the intervening sequence to be detectable by low-stringency hybridization. Filters were therefore hybridized and washed at low stringency (hybridization at 30°C in 5x SSC with 50% formamide; final wash in 2x SSC, 0.1% SDS at 65°C), and detected using chemiluminescence according to protocols recommended by Roche. To ensure high sensitivity, we chose BACs with both strong and weak hybridization signals. Recognizing that this approach could give false positives, we used PCR to confirm the presence of OR genes in 94 of the BACs, testing them with seven degenerate primer combinations (OR5B/OR3B, P24/P28, P24/OR3B, TM3deg1/OR3B, TM3deg1/P28, OR5B/P28, P26/P27). We required that at least one primer pair give a product of the expected size. We obtained end sequences from OR-containing BAC clones as described previously (44) and from a publicly available resource (45).
Chromosomal localization of BAC clones by FISH
BACs were hybridized to mouse mitotic cells fixed to slides using procedures for FISH detailed elsewhere (46). Briefly, mouse metaphase spreads were prepared from spleen cell suspensions after lysis of red blood cells. Splenocytes were cultured for 48 h in lipopolysaccharide to stimulate cell cycling, arrested in mitosis by incubation in colcemid for 10 min prior to harvest, and fixed to glass slides using conventional cytogenetic methods. All BACs were streaked to obtain single colonies, and DNA was prepared using an Autogen 740 robot. BAC DNA was biotinylated by nick translation, and 200 ng was hybridized to chromosomes at 37°C in 50% formamide/2x SSC/10% dextran sulfate in the presence of 10 µg mouse Cot1 DNA, which suppresses labeling of interspersed repetitive elements. After washing in 50% formamide/2x SSC at 42°C, hybridization sites were labeled with avidin-FITC, the cells were washed, and they were then counterstained with DAPI applied in an antifade solution. Images were collected using a Zeiss Axiophot microscope equipped with ChromaTechnology spectral filters, a Photometrics Quantix cooled CCD camera, and IpLab Spectrum software. If a clone gave multiple FISH signals, BACs were streaked to obtain single colonies a second time, in order to exclude the possibility that the multiplicity of signals was due to mixed clones in the probe. From the 2471 OR-positive BACs, 130 BACs were chosen randomly and additional BACs were chosen based on their position in the BC Cancer Agency Genome Sequence Centers (BCGSC) physical map (http://www.bcgsc.bc.ca/). We FISH-mapped clones from any contigs that contained at least two OR hybridization-positive BAC clones, but did not contain any of the 130 randomly chosen clones. We also used the BCGSC contigs to choose BACs overlapping clones that gave multiple FISH signals, and at chromosomal locations where only one randomly chosen BAC had been mapped. When determining the number of OR-containing genomic loci, we counted only locations confirmed by having signals from at least two OR-containing clones.
Sequence database mining
A local database of OR protein sequences was compiled by downloading from GenBank any sequences annotated with the keywords olfactory receptor or odorant receptor. Some lamprey OR sequences were removed, because their closest mammalian homologs were serotonin rather than ORs (47). Other non-OR proteins were used as outgroups, including taste receptors, vomeronasal receptors, adrenergic receptors, melanocortin receptors and serotonin receptors. A similar set of mouse OR nucleotide sequences was downloaded from GenBank (234 sequences). This set was reduced to a non-redundant set of 155 sequences by taking only one representative of groups of sequences showing
97% sequence identity.
Celeras mouse genome assembly (http://www.celera.com/) was built from shotgun sequence reads representing a 5.25-fold coverage of the genome. At the time of our analysis (June 2001) it consisted of 19 778 scaffold sequences: sequences within which the order and orientation of the sequence should be correct, but containing gaps whose size can be estimated with reasonable accuracy. Scaffold sequences were searched using gapped tblastn (48) with 34 previously identified OR protein sequence queries, chosen based on phylogenetic diversity as assessed by preliminary sequence analysis (L.Linardopoulou and R.Lane, unpublished data) and by human OR gene classification (49), using one non-pseudogene member of each human OR family where possible. The query set consisted of HORDE genes OR1D5, OR2F1, OR3A3, OR4F5, OR5M8, OR6T1, OR7D2, OR8H2, OR9A4, OR10J1, OR11A2, OR12D3, OR13C5, OR51H1, OR52A1, OR55C1P and OR56A1 (4), genes from the mouse P2 cluster, I7, M50, B1, B2, B5, P2, P3 and P4 (14), four subtelomeric human OR genes, OR-7501A, OR-7501B and OR-7501C (8) and OR4F3, as well as miscellaneous other genes; C3 (36), HSHTPRH06 (50), K18 (37), OR11-8c (51) and an anonymous gene with GenPept accession no. AAC18915. Perl scripts were written to identify all genomic locations in scaffolds where an E score
105 was obtained with any of the query sequences and to extract these sequences with 1 kb of additional sequence on each side.
We used a modification of the method of Glusman et al. (4) to predict the OR protein sequence of each gene. Each potential OR gene and its flanking sequence was first screened for repeats using RepeatMasker (http://ftp.genome.washington.edu/RM/RepeatMasker.html) and the Repbase database (52). Sequences were then compared to a local database of full-length OR protein sequences using fastx33 (53) to identify the best reading frame, allowing for frameshifts in the sequence. Sequences were then extended outward codon by codon to try to find suitable start and end codons.
After identification of potential OR genes (1986 sequences) and prediction of protein sequence, the following filters were applied to the data. (i) Pairs of OR gene fragments close in the genome and interrupted by repeat sequences, but appearing to be two halves of the same gene, were combined into one sequence (14 pairs). (ii) Sequences matching non-OR G-protein-coupled receptors better than ORs were eliminated (95 sequences). (iii) Sequences whose original blast hit was very weak and did not match OR genes by fastx33 comparison were eliminated (46 sequences). (iv) When only partial data was available (e.g. OR genes abutting an end or gap in Celeras scaffold sequence), we eliminated sequences with a match of
97% nucleotide identity over
200 bp to a sequence in the full-length data set (Results), reasoning that these sequences come from the same gene but, for some reason, were not properly assembled (184 sequences). (v) Apparent pseudogenes were required to match another OR gene (a previously identified OR or one of the 866 intact OR genes identified here) with
40% amino acid identity over 100 residues or
50% identity if between 25 and 99 amino acids. These criteria were chosen because all of the intact, full-length OR sequences we found matched a previously identified OR with
40% identity. These criteria are similar to those used by Glusman et al. (4) in their evaluation of the human OR family. Filter V resulted in the elimination of 178 sequences. Additional redundancy among the 262 partial sequences was eliminated by taking only one representative of each group of sequences sharing
97% sequence identity, leaving 187 unique sequences.
Two human datasets were analyzed. The sequences presented by Zozulya et al. (5) were used to determine the degree of conservation of intact human OR proteins. Sequences were obtained from HORDE (http://bioinformatics.weizmann.ac.il/HORDE/) for other analyses.
Chromosomal localization of sequences
Two sources of chromosomal localization information were available for the OR-containing scaffold sequences. FISH-mapped BACs were cross-referenced to the scaffold sequences when one or both of their end sequences matched the scaffold sequence with
95% sequence identity over three-quarters of their length. Unmasked end sequences were used for one-third of the sequences when less than 50 bp of unique sequence remained after repeat-masking. Matches were rejected if more than one genomic region matched the BAC end sequence at this level of identity. These mapping data were supplemented by chromosomal localization data made available by Celera. Celera has mapped scaffolds based on the linkage or radiation-hybrid map position of any sequence tagged sites matching scaffold sequences. For the small number of unmapped OR-containing scaffolds, we used matching end sequences to choose an additional 36 BACs to FISH, and thus localized another 17 scaffolds. Remaining unmapped scaffolds were small and had no matching BAC end sequences in GenBank (July 2001) or in our own BAC end sequence database.
Although HORDE supplies a chromosomal location for many human OR sequences, we updated and refined these positions by comparing each to the December 12, 2000 version of the UCSC genome assembly (http://genome.ucsc.edu/). We required
99% nucleotide match over
50 bp to assign a map position.
Sequence analysis
An initial sequence alignment was obtained using CLUSTALW (54) and edited by hand. PAUP v4.0b6 (Version 4, Sinauer Associates, Sunderland, MA) was used to determine protein divergences and to generate a phylogenetic tree using the neighbor-joining method (gaps of more than one amino acid in size were coded as one gap plus missing data for the rest of the positions). Tree branches were colored using a custom perl script. Alignments of all mouse genes with the 906 human OR genes identified by Glusman et al. (4) showed that 23 of the human sequences were not alignable to OR protein sequencesin fact when they were used to search public sequence databases, OR genes were not among the best matching sequences. These sequences were therefore eliminated from subsequent analysis, along with any sequences that Glusman et al. (4) were unable to classify into an OR family according to their system of nomenclature. One sequence in the human data set (OR1E7) is of mouse origin and was also eliminated. OR1E7 exactly matches mouse sequences in both the public and Celera databases over its entire length. Six mouse sequences were not easily alignable and were removed. Gapped blast v2.2.1 (48) was used for other large-scale comparisons. We chose 41 mousehuman orthologous gene pairs conservatively; we used only gene pairs where neither gene was a pseudogene, and where there appear to have been no duplications in either species since primaterodent divergence (i.e. the mouse gene was the best matching mouse gene of only one human gene, and this human gene was the best match of the same mouse gene and no other). The information content of protein sequence alignments was determined using alpro (26). For information content analysis, mouse and human alignments contained only equivalent residues; positions where most sequences had a gap in only one species were disregarded.
We developed a custom database using acedb (http://www.acedb.org/) and used it to store and cross-reference information about clones and sequences. Our website (http://www.fhcrc.org/labs/trask/OR) provides a database where mapping information and orthologous relationships can be queried. Under the terms of our agreement with Celera Genomics, we are able to provide on our website only the sequences of the 445 genes described in this paper that we have confirmed experimentally by isolating and sequencing cDNA clones (J.Young, J.Ross, E.Williams, T.Newman, L.Tonnes-Priddy, R.Lane and B.Trask, manuscript in preparation). Additional gene sequences will be released as we find more matching cDNAs, and genes in our database will be linked to any publicly available matching sequences. Mouse BAC-end sequences generated from OR-positive BACs have Genbank accession nos BH405737BH406512.
| ACKNOWLEDGEMENTS |
|---|
We thank Bob Lane, Tera Newman and Ger van den Engh for helpful discussions, Greg Mahairas and Steve Swartzell of the Institute for Systems Biology for BAC end sequencing, and Martha Ogilvie of Celera for assistance with batch BLAST searches. The data in this paper were generated in part through use of the Celera Discovery SystemTM and Celera Genomics associated databases. This work was supported by NIH grant R01 DC04209.
| FOOTNOTES |
|---|
+ To whom correspondence should be addressed. Tel: +1 206 667 1470; Fax: +1 206 667 4023; Email: btrask@fhcrc.org Present address: Lori Tonnes-Priddy, Epigenomics Inc., 1000 Seneca Street, Suite 300, Seattle, WA 98101, USA
| REFERENCES |
|---|
|
|
|---|
1 Stoddart,D.M. (1980) The Ecology of Vertebrate Olfaction. Chapman and Hall, London and New York.
2 Buck,L. and Axel,R. (1991) A novel multigene family may encode odorant receptors: a molecular basis for odor recognition. Cell, 65, 175187.[Web of Science][Medline]
3 Buck,L.B. (1992) The olfactory multigene family. Curr. Opin. Genet. Dev., 2, 467473.[Medline]
4
Glusman,G., Yanai,I., Rubin,I. and Lancet,D. (2001) The complete human olfactory subgenome. Genome Res., 11, 685702.
5 Zozulya,S., Echeverri,F. and Nguyen,T. (2001) The human olfactory receptor repertoire. Genome Biol., 2, 112.
6 Rouquier,S., Taviaux,S., Trask,B.J., Brand-Arpon,V., van den Engh,G., Demaille,J. and Giorgi,D. (1998) Distribution of olfactory receptor genes in the human genome. Nat. Genet., 18, 243250.[Web of Science][Medline]
7
Trask,B.J., Massa,H., Brand-Arpon,V., Chan,K., Friedman,C., Nguyen,O.T., Eichler,E., van den Engh,G., Rouquier,S., Shizuya,H. et al. (1998) Large multi-chromosomal duplications encompass many members of the olfactory receptor gene family in the human genome. Hum. Mol. Genet., 7, 20072020.
8
Trask,B.J., Friedman,C., Martin-Gallardo,A., Rowen,L., Akinbami,C., Blankenship,J., Collins,C., Giorgi,D., Iadonato,S., Johnson,F. et al. (1998) Members of the olfactory receptor gene family are contained in large blocks of DNA duplicated polymorphically near the ends of human chromosomes. Hum. Mol. Genet., 7, 1326.
9
Wysocki,C.J. and Beauchamp,G.K. (1984) Ability to smell androstenone is genetically determined. Proc. Natl Acad. Sci. USA, 81, 48994902.
10 Whissell-Buechy,D. and Amoore,J.E. (1973) Odour-blindness to musk: simple recessive inheritance. Nature, 242, 271273.[Medline]
11 Sosinsky,A., Glusman,G. and Lancet,D. (2000) The genomic structure of human olfactory receptor genes. Genomics, 70, 4961.[Web of Science][Medline]
12 Mombaerts,P. (1999) Molecular biology of odorant receptors in vertebrates. Annu. Rev. Neurosci., 22, 487509.[Web of Science][Medline]
13
Bulger,M., Bender,M.A., van Doorninck,J.H., Wertman,B., Farrell,C.M., Felsenfeld,G., Groudine,M. and Hardison,R. (2000) Comparative structural and functional analysis of the olfactory receptor genes flanking the human and mouse ß-globin gene clusters. Proc. Natl Acad. Sci. USA, 97, 1456014565.
14
Lane,R.P., Cutforth,T., Young,J., Athanasiou,M., Friedman,C., Rowen,L., Evans,G., Axel,R., Hood,L. and Trask,B.J. (2001) Genomic analysis of orthologous mouse and human olfactory receptor loci. Proc. Natl Acad. Sci. USA, 98, 73907395.
15 Lapidot,M., Pilpel,Y., Gilad,Y., Falcovitz,A., Sharon,D., Haaf,T. and Lancet,D. (2001) Mouse-human orthology relationships in an olfactory receptor gene cluster. Genomics, 71, 296306.[Web of Science][Medline]
16
Younger,R.M., Amadou,C., Bethel,G., Ehlers,A., Lindahl,K.F., Forbes,S., Horton,R., Milne,S., Mungall,A.J., Trowsdale,J. et al. (2001) Characterization of clustered MHC-linked olfactory receptor genes in human and mouse. Genome Res., 11, 519530.
17
Dehal,P., Predki,P., Olsen,A.S., Kobayashi,A., Folta,P., Lucas,S., Land,M., Terry,A., Ecale Zhou,C.L., Rash,S. et al. (2001) Human chromosome 19 and related regions in mouse: conservative and lineage-specific evolution. Science, 293, 104111.
18
Sullivan,S.L., Adamson,M.C., Ressler,K.J., Kozak,C.A. and Buck,L.B. (1996) The chromosomal distribution of mouse odorant receptor genes. Proc. Natl Acad. Sci. USA, 93, 884888.
19 Fan,W., Liu,Y.C., Parimoo,S. and Weissman,S.M. (1995) Olfactory receptor-like genes are located in the human major histocompatibility complex. Genomics, 27, 119123.[Web of Science][Medline]
20 Carver,E.A., Issel-Tarver,L., Rine,J., Olsen,A.S. and Stubbs,L. (1998) Location of mouse and human genes corresponding to conserved canine olfactory receptor gene subfamilies. Mamm. Genome, 9, 349354.[Web of Science][Medline]
21 Strotmann,J., Hoppe,R., Conzelmann,S., Feinstein,P., Mombaerts,P. and Breer,H. (1999) Small subfamily of olfactory receptor genes: structural features, expression pattern and genomic organization. Gene, 236, 281291.[Web of Science][Medline]
22
Rouquier,S., Blancher,A. and Giorgi,D. (2000) The olfactory receptor gene repertoire in primates and mouse: evidence for reduction of the functional fraction in primates. Proc. Natl Acad. Sci. USA, 97, 28702874.
23 Freitag,J., Ludwig,G., Andreini,I., Rossler,P. and Breer,H. (1998) Olfactory receptors in aquatic and terrestrial vertebrates. J. Comp. Physiol. [A], 183, 635650.[Web of Science][Medline]
24 Hoppe,R., Weimer,M., Beck,A., Breer,H. and Strotmann,J. (2000) Sequence analyses of the olfactory receptor gene cluster mOR37 on mouse chromosome 4. Genomics, 66, 284295.[Web of Science][Medline]
25
Benson,D.A., Karsch-Mizrachi,I., Lipman,D.J., Ostell,J., Rapp,B.A. and Wheeler,D.L. (2000) GenBank. Nucleic Acids Res., 28, 1518.
26
Schneider,T.D. and Stephens,R.M. (1990) Sequence logos: a new way to display consensus sequences. Nucleic Acids Res., 18, 60976100.
27
Mefford,H.C., Linardopoulou,E., Coil,D., van den Engh,G. and Trask,B.J. (2001) Comparative sequencing of a multicopy subtelomeric region containing olfactory receptor genes reveals multiple interactions between non-homologous chromosomes. Hum. Mol. Genet., 10, 23632372.
28
Linardopoulou,E., Mefford,H.C., Nguyen,O., Friedman,C., van den Engh,G., Farwell,D.G., Coltrera,M. and Trask,B.J. (2001) Transcriptional activity of multiple copies of a subtelomerically located olfactory receptor gene that is polymorphic in number and location. Hum. Mol. Genet., 10, 23732383.
29
Bailey,J.A., Yavor,A.M., Massa,H.F., Trask,B.J. and Eichler,E.E. (2001) Segmental duplications: organization and impact within the current human genome project assembly. Genome Res., 11, 10051017.
30
Robertson,H.M. (1998) Two large families of chemoreceptor genes in the nematodes Caenorhabditis elegans and Caenorhabditis briggsae reveal extensive gene duplication, diversification, movement, and intron loss. Genome Res., 8, 449463.
31
Robertson,H.M. (2000) The large srh family of chemoreceptor genes in Caenorhabditis nematodes reveals processes of genome evolution involving large duplications and deletions and intron gains and losses. Genome Res., 10, 192203.
32
Zhang,J., Dyer,K.D. and Rosenberg,H.F. (2000) Evolution of the rodent eosinophil-associated RNase gene family by rapid gene sorting and positive selection. Proc. Natl Acad. Sci. USA, 97, 47014706.
33
Nei,M., Gu,X. and Sitnikova,T. (1997) Evolution by the birth-and-death process in multigene families of the vertebrate immune system. Proc. Natl Acad. Sci. USA, 94, 77997806.
34 Graur,D., Shuali,Y. and Li,W.H. (1989) Deletions in processed pseudogenes accumulate faster in rodents than in humans. J. Mol. Evol., 28, 279285.[Web of Science][Medline]
35 Malnic,B., Hirono,J., Sato,T. and Buck,L.B. (1999) Combinatorial receptor codes for odors. Cell, 96, 713723.[Web of Science][Medline]
36 Krautwurst,D., Yau,K.W. and Reed,R.R. (1998) Identification of ligands for olfactory receptors by functional expression of a receptor library. Cell, 95, 917926.[Web of Science][Medline]
37 Ressler,K.J., Sullivan,S.L. and Buck,L.B. (1993) A zonal organization of odorant receptor gene expression in the olfactory epithelium. Cell, 73, 597609.[Web of Science][Medline]
38 Chess,A., Simon,I., Cedar,H. and Axel,R. (1994) Allelic inactivation regulates olfactory receptor gene expression. Cell, 78, 823834.[Web of Science][Medline]
39 Vassar,R., Ngai,J. and Axel,R. (1993) Spatial segregation of odorant receptor expression in the mammalian olfactory epithelium. Cell, 74, 309318.[Web of Science][Medline]
40 Ngai,J., Chess,A., Dowling,M.M., Necles,N., Macagno,E.R. and Axel,R. (1993) Coding of olfactory information: topography of odorant receptor expression in the catfish olfactory epithelium. Cell, 72, 667680.[Web of Science][Medline]
41
Ben-Arie,N., Lancet,D., Taylor,C., Khen,M., Walker,N., Ledbetter,D.H., Carrozzo,R., Patel,K., Sheer,D., Lehrach,H. et al. (1994) Olfactory receptor gene cluster on human chromosome 17: possible duplication of an ancestral receptor repertoire. Hum. Mol. Genet., 3, 229235.
42
Osoegawa,K., Tateno,M., Woon,P.Y., Frengen,E., Mammoser,A.G., Catanese,J.J., Hayashizaki,Y. and de Jong,P.J. (2000) Bacterial artificial chromosome libraries for mouse sequencing and functional analysis. Genome Res., 10, 116128.
43
Osoegawa,K., Mammoser,A.G., Wu,C., Frengen,E., Zeng,C., Catanese,J.J. and de Jong,P.J. (2001) A bacterial artificial chromosome library for sequencing the complete human genome. Genome Res., 11, 483496.
44
Mahairas,G.G., Wallace,J.C., Smith,K., Swartzell,S., Holzman,T., Keller,A., Shaker,R., Furlong,J., Young,J., Zhao,S. et al. (1999) Sequence-tagged connectors: a sequence approach to mapping and scanning the human genome. Proc. Natl Acad. Sci. USA, 96, 97399744.
45
Zhao,S., Shatsman,S., Ayodeji,B., Geer,K., Tsegaye,G., Krol,M., Gebregeorgis,E., Shvartsbeyn,A., Russell,D., Overton,L. et al. (2001) Mouse BAC ends quality assessment and sequence analyses. Genome Res., 11, 17361745.
46 Trask,B. (1999) In Birren,B., Green,E.D., Hieter,P., Slapholz,S., Myers,R.M., Riethman,H. and Roskams,J. (eds), Genome Analysis: A Laboratory Manual. Cold Spring Harbor Laboratory Press, NY. Vol. 4, pp. 303-413.
47 Berghard,A. and Dryer,L. (1998) A novel family of ancient vertebrate odorant receptors. J. Neurobiol., 37, 383392.[Web of Science][Medline]
48
Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 33893402.
49 Glusman,G., Bahar,A., Sharon,D., Pilpel,Y., White,J. and Lancet,D. (2000) The olfactory receptor gene superfamily: data mining, classification, and nomenclature. Mamm. Genome, 11, 10161023.[Web of Science][Medline]
50 Parmentier,M., Libert,F., Schurmans,S., Schiffmann,S., Lefort,A., Eggerickx,D., Ledent,C., Mollereau,C., Gerard,C., Perret,J. et al. (1992) Expression of members of the putative olfactory receptor gene family in mammalian germ cells. Nature, 355, 453455.[Medline]
51 Buettner,J.A., Glusman,G., Ben-Arie,N., Ramos,P., Lancet,D. and Evans,G.A. (1998) Organization and evolution of olfactory receptor genes on human chromosome 11. Genomics, 53, 5668.[Web of Science][Medline]
52 Jurka,J., Benson,D.A., Karsch-Mizrachi,I., Lipman,D.J., Ostell,J., Rapp,B.A. and Wheeler,D.L. (2000) Repbase update: a database and an electronic journal of repetitive elements. Trends Genet., 16, 418420.[Web of Science][Medline]
53 Pearson,W.R., Wood,T., Zhang,Z. and Miller,W. (1997) Comparison of DNA sequences with protein sequences. Genomics, 46, 2436.[Web of Science][Medline]
54
Thompson,J.D., Higgins,D.G. and Gibson,T.J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res., 22, 46734680.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
J. Reisert and D. Restrepo Molecular Tuning of Odorant Receptors and Its Implication for Odor Signal Processing Chem Senses, September 1, 2009; 34(7): 535 - 545. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Niimura On the Origin and Evolution of Vertebrate Olfactory Receptor Genes: Comparative Genome Analysis Among 23 Chordate Species Gen Biol Evol, June 22, 2009; 2009(0): 34 - 44. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Saito, Q. Chi, H. Zhuang, H. Matsunami, and J. D. Mainland Odor Coding by a Mammalian Receptor Repertoire Sci. Signal., March 3, 2009; 2(60): ra9 - ra9. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Furudono, Y. Sone, K. Takizawa, J. Hirono, and T. Sato Relationship between Peripheral Receptor Code and Perceived Odor Quality Chem Senses, February 1, 2009; 34(2): 151 - 158. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Hanada, C. Zou, M. D. Lehti-Shiu, K. Shinozaki, and S.-H. Shiu Importance of Lineage-Specific Expansion of Plant Tandem Duplicates in the Adaptive Response to Environmental Stimuli Plant Physiology, October 1, 2008; 148(2): 993 - 1003. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Go and Y. Niimura Similar Numbers but Different Repertoires of Olfactory Receptor Genes in Humans and Chimpanzees Mol. Biol. Evol., September 1, 2008; 25(9): 1897 - 1907. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. E. Grus and J. Zhang Distinct Evolutionary Patterns between Chemoreceptors of 2 Vertebrate Olfactory Systems and the Differential Tuning Hypothesis Mol. Biol. Evol., August 1, 2008; 25(8): 1593 - 1601. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. C. Materna and R. A. Cameron The Sea Urchin Genome as a Window on Function Biol. Bull., June 1, 2008; 214(3): 266 - 273. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Wang, R. F. Lyman, S. A. Shabalina, T. F. C. Mackay, and R. R. H. Anholt Association of Polymorphisms in Odorant-Binding Protein Genes With Variation in Olfactory Response to Benzaldehyde in Drosophila Genetics, November 1, 2007; 177(3): 1655 - 1665. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. J. Crasto, L. N. Marenco, N. Liu, T. M. Morse, K.-H. Cheung, P. C. Lai, G. Bahl, P. Masiar, H. Y.K. Lam, E. Lim, et al. SenseLab: new developments in disseminating neuroscience information Brief Bioinform, May 17, 2007; (2007) bbm018v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. S. Michaloski, P. A.F. Galante, and B. Malnic Identification of potential regulatory motifs in odorant receptor genes by analysis of promoter sequences Genome Res., September 1, 2006; 16(9): 1091 - 1098. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Li, W. Li, H. Wang, D. L. Bayley, J. Cao, D. R. Reed, A. A. Bachmanov, L. Huang, V. Legrand-Defretin, G. K. Beauchamp, et al. Cats Lack a Sweet Taste Receptor J. Nutr., July 1, 2006; 136(7): 1932S - 1934S. [Full Text] [PDF] |
||||
![]() |
L. E. C. Von Dannecker, A. F. Mercadante, and B. Malnic Ric-8B promotes functional expression of odorant receptors PNAS, June 13, 2006; 103(24): 9310 - 9314. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. A. Gimelbrant and A. Chess An epigenetic state associated with areas of gene duplication Genome Res., June 1, 2006; 16(6): 723 - 729. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. T. Hamilton, S. Huntley, M. Tran-Gyamfi, D. M. Baggott, L. Gordon, and L. Stubbs Evolutionary expansion and divergence in the ZNF91 subfamily of primate-specific zinc finger genes Genome Res., May 1, 2006; 16(5): 584 - 594. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Go Lineage-Specific Expansions and Contractions of the Bitter Taste Receptor Gene Repertoire in Vertebrates Mol. Biol. Evol., May 1, 2006; 23(5): 964 - 972. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. G. Vogt How Sensitive Is a Nose? Sci. Signal., February 14, 2006; 2006(322): pe8 - pe8. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. A. Schoenfeld and T. A. Cleland Anatomical Contributions to Odorant Sampling and Representation in Rodents: Zoning in on Sniffing Behavior Chem Senses, February 1, 2006; 31(2): 131 - 144. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. C. Lai, M. S. Singer, and C. J. Crasto Structural Activation Pathways from Dynamic Olfactory Receptor-Odorant Interactions Chem Senses, November 1, 2005; 30(9): 781 - 792. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Fredriksson and H. B. Schioth The Repertoire of G-Protein-Coupled Receptors in Fully Sequenced Genomes Mol. Pharmacol., May 1, 2005; 67(5): 1414 - 1425. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Go, Y. Satta, O. Takenaka, and N. Takahata Lineage-Specific Loss of Function of Bitter Taste Receptor Genes in Humans and Nonhuman Primates Genetics, May 1, 2005; 170(1): 313 - 326. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Niimura and M. Nei From The Cover: Evolutionary dynamics of olfactory receptor genes in fishes and tetrapods PNAS, April 26, 2005; 102(17): 6039 - 6044. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. M. Shykind Regulation of odorant receptors: one allele at a time Hum. Mol. Genet., April 15, 2005; 14(suppl_1): R33 - R39. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. K. Stewart, N. L. Clark, G. Merrihew, E. M. Galloway, and J. H. Thomas High Genetic Diversity in the Chemoreceptor Superfamily of Caenorhabditis elegans Genetics, April 1, 2005; 169(4): 1985 - 1996. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Gilad, O. Man, and G. Glusman A comparison of the human and chimpanzee olfactory receptor gene repertoires Genome Res., February 1, 2005; 15(2): 224 - 230. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. M. Young, M. Kambere, B. J. Trask, and R. P. Lane Divergent V1R repertoires in five species: Amplification in rodents, decimation in primates, and a surprisingly small repertoire in dogs Genome Res., February 1, 2005; 15(2): 231 - 240. [Abstract] [Full Text] [PDF] |
||||
![]() |
P.-M. Lledo, G. Gheusi, and J.-D. Vincent Information Processing in the Mammalian Olfactory System Physiol Rev, January 1, 2005; 85(1): 281 - 317. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Matsunami Functional Expression of Mammalian Odorant Receptors Chem Senses, January 1, 2005; 30(suppl_1): i95 - i96. [Full Text] [PDF] |
||||
![]() |
X. Zhang, M. Rogers, H. Tian, X. Zhang, D.-J. Zou, J. Liu, M. Ma, G. M. Shepherd, and S. J. Firestein High-throughput microarray detection of olfactory receptor gene expression in the mouse PNAS, September 28, 2004; 101(39): 14168 - 14173. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Tian and M. Ma Molecular Organization of the Olfactory Septal Organ J. Neurosci., September 22, 2004; 24(38): 8383 - 8390. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. A. Gimelbrant, H. Skaletsky, and A. Chess From The Cover: Selective pressures on the olfactory receptor repertoire since the human-chimpanzee divergence PNAS, June 15, 2004; 101(24): 9019 - 9022. [Abstract] [Full Text] [PDF] |
||||
![]() |
S.-H. Shiu, W. M. Karlowski, R. Pan, Y.-H. Tzeng, K. F. X. Mayer, and W.-H. Li Comparative Analysis of the Receptor-Like Kinase Family in Arabidopsis and Rice PLANT CELL, May 1, 2004; 16(5): 1220 - 1234. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. D. Emes, S. A. Beatson, C. P. Ponting, and L. Goodstadt Evolution and Comparative Genomics of Odorant- and Pheromone-Associated Genes in Rodents Genome Res., April 1, 2004; 14(4): 591 - 602. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Gaillard, S. Rouquier, A. Chavanieu, P. Mollard, and D. Giorgi Amino-acid changes acquired during evolution by olfactory receptor 912-93 modify the specificity of odorant recognition Hum. Mol. Genet., April 1, 2004; 13(7): 771 - 780. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Malnic, P. A. Godfrey, and L. B. Buck The human olfactory receptor gene family PNAS, February 24, 2004; 101(8): 2584 - 2589. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. A. Godfrey, B. Malnic, and L. B. Buck The mouse olfactory receptor gene family PNAS, February 17, 2004; 101(7): 2156 - 2161. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Serizawa, K. Miyamichi, H. Nakatani, M. Suzuki, M. Saito, Y. Yoshihara, and H. Sakano Negative Feedback Regulation Ensures the One Receptor-One Olfactory Neuron Rule in Mouse Science, December 19, 2003; 302(5653): 2088 - 2094. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Hoppe, H. Frank, H. Breer, and J. Strotmann The Clustered Olfactory Receptor Gene Family 262: Genomic Organization, Promotor Elements, and Interacting Transcription Factors Genome Res., December 1, 2003; 13(12): 2674 - 2685. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Amadou, R. M. Younger, S. Sims, L. H. Matthews, J. Rogers, A. Kumanovics, A. Ziegler, S. Beck, and K. Fischer Lindahl Co-duplication of olfactory receptor and MHC class I genes in the mouse major histocompatibility complex Hum. Mol. Genet., November 15, 2003; 12(22): 3025 - 3040. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Niimura and M. Nei Evolution of olfactory receptor genes in the human genome PNAS, October 14, 2003; 100(21): 12235 - 12240. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. J. Zylka, X. Dong, A. L. Southwell, and D. J. Anderson Atypical expansion in mice of the sensory neuron-specific Mrg G protein-coupled receptor family PNAS, August 19, 2003; 100(17): 10043 - 10048. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Conte, M. Ebeling, A. Marcuz, P. Nef, and P. J. Andres-Barquin Evolutionary relationships of the Tas2r receptor gene families in mouse and human Physiol Genomics, June 24, 2003; 14(1): 73 - 82. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Volz, A. Ehlers, R. Younger, S. Forbes, J. Trowsdale, D. Schnorr, S. Beck, and A. Ziegler Complex Transcription and Splicing of Odorant Receptor Genes J. Biol. Chem., May 23, 2003; 278(22): 19691 - 19701. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Newman and B. J. Trask Complex Evolution of 7E Olfactory Receptor Genes in Segmental Duplications Genome Res., May 1, 2003; 13(5): 781 - 793. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. D. Emes, L. Goodstadt, E. E. Winter, and C. P. Ponting Comparison of the genomes of human and mouse lays the foundation of genome zoology Hum. Mol. Genet., April 1, 2003; 12(7): 701 - 709. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Gilad, O. Man, S. Paabo, and D. Lancet Human specific loss of olfactory receptor genes PNAS, March 18, 2003; 100(6): 3324 - 3327. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Gilad and D. Lancet Population Differences in the Human Functional Olfactory Repertoire Mol. Biol. Evol., March 1, 2003; 20(3): 307 - 314. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. LIPOVICH and M.-C. KING Novel Transcriptional Units and Unconventional Gene Pairs in the Human Genome: Toward a Sequence-level Basis for Primate-specific Phenotypes? Cold Spring Harb Symp Quant Biol, January 1, 2003; 68(0): 461 - 470. [Abstract] [PDF] |
||||
![]() |
M. Ma, X. Grosmaitre, C. L. Iwema, H. Baker, C. A. Greer, and G. M. Shepherd Olfactory Signal Transduction in the Mouse Septal Organ J. Neurosci., January 1, 2003; 23(1): 317 - 324. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. J. Mural, M. D. Adams, E. W. Myers, H. O. Smith, G. L. G. Miklos, R. Wides, A. Halpern, P. W. Li, G. G. Sutton, J. Nadeau, et al. A Comparison of Whole-Genome Shotgun-Derived Mouse Chromosome 16 and the Human Genome Science, May 31, 2002; 296(5573): 1661 - 1671. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. M. Young and B. J. Trask The sense of smell: genomics of vertebrate odorant receptors Hum. Mol. Genet., May 15, 2002; 11(10): 1153 - 1160. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. W. Myers, G. G. Sutton, H. O. Smith, M. D. Adams, and J. C. Venter On the sequencing and assembly of the human genome PNAS, April 2, 2002; 99(7): 4145 - 4146. [Full Text] [PDF] |
||||
![]() |
E. W. Myers, G. G. Sutton, H. O. Smith, M. D. Adams, and J. C. Venter On the sequencing and assembly of the human genome PNAS, April 2, 2002; 99(7): 4145 - 4146. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


























