Skip Navigation


Human Molecular Genetics Advance Access originally published online on August 7, 2007
Human Molecular Genetics 2007 16(21):2572-2582; doi:10.1093/hmg/ddm209
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
16/21/2572    most recent
ddm209v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (9)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Bosch, N.
Right arrow Articles by Estivill, X.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Bosch, N.
Right arrow Articles by Estivill, X.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Characterization and evolution of the novel gene family FAM90A in primates originated by multiple duplication and rearrangement events

Nina Bosch1, Mario Cáceres1, Maria Francesca Cardone2, Anna Carreras1, Ester Ballana1, Mariano Rocchi2, Lluís Armengol1 and Xavier Estivill1,3,*

1 Genes and Disease Program, Center for Genomic Regulation (CRG-UPF) and CIBERESP, Barcelona, Catalonia, Spain, 2 Department of Genetics and Microbiology, University of Bari, Bari, Italy and 3 Experimental and Health Sciences Department, Pompeu Fabra University, Barcelona, Cataloniasss, Spain

* To whom correspondence should be addressed at: Genes and Disease Program, Center for Genomic Regulation (CRG-UPF), and CIBERESP, Plaça Charles Darwin s/n (Carrer Dr Aiguader, 88), PRBB Building, Room 521, 08003 Barcelona, Catalonia, Spain. Tel: +34-933160159; Fax: +34-933160099; Email: xavier.estivill{at}crg.es

Received April 17, 2007; Accepted July 25, 2007


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 MATERIALS AND METHODS
 SUPPLEMENTARY MATERIAL
 REFERENCES
 
Genomic plasticity of human chromosome 8p23.1 region is highly influenced by two groups of complex segmental duplications (SDs), termed REPD and REPP, that mediate different kinds of rearrangements. Part of the difficulty to explain the wide range of phenotypes associated with 8p23.1 rearrangements is that REPP and REPD are not yet well characterized, probably due to their polymorphic status. Here, we describe a novel primate-specific gene family, named FAM90A (family with sequence similarity 90), found within these SDs. According to the current human reference sequence assembly, the FAM90A family includes 24 members along 8p23.1 region plus a single member on chromosome 12p13.31, showing copy number variation (CNV) between individuals. These genes can be classified into subfamilies I and II, which differ in their upstream and 5'-untranslated region sequences, but both share the same open reading frame and are ubiquitously expressed. Sequence analysis and comparative fluorescence in situ hybridization studies showed that FAM90A subfamily II suffered a big expansion in the hominoid lineage, whereas subfamily I members were likely generated sometime around the divergence of orangutan and African great apes by a fusion process. In addition, the analysis of the Ka/Ks ratios provides evidence of functional constraint of some FAM90A genes in all species. The characterization of the FAM90A gene family contributes to a better understanding of the structural polymorphism of the human 8p23.1 region and constitutes a good example of how SDs, CNVs and rearrangements within themselves can promote the formation of new gene sequences with potential functional consequences.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 MATERIALS AND METHODS
 SUPPLEMENTARY MATERIAL
 REFERENCES
 
Genome duplication is widely accepted as one of the main mechanisms for the birth of new genes and the expansion of gene families, as well as an opportunity to adopt new functions (1). This is exemplified by the formation of the homeobox (2), globins (3) or primate-specific morpheus tandem gene clusters (4), which are likely the result of recombination between misaligned homologous sequences (5). Moreover, it has been postulated that segmental duplications (SDs—i.e. duplicated segments of genomic DNA of size > 1 kb and that share > 90% of sequence identity) mediate genomic rearrangements via non-allelic homologous recombination (NAHR) (6). Thus far, a significant number of genomic disorders, such as the Williams–Beuren, Prader–Willi or DiGeorge syndromes, are known to arise from NAHR between SDs (79), and the presence of SDs has been correlated with evolutionary breakpoints in primates and other mammals (1012). In addition, genomic copy number gains or losses, also known as copy number variants (CNVs), are frequently the result of this kind of recombination event and have been associated with SDs as well (13).

Besides genomic gains and losses, depending on the relative position and orientation of SDs, NAHR can also lead to the inversion of the genomic intervening sequence. Occasionally, these inversions are involved in human disease, such as in Hunter syndrome (14) or hemophilia A (15). Several other inversion variants do not have any evident phenotypic effects for the carriers, but result in a higher risk of chromosomal rearrangements in the offspring (16,17) or a small increase in fertility (18). The polymorphic inversion found in the 8p23.1 region in 26% of the European and Japanese general population (19,20) has no apparent phenotypic consequences, but incorrect pairing between normal and inverted chromosomes during meiosis leads to different types of rearrangements that have been associated with a wide range of phenotypes in the offspring (2125). Furthermore, duplications of the 8p23.1 segment have been repeatedly related to the presence of benign euchromatic variants (2629).

The underlying basis for all these rearrangements is likely the presence of two poorly characterized sets of complex SDs, REPP and REPD, on the proximal and distal portions of 8p23.1. Although several studies have lately improved the REPP and REPD genomic architecture, even on the last effort to obtain an accurate sequence of human chromosome 8 by the International Human Genome Sequence Consortium, two of the four gaps remaining in this chromosome were still located within these regions and they seemed to be refractory to current cloning and mapping technologies (30). Incomplete and incorrectly assembled sequences are common to regions containing SDs and hence, these genomic fragments deserve special attention in order to obtain a proper characterization (31). Another phenomenon that complicates the analysis of the 8p23.1 region is the polymorphic status of different components within the SDs, such as the CNVs of alpha- as well as beta-defensin gene clusters (3236). These polymorphic genomic regions are frequently related with hotspots in evolution, and they often contain genes related to adaptation to the environment (37). Furthermore, the 8p23.1 region also shows a strikingly high polymorphism rate in the human population that is just exceeded by some regions of the Y chromosome (30).

In our effort to better understand the evolution and dynamics of these complex regions, we have identified a novel gene family (FAM90A) with at least 24 members distributed along the REPP and REPD plus one single copy on chromosome 12p13.31 per haploid genome. In the current work, we examine the sequence of the different copies, identify the functional ones and describe two different subfamilies of these genes. We also report the presence of a variable number of FAM90A members at the genomic level in individuals of the general population and in non-human primates, which brings in additional complexity to the SDs on 8p23.1. Finally, a mechanism by which these copies could have evolved and expanded through the primate lineage is hypothesized.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 MATERIALS AND METHODS
 SUPPLEMENTARY MATERIAL
 REFERENCES
 
In silico analysis of the low copy repeats in the 8p23.1 region
Our first approach to obtain a better characterization of the 8p23.1 SDs was an in-silico analysis of the REPD and REPP sequences in the current human genome reference assembly (NCBI Build 36.1). As previously described, these SDs are formed by several segments that are duplicated in many other parts of the genome, including olfactory receptors (ORs) and the alpha- and beta-defensin genes related to the immune response, which are clustered around the ~100 kb gap placed within REPD (35) (Fig. 1). In addition, the alignment of the REPD and REPP sequences against themselves using the Pipmaker algorithm (38) detected a 7.6 kb fragment that was tandemly repeated at multiple locations. Along REPD, four different clusters, A–D, were found, which contain six, five, eight and three copies of the 7.6 kb module (HsaCopy1–22), respectively (Fig. 1). Furthermore, within REPP, there are two additional sequences that share high identity with 6.4 kb of the 7.6 kb module (HsaCopy23 and 24; Fig. 1), totaling the 24 copies of this sequence present in the latest assembly of the 8p23.1 region. Interestingly, clusters A and B on REPD are in opposite direction and located on each side of a group of beta-defensin genes, and the same happens with clusters C and D (Fig. 1). Therefore, the whole genomic region appears to be duplicated and distributed as specular images on both sides of the gap.


Figure 1
View larger version (17K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 1. Schematic representation of the 8p23.1 region. (A) Ideogram of human chromosome 8 showing magnification of the 8p23.1 region. (B) Wide-colored arrows indicate the orientation of the FAM90A clusters (named A–D) and other single FAM90A copies. Distance between cluster D and HsaCopy23 is 3.4 Mb and coincides with the location of the polymorphic inversion affecting 8p23.1 (19). (C) FAM90A clusters, represented with the same colors as in (B), and their positions with regard to alpha- and beta-defensin gene clusters and olfactory receptors. Thin arrows show the directions of transcription and the numbers underneath are the sizes of the clusters in kilo-base pair. Besides FAM90A clusters, REPD is composed of an alpha-defensin cluster (DEFA6, DEFA4, DEFA3, DEFT1, DEFA3, DEFT1, DEFA3 and DEFA5), two copies of OR7E125P and OR7E154P pseudogenes, two copies of a beta-defensin cluster (DEFB4, DEFB103A, DEFB104, DEFB106, DEFB105 and DEFB107), an estimated 100 kb gap and OR7E96P. REPP contains three different OR pseudogenes (OR7E158P, OR7E161P and OR7E160P), followed by FAM90A HsaCopy23, an estimated 100 kb gap, and FAM90A HsaCopy24.

 
Characterization of the novel FAM90A gene family
A more detailed analysis of the 7.6 kb module revealed that it is composed of 6.3 kb of unique sequence that includes 285 bp related to a LINE repetitive element (L1MB3) and 1033 bp corresponding to a long terminal repeat of the ERVK family of endogenous retrovirus (LTR5A) at the end. In order to screen whether this module resembled any other human sequence, identity searches were performed against the human genome and the RefSeq database. Besides the copies in the 8p23.1 region, the best match was the family with sequence similarity 90 member A1 (FAM90A1) gene (GenBank: NM_018088), located as a single copy on chromosome 12p13.31. The transcribed portion of this gene (6342 bp) shares 96% nucleotide identity with the two copies found in REPP and more than 93% identity with the 7.6 kb module repeated in REPD. Thus, these sequences constitute a novel gene family (FAM90A) in the human genome, and we have named the different members in 8p23.1 as HsaCopy1–24 (distal to proximal) to avoid possible confusion with the current nomenclature (Supplementary Material, Table S1).

Taking as reference FAM90A1 annotation, which contains six exons and results in a 2342 bp long mRNA, we predicted the gene structure of the 24 copies on 8p23.1 SDs (Fig. 2). On the basis of the gene structure and sequence similarity, the different members of this family could be divided into subfamilies I and II. Subfamily I includes FAM90A1 and both single copies on REPP (HsaCopy23 and 24), and subfamily II is formed by the rest of the members on REPD SDs (HsaCopy1–22). The main difference between the two groups is that the initial 1036 bp of the FAM90A1 gene, which contains the first untranslated exon (exon 1) plus 302 and 98 bp homologous to an AluSx and an MIRb repetitive element, is exclusive of the three subfamily I copies (Fig. 2). However, this 1 kb sequence is found at multiple locations on different chromosomes, including an intron of the ALG1 gene on chromosome 16, and constitutes part of a SD with at least eight additional copies in the human genome. Instead, subfamily II copies have a 1244 bp sequence, which is only found to be associated with FAM90A members (Fig. 2).


Figure 2
View larger version (16K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 2. Exon–intron structure of FAM90A subfamily I and subfamily II genes. Filled boxes correspond to non-translated exons and open boxes to the coding sequence. The alternatively spliced exon is represented by a dotted line. Repetitive elements are symbolized as gray and black rectangles. Nonsense substitutions in HsaCopy3, 23 and 24 are represented by asterisks. Frameshift insertions and deletions in HsaCopy7 and 9 are depicted as downward or upward triangles, respectively (Supplementary Material, Table S1). The black arrow indicates the alternative GC donor splice site present in nine members of subfamily II. 5'- and 3'-UTR are represented by black lines and the coding sequence (CDS) as a white rectangle below the diagrams.

 
From the predicted mRNA sequences, we have also determined the coding sequence conservation in the different 8p23.1 FAM90A copies. The coding sequence of the FAM90A1 gene consists of four exons (exons 3–6) and has an ORF with the potential to encode for a protein of 464 amino acids (Swiss-Prot: Q9NVZ6). This ORF is conserved in most of the 8p23.1 copies, indicating that they also have protein coding capacity (Supplementary Material, Table S1 and Fig. 2). The only exceptions are HsaCopy3, 7, 9 and 23 and 24, which have a total of six potential gene-inactivating mutations resulting probably in non-functional proteins. In addition, there are 10 FAM90A subfamily II copies (HsaCopy1–5, 10–11 and 20–22) with a T to C mutation in the donor splice site of exon 5 that changes the common GT to the alternative donor splice site GC (Fig. 2), although that should not affect the encoded protein (39).

Variation in FAM90A clusters in humans
After the identification of the FAM90A clusters in the human genome reference sequence, we pursued the experimental characterization of the novel gene family in 20 unrelated individuals using pulsed-field gel electrophoresis (PFGE) and Southern blot analysis (Fig. 3). Genomic DNA was digested with the Acc65I restriction enzyme that allowed us to isolate the different clusters on REPD, plus the three single copies on REPP and 12p13.31. As a probe, we used a 700 bp DNA fragment corresponding to the second intron of FAM90A1. The expected restriction pattern from the reference assembly included seven different fragments, ranging from 77 to 14 kb. However, the PFGE patterns showed high variability between individuals (Fig. 3), indicating that there are structural changes in this region and that FAM90A members are polymorphic in copy number in the human population.


Figure 3
View larger version (36K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 3. Southern blot hybridization of PFGE of Acc65I-digested genomic DNA from 20 individuals with a probe corresponding to the second intron of FAM90A1. Size of marker fragments is indicated on both sides. Black bars between the panels correspond to the approximate location of the seven expected digestion fragments based on the reference assembly (cluster C, 77 kb; cluster A, 68 kb; cluster B, 54 kb; FAM90A1 fragment, 49 kb; cluster D, 45 kb and HsaCopy23 and HsaCopy24, two 13.6 kb fragments).

 
To confirm these results, we examined the information available from the complete sequences of other human bacterial artificial chromosome (BAC) clones not included in the genome assembly. By BLAST analysis of the 7.6 kb module against the non-redundant database, we found three BAC sequences with entire FAM90A subfamily II clusters that have different numbers of copies than those in the reference human genome sequence (Supplementary Material, Table S2). These clones likely belong to a different allele to the human reference sequence, although we cannot discard the possibility that they represent additional copies of the clusters on the 8p23.1 region or part of the non-assembled genomic material located on the REPD and REPP gaps. Therefore, these results independently stress the existence of variability in the number of FAM90A copies in human chromosomes.

Expression of FAM90A gene family members
In order to examine the expression of FAM90A genes, we carried out RT–PCR from total adult RNA of 12 different tissues with primers binding to exons 3 and 4 of most FAM90A copies. A fragment of the expected size of the mRNA (351 bp) was amplified in all tissues tested (Fig. 4). We also repeated the same RT–PCR with RNA from lymphoblastoid cell lines from seven individuals from the general population. The expected fragment from the FAM90A mRNA was detected in all individuals, suggesting that these genes are widely expressed in humans.


Figure 4
View larger version (46K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 4. RT–PCR analysis of the expression pattern of FAM90A gene family. (A) RT–PCR of FAM90A members in 12 different human tissues RNA. (B) RT–PCR of FAM90A members in lymphoblastoid cell line RNA from seven individuals of the general population. (C) RT–PCR with primers specific to FAM90A members of subfamily I and to multiple members of subfamily II on 13 human tissues. A 1 kb molecular size ladder is shown on the left side. Right lanes correspond to genomic DNA.

 
To investigate the expression of the two different FAM90A subfamilies, we searched for expressed sequence tags (ESTs) in public databases supporting the expression of particular FAM90A copies. Besides the reference mRNA of FAM90A1, there were several additional full-length mRNAs and ESTs matching the exons of this gene. The main difference was that approximately half of these sequences include an alternatively spliced exons between 2 and 3 (Fig. 2). Moreover, there was evidence of expression of exons 2–6 of HsaCopy23 and 24 on 8p23.1 proximal duplicons from two additional ESTs (GenBank: AL832996 [GenBank] and CX762572); one of them was fully sequenced by us, confirming the predicted intron–exon structure. Conversely, no spliced ESTs matching specifically subfamily II copies could be detected, although there were some corresponding to different parts of the repeated module in the REPD clusters.

Three different primer pairs which are specific, respectively, for FAM90A1, HsaCopy23 and 24 on REPP and most subfamily II copies on REPD were used to further study the tissue expression of subfamily I and II members (Fig. 4). Consistent with the previous results, FAM90A1 was expressed in all 13 tissues tested, whereas we could not detect HsaCopy23 and 24 expression in heart, testis, placenta and prostate. Regarding expression of subfamily II members, we detected a clear band of the expected size on all tissues with the exception of kidney and lung, whereas a larger size band, corresponding perhaps to an alternative transcript, was observed in prostate.

FAM90A genes in primates and other species
To investigate the presence of members of this family throughout mammals, we performed exhaustive similarity searches of the FAM90A nucleotide and protein sequences against the available genome assemblies and non-redundant sequences in the databases. No significant similarity to FAM90A was found in non-primate species, but complete or partial copies of this gene were identified in chimpanzees, rhesus macaque and baboon.

Owing to the complexity of the REPD and REPP SDs, the syntenic region in the chimpanzee genome is not well resolved yet. However, there are two chimpanzee BACs (GenBank: AC183981 [GenBank] and AC184710 [GenBank] ), which contain sequences homologous to human subfamily II clusters, including, respectively, seven (PtrCopy1–7) and six (PtrCopy8–13) full-length FAM90A copies. In addition, we found homologous copies to FAM90A subfamily I members in chromosome 12 (PtrCopy14), chromosome 8 (PtrCopy15) and chromosome 11 (PtrCopy16) of the current chimpanzee genome assembly. Thus, organization of this gene family in chimpanzees is very similar to that in humans.

A different scenario is found in rhesus macaque and baboon. In the rhesus macaque genome assembly, there are only two ~4 kb fragments with sequence similarity to parts of FAM90A subfamily II members, one of which is represented in two overlapping baboon BACs (GenBank: AC116559 and AC116558). In addition, a rhesus contig sequence of 14.5 kb (GenBank: NW_001158155) includes a subfamily II copy (MmuCopy1) flanked by smaller fragments that match the end and the beginning of another copy, respectively. However, in none of the rhesus or baboon sequences, there is an LTR inserted at the end of the gene. Moreover, no subfamily I copies have been identified in rhesus, baboon or orangutan, indicating that this gene arrangement is more likely specific of African great apes. The only position in the rhesus genome with similarity to the initial 1028 bp fragment of this subfamily that is duplicated in humans is located in the ALG1 gene intron at chromosome 20 (syntenic to human chromosome 16).

Genomic distribution of FAM90A genes in primates
Four human BACs and a 4.8 kb PCR-amplified FAM90A fragment were used as probes to investigate the chromosomal distribution of the FAM90A gene family on human, chimpanzee, gorilla, orangutan and rhesus macaque by fluorescence in situ hybridization (FISH) studies. Representative results of comparative FISH experiments from primate chromosome spreads are shown in Supplementary Material, Figures S1 and S2, and a summary of the chromosomal signals of all probes is reported in Table 1. The BAC clones corresponding to the FAM90A1 region (12p13.31) and the REPP SDs (8p23.1) produced very similar results in human, chimpanzee and gorilla, with signals on several chromosomes matching the known locations of the SDs contained within the probes. Interestingly, in orangutan and rhesus macaque, these BACs did not hybridize to the homolog of human chromosome 12. A different FISH pattern was observed with the BACs of the FAM90A REPD clusters and the 4.8 kb FAM90A fragment, with hybridization to the homolog of human chromosome 8 in all species. However, on interphase nuclei, several close signals could be distinguished on human and chimpanzee chromosome 8, whereas single signals appeared on gorilla and orangutan (Supplementary Material, Fig. S2). In the case of rhesus macaque, we did not find any signal with the FAM90A fragment, probably due to the sequence divergence and the resolution limit of the FISH technique. Moreover, we performed PFGE and Southern experiments in chimpanzee, gorilla and orangutan using the same probe as in the human experiments (Supplementary Material, Fig. S3). Consistent with the FISH results, in all three great ape species, fragments with homology to FAM90A can be seen, but their sizes and numbers are different when compared with humans. Gorilla and orangutan have only two fragments of high molecular weight, whereas chimpanzee had also some smaller bands that could be shared with those found on human individuals.


View this table:
[in this window]
[in a new window]

 
Table 1. Summary of FISH experiments performed on chromosome spreads from primate species with four human BAC clones and a PCR fragment from the FAM90A gene

 
A more accurate estimate of the number of FAM90A copies in primates was obtained by real-time quantitative PCR with genomic DNA from human, chimpanzee, bonobo, gorilla and orangutan. The primers and probe were initially designed in regions conserved between the human FAM90A copies, but the availability of chimpanzee sequence subsequently showed that one of the primers had mismatches in most chimpanzee copies. In addition, sequence divergence between humans and macaques made impossible to design proper common primers and probe to be used with this species. To control for differences in DNA concentration between samples, FAM90A copy number was estimated in each species in comparison with a single-copy ultra-conserved region from human chromosome 6p21 (see Materials and Methods). We detected a 14.6-, 4.2-, 8.4- and 9.0-fold increase in human, chimpanzee, bonobo and gorilla with respect to orangutan, respectively, which was the species presenting a lower number of FAM90A copies. Thus, it is likely that there has been an expansion of the FAM90A gene family in African great apes, although the real number of copies in non-human primates could be underestimated because of sequence changes affecting the primers or probe.

Evolutionary analysis of FAM90A family genes
To assess the relationship between all the FAM90A members identified in primates, we carried out a phylogenetic analysis of different parts of these genes. Figure 5 shows the neighbor-joining tree based on the common sequence to both FAM90A subfamilies, except the LTR, using the rhesus MmuCopy1 as outgroup. According to this, subfamily II members in humans and chimpanzees form two separate clades that share a common origin. In addition, FAM90A copies located in the same cluster tend to form groups with high bootstrap values (e.g. HsaCopy1–5 or HsaCopy13–18), although there are exceptions involving mostly the copies located at the ends (e.g. HsaCopy12, 19 or 20). Finally, most subfamily I members (96% average identity) do not group together in the tree and instead appear to be located in independent branches, suggesting an old origin of these copies. The only exceptions are HsaCopy23 and 24 (99.5% identity) and the two copies located in chromosome 12 of each species, FAM90A1 and PtrCopy14, which are likely syntenic (97.9% identity). When phylogenetic trees were built based only on the LTR sequences, we obtained approximately the same associations between the different FAM90A copies (data not shown). The main difference in this case was that HsaCopy6 and 20 consistently group with members of subfamily I, suggesting that there has been gene conversion between the ends of clusters A and D and subfamily I sequences.


Figure 5
View larger version (17K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 5. Evolutionary analysis of FAM90A copies in primates. Phylogenetic tree was obtained by neighbor-joining using the sequence common to all the available full-length FAM90A members (5081 bp). Bootstrap values of 1000 replicates are indicated in boldface for the main nodes and only branches with more than 70% bootstrap support are represented in the tree. Branch lengths correspond to the number of nucleotide substitutions per position according to the BASEML module of PAML (40). Ka/Ks ratios obtained using the free-ratio model of the PAML CODEML module (40) are indicated in italics above the main branches, with dashes representing branches with Ks = 0. Brackets indicate the average Ka/Ks values of copies forming different clades within the tree. Gene-inactivating mutations are represented by asterisks.

 
The rate of synonymous (Ks) and non-synonymous (Ka) substitutions in the coding sequence of FAM90A genes was estimated with the maximum likelihood analysis program PAML (40). In general, the Ka/Ks ratio for this family was considerably high, with an average of 0.91 over the whole tree, very similar to the expected value of 1, assuming no functional constraint on the coding sequence (neutral evolution). However, there was some variation in Ka/Ks ratios between different groups of FAM90A genes (Fig. 5), although these differences were not statistically significant. We also divided the coding sequence of the gene into three parts of equal length and calculated the Ka/Ks separately for each of those. Significant differences in the evolutionary rate of the different parts of the protein were detected (2{Delta}l = 201.33, df = 128, P < 0.0001), with the C-terminal region (average Ka/Ks = 0.77) showing lower Ka/Ks ratios than the N-terminal (average Ka/Ks = 0.99) and central (average Ka/Ks = 0.92) regions.


    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 MATERIALS AND METHODS
 SUPPLEMENTARY MATERIAL
 REFERENCES
 
Duplication plays a central role in genome evolution (1,41). With the description of the novel gene family FAM90A, we present an extreme example of gene expansion by duplication and generation of new gene conformations with different upstream and untranslated region (UTR) sequences by rearrangement and intra- and/or interchromosomal duplication events. In addition, we have observed considerable variation in the organization and the number of copies of these genes in the human population, which is consistent with previously reported CNVs (42) and provides further information on the complexity of the genomic architecture of the SDs encompassing the 8p23.1 region (32,34). The mechanism underlying the generation of the complex structure of these SDs is far from clear. However, the analysis of the available genome sequence information and the experimental results presented here allow us to make several inferences about the evolution of the 8p23.1 genomic region and the FAM90A gene family in primates.

The generation of the known copies of FAM90A genes in human and chimpanzee genomes from the common ancestor of hominoids and Old World monkeys involved probably a complex process of duplications and rearrangements (Fig. 6). On the basis of the genome sequence and FISH data, subfamily I members are absent in rhesus macaque and orangutan, and we propose that subfamily II preceded subfamily I genes. Both subfamilies would have originated from FAM90A sequences similar to those found in macaque chromosome 8, which have a structure resembling the tandem repeated modules of subfamily II clusters. Next, there was an insertion of an LTR of the ERV-K family, with the generation of the 6 bp target site duplications typical of these elements. From there, the whole genomic region duplicated and the two FAM90A subfamilies diverged. For subfamily I copies, sometime around the divergence of orangutan and African great apes (~15 Ma ago), there was a deletion that fused the FAM90A sequences with one of the SDs from the ALG1 gene region (originally located as a single copy in the homolog of human chromosome 16). This resulted in co-opting part of the ALG1 intron as the upstream sequences and exon 1 of the new FAM90A genes. Then, the whole region went through a series of several independent duplication and gene loss events, which gave rise to the different subfamily I copies, found nowadays in the human and chimpanzee genomes.


Figure 6
View larger version (35K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 6. Schematic diagram of the most parsimonious evolutionary model for the origin of FAM90A genes in humans and chimpanzees. Main steps of the process of generation of the two FAM90A subfamilies are shown, with the approximate time period when they occurred indicated on the left using the origin of different phylogenetic groups in the primate lineage as reference. The different sequences included in FAM90A copies are represented as orange and red rectangles and coding and non-coding exons are represented in black. Sequences homologous to L1MB3 and LTR5A elements are shown as dashed gray boxes. Purple, green, yellow and blue rectangles symbolize the flanking sequences, and the ones found in current FAM90A copies are numbered. Arrows in top of the diagrams correspond to the subfamily I and II modules, as shown in Figure 2. Part of the intronic sequence of the ALG1 gene included in subfamily I members is also shown. The ancestral ape chromosomal nomenclature of roman numerals is used to indicate the chromosomal location. The FAM90A copies currently present in the human and chimpanzee reference genomes are included within rectangles. The subfamily II modules in clusters A–D drawn are just approximate and do not represent the actual number of copies. Organization of the subfamily II clusters in chimpanzees is not well resolved because of genome sequence assembly problems.

 
The precursor of the subfamily II clusters could have originated by a tandem duplication of the 7.6 kb module, either by a mechanism similar to slippage during DNA replication or by NAHR between the repeated regions at the beginning and end of each module (Fig. 6). This was followed by a deletion that eliminated part of the cluster and resulted in a ~500 bp shorter LTR at one end. Then, a process of duplication of the original cluster, deletion of the other border and flanking region in one of the clusters plus additional duplication of the region encompassing the two clusters, generated the actual distribution of the REPP FAM90A clusters in humans, with clusters A and B being roughly a mirror image of clusters C and D. These series of events are consistent with the distribution and identity of the flanking sequences between the clusters (Fig. 6). Moreover, FISH, Southern and quantitative PCR analysis suggest that the duplication of the clusters and the expansion of FAM90A occurred in the common ancestor of humans, chimpanzees and gorillas. After that, the evolution of the different FAM90A members has probably been characterized by events of NAHR, which could result in the variation of the copy number of these genes between individuals and other structural changes, such as the polymorphic inversion affecting the 8p23.1 region (23). In addition, according to the phylogenetic analysis, there are evidences of a great degree of sequence homogenization by gene conversion between copies located in the same cluster, copies located at the ends of clusters with the same flanking sequences and even the ends of subfamily II clusters and subfamily I members (Fig. 5).

Comparative genomic analysis revealed that the FAM90A family is exclusive of primates and that the most closely related sequences are two hypothetical genes from cow (LOC615167) and dog (LOC609215), which share very low identity with FAM90A proteins (45% identity over 234 amino acids and 35% identity over 307 amino acids, respectively) to consider them orthologs. There are several described examples of novel primate genes created by fusion processes, such as the hominoid-specific chimeric genes derived from the melanocortin-concentrating hormone (43). In addition, several expansion processes similar to that of FAM90A genes have been reported across primate species (44), including the kruppel-associated box zinc finger gene clusters (45), the neuroblastoma breakpoint gene family (NBPF) (46), the morpheus genes on human chromosome 16 (4) or the extreme amplification of the sequences encoding the DUF1220 protein domain in humans (47). Therefore, all these examples stress the importance of SDs and structural changes in the generation of genomic variation and new gene sequences during evolution (41).

In most cases, the function of the genes expanded in the primate genome is not yet clear. In FAM90A proteins, the only known feature is a 19 amino acid motif corresponding to a CCHC zinc-finger domain that could be involved in DNA or RNA binding (Supplementary Material, Fig. S4). The evolutionary analysis of the Ka/Ks ratios is compatible with an overall low functional constraint acting on FAM90A proteins, but at least one FAM90A member per species has Ka/Ks values clearly lower than 1, indicating that they could encode functional proteins. These include FAM90A1 (Ka/Ks = 0.35) and HsaCopy10 and 11 and HsaCopy21 and 22 in humans (average Ka/Ks = 0.63), the available subfamily II clusters in chimpanzees (average Ka/Ks = 0.48) and the MmuCopy1 in rhesus (Ka/Ks = 0.78). However, there are also several copies that have accumulated inactivating mutations in the coding sequence through a process of pseudogenization and have Ka/Ks ratios close to 1, such as {Psi}HsaCopy3, 7, 9, 23 and 24. In addition, there might be other copies with Ka/Ks ratios close to 1 that have lost the ability of being transcribed. Therefore, together with the existence of CNVs and gene conversion events in this region, it is very difficult to define the number of functional FAM90A genes in each individual.

Besides the possible changes in the FAM90A coding sequence, in this work we describe the generation of a novel gene conformation that is exclusive of African great apes, subfamily I, which has the same coding capacity, but a different first 5'-UTR exon and upstream sequences. Our RT–PCR analysis has showed that both subfamily I and subfamily II are ubiquitously expressed in diverse human tissues. Interestingly, subfamily I members, and in particular FAM90A1, are the FAM90A genes, for which there is the largest number of ESTs to support mRNA expression in humans. It is thus tempting to speculate that the acquisition of the new regulatory sequences resulted in a different expression profile that has been favored by natural selection. Preliminary analyses did not find evidences of acceleration of nucleotide changes in the ALG1 intron sequences co-opted as 5'-UTR and upstream regions of subfamily I members (M.C., unpublished data). However, more exhaustive expression and regulation analysis in subfamily I and subfamily II genes are required to assess the potential functional role of these novel genes during primate evolution.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 MATERIALS AND METHODS
 SUPPLEMENTARY MATERIAL
 REFERENCES
 
Sequence analysis
In silico analysis of the 8p23.1 region sequence was performed on the NCBI build 36.1 human genome assembly (http://genome.ucsc.edu/). The analyzed region was divided into four fragments determined by the positions of the distal and proximal 8p23.1 gaps: 6.5–7.4, 7.5–8, 11.8–12 and 12.2–12.5 Mb (Fig. 1). Repetitive elements in the four different fragments were masked using RepeatMasker (http://www.repeatmasker.org) and the remaining sequences were aligned with PipMaker (38) in order to identify duplicated segments between them. BLAST-based algorithms (48), the Celera database, Biology Workbench tools (49) and NCBI's Entrez Gene (www.ncbi.nlm.nih.gov/entrez) were used to obtain information of the FAM90A gene. Functional domains in the FAM90A protein were identified with InterPro Scan software (http://www.ebi.ac.uk/InterProScan).

Pulsed-field gel electrophoresis and Southern blotting
High-quality genomic DNA was isolated in agarose plugs prepared from lymphoblastoid cell lines from blood samples of different human donors plus one chimpanzee (Pan troglodytes), gorilla (Gorilla gorilla) and orangutan (Pongo pygmaeus) individual. DNA plugs were treated with Acc65I restriction enzyme, electrophoresed by pulsed-field gel electrophoresis (PFGE) using a CHEF MAPPER system (Biorad) at 6 V/cm for 19 h on a 1% agarose gel and blotted onto positively charged nylon membrane (Hybond-N+, Amersham). The filter was pre-hybridized at 42°C for 4 h and the hybridization was performed overnight at 42°C using a 700 bp fragment from FAM90A1 second intron as a probe. The probe was labeled with the PCR DIG Probe Synthesis kit (Roche) and was detected with Anti-Digoxigenin-AP antibody and CDP-STAR reagent (Roche).

RT–PCR expression analysis
Total RNA was extracted from lymphoblastoid cell lines of human individuals according to a standard protocol using Trizol reagent (Invitrogen). Isolated RNA (5 µg) from control individuals as well as commercial total adult RNA from ovary, liver, spleen, lung, placenta, kidney, thymus, heart, skeletal muscle, testes and colon (Stratagene) and brain (Ambion) was incubated with DNase I (Ambion) for 30 min at 37°C, and cDNA was synthesized from 1 µg of the DNase I-treated RNA by reverse transcription using the SuperScript First Strand Synthesis System (Invitrogen). PCRs were performed in a 12.5 µl reaction volume with 2 µl of cDNA using standard cycling program conditions. Primers to amplify FAM90A members were designed by Primer3 (50) and are listed in Supplementary Material, Table S3. Information on ESTs matching, particularly FAM90A copies, was obtained from the UCSC genome browser, and several IMAGE clones corresponding to cDNAs of FAM90A members were directly sequenced to confirm their identity.

Comparative FISH study
Metaphase spreads were obtained from human and primate cell lines (lymphoblasts or fibroblasts), including common chimpanzee, gorilla, orangutan and rhesus monkey (Macaca mulatta). Epstien–Barr virus-transformed human lymphoblasts were grown in standard RPMI media containing 10% fetal calf serum and antibiotics. DNA extraction from BACs was performed as described previously (51) and FISH experiments were performed as described by Lichter et al. (52). Digital images were obtained using a Leica DMRXA2 epifluorescence microscope equipped with a cooled CCD camera (Princeton Instruments, Princeton, NJ, USA). Cy3-dCTP, FluorX-dCTP, DEAC, Cy5-dCTP and DAPI fluorescence signals, detected with specific filters, were recorded separately as gray scale images. Pseudocoloring and merging of images were performed using Adobe PhotoshopTM software.

Real-time PCR analysis
For the quantitative real-time PCR amplification, two sets of universal probe library probes and primer pairs were used: one targeting the last exon of FAM90A genes and the other a single-copy ultra-conserved region on human chromosome 6p12.31. The probes and primer sets were designed at the ProbeFinder Design assay center website (https://www.roche-applied-science.com/sis/rtpcr/upl/adc.jsp) and the selected primers targeted regions identical in the great majority of FAM90A sequences (Supplementary Material, Table S3). Real-time PCR was performed in the LightCycler® 480 instrument (Roche Molecular Diagnostics), using the following program conditions for both amplicons: 10 min at 95°C followed by 45 cycles of 15 s at 95°C, 1 min at 59°C and 30 s at 72°C. Individual 10 µl reactions were carried out in triplicate per each sample in a 384-multiwell plate. Independent genomic DNA-based standard curves were used to determine the efficiencies of the FAM90A target amplification in each species. Estimates of FAM90A copies quantification were obtained from the crossing-point (Cp) values as described by Pfaffl (53).

Evolutionary analyses
To identify homologous sequences to FAM90A genes in other species, similarity searches against available genome assemblies and non-redundant databases were performed using BLAT and BLAST. In addition, BLASTP and TBLASTN searches with the FAM90A protein were also performed in the NCBI website (http://www.ncbi.nlm.nih.gov/blast). Multiple sequence alignments were carried out with the MUSCLE program with default parameters (54). Phylogenetic trees of the common sequence to all FAM90A copies were obtained by neighbor-joining from 1000 bootstrap replicates using the PHYLIP software package (http://evolution.genetics.washington.edu/phylip/doc/main.html), but similar results were obtained using the UPGMA, DNA parsimony and maximum likelihood methods. Distances of the different branches were calculated using the BASEML module of the PAML program (40). Synonymous (Ks) and non-synonymous (Ka) substitution rates along different branches were calculated by maximum likelihood under the codon substitution model implemented in PAML (40). To compare the Ka/Ks ratios of different parts of the tree, a likelihood ratio test was performed, as described previously (55).


    SUPPLEMENTARY MATERIAL
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 MATERIALS AND METHODS
 SUPPLEMENTARY MATERIAL
 REFERENCES
 
Supplementary Material is available at HMG Online.


    ACKNOWLEDGEMENTS
 
We thank Susana de la Luna, Baldo Oliva and Arcadi Navarro for helpful discussions and the ‘Centre de Transfusió i Banc de Teixits de l'Hospital Vall d'Hebrón’ (Barcelona, Spain) for the material from blood donors. Financial support was received from Genome Spain, Genome Canada, Spanish Ministry of Science and Education (SAF 2002-00799) and the Spanish Ministry of Health (184) (Instituto de Salud Carlos III). N.B. is a recipient of a BEFI fellowship from ‘Instituto de Salud Carlos III FIS-ISCIII’. M.C. was supported by the ‘Ramón y Cajal’ Program (Spanish Ministry of Science and Education). E.B. is recipient of a FI fellowship from Department d'Universitats i Societat de la Informació, Generalitat de Catalunya (2003FI00066).

Conflict of Interest statement. None declared.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 MATERIALS AND METHODS
 SUPPLEMENTARY MATERIAL
 REFERENCES
 

  1. Ohno S. The spontaneous mutation rate revisited and the possible principle of polymorphism generating more polymorphism. Can. J. Genet. Cytol. (1969) 11:457–467.[Web of Science][Medline]

  2. Holland P.W., Takahashi T. The evolution of homeobox genes: implications for the study of brain development. Brain Res. Bull. (2005) 66:484–490.[CrossRef][Web of Science][Medline]

  3. Shen S.H., Slightom J.L., Smithies O. A history of the human fetal globin gene duplication. Cell (1981) 26:191–203.[CrossRef][Web of Science][Medline]

  4. Johnson M.E., Viggiano L., Bailey J.A., Abdul-Rauf M., Goodwin G., Rocchi M., Eichler E.E. Positive selection of a gene family during the emergence of humans and African apes. Nature (2001) 413:514–519.[CrossRef][Medline]

  5. Babushok D.V., Ostertag E.M., Kazazian H.H. Jr, Current topics in genome evolution: molecular mechanisms of new gene formation. Cell. Mol. Life Sci. (2006) 64:542–554.[CrossRef][Web of Science]

  6. Lupski J.R. Genomic disorders: structural features of the genome can lead to DNA rearrangements and human disease traits. Trends Genet. (1998) 14:417–422.[CrossRef][Web of Science][Medline]

  7. Bayes M., Magano L.F., Rivera N., Flores R., Perez Jurado L.A. Mutational mechanisms of Williams–Beuren syndrome deletions. Am. J. Hum. Genet. (2003) 73:131–151.[CrossRef][Web of Science][Medline]

  8. Emanuel B.S., Shaikh T.H. Segmental duplications: an ‘expanding’ role in genomic instability and disease. Nat. Rev. Genet. (2001) 2:791–800.[Web of Science][Medline]

  9. Lupski J.R., Stankiewicz P. Genomic disorders: molecular mechanisms for rearrangements and conveyed phenotypes. PLoS Genet. (2005) 1:e49.[CrossRef][Medline]

  10. Tuzun E., Bailey J.A., Eichler E.E. Recent segmental duplications in the working draft assembly of the brown Norway rat. Genome Res. (2004) 14:493–506.[Abstract/Free Full Text]

  11. Armengol L., Pujana M.A., Cheung J., Scherer S.W., Estivill X. Enrichment of segmental duplications in regions of breaks of synteny between the human and mouse genomes suggest their involvement in evolutionary rearrangements. Hum. Mol. Genet. (2003) 12:2201–2208.[Abstract/Free Full Text]

  12. Murphy W.J., Larkin D.M., Everts-van der Wind A., Bourque G., Tesler G., Auvil L., Beever J.E., Chowdhary B.P., Galibert F., Gatzke L., et al. Dynamics of mammalian chromosome evolution inferred from multispecies comparative maps. Science (2005) 309:613–617.[Abstract/Free Full Text]

  13. Conrad B., Antonarakis S.E. Gene Duplication: A Drive for Phenotypic Diversity and Cause of Human Disease. Annu. Rev. Genomics Hum. Genet. (2007) 38:75–81.

  14. Bondeson M.L., Dahl N., Malmgren H., Kleijer W.J., Tonnesen T., Carlberg B.M., Pettersson U. Inversion of the IDS gene resulting from recombination with IDS-related sequences is a common cause of the Hunter syndrome. Hum. Mol. Genet. (1995) 4:615–621.[Abstract/Free Full Text]

  15. Lakich D., Kazazian H.H. Jr, Antonarakis S.E., Gitschier J. Inversions disrupting the factor VIII gene are a common cause of severe haemophilia A. Nat. Genet. (1993) 5:236–241.[CrossRef][Web of Science][Medline]

  16. Osborne L.R., Li M., Pober B., Chitayat D., Bodurtha J., Mandel A., Costa T., Grebe T., Cox S., Tsui L.C., et al. A 1.5 million-base pair inversion polymorphism in families with Williams–Beuren syndrome. Nat. Genet. (2001) 29:321–325.[CrossRef][Web of Science][Medline]

  17. Gimelli G., Pujana M.A., Patricelli M.G., Russo S., Giardino D., Larizza L., Cheung J., Armengol L., Schinzel A., Estivill X., et al. Genomic inversions of human chromosome 15q11–q13 in mothers of Angelman syndrome patients with class II (BP2/3) deletions. Hum. Mol. Genet. (2003) 12:849–858.[Abstract/Free Full Text]

  18. Stefansson H., Helgason A., Thorleifsson G., Steinthorsdottir V., Masson G., Barnard J., Baker A., Jonasdottir A., Ingason A., Gudnadottir V.G., et al. A common inversion under selection in Europeans. Nat. Genet. (2005) 37:129–137.[CrossRef][Web of Science][Medline]

  19. Giglio S., Broman K.W., Matsumoto N., Calvari V., Gimelli G., Neumann T., Ohashi H., Voullaire L., Larizza D., Giorda R., et al. Olfactory receptor-gene clusters, genomic-inversion polymorphisms, and common chromosome rearrangements. Am. J. Hum. Genet. (2001) 68:874–883.[CrossRef][Web of Science][Medline]

  20. Sugawara H., Harada N., Ida T., Ishida T., Ledbetter D.H., Yoshiura K., Ohta T., Kishino T., Niikawa N., Matsumoto N. Complex low-copy repeats associated with a common polymorphic inversion at human chromosome 8p23. Genomics (2003) 82:238–244.[CrossRef][Web of Science][Medline]

  21. Giorda R., Ciccone R., Gimelli G., Pramparo T., Beri S., Bonaglia M.C., Giglio S., Genuardi M., Argente J., Rocchi M., et al. Two classes of low-copy repeats comediate a new recurrent rearrangement consisting of duplication at 8p23.1 and triplication at 8p23.2. Hum. Mutat. (2007) 28:459–468.[CrossRef][Web of Science][Medline]

  22. Barber J.C., Maloney V., Hollox E.J., Stuke-Sontheimer A., du Bois G., Daumiller E., Klein-Vogler U., Dufke A., Armour J.A., Liehr T. Duplications and copy number variants of 8p23.1 are cytogenetically indistinguishable but distinct at the molecular level. Eur. J. Hum. Genet. (2005) 13:1131–1136.[CrossRef][Web of Science][Medline]

  23. Giglio S., Calvari V., Gregato G., Gimelli G., Camanini S., Giorda R., Ragusa A., Guerneri S., Selicorni A., Stumm M., et al. Heterozygous submicroscopic inversions involving olfactory receptor-gene clusters mediate the recurrent t(4;8)(p16;p23) translocation. Am. J. Hum. Genet. (2002) 71:276–285.[CrossRef][Web of Science][Medline]

  24. Giglio S., Graw S.L., Gimelli G., Pirola B., Varone P., Voullaire L., Lerzo F., Rossi E., Dellavecchia C., Bonaglia M.C., et al. Deletion of a 5-cM region at chromosome 8p23 is associated with a spectrum of congenital heart defects. Circulation (2000) 102:432–437.[Abstract/Free Full Text]

  25. Floridia G., Piantanida M., Minelli A., Dellavecchia C., Bonaglia C., Rossi E., Gimelli G., Croci G., Franchi F., Gilgenkrantz S., et al. The same molecular mechanism at the maternal meiosis I produces mono- and dicentric 8p duplications. Am. J. Hum. Genet. (1996) 58:785–796.[Web of Science][Medline]

  26. Barber J.C. Directly transmitted unbalanced chromosome abnormalities and euchromatic variants. J. Med. Genet. (2005) 42:609–629.[Abstract/Free Full Text]

  27. Tsai C.H., Graw S.L., McGavran L. 8p23 duplication reconsidered: is it a true euchromatic variant with no clinical manifestation? J. Med. Genet. (2002) 39:769–774.[Free Full Text]

  28. Harada N., Takano J., Kondoh T., Ohashi H., Hasegawa T., Sugawara H., Ida T., Yoshiura K., Ohta T., Kishino T., et al. Duplication of 8p23.2: a benign cytogenetic variant? Am. J. Med. Genet. (2002) 111:285–288.[CrossRef][Web of Science][Medline]

  29. Engelen J.J., Moog U., Evers J.L., Dassen H., Albrechts J.C., Hamers A.J. Duplication of chromosome region 8p23.1->p23.3: a benign variant? Am. J. Med. Genet. (2000) 91:18–21.[CrossRef][Web of Science][Medline]

  30. Nusbaum C., Mikkelsen T.S., Zody M.C., Asakawa S., Taudien S., Garber M., Kodira C.D., Schueler M.G., Shimizu A., Whittaker C.A., et al. DNA sequence and analysis of human chromosome 8. Nature (2006) 439:331–335.[CrossRef][Medline]

  31. Cheung J., Estivill X., Khaja R., MacDonald J.R., Lau K., Tsui L.C., Scherer S.W. Genome-wide detection of segmental duplications and potential assembly errors in the human genome sequence. Genome Biol. (2003) 4:R25.[CrossRef][Medline]

  32. Taudien S., Galgoczy P., Huse K., Reichwald K., Schilhabel M., Szafranski K., Shimizu A., Asakawa S., Frankish A., Loncarevic I.F., et al. Polymorphic segmental duplications at 8p23.1 challenge the determination of individual defensin gene repertoires and the assembly of a contiguous human reference sequence. BMC Genomics (2004) 5:92.[CrossRef][Medline]

  33. Aldred P.M., Hollox E.J., Armour J.A. Copy number polymorphism and expression level variation of the human {alpha}-defensin genes DEFA1 and DEFA3. Hum. Mol. Genet. (2005) 14:2045–2052.[Abstract/Free Full Text]

  34. Hollox E.J., Armour J.A., Barber J.C. Extensive normal copy number variation of a beta-defensin antimicrobial-gene cluster. Am. J. Hum. Genet. (2003) 73:591–600.[CrossRef][Web of Science][Medline]

  35. Linzmeier R.M., Ganz T. Human defensin gene copy number polymorphisms: comprehensive analysis of independent variation in alpha- and beta-defensin regions at 8p22–p23. Genomics (2005) 86:423–430.[CrossRef][Web of Science][Medline]

  36. Ballana E., Gonzalez J.R., Bosch N., Estivill X. Inter-population variability of DEFA3 gene absence: correlation with haplotype structure and population variability. BMC Genomics (2007) 8:14.[CrossRef][Medline]

  37. Bailey J.A., Gu Z., Clark R.A., Reinert K., Samonte R.V., Schwartz S., Adams M.D., Myers E.W., Li P.W., Eichler E.E. Recent segmental duplications in the human genome. Science (2002) 297:1003–1007.[Abstract/Free Full Text]

  38. Schwartz S., Zhang Z., Frazer K.A., Smit A., Riemer C., Bouck J., Gibbs R., Hardison R., Miller W. PipMaker—a web server for aligning two genomic DNA sequences. Genome Res. (2000) 10:577–586.[Abstract/Free Full Text]

  39. Wu Q., Krainer A.R. AT-AC pre-mRNA splicing mechanisms and conservation of minor introns in voltage-gated ion channel genes. Mol. Cell Biol. (1999) 19:3225–3236.[Free Full Text]

  40. Yang Z. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. (1997) 13:555–556.[Free Full Text]

  41. Bailey J.A., Eichler E.E. Primate segmental duplications: crucibles of evolution, diversity and disease. Nat. Rev. Genet. (2006) 7:552–564.[Web of Science][Medline]

  42. Redon R., Ishikawa S., Fitch K.R., Feuk L., Perry G.H., Andrews T.D., Fiegler H., Shapero M.H., Carson A.R., Chen W., et al. Global variation in copy number in the human genome. Nature (2006) 444:444–454.[CrossRef][Medline]

  43. Courseaux A., Nahon J.L. Birth of two chimeric genes in the Hominidae lineage. Science (2001) 291:1293–1297.[CrossRef][Web of Science][Medline]

  44. Fortna A., Kim Y., MacLaren E., Marshall K., Hahn G., Meltesen L., Brenton M., Hink R., Burgers S., Hernandez-Boussard T., et al. Lineage-specific gene duplication and loss in human and great ape evolution. PLoS Biol. (2004) 2:E207.[CrossRef][Medline]

  45. Eichler E.E., Hoffman S.M., Adamson A.A., Gordon L.A., McCready P., Lamerdin J.E., Mohrenweiser H.W. Complex beta-satellite repeat structures and the expansion of the zinc finger gene cluster in 19p12. Genome Res. (1998) 8:791–808.[Abstract/Free Full Text]

  46. Vandepoele K., Van Roy N., Staes K., Speleman F., van Roy F. A novel gene family NBPF: intricate structure generated by gene duplications during primate evolution. Mol. Biol. Evol. (2005) 22:2265–2274.[Abstract/Free Full Text]

  47. Popesco M.C., Maclaren E.J., Hopkins J., Dumas L., Cox M., Meltesen L., McGavran L., Wyckoff G.J., Sikela J.M. Human lineage-specific amplification, selection, and neuronal expression of DUF1220 domains. Science (2006) 313:1304–1307.[Abstract/Free Full Text]

  48. McGinnis S., Madden T.L. BLAST: at the core of a powerful and diverse set of sequence analysis tools. Nucleic Acids Res (2004) 32:W20–W25.[Abstract/Free Full Text]

  49. Subramaniam S. The biology workbench—a seamless database and analysis environment for the biologist. Proteins (1998) 32:1–2.[CrossRef][Web of Science][Medline]

  50. Rozen S., Skaletsky H. Primer3 on the WWW for general users and for biologist programmers. Methods Mol. Biol. (2000) 132:365–386.[Medline]

  51. Ventura M., Archidiacono N., Rocchi M. Centromere emergence in evolution. Genome Res. (2001) 11:595–599.[Abstract/Free Full Text]

  52. Lichter P., Tang C.J., Call K., Hermanson G., Evans G.A., Housman D., Ward D.C. High-resolution mapping of human chromosome 11 by in situ hybridization with cosmid clones. Science (1990) 247:64–69.[Abstract/Free Full Text]

  53. Pfaffl M.W. A new mathematical model for relative quantification in real-time RT–PCR. Nucleic Acids Res. (2001) 29:e45.[Abstract/Free Full Text]

  54. Edgar R.C. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics (2004) 5:113.[CrossRef][Medline]

  55. Yang Z. Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution. Mol. Biol. Evol. (1998) 15:568–573.[Abstract]

  56. Wienberg J., Jauch A., Stanyon R., Cremer T. Molecular cytotaxonomy of primates by chromosomal in situ suppression hybridization. Genomics (1990) 8:347–350.[CrossRef][Web of Science][Medline]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Mol Biol EvolHome page
K. Vandepoele, V. Andries, and F. van Roy
The NBPF1 Promoter Has Been Recruited from the Unrelated EVI5 Gene Before Simian Radiation
Mol. Biol. Evol., June 1, 2009; 26(6): 1321 - 1332.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
S. A. Bakar, E. J. Hollox, and J. A. L. Armour
Allelic recombination between distinct genomic locations generates copy number diversity in human {beta}-defensins
PNAS, January 20, 2009; 106(3): 853 - 858.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
M. Freeling, E. Lyons, B. Pedersen, M. Alam, R. Ming, and D. Lisch
Many or most genes in Arabidopsis transposed after the origin of the order Brassicales
Genome Res., December 1, 2008; 18(12): 1924 - 1937.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
E. J. Hollox, J. C.K. Barber, A. J. Brookes, and J. A.L. Armour
Defensins and the dynamic genome: What we can learn from structural variation at human chromosome band 8p23.1
Genome Res., November 1, 2008; 18(11): 1686 - 1697.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
16/21/2572    most recent
ddm209v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (9)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Bosch, N.
Right arrow Articles by Estivill, X.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Bosch, N.
Right arrow Articles by Estivill, X.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?