Human Molecular Genetics Advance Access originally published online on September 22, 2004
Human Molecular Genetics 2004 13(22):2737-2751; doi:10.1093/hmg/ddh301
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Human Molecular Genetics, Vol. 13, No. 22 © Oxford University Press 2004; all rights reserved
Evolution of the tumor suppressor BRCA1 locus in primates: implications for cancer predisposition


1Genetic Information Research Institute, Mountain View, CA, USA and 2Laboratory of Biosystems and Cancer, National Cancer Institute, Bethesda, MD, USA
Received June 7, 2004; Revised September 1, 2004; Accepted September 14, 2004
DDBJ/EMBL/GenBank accession nos
| ABSTRACT |
|---|
|
|
|---|
Germ-line mutations in the BRCA1 gene predispose affected individuals to breast and ovarian cancer syndromes. In an attempt to systematically analyze a broader spectrum of genetic changes ranging from frequent exon deletions and duplications to amino acid replacements and protein truncations, we isolated and characterized full size BRCA1 homologues from a representative group of non-human primates. Our analysis represents the first comprehensive sequence comparison of primate BRCA1 loci and corresponding proteins. The comparison revealed an unusually high proportion of indels in non-coding DNA. The major force driving evolutionary changes in non-coding BRCA1 sequences was Alu-mediated rearrangements, including Alu transpositions and Alu-associated deletions, indicating that structural instability of this locus may be intrinsic in anthropoids. Analysis of the non-synonymous/synonymous ratio in coding portions of the gene revealed the presence of both conserved and rapidly evolving regions in the BRCA1 protein. Previously, a rapidly evolving region with evidence of positive evolutionary selection in human and chimpanzee had been identified only in exon 11. Here, we show that most of the internal BRCA1 sequence is variable between primates and evolved under positive selection. In contrast, the terminal regions of BRCA1, which encode the RING finger and BRCT domains, experienced negative selection, which left them almost identical between the compared primates. Distribution of the reported missense mutations, but not frameshift and nonsense mutations, is positively correlated with BRCA1 protein conservation. Finally, on the basis of protein sequence conservation, we identified missense changes that are likely to compromise BRCA1 function.
| INTRODUCTION |
|---|
|
|
|---|
The BRCA1 gene (MIM 113705) on chromosome 17q21.31 was identified on the basis of its linkage to early onset breast and breastovarian syndromes in women (1,2). The lifetime risk of breast and ovarian cancer among female BRCA1 mutation carriers is 82% and 54%, respectively (3). The BRCA1 gene contains 24 exons (22 coding and 2 non-coding) and covers a span of
90 kb (4). Its coding region comprises
5.6 kb and encodes a protein of 1863 amino acids. Exon 11, with 3427 bp, accounts for 61% of the CDS. The remaining exons are small, ranging from 37 to 311 bp.
BRCA1 has been implicated in a diverse array of biological processes, including the cellular response to DNA damaging agents, transcriptional regulation, ubiquitination and chromosome remodeling (reviewed in 5). Despite extensive studies, the function of the BRCA1 protein remains unclear. At present, over 30 proteins have been identified that bind to it directly, indirectly or as part of a larger multiprotein complex. BRCA1 contains three distinct protein-interacting regions: the RING finger domain, the RAD51 interaction domain and the BRCT domain. The N-terminal RING finger domain encompassing exons 26 has been implicated in interactions with at least five different proteins, including formation of stable heterodimers with BARD1 (reviewed in 6). The RING domain functions in vitro as an E3 ubiquitin ligase where it catalyzes the synthesis of monoubiquitin- and polyubiquitin-targeted proteins. The activity is greatly increased when BRCA1 is in a complex with its N-terminal binding partner BARD1. In vivo, BRCA1 and BARD1 co-localize in a cell-cycle-dependent manner and in response to DNA damage, suggesting a role for BRCA1 ubiquitin conjugation in DNA repair. The central RAD51 interaction domain is located at exon 11. It is involved in DNA repair and contains multiple protein-binding sites, including those for RAD51, the RAD50 complex and MSH2 (5). The C-terminal region was identified in vitro as a transcriptional co-regulator with some specificity for p53 and STAT1 co-activation. The region contains two
90 amino acid sequence repeats called BRCT (BRCA1 C-terminal) (7) that have a weak similarity to other proteins involved in DNA repair, such as the yeast protein RAD9 and the mammalian protein XRCC1. This domain is particularly rich in proteinprotein interaction sites (reviewed in 6), including binding domains for DNA helicase BACH1 (BRCA1-associated C-terminal helicase), which is involved in the repair of double-strand DNA breaks (8).
The Breast Cancer Information Core (BIC; http://research.nhgri.nih.gov/projects/bic/) is a database of more than 8500 mutationsincluding polymorphisms and rare variantsscattered along BRCA1, but only some of them are known to affect BRCA1 function. The vast majority of the disease-associated mutations result in truncated reading frames. The mutations include large genomic deletions and duplications involving one or more BRCA1 exons (9,10) caused mostly by recombination between Alu repeats, which are particularly numerous in BRCA1 (4). Nearly one-third of reported BRCA1 changes are missense mutations, and the functional consequence of most of them is uncertain. The reported disease-associated mutations are concentrated at the terminal RING and BRCT domains. Although those domains encompass only 13% of the entire protein, they house more than 90% of the missense changes known to be deleterious (1113). It should be noted, however, that the sequence variants in the BIC database are based on voluntary submissions and do not represent an unbiased set of BRCA1 mutations.
Predictions regarding missense changes can be strengthened by comparative evolutionary analysis. Such analysis may be particularly helpful in the identification of low-penetrance missense changes in functionally important regions. Phylogenetic approaches can also determine whether certain residues have evolved more rapidly than predicted by a neutral theory, reflecting the action of positive (diversifying) selection. So far, complete CDS of BRCA1 are available for only a few vertebrate species. A recent analysis of partial BRCA1 exon 11 sequences in various mammalian species allowed the prediction of several missense mutations that would be more likely to affect BRCA1 function (14,15). Comparison of exon 11 sequences in non-human primates also revealed that the RAD51 interaction domain experienced strong positive selection during human evolution (16).
Here, we describe the isolation of genomic clones containing the entire BRCA1 gene from chimpanzee, gorilla, orangutan and rhesus macaque. Comparison of the homologues allowed us to follow evolutionary changes in coding and non-coding regions of the BRCA1 gene in primates and to extend the number of predicted amino acid changes that would affect gene function.
| RESULTS |
|---|
|
|
|---|
Genomic organization of the BRCA1 genes in primates
We isolated BRCA1 gene homologues from chimpanzee, gorilla, orangutan and rhesus macaque by transformation-associated recombination (TAR) cloning (Supplementary Material, Fig. S1) (17). The targeting hooks were developed from a promoter sequence and the 3'-untranslated region of human BRCA1. The vector allowed us to isolate the entire BRCA1 gene as
95 kb genomic fragments. The yield of BRCA1-positive clones from the four primate DNAs was approximately the same as that from human DNA (
1%). We isolated at least three independent genomic clones for each species. We converted one randomly chosen TAR clone from each species into a BAC and sequenced it with high accuracy using BAC DNA as a template. The overall BRCA1 structure was conserved in all five primates (Fig. 1). A conserved promoter region containing CpG islands was followed by 24 exons with conserved exonintron boundaries. Multiple alignment of the genes revealed an unusually high proportion of insertions and deletions (indels) (Table 1; Fig. 1B and G). Pairwise identity in the aligned segments (with indels excluded) ranged from 93 to 99% for humanchimpanzee. The identity dropped to 7273% (between macaque and hominoids) when indels were included (Table 1). Many of the rearrangements were linked to the activity of Alu repeats (see later).
|
|
The promoter region and the 5' end of the human BRCA1 gene is duplicated
40 kb upstream, and homologous recombination in this segment occasionally causes deletion of the promoter region (18,19). We analyzed polymorphisms in human genomic clones containing the duplicated segments that were available in sequence databases for evidence of gene conversion (i.e. correspondence between the 5' BRCA1 pseudogene and the functional copy) between the segments, but did not detect any (data not shown). Nor did analysis of BIC mutations detect any hallmark of gene conversion (data not shown).
Alu repeats shape BRCA1 genes in primates
Most of the detected long indels appear to be associated with Alu sequences. Alu elements are
280 bp long, and in order to detect possible Alu-mediated rearrangements we concentrated on 45 indels
250 bp. The majority of the long rearrangements took place in the lineage leading to hominoid primates and in the macaque branch (Fig. 1B; Supplementary Material, Table S1). The ancestral orangutangorillachimpanzeehuman lineage (2514 million years agoMYA; 20) accumulated nine Alu insertions (six from the AluY and three from the AluS subfamilies), resulting in a gain of 2755 bp (Alu insertions and duplications of the target sites). The same lineage exhibited two deletions not found in the rhesus macaque: one deletion (655 bp) was caused by homologous AluAlu recombination, the other (317 bp) probably by non-homologous recombination. The lineage leading to rhesus macaque was particularly rich in indel variety. We detected two deletions caused by homologous AluAlu recombination and two caused by non-homologous recombination, leading to a total loss of 3647 bp in the primate consensus sequence. There were 23 macaque-specific Alu insertions (19 from AluY, three from AluS subfamilies and one short Alu fragment) that together with target-site duplication contributed 7547 bp. Indel variation, and especially Alu retrotranspositions, ceased in recent hominoid lineages (Fig. 1B). After separation of the orangutan, there were three deletions in the lineage leading to human, chimpanzee and gorilla (147 MYA; 20); one (671 bp) was caused by homologous AluAlu recombination and two (519 and 1279 bp) were probably caused by non-homologous recombination. There was one 5353 bp deletion in chimpanzee caused by AluAlu recombination and one gorilla-specific AluYc1 insertion. There was also an orangutan-specific 280 bp deletion in the non-repetitive DNA. In addition, the last intron contained a cluster of Alu sequences that was unstable, because no sequence pairs share the same indel pattern.
To summarize, all 33 insertions >250 bp were caused by Alu recombinations. Alu sequences also contributed significantly to long (>250 bp) deletions. Five deletions were caused by homologous AluAlu recombination and six by non-homologous recombination. Distribution of the long deletions along the BRCA1 genes was not random (Fig. 1B); large deletions were, as expected, found in the largest introns. Alu insertions were more dispersed. However, there appears to be a hotspot for retroposition, with eight independent Alu insertions at positions around 810.5 kb (intron 3). In conclusion, our analysis shows that Alus were the main force shaping primate BRCA1 genes.
Most Alu repeats involved in disease-associated genomic rearrangements are retained in non-human primates
The human BRCA1 gene contains 129 Alu elements, which is equivalent to
42% of the sequence or
1 per 0.7 kb (4; Figs 1F and 2). This high density of Alus appears to be the main source of large genomic rearrangements identified in patients with a hereditary predisposition to breast and ovarian cancers (9,10,2127). So far, 19 different types of Alu-associated germ-line BRCA1 rearrangements ranging in size from 0.5 to 23.8 kb have been described in the literature (Supplementary Material, Table S2; Fig. 2).
|
Analysis of the junction regions revealed that at least 26 different Alu elements are involved in the rearrangements (Supplementary Material, Table S2; Fig. 2). We found these high-risk elements to be remarkably stable in hominoid primates, having been conserved in chimpanzee, gorilla, orangutan and rhesus macaque. Whereas in rhesus macaque a high-risk Alu92 fused with Alu93, and a high-risk Alu99 fused with Alu104 deleting another high-risk Alu101 and an Alu102 element, there was no loss of dangerous Alus in the hominoid primates. Only one of seven Alu insertions in the hominoid lineage, Alu10, was linked with the disease-associated rearrangements. Alu10 inserted after macaque separated from the other lineages and was involved in a deletion in the BRCA1 promoter region that resulted from its recombination with an upstream Alu repeat (28).
Structure and evolution of the BRCA1 CDS and protein
We analyzed the BRCA1 CDS from five primate species (Fig. 3). All of them encode proteins that comprise 1863 amino acids. The rhesus macaque protein has an in-frame 3 bp deletion in exon 11 that resulted in the loss of serine 287 and one 3 bp insertion in exon 11 that inserted isoleucine 1020. Many of the base substitutions in primate homologues (70/300 or 23% of all variable positions) involved CpG.
|
Table 2 shows pairwise identities in the BRCA1 CDS. As expected, coding regions tend to be conserved more than full-length genes with promoters and introns. For proteins (Table 3), humanchimpanzee identity was 98.22% and humangorilla identity was 98.01%. The greatest identity (98.65%) was found for gorilla and chimpanzee. At the same time, primate BRCA1s are relatively distinct from the mammalian examples shown in Table 3, and the chicken protein shows so little similarity to the mammalian protein that in many places we could not align the two.
|
|
DNA and protein conservation profiles fluctuate significantly along the sequence lengths (Fig. 3F). The terminal RING and BRCT domains are the most conserved, and the central parts are variable. We detected a similar pattern of conservation when we compared the human BRCA1 protein with the canine, rat, murine and chicken orthologues (2931; Supplementary Material, Figs S2 and S3).
Analysis of substitutions in individual primate branches revealed an increased non-synonymous/synonymous ratio (
=Ka/Ks) in the human and chimpanzee lineages (Fig. 4), indicating positive selection (adaptive evolution). Positive selection was particularly strong in exon 11, as has been previously shown (16,32), but it was also detectable in exons 1216 (Fig. 3D). This segment contains three human-specific non-synonymous substitutions (one non-conservative) and no synonymous ones. The chimpanzee branch exhibits conservative replacement of two residues in the same region. In both terminal lineages, non-conservative amino acid changes appear primarily in the first half of the BRCA1 protein (Fig. 3D, red bars), whereas conservative changes appear primarily in the second half. Terminal segments, on the other hand, have been under conservative negative selection (Fig. 3DF). Indeed, both DNA and protein sequences conservation are significantly negatively correlated with the
ratio, the correlation is 0.399 in the case of DNA identity (P=0.040; Spearman's rank coefficient) and 0.663 (P<0.001) for protein identity (Supplementary Material, Table S3). The non-synonymous rate significantly varies along the CDS, however, detailed analysis of the codon adaptation index (CAI) rules out the possibility that this variation is driven by selection on optimal codon usage (Fig. 3E; Supplementary Material, Table S3).
|
Conservation of specific structures in the RING and BRCT domains and sites of phosphorylation
The majority of the known cancer-causing BRCA1 missense mutations are localized in the RING finger and BRCT domains (33 and references therein). Using the available crystal structure of the domains (12,34), we compared primate BRCA1 with known BRCA1 proteins to investigate in detail the interspecies conservation within the RING and BRCT domains.
The RING domain found in various proteins is characterized by a conserved pattern of eight cysteine and histidine residues forming a pair of Zn2+-binding sites (I and II). The BRCA1 RING domain is, as expected, strongly conserved within those sites (Fig. 5). In addition, the regions close to the active sitesthe central
-helix, a ß-strand and adjacent segmentsare strictly conserved, not only among the analyzed primates but also in xenopus, chicken, dog, mouse and rat. In primates, the few replacements observed were limited mostly to the long N- and C-terminal
-helices. Surprisingly, in vitro mutations in the site II domain do not disrupt the conformation needed for its proper dimerization with its hererodimeric partner BARD1, suggesting that the main function of the conserved residues near sites I and II is interaction with other proteins, such as the ubiquitin conjugation enzymes (35).
|
The C-terminus of BRCA1 has a more complex structure, consisting of two BRCT repeats connected by a 23 amino acid linker (7,34; Fig. 6). The peripheral regions harbor the majority of variable sites in the N-terminal repeat. The inner region consists of three highly conserved structural motifssheets ß34 and helix
2. The linker region is variable except for the alpha helix
L. The C-terminal repeats are relatively more flexible, but sheet ß2' and some neighboring residues are highly conserved. Interestingly, the above-mentioned conserved region overlaps the BACH1 helicase interaction regions (36). The majority of replacements occurred between mammal and other vertebrates; changes were nearly absent in primates. Gaps were not allowed except at the C-end of the linker and the most C-terminal part of the BRCA1 proteins (Fig. 6).
|
The two BRCT repeats in BRCA1 interact through three
-helices
2 from the N-terminus and
1' and
3' from the C-terminal repeat. Similar tandem BRCT repeats are common in other proteins, such as 53PB1, RAD9, RAD4 and DNA ligase IV (34). The sequence alignment between the proteins shows conservation of
1' and
3' helices and, to lesser extent, in
2'. On the other hand, comparative analysis of several BRCA1 proteins reveals that the
2' helix is the most conserved of all helices. Both
1' and
3' contain several changes, including non-conservative replacements. BRCA1 is phosphorylated at multiple sites, mainly on serine and to a lesser extent on tyrosine residues (37). Several phosphorylation sites, modified by at least three different kinases have been identified so far in human BRCA1 (Fig. 3G, bottom). All these positions are invariant in primates. Serine 988, 1280, 1387, 1457 and 1524 and tyrosine 1394 are invariant in canine, rat and mouse. Serine 1143 is replaced by phenylalanine in rat and mouse; serine 1423 is replaced by asparagine in mouse. On the other hand, 57% (128/224) of serine residues as well as 74% (23/31) of tyrosine residues were variable in mammals.
Human mutations and sequence conservation
We used the April 2004 version of the BIC database to collect BRCA1 mutations. The set included 8588 mutations scattered over 1090 mostly exonic nucleotide positions (885 protein positions); 32% were missense, 54% were frameshift and 12% were nonsense mutations; other categories, such as synonymous mutations, splice variants or large indels, were less frequent. Out of 1246 distinct (non-redundant) protein mutations, 38% (473) were missense, 40% (502) frameshift and 14% (176) nonsense. To reduce the bias caused by a high proportion of founder mutations in the BIC, in the following parts we concentrated mostly on non-redundant sites with known mutations rather than on total number of mutations.
We investigated whether CpG-induced changes were overrepresented in the BIC mutation database. The human BRCA1 CDS contains 43 CpG dinucleotides (86 bp). We found 54 non-redundant DNA mutations within the CpGs (62.8% of CpG positions; Fig. 3C). The BRCA1 CDS has 5503 non-CpG positions (559886 CpG sites) that have mutated in 1016 different places (18.5% of positions). Thus, the mutation frequency at CpG positions is greatly increased (P<0.0001; chi-square test). Most of the mutations were missense mutations (46 mutations in 86 CpG sites, 411 mutations in 5503 non-CpG sites; P<0.0001; chi-square test). In addition, comparison of the total (redundant) number of mutations revealed a significant bias towards CpGs sites (data not shown). Although the observed frequencies of CpG mutations could be influenced by biased submission into the BIC, our results indicate that the CpG dinucleotides present an increased risk of mutation, confirming the preliminary results obtained by Rodenhiser et al. (38), who analyzed the relatively small number of mutations available in the BIC at that time.
Figure 7C shows that the density of sites harboring at least one mutation as well as the distribution number of mutations are non-random, and there are two peaks in the N- and C- terminal domains. A separate analysis of missense, nonsense and frameshift mutation sites (Fig. 7D) revealed that the pattern is the strongest for missense mutations, and the density of positions with missense mutations was quite similar to the conservation profiles in Figure 3F; Supplementary Material, Figures S2 and S3. A correlation analysis (Supplementary Material, Table S4) showed that the density of missense sites correlates positively with DNA identity in primates and protein conservation in both primates and mammals. The obvious interpretation is that most mutations are not tolerated in high conservation areas and thus their abundance in the database reflects a detection bias for deleterious replacements. Indeed, BIC missense mutations tend to be non-conserved in the conserved regions compared with the flexible regions (Fig. 7E; Supplementary Material, Table S4), strongly supporting detection bias.
|
The distribution of frameshift and nonsense mutations, on the other hand, is independent of sequence conservation (Fig. 7D; Supplementary Material, Table S4). Therefore, it seems that frameshift and nonsense mutations have a similar phenotype. Presumably, the nonsense-mediated mRNA decay pathway (39,40) prevents the production of such truncated proteins. A disproportionate representation of founder mutations (such as those found in Ashkenazi Jewish individuals) in the BIC was partially eliminated by our focus on mutation sites and not on the actual number of mutations. Although some BIC bias may affect our results on frameshift and nonsense mutations, it is unlikely that it would strongly affect our conclusions on the distribution of missense mutations, because a typical (random-like) noise should decrease, not increase, the statistical significance of the results.
Prediction of deleterious missense mutations
We applied two related computer-based methods for prediction of deleterious missense mutations: (i) predictions using the SIFT program (41), and (ii) the ancestral sequence (AS) method (14). Both methods use evolutionary conservation of the BRCA1 protein to predict deleterious changes. Only highly variable positions are expected to tolerate non-conservative mutations (as estimated by a protein substitution matrix), whereas conserved positions are expected to tolerate only replacements that have similar physicochemical properties (conservative changes). The primate alignment contains a small number of replacements and thus cannot be used efficiently to distinguish polymorphic changes. Including chicken BRCA1 in the comparison, on the other hand, lowers alignment quality in many places and is likely to produce a misleading conservation profile and thus bias the statistics. The mammalian alignment seems to provide the best balance, and indeed, its predictive value was clearly superior (Supplementary Material, Table S4). We therefore used five primate proteins and a canine, rat and mouse protein to predict the effect of mutations on human BRCA1. For all 473 different missense mutations reported in the BIC database, we predicted the effect on protein function and thus the predisposition to an increased risk of cancers (Supplementary Material, Table S5). In addition, using SIFT, we obtained a complete matrix of all possible replacements (even unreported ones) in the BRCA1 protein and an estimate of their deleterious effect (Supplementary Material, Table S6).
Tables 4 and 5 summarize the results. The proportion of BIC missense mutations predicted to be tolerated was about 38% by the SIFT method and 45% by the AS method. The other mutations would be expected to affect protein function and therefore predispose to breast and ovarian cancers. Both methods correctly predicted that a very small fraction of replacements would be tolerated in the area coding the conserved terminal RING and BRCT domains. It has been estimated that <10% of missense mutations are tolerated in these segments (1113); our analysis predicted 215%. The predicted deleterious mutations listed in Supplementary Material, Table S5 would be a good set for correlations of phenotype with missense mutations in breastovarian cancer families and for studying the effects of the mutations on BRCA1 structure and function.
|
|
| DISCUSSION |
|---|
|
|
|---|
This work represents the first systematic study of evolutionary changes in the entire BRCA1 locus in non-human primates. Our comparative analysis of BRCA1 genes was simplified by the use of TAR cloning, a technique that allows the direct isolation of gene homologues (17,42). Using this cloning strategy, we isolated the genomic BRCA1 clones containing 5', 3' and all intron sequences from chimpanzee, gorilla, orangutan and rhesus macaque. This allowed us to develop sequence data that are either not present in the sequence databases or present as poor quality draft sequences.
Interspecies comparisons revealed the existence of an unexpectedly high number of indels. The proportion of BRCA1 indels was approximately three times higher than previously observed for full-length ASPM genes from the same primates (42). Most long indels were associated with Alu sequences. The majority of Alu insertions took place in the ancestral lineage leading to hominoid primates after the split of Hominidae (2514 MYA) and the rhesus macaque branch; more recent hominoid lineages acquired mostly deletions. As no significant rearrangements involving other repetitive sequences were observed, we concluded that Alu repeats were the main contributors to evolution of the BRCA1 non-CDS, and that the BRCA1 gene represents a genomic hotspot for both retroposition and recombination of Alu repeats.
Our analysis of BRCA1 gene homologues revealed that most Alu elements involved in genomic rearrangements in humans are retained in non-human primates. The fact that the high-risk repeats were not eliminated by selection during primate evolution suggests a role in gene expression. Alternatively, ineffective selection against late-onset diseases may explain the tolerance of many dangerous Alu repeats in ancestral lineages.
Recombination between Alu repeats is an important contributor to genetic disorders (reviewed in 43). Genomic rearrangements may account for up to 30% of all BRCA1 mutations identified in breast cancer families (9,10,2127). Given the high frequency of germ-line Alu-mediated BRCA1 rearrangements, it would not be surprising if Alus also contribute to at least some cases of sporadic breast and ovarian cancers by stimulating somatic recombinations, as has been recently suggested (44).
Using partial BRCA1 CDS derived from exon 11, Huttley et al. (16) showed that the RAD51-interacting domain evolved under positive selection in human and chimpanzee. Comparison of primate BRCA1 proteins has shown that the positive selection was not restricted to the RAD51-interacting domain but extended to most of the protein sequence, including the part encoded in exons 1216. The terminal parts of BRCA1 encoding the RING and BRCT domains experienced strong conservation both in human and non-human primate lineages as was previously reported for other vertebrates (2931). Such a mosaic of positive and negative selection has been previously described for other proteins (42,45,46).
Our analysis revealed that the most conserved sequences form specific tertiary structures. In the RING domain, the most conserved residues were closely packed around the Zn2+-binding sites. We speculate that these Zn2+-binding sites interact with ubiquitin conjugation enzymes (35). Surprisingly, the most conserved part of the BRCT domain is not at an interface where the two BRCT repeats interact, but is found mostly around the inner part of the N-terminal repeat. It co-localizes with the critical residues for BRCA1BACH1 interaction (36). BACH1 functions with BRCA1 as a mediator of double-strand break repair, and deleterious BACH1 mutations seem to predispose affected individuals to early-onset breast cancer (8,47). On the basis of interspecies BRCA1 comparisons, the majority, if not all, of the mutations in the RING Zn2+-binding region and in the most conserved regions of the BRCT domain represent a group of mutations that strongly predispose to breast and ovarian cancers. Similarly, the strong conservation of certain phosphorylation sites indicates a critical role for them in protein function and therefore suggests that altered residues may result in cancer predisposition.
Amino acid substitutions resulting from nonsense, missense and frameshift mutations in the BIC database for BRCA1 were unevenly distributed along the protein. Although the frameshift and nonsense mutants did not exhibit any specific clustering, the frequency of missense mutants correlated positively with BRCA1 conservation. This strongly suggests that the clustering of missense mutants within conservative regions is driven by different phenotypic manifestations in the conserved and variable regions, and therefore by a detection bias for deleterious mutations.
Most of the 473 independent missense mutations reported in the BIC prior to April 2004 play an unknown role in breast cancer susceptibility. The effect of the mutations has been difficult to characterize because the function of some regions of the BRCA1 protein is poorly understood. Recently, interspecies comparisons have been made in an attempt to predict the role of missense changes in breast cancer susceptibility (14,15). By aligning exon 11 sequences from 57 eutherian and 8 marsupial mammals and categorizing amino acid sites by the degree of conservation, investigators identified 21 missense mutations that are likely to influence gene function and thereby contribute to cancer susceptibility. In our work, we applied a similar approach to analyze complete sequences of eight BRCA1 homologues. Our prediction yielded 5562% deleterious missense mutations in the BIC, including those identified previously (14,15).
Our analysis is likely to overpredict deleterious mutations owing to the small number of sequences we used for comparison. At the same time, the use of BRCA1 sequences from more distant species might disguise some deleterious mutations as a result of fixation of mutations that are deleterious in humans (48). These cases can be explained either by compensatory effects of other mutations or by relaxed selection of late-onset phenotypes in the distantly related species (49,50). Therefore, adding the BRCA1 protein sequences from other primate species to the analysis may produce better estimates of mutation effects. While we will sequence other primate genes in future work, the predicted deleterious missense mutations in Supplementary Material (Tables S5 and S6) may be helpful for further detailed analyses of phenotypic correlation of missense mutations in breastovarian cancer families.
In conclusion, comparison of primate BRCA1 gene homologues allowed us to reconstruct an evolutionary history of the entire BRCA1 locus. The impact of Alu repeats, CpG dinucleotides and a mixture of positive selection and conservation of the CDS were the main factors that shaped BRCA1 evolution in primates. Interspecies sequence comparisons also provided a basis for the identification of conservative amino acid residues in BRCA1 and for the prediction of missense changes that compromise BRCA1 function. Missense mutations that confer the highest predisposition to breast and ovarian cancers are located in the evolutionarily conserved regions, phosphorylated residues and especially in specific protein-binding domains. Genomic clones of BRCA1 homologues with regulatory elements may also be used for comparative gene expression studies to identify the role of intron regions in gene regulation.
| MATERIALS AND METHODS |
|---|
|
|
|---|
TAR cloning of BRCA1 gene homologues by in vivo recombination in yeast
To isolate the full BRCA1 gene from the chimpanzee (Pan troglodytes), gorilla (Gorilla gorilla), orangutan (Pongo pygmaeus) and rhesus macaque (Macaca mulatta) genomes, we used TAR vector pVC-BC1 containing 5' and 3' targeting sequences (hooks) of the human BRCA1 gene (51). The 5' targeting sequence of 769 bp corresponds to positions 21472915 in the genomic sequence L78833 (GI: 1698398) and the 3' targeting sequence of 325 bp corresponds to positions 82 93683 260 in the genomic sequence L78833 (GI: 1698398). We PCR-amplified the 5' BRCA1 sequence from genomic DNA using the primer pair BC1 (5'-CTCGAGGTCACTAAAACGAT-3') and BC2 (5'-GAATTCCAGCATGCGTTGCGG-3'). We PCR-amplified the 3' BRCA1 sequence from genomic DNA using the primer pair BC3 (5'-GAATTCCAATTGGGCAGATGTGT-3') and BC4 (5'-GGATCCAAGGGAGACTTCAAG-3'). We cloned the PCR products into a polylinker of a basic TAR vector as XhoIEcoRI and EcoRIBamHI fragments. Before performing the transformation experiments, we linearized the TAR cloning vector with EcoRI to release targeting hooks. We prepared genomic DNA samples from chimpanzee, gorilla, orangutan and rhesus macaque fibroblast culture cell lines (Coriell Institute for Medical Research) on agarose plugs. For transformations, we used the highly transformable Saccharomyces cerevisiae strain VL648 (MAT
, his3
200, trp1
1, ura352, lys2, ade2101, met14), which has HIS3 deleted (52). Spheroplast transformation experiments were carried out as previously described (52). The yield of transformants per µg vector, 2 µg genomic DNA, and 5x108 spheroplasts was 310 colonies. We obtained approximately 300 His+ transformants for each species. To identify clones positive for BRCA1, we examined yeast transformants by PCR using diagnostic primers 11F (5'-CTCAGTTCAGAGGCAACGAA-3') and 11R (5'-GGAGCCCACTTCATTAGTAC-3') specific for BRCA1 exon 11. These primers amplify a 302 bp sequence by PCR. We isolated yeast genomic DNA from individual transformants or pools and PCR-amplified them as previously described (52). The yield of BRCA1-positive clones from human, chimpanzee, gorilla, orangutan and rhesus macaque genomic DNAs was
1%. To confirm that the copies were complete, we PCR-analyzed three independent TAR YAC isolates for each species using a set of primers specific for each of the 24 exons (1). We obtained the same size PCR products for each species isolate with each primer pair. Finally, we examined Alu profiles of the YACs after TaqI digestion and found that they were indistinguishable (data not shown). From these studies we concluded that we had isolated non-arranged genomic copies of all the BRCA1 gene homologues. We retrofitted the individual BRCA1 YACs corresponding to each species into BACs by homologous recombination in yeast using BAC/NeoR retrofitting vector BRV1 and used them to transform a recA DH10B Escherichia coli strain (52). Before sequencing, we confirmed the integrity of the inserts in BACs by NotI, HindIII, EcoRI and PstI digestion. Chimpanzee, gorilla, orangutan and rhesus macaque TAR clones containing full-size BRCA1 genes were directly sequenced from BAC DNAs (53). Identical Alu-profiles of independent TAR isolates were considered as a conformation of indels. In addition, some indels were confirmed by PCR amplification from genomic DNAs (data not shown). All sequences were named and numbered according to the clone/accession identifier. Sequences were deposited into GenBank under accessions AY365046, AY589040, AY589041 and AY589042.
Sequence analysis
We aligned genomic sequences using MAVID (54; http://baboon.math.berkeley.edu/mavid/) and proteins and protein-coding DNA sequences using Dialign2.1 (55; http://bibiserv.techfak.uni-bielefeld.de/dialign/). We edited alignments manually with the Seaview editor (56; http://pbil.univ-lyon1.fr/software/seaview.html). For prediction of CpG islands we used cpgplot (EMBOSS (57) http://www.hgmp.mrc.ac.uk/Software/EMBOSS/) with the default parameters (length
200; CpG/GpC
0.6; GC
0.5). We determined the CAI by cai (EMBOSS) with a human codon use library. Censor (58; http://www.girinst.org/Censor_ServerData_Entry_Forms.html), RepeatMasker (A.F.A. Smit and P. Green unpublished data; http://www.repeatmasker.org/), Repbase Update libraries (59; http://www.girinst.org/Repbase_Update.html) and TandemRepeatFinder (60; http://tandem.bu.edu/trf/trf.html) were used to identify repetitive elements. The segmental duplication of the 5' BRCA1 region was localized by local BLAT searches (61). Human single nucleotide polymorphism (SNP) data were extracted from dbSNP (http://www.ncbi.nlm.nih.gov/SNP/). To avoid possible paralogous sequence variants from the 5' segmental duplication, we extracted all chromosome 17 SNPs and localized them on chromosome 17 using local BLAT (61) searches; only SNPs with best hits within the BRCA1 gene were considered.
We used SNAP (http://www.hiv.lanl.gov/content/hiv-db/SNAP/WEBSNAP/SNAP.html) to detect synonymous and non-synonymous substitutions (62). Gonnet PAM250 matrix (63) was applied to classify substitutions and human mutations as conservative or non-conservative. We considered changes as conservative if the score was >0.5 (14). Human BRCA1 mutations were downloaded from the BIC (64). We used SIFT (41; http://blocks.fhcrc.org/sift/SIFT.html) to predict deleterious missense mutations. Protein structures were visualized in PyMOL (DeLano Scientific, San Carlos, CA; http://www.pymol.org).
We applied the codon maximum-likelihood method in codeml in PAML v. 3.13 (65; http://abacus.gene.ucl.ac.uk/software/paml.html) for reconstruction of phylogenetic trees, ancestral sequences and detection of positive selection. Branch lengths and ancestral sequences were reconstructed using a free ratio model for individual branches. Phylogenetic trees were drawn in TREEVIEW (66).
| SUPPLEMENTARY MATERIAL |
|---|
|
|
|---|
Supplementary Material is available at HMG Online.
| ACKNOWLEDGEMENTS |
|---|
We thank Marco Montagna, Tom Scholl and Sylvie Mazoyer for providing data on Alu-associated genomic rearrangements in BRCA1, Andrew Gentles for corrections and Miriam Bloom (SciWrite Biomedical Writing and Editing Services) for professional editing. This work was supported in part by National Institutes of Health Grant 2 P41 LM 06252-04A1 (J.J.).
| FOOTNOTES |
|---|
* To whom correspondence should be addressed. Tel: +1 3014967941; Fax: +1 3014802772; Email: larionov{at}mail.nih.gov
The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors. ![]()
AY365046 and AY589040AY589042. ![]()
| REFERENCES |
|---|
|
|
|---|
-
Futreal, P.A., Liu, Q., Shattuck-Eidens, D., Cochran, C., Harshman, K., Tavtigian, S., Bennett, L.M., Haugen-Strano, A., Swensen, J., Miki, Y. et al. (1994) BRCA1 mutations in primary breast and ovarian carcinomas. Science, 266, 120122.
[Abstract/Free Full Text] -
Miki, Y., Swensen, J., Shattuck-Eidens, D., Futreal, P.A., Harshman, K., Tavtigian, S., Liu, Q., Cochran, C., Bennett, L.M., Ding, W. et al. (1994) A strong candidate for the breast and ovarian cancer susceptibility gene BRCA1. Science, 266, 6671.
[Abstract/Free Full Text] -
King, M.C., Marks, J.H. and Mandell, J.B. for The New York Breast Cancer Study Group. (2003) Breast and ovarian cancer risks due to inherited mutations in BRCA1 and BRCA2. Science, 302, 643646.
[Abstract/Free Full Text] -
Smith, T.M., Lee, M.K., Szabo, C.I., Jerome, N., McEuen, M., Taylor, M., Hood, L. and King, M.C. (1996) Complete genomic sequence and analysis of 117 kb of human DNA containing the gene BRCA1. Genome Res., 6, 10291049.
[Abstract/Free Full Text] - Rosen, E.M., Fan, S., Pestell, R.G. and Goldberg, I.D. (2003) BRCA1 gene in breast cancer. J. Cell Physiol., 196, 1941.[CrossRef][ISI][Medline]
-
Welcsh, P.L. and King, M.C. (2001) BRCA1 and BRCA2 and the genetics of breast and ovarian cancer. Hum. Mol. Genet., 10, 705713.
[Abstract/Free Full Text] - Koonin, E.V., Altschul, S.F. and Bork, P. (1996) BRCA1 protein products... Functional motifs... Nat. Genet., 13, 266268.[CrossRef][ISI][Medline]
- Cantor, S.B., Bell, D.W., Ganesan, S., Kass, E.M., Drapkin, R., Grossman, S., Wahrer, D.C., Sgroi, D.C., Lane, W.S., Haber, D.A. and Livingston, D.M. (2001) BACH1, a novel helicase-like protein, interacts directly with BRCA1 and contributes to its DNA repair function. Cell, 105, 149160.[CrossRef][ISI][Medline]
-
Puget, N., Torchard, D., Serova-Sinilnikova, O.M., Lynch, H.T., Feunteun, J., Lenoir, G.M. and Mazoyer, S. (1997) A 1-kb Alu-mediated germ-line deletion removing BRCA1 exon 17. Cancer Res., 57, 828831.
[Abstract/Free Full Text] - Petrij-Bosch, A., Peelen, T., van Vliet, M., van Eijk, R., Olmer, R., Drusedau, M., Hogervorst, F.B., Hageman, S., Arts, P.J., Ligtenberg, M.J. et al. (1997) BRCA1 genomic deletions are major founder mutations in Dutch breast cancer patients. Nat. Genet., 17, 341345.[CrossRef][ISI][Medline]
-
Monteiro, A.N., August, A. and Hanafusa, H. (1996) Evidence for a transcriptional activation function of BRCA1 C-terminal region. Proc. Natl Acad. Sci. USA, 93, 1359513599.
[Abstract/Free Full Text] - Brzovic, P.S., Rajagopal, P., Hoyt, D.W., King, M.C. and Klevit, R.E. (2001) Structure of a BRCA1BARD1 heterodimeric RINGRING complex. Nat. Struct. Biol., 8, 833837.[CrossRef][ISI][Medline]
-
VallonChristersson, J., Cayanan, C., Haraldsson, K., Loman, N., Bergthorsson, J.T., Brondum-Nielsen, K., Gerdes, A.M., Moller, P., Kristoffersson, U. et al. (2001) Functional analysis of BRCA1 C-terminal missense mutations identified in breast and ovarian cancer families. Hum. Mol. Genet., 10, 353360.
[Abstract/Free Full Text] - Fleming, M.A., Potter, J.D., Ramirez, C.J., Ostrander, G.K. and Ostrander, E.A. (2003) Understanding missense mutations in the BRCA1 gene: an evolutionary approach. Proc. Natl Acad. Sci. USA, 100, 1151-1156.
- Ramirez, C.J., Fleming, M.A., Potter, J.D., Ostrander, G.K. and Ostrander, E.A. (2004) Marsupial BRCA1: conserved regions in mammals and the potential effect of missense changes. Oncogene, 23, 17801788.[CrossRef][ISI][Medline]
- Huttley, G.A., Easteal, S., Southey, M.C., Tesoriero, A., Giles, G.G., McCredie, M.R., Hopper, J.L. and Venter, D.J. (2000) Adaptive evolution of the tumour suppressor BRCA1 in humans and chimpanzees. Australian breast cancer family study. Nat. Genet., 25, 410413.[CrossRef][ISI][Medline]
- Kouprina, N. and Larionov, V. (2003) Exploiting the yeast Saccharomyces cerevisiae for the study of the organization and evolution of complex genomes. FEMS Microbiol. Rev., 27, 629649.[CrossRef][ISI][Medline]
- Brown, M.A., Lo, L.J., Catteau, A., Xu, C.F., Lindeman, G.J., Hodgson, S. and Solomon, E. (2002) Germline BRCA1 promoter deletions in UK and Australian familial breast cancer patients: Identification of a novel deletion consistent with BRCA1:psiBRCA1 recombination. Hum. Mutat., 19, 435442.[CrossRef][ISI][Medline]
- Puget, N., Gad, S., Perrin-Vidoz, L., Sinilnikova, O.M., Stoppa-Lyonnet, D., Lenoir, G.M. and Mazoyer, S. (2002) Distinct BRCA1 rearrangements involving the BRCA1 pseudogene suggest the existence of a recombination hot spot. Am. J. Hum. Genet., 70, 858865.[CrossRef][ISI][Medline]
- Goodman M. (1999) The genomic record of Humankind's evolutionary roots. Am. J. Hum. Genet., 64, 3139.[CrossRef][ISI][Medline]
- Montagna, M., Santacatterina, M., Torri, A., Menin, C., Zullato, D., Chieco-Bianchi, L. and D'Andrea, E. (1999) Identification of a 3 kb Alu-mediated BRCA1 gene rearrangement in two breast/ovarian cancer families. Oncogene, 18, 41604165.[CrossRef][ISI][Medline]
- Puget, N., Sinilnikova, O.M., Stoppa-Lyonnet, D., Audoynaud, C., Pages, S., Lynch, H.T., Goldgar, D., Lenoir, G.M. and Mazoyer, S. (1999) An Alu-mediated 6-kb duplication in the BRCA1 gene: a new founder mutation? Am. J. Hum. Genet., 64, 300302.[CrossRef][ISI][Medline]
-
Puget, N., Stoppa-Lyonnet, D., Sinilnikova, O.M., Page, S., Lynch, H.T., Lenoir, G.M. and Mazoyer, S. (1999) Screening for germ-line rearrangements and regulatory mutations in BRCA1 led to the identification of four new deletions. Cancer Res., 59, 455461.
[Abstract/Free Full Text] - Rohlfs, E.M., Chung, C.H., Yang, Q., Skrzynia, C., Grody, W.W., Graham, M.L. and Silverman, L.M. (2000) In-frame deletions of BRCA1 may define critical functional domains. Hum. Genet., 107, 385390.[CrossRef][ISI][Medline]
- Rohlfs, E.M., Puget, N., Graham, M.L., Weber, B.L., Garber, J.E., Skrzynia, C., Halperin, J.L., Lenoir, G.M., Silverman, L.M. and Mazoyer, S. (2000) An Alu-mediated 7.1 kb deletion of BRCA1 exons 8 and 9 in breast and ovarian cancer families that results in alternative splicing of exon 10. Genes Chromosomes Cancer, 28, 300307.[CrossRef][ISI][Medline]
-
Montagna, M., Dalla Palma, M., Menin, C., Agata, S., De Nicolo, A., Chieco-Bianchi, L. and D'Andrea, E. (2003) Genomic rearrangements account for more than one-third of the BRCA1 mutations in northern Italian breast/ovarian cancer families. Hum. Mol. Genet., 12, 10551061.
[Abstract/Free Full Text] -
Hogervorst, F.B., Nederlof, P.M., Gille, J.J., McElgunn, C.J., Grippeling, M., Pruntel, R., Regnerus, R., van Welsem, T., van Spaendonk, R., Menko, F.H. et al. (2003) Large genomic deletions and duplications in the BRCA1 gene identified by a novel quantitative method. Cancer Res., 63, 14491453.
[Abstract/Free Full Text]




21=6.63. In addition to testing the different 

