A transcription map of the DiGeorge and velo-cardio-facial syndrome minimal critical region on 22q11
A transcription map of the DiGeorge and velo-cardio-facial syndrome minimal critical region on 22q11Weilong Gong1, Beverly S. Emanuel1,2, Joelle Collins1, David H. Kim1, Zhili Wang3, Feng Chen3, Guozhong Zhang3, Bruce Roe3 and Marcia L. Budarf1,2,*
1The Division of Human Genetics and Molecular Biology, The Children's Hospital of Philadelphia, Philadelphia, PA, USA, 2The Department of Pediatrics University of Pennsylvania School of Medicine, Philadelphia, PA, USA and 3Departments of Chemistry and Biochemistry, University of Oklahoma, Norman, OK, USA
Received January 15, 1996;Revised and Accepted March 27, 1996GenBank accession nos. L77559-L77571
The majority of patients with DiGeorge syndrome (DGS) and velo-cardio-facial syndrome (VCFS) have a microdeletion of 22q11. Using translocation breakpoints and fluorescence in situ hybridization analysis (FISH), the minimal DiGeorge critical region (MDGCR) has been narrowed to 250 kb in the vicinity of D22S75 (N25). The construction of a detailed transcription map covering the MDGCR is an essential first step toward the identification of genes important to the etiology of DGS/VCFS, two complex disorders. We have identified a minimum of 11 transcription units encoded in the MDGCR using a combination of methods including cDNA selection, RT-PCR, RACE and genomic sequencing. This approach is somewhat unique and may serve as a model for gene identification. Of the 11 transcripts, one is the previously reported DGCR2/IDD/LAN gene, and three revealed a high level of similarity to mammalian genes: a Mus musculus serine/threonine kinase, a rat tricarboxylate transport protein and a bovine clathrin heavy chain. The remaining transcripts do not demonstrate any significant homology to genes of known function. The identification of these transcription units in the MDGCR will facilitate their further characterization and help elucidate their role in the etiology of DGS/VCFS.
DiGeorge syndrome (DGS) is a developmental anomaly of the derivatives of the 3rd and 4th pharyngeal pouches. It is associated with a spectrum of malformations, including absence or hypoplasia of the thymus and parathyroid glands, cardiovascular anomalies and mild craniofacial dysmorphia. It has been proposed that the primary defect in DGS is the failure of cephalic neural crest cells to migrate properly during early embryonic development (1 ,2 ). Previously, cytogenetic studies of patients with DGS demonstrated that ~20% have chromosomal abnormalities, with the majority of these chromosomal rearrangements involving the loss of the proximal long arm of chromosome 22 (3 ). These results suggested that monosomy for 22q11 may play a significant role in the etiology of DGS. Subsequently, molecular studies have demonstrated the validity of this hypothesis (4 ,5 ) and microdeletions have been detected in 89% of the patients we have studied with DGS (6 ).
Velo-cardio-facial syndrome (VCFS) is a common autosomal dominant disorder characterized by cleft palate, cardiac anomalies, typical facies and learning disabilities. Due to the phenotypic overlap between VCFS and DGS, it was postulated that both diseases might share a common pathogenesis or be etiologically related (7 ). Using the 22q11.2 markers deleted in patients with DGS, it was possible to demonstrate that the majority of VCFS patients are hemizygous for the same region (8 ). Currently, over 85% of the patients we have studied with a clinical diagnosis of VCFS have microdeletions of 22q11.2. These findings indicate that haploinsufficiency of this region is a major factor in the development of this disorder (6 ).
The majority of DGS/VCFS patients have a large deletion which includes a common set of markers in 22q11.2 (4 ,6 ). This `commonly deleted region' has been estimated to be greater than 1.5 Mb based on pulsed-field gel analysis (9 ). However, individual patients can have deletions which extend either proximally, distally or in both directions (4 ). Using translocation breakpoints and fluorescence in situ hybridization analysis (FISH), the region critical to DGS/VCFS has been narrowed to a 250 kb area in the proximal portion of the commonly deleted region (9 ,10 ). This region (Fig. 1 A, B) includes the marker D22S75 which is the most consistently deleted marker in our patient studies (6 ,10 ). Further, the breakpoint region of ADU, the only known balanced translocation patient with the DGS/VCFS phenotype, maps to the proximal portion of this 250 kb region (9 ,11 ). These data suggest that one or more of the genes in this minimal DGS/VCFS critical region (MDGCR) are strong candidates for involvement in the pathogenesis of these diseases.
A cosmid contig representing a 250 kb genomic region (MDGCR) containing the marker D22S75 (clone name N25) and the balanced (2;22)(q14;q11.21) translocation breakpoint (ADU) has been constructed (Fig. 1 ). From this contig, seven minimally overlapping cosmids were used to isolate region-specific cDNAs by cDNA selection (Fig. 1 C). To increase the complexity of the starting material, poly(A)RNA from fetal brain, fetal liver and adult skeletal muscle was used to synthesize the cDNA utilized in the cDNA selection. In total, 567 colonies were selected and used as a cDNA reference sublibrary. The sizes of the cDNA inserts were determined using PCR with primers specific for each linker (see Materials and Methods). The average insert size in the cDNA sublibrary was 550 bp (ranging from 350 bp to 1.2 kb).
The next step was to regionally assign the cDNA clones by hybridizing each of the seven cosmids to the nylon filters containing the arrayed cDNAs. This resulted in the identification of 429 cDNAs, including the 57 clones detected by the positive control probes previously mapped to the MDGCR, and five of 34 Alu-positive cDNAs (see Materials and Methods). An additional 50 cDNAs were detected by cDNA walking or with the use of RT-PCR products generated during subsequent steps (see below). Since these latter probes were generated from expressed sequences, they had greater sensitivity than the cosmid probes. In total, 479 cDNAs from the cDNA sublibrary (85%) mapped back to the 250 kb MDGCR. Of the 479 cDNAs, 129 clones were derived from fetal brain, 122 clones from fetal liver and 228 clones from adult skeletal muscle. The remaining 88 clones (15%) either contained repetitive sequences (29/88), small or no insert (23/88), multiple inserts (6/88), or they grew poorly (30/88).
Based on clone overlap, the smallest number of cDNA contigs that could be assembled was 16 (Fig. 1 D) from which primers for PCR were generated (Fig. 1 B and Table 1 ). For details of cDNA contig construction see Materials and Methods. To further assemble the contigs into transcription units, results from Northern blot experiments were compared. If clones from adjacent contigs detected the same size transcript(s) and tissue distribution on Northern blots, this was taken as evidence that the clones could be part of the same transcript. In a few cases, searches against nucleotide and/or protein sequence databases demonstrated that non-overlapping clones had similarity to the same entry, suggesting that they could be from the same transcription unit. To verify that non-overlapping clones were part of the same transcript, primers to sequences at the adjacent ends of two neighboring cDNA contigs were synthesized and tested for the ability to generate a specific PCR product from primary cDNA. The RT-PCR product generated from these experiments was isolated and sequenced to confirm the specificity of the reaction. Using this approach, we assembled 11 transcription units (DGS-A to DGS-K) in the MDGCR (Fig. 1 E and Table 2 ). Listed in Table 3 are the number and tissue distribution of the cDNA clones identified for each contig. Not all clones were analyzed for gene assembly because they were duplicates or represented smaller clones of the same region.
*TH: total human DNA; FB: fetal brain cDNA; FL: fetal liver cDNA; ASM: adult skeletal muscle cDNA.-indicates that no specific PCR product was generated. PCR reaction conditions for the generation of these STSs are described in Material and Methods.
DGS-A lies in the most centromeric region of the MDGCR and was assembled from five cDNA clones which constitute the minimal overlap for contig 1 (C-1 in Fig. 1 D and E). Northern blot hybridization with cDNAs from this contig did not give a positive signal for any of the 16 adult and four fetal tissues tested, but RT-PCR products of the expected size were successfully amplified from fetal brain and skeletal muscle mRNA using primers derived from this contig (D22S1566, Fig. 1 B and Table 1 ). No specific amplification occurred when the reverse transcriptase was omitted from the first step of cDNA synthesis, indicating that the RT-PCR product was not due to genomic DNA contamination. These results suggest that DGS-A is a low abundance transcript. Based on the combined sequence data, DGS-A represents 2.3 kb of expressed sequence containing no introns. The presence of a poly(A) tail in the 3' end of DGS-A indicates its orientation as being centromere to telomere (5' -> 3') (Fig. 1 E). Further, six ESTs (Table 2 ) were detected from the EST database (dbEST) with greater than 97% identity to this transcript. The dbEST ESTs were derived from fetal brain cDNAs. Searching nucleotide and amino acid sequence databases using the BLAST e-mail server at the National Center for Biotechnology Information (NCBI), a match was obtained with the human membrane protein-like protein (HMPL, accession no. U21556). There was 86% identity over 920 bp. However, the significance of this high level of similarity is difficult to assess because there is little information regarding HMPL which is not yet published.
DGS-B was represented by contig 2 (C-2 in Fig. 1 D and E) and on Northern blots identified a 1.6 kb message in several tissues with strongest signal in heart and skeletal muscle (Fig. 2 and Table 2 ). This transcript maps 4.8 kb proximal to the (2;22) balanced translocation breakpoint of a DGS proband, ADU, and has been previously described as DGCR4 (9 ). Similarity searches of nucleotide and protein sequence databases did not find any significant matches. Primers between cDNA contig 1/DGS-A and contig 2/DGS-B failed to amplify a RT-PCR product, suggesting that the two contigs are not derived from a single gene. This result is consistent with the Northern blot analysis of the two contigs.
The differences in size and tissue distribution of the transcripts we have identified suggest that a minimum of 11 transcription units are encoded in the MDGCR. Several methods are available to identify the sense strand of a gene. These include: (i) searching for a poly(A) tail or polyadenylation signal in the cDNA sequence; (ii) analysis of consensus sequences at splice junctions; (iii) comparison of the new cDNAs to the orientation of known genes with which they share a high degree of homology; and (iv) isolation of 5'- or 3'- end sequence of a given cDNA clone using the RACE methods. We have used a combination of these methods in our analysis. Furthermore, all of these approaches were assisted by comparison of the cDNA sequence to the corresponding genomic sequence.
Prior to these experiments, two partial cDNAs from the MDGCR, Lan and N25-wa (CLTCL) had been completely sequenced. The sequence data from both cDNAs demonstrate a poly(A) tail and indicate that these genes are arranged from telomere to centromere. The poly(A) tail present in the sequence of DGS-A indicates its orientation is from centromere to telomere. The sequence of the remaining cDNAs failed to show either a polyadenylation signal or a poly(A) tail, necessitating additional analysis.
The consensus splice site (c/a)ag*gt(a/g)agt.......ncag*g(t/a) (18 ) found in the sequences of LAN, DGS-I, CTP and CLTCL indicates that the direction of transcription of these genes is from telomere to centromere. The partial sequences of DGS-B, -E, -F, -G and -H are devoid of introns, indicating that they could be intronless genes or 3' or 5' untranslated sequences. For these transcripts, 3' RACE was performed using primers derived from the cDNA contigs. The sequence of the 3' RACE fragments for DGS-G and -H showed a poly(A) signal and a poly(A) tail, indicating that the two genes are transcribed in opposite orientation as shown in Figure 1 E. In addition, 3' RACE fragments of DGS-D, -I and CTP were also generated, which confirmed the orientation which had been predicted by the presence of consensus splice sites.
The identification of a minimum of 11 transcription units in the 250 kb MDGCR indicates that this region has a high density of genes with an average of one every 20-25 kb. CpG islands have been used as markers for genes because of their association with the 5' end of all housekeeping genes and an estimated 40% of transcripts which are tissue specific (19 ,20 ). Further, as might be expected, it has been noted that regions rich in genes have a high density of CpG islands (22 ). Results from restriction mapping of cosmids in the MDGCR using rare cutting restriction enzymes frequently found in CpG islands (BssHII, NotI and SacII) demonstrates that the MDGCR has an abundance of these sites. The mapping data show that there are a minimum of five NotI sites, 10 BssHII sites and >20 SacII sites. Three regions have five or more of these restriction sites within 1 kb, indicating potential CpG islands (shown as diamonds in Fig. 1 B). The most telomeric of these islands corresponds to the 5' region of CTP. Results of Northern analysis of CTP demonstrates that it is widely expressed. Further, several ESTs have been identified for CTP, consistent with a `housekeeping' function (Fig. 2 and Table 2 ). Transcripts have not yet been identified for the two more proximal CpG islands, making them attractive targets for further characterization. Given its abundant and widespread expression, it is somewhat surprising that restriction mapping failed to detect a CpG island in association with the LAN gene. GRAIL (12 ) analysis of this region predicts a CpG island in the 5' region of LAN. The greater sensitivity of GRAIL in this case was due to the fact that although the 5' region of LAN is CpG-rich it contains relatively few sites for rare cutting restriction enzymes.
It is possible to compare the success of GRAIL (12 ) to cDNA selection in the identification of open reading frames (ORF). GRAIL2 analysis of 160 kb of genomic sequence from the MDGCR (accession nos L77569 and L77570) predicts 11 excellent ORFs in the `forward strand' (defined as the centromere -> telomere strand in the 5' -> 3' direction). Experimentally, we were able to confirm only one of these or ~10%. This apparent lack of consensus between the two approaches may be explained by a coding strand bias because on the reverse strand (i.e. telomere -> centromere) 43 `excellent' ORFs were predicted and we were able to verify 38 of these (88%). The best correspondence between GRAIL and our experimental data was for genes with multiple exons. In these instances, GRAIL correctly identified approximately 90% of the exons. In many cases, both the splice donor and splice acceptor sites were accurately identified and in the majority at least one of these was correct. For intronless genes and in areas where there are multiple small transcripts GRAIL was less successful. GRAIL 1a, which does not use splice donor and acceptor information in predicting ORFs, gave somewhat different results from GRAIL2, but it was no more accurate at predicting intronless genes in this region.
Lastly, BLASTN searches of GenBank dbEST demonstrated that six of the 11 transcription units identified ESTs. The number of ESTs for the six transcripts ranged from one to more than 45. Two of the six transcripts appear to be intronless (DGS-A and -G) and thus the ESTs did not provide any information with respect to intron/exon boundaries. Interestingly, the gene with the most ESTs, LAN, has a large 3' UTR (2.5 kb) and none of the ESTs extend past this region. DGS-I and CTP have much smaller 3' UTRs and approximately two thirds of the ESTs extend into coding regions, including several 5' exons.
In a positional cloning approach, identification of expressed sequences from a genomic region containing a disease locus is a major step toward isolation of the disease gene(s). The construction of a detailed transcription map is particularly important for DGS/VCFS because, at present, it is not known whether the major features of these complex syndromes are due to the loss of function of a single or multiple genes. Although the majority of patients have large deletions of 22q11.2, a small number of patients have the phenotypic features of DGS/VCFS but have no detectable deletion. It has not been possible to demonstrate linkage to 22q11.2 or any other chromosomal region for these non-deleted patients because the cases are usually sporadic or from small nuclear families. There is one DGS patient, ADU, who has a balanced (2;22)(q14.1;q11.2) translocation (11 ) which suggests that a single disrupted gene may be responsible for the phenotype. We have reported the cloning of this translocation breakpoint and the identification of transcripts in the surrounding region (9 ). However, at this time we have been unable to detect mutations in any of these putative transcripts in non-deleted DGS/VCFS patients, leaving open the possibility that the ADU translocation may have a positional effect on genes proximal or distal to the breakpoint.
The DGCR, the region of 22q11 deleted in the majority of patients with DGS or VCFS, is greater than 1.5 Mb. Using a limited number of patients with smaller deletions, it has been possible to narrow the region critical to the phenotype to the proximal 250 kb of the DGCR, the MDGCR (10 ). We have established a detailed transcription map covering the MDGCR. Although several genes have been previously described which map within the larger (>1.5 Mb), commonly deleted region associated with DGS/VCFS, including TUPLE1, COMT and ZNF74 (for review see 9 ), only three previously reported genes have been mapped to the region we define as the MDGCR.
The first transcript to be described was N25-wa, which was isolated by screening a cDNA library with NotI linking clone N25 (16 ). This corresponds to CLTCL (DGS-K) which demonstrates high homology to clathrin heavy chain (Fig. 3 C). Clathrin heavy chain is one of the major structural components of coated pits and coated vesicles, and is ubiquitously expressed. The coated pits/vesicles are involved in intracellular vesicular transport and in uptake of membrane-bound ligands and extracellular fluid. The primary structure of clathrin heavy chain is highly conserved with significant identity of amino acids among rat, bovine and human clathrin (23 ). In 1991 Dodge et al. reported the isolation and mapping of a partial cDNA for the human clathrin heavy chain to 17q11-qter (17 ). Thus, the transcript we have identified in the MDGCR appears to represent a second distinct locus. The expression of CLTCL is limited to adult skeletal muscle and it demonstrates lower homology to other mammalian clathrin heavy chains, supporting the hypothesis that CLTCL represents a different but related gene. The role of the coated pits in receptor-mediated endocytosis suggests a possible mechanism for involvement of CLTCL in DGS/VCFS by perturbation of receptor signaling during neural crest cell migration. Further, it has been reported that Drosophila embryos, homozygous or hemizygous for clathrin heavy chain mutations, fail to hatch at the first larval stage (24 ). Nonetheless, it is somewhat puzzling that in contrast to this apparent second locus in humans, there appears to be only a single locus for clathrin heavy chain in other species, such as rat, Drosophila and yeast (23 -25 ). Further, as yet, we have been unable to detect a homolog for CLTCL in the mouse (Galili et al., unpublished). This will make it more difficult to assess the function of the CLTCL gene in humans and its role in DGS/VCFS.
The second gene to be isolated and mapped to the MDGCR is the DGCR2/IDD/LAN gene (9 ,14 ,15 ). The 3' end of this gene is approximately 10 kb telomeric to the ADU breakpoint. From the deduced amino acid sequence the LAN protein is predicted to be an integral membrane protein. The N-terminus contains Cys-rich repeats and has similarity to the low-density lipoprotein receptor and other proteins containing this motif, including basement membrane proteins such as perlecans. LAN is widely expressed in adult and fetal tissues (Fig. 2 , refs 9 ,14 ,15 ). Because the LAN gene product could be involved in cell-cell or cell matrix interactions via ligand binding, and thus potentially play a role in neural crest cell migration, it can be considered an attractive candidate for DGS/VCFS. However, to date, mutations in this gene have not been identified in non-deleted DGS/VCFS patients. Lastly, we reported a partial cDNA (ac2b1) corresponding to DGS-B which appears to recognize the same sized transcript on Northern blot analysis as two GRAIL predicted exons in its vicinity, nex2.2 and nex3 (9 ). Since DGS-B does not have any homology to known genes, its potential as a DGS/VCFS candidate gene is based on its close proximity to the ADU breakpoint region.
Of the remaining eight transcription units in this report, only two, CTP (DGS-J) and DGS-G, demonstrated a high level of similarity to known genes. CTP had 98% similarity (Fig. 3 B) to a rat mitochondrial tricarboxylate transporter also referred to as citrate transport protein. CTP is a mitochondrial inner membrane protein predicted to have six hydrophobic membrane-spanning [alpha]-helices with five connecting hydrophilic segments (26 ). Its function is to exchange a tricarboxylate along with a proton for another tricarboxylate/H+,or a dicarboxylate, or phosphoenol- pyruvate across the inner mitochondrial membrane. This electroneutral exchange supplies NAD+ and NADPH for glycolysis and lipid biosynthesis, as well as a carbon source for the triacylglycerol and sterol biosynthetic pathways. Although no direct genetic studies are available, rats experimentally made insulopenic showed a decreased level of CTP activity, indicating that CTP levels are regulated in part by insulin (27 -29 ). Reduced levels of CTP due to haploinsufficiency may play a modifying role in DGS/VCFS by affecting glucose metabolism. Further, epidemiological studies have shown that infants of diabetic mothers are at an increased risk for conotruncal heart defects (30 ), suggesting that perturbations of glucose metabolism affect this developmental field.
The last transcription unit to demonstrate significant homology to any known gene is DGS-G which shows 94% similarity to a mouse serine/threonine kinase, TSK-1 (Fig. 3 A). Similar to DGS-G, TSK-1 is a 1.6 kb transcript expressed exclusively in the testis (31 ). Unfortunately, neither the function of TSK-1 in the testis nor the expression pattern in the mouse embryo are known (31 ). Comparison of the kinase domain of DGS-G to other kinases indicates that DGS-G has all 14 of the most highly conserved amino acids and has the consensus in subdomains VI and VIII predicted for Ser/Thr kinases (Fig. 3 A; 32 ). Based on sequence homology of the catalytic domain, DGS-G belongs to the SNF1 subfamily of Ser/Thr kinases. Three other members of this family are 5'-AMP-activated protein kinase, par-1 and msk. These kinases are involved in lipid metabolism (33 ,34 ), in establishing polarity in early C. elegans embryos (35 ) and in early expression in the myocardial cells of the developing mouse heart (36 ), respectively. Although, in the embryo, msk has very restricted expression in the developing heart, it is abundantly, though not exclusively, expressed in adult testis (36 ). Thus, the expression of DGS-G in adult testis does not preclude its playing a role in embryonic development. Further examples include, Hoxa-4 and int-1 which are expressed in the central nervous system of developing embryos, but demonstrate adult expression exclusively in testis (37 -39 ). Given the central role of protein kinases in coordinating the eukaryotic cell's response to external and internal signals, DGS-G is an appealing candidate for DGS/VCFS.
The remaining six transcription units do not show significant homology to genes with known function and therefore cannot be assessed for likely involvement in DGS/VCFS based on their predicted role. Northern blot analysis demonstrated that one of the transcripts (CLTCL) was expressed exclusively in skeletal muscle, four transcripts were more highly expressed in this tissue, and one was expressed only in skeletal muscle and heart. Further, all genes except for DGS-G and CLTCL seem to be abundant in the heart by Northern analysis. This tissue-specific expression pattern in heart and skeletal muscle may be an indication that these genes are important in cardiac development.
In summary, the 11 transcription units described in this manuscript are all candidates for the abnormalities associated with DGS/VCFS because they fall within the MDGCR and are deleted in the majority of patients with DGS/VCFS. Additional studies of the small number of non-deleted DGS/VCFS patients, aimed at the identification of small rearrangements or point mutations in these genes, are underway. These studies will be necessary to determine how these genes contribute to the various phenotypic abnormalities associated with these disorders.
The probe N25 (D22S75) was used to initiate the cosmid contig in the MDGCR. The chromosome 22 specific library, LL22N03, constructed at the Biomedical Sciences Division, Lawrence Livermore National Laboratory was the source of all cosmids represented in this contig. High density filters were prepared from the arrayed cosmid library and screened by colony hybridization. Primary positive clones were verified by Southern blot analysis of HindIII digested cosmid DNA. Cosmids positive by Southern blot analysis were further analyzed by using single and double restriction digestion with BssHII, MluI, NotI, NruI, SacII, SalI and SfiI, followed by pulsed-field gel electrophoresis. A restriction map was constructed after each cosmid walk and a terminal fragment, suitable for further walking was identified. The cosmid library is estimated to be approximately 7 * coverage of chromosome 22 and on average we identified seven cosmids per walk. A total of nine cosmid walks were performed. Based on the hybridization and restriction mapping data, a 250 kb cosmid contig encompassing D22S75 and the ADU breakpoint was constructed. The cosmids shown in Figure 1 C represent the minimal tiling path used for cDNA selection.
cDNA selection (40 ,41 ) was performed using a modified protocol (42 ). cDNAs were synthesized from poly(A) mRNA prepared from fetal brain (FB), fetal liver (FL) and adult skeletal muscle (ASM) (Clontech). Reverse transcription was performed separately for each tissue using 2.5 [mu]g of mRNA, 150 ng of random hexamers (GIBCO BRL) and 500 U reverse transcriptase (GIBCO BRL) in a 50 [mu]l reaction. cDNA from each source was tagged using linkers which could be distinguished because they differed at the last 5-6 bp. The sequence of the primers is as follows:
These `tissue specific' linkers were ligated to the blunt-ended cDNAs. Each ligation reaction was passed through a Chroma spin-1000 (Clontech) to remove small cDNA molecules (<420 bp) from the samples and then dissolved in 50 [mu]l H20. Five [mu]l from each of the cDNA samples was PCR amplified separately using the primers specific for each source in a 100 [mu]l reaction containing 10 mM Tris-HCl, pH 8.3, 2.5 mM MgCl2, 50 mM KCl, 0.25 mM each of dNTPs, 0.5 mM primer and 2.5 U Taq polymerase. The PCR reaction was performed with a 2 min step denaturation at 94oC and then subjected to 30 cycles of denaturation (45 s at 94oC), annealing (45 s at 65oC) and extension (5 min at 72oC), and final extension was 7 min at 72oC using a 9600 thermal cycler (Perkin-Elmer).
Purified DNA (100 ng each) from seven cosmids covering the MDGCR was pooled and biotinylated using a nick translation kit (BRL). The human repeats present in genomic DNA were suppressed by prehybridization with human Cot-1 DNA (500 [mu]g/[mu]l) and total human placental DNA (500 [mu]g/[mu]l) for 1-3 h, and then hybridized to the amplified cDNAs in solution. The cosmid/cDNA complexes were captured on streptavidin-coated magnetic beads (Dynal), which were pretreated with 10 [mu]g of human Cot-1 DNA (BRL) for 1 h at room temperature. The specific cDNAs were separated from the beads by heating for 10 min at 75oC and PCR amplified. After a second round of selection, the eluted cDNAs were PCR amplified with the primers described above with the addition of a 12-nucleotide (CUA)4 sequence to the 5'-end. The PCR products were treated with uracil DNA glycosylase, cloned into vector pAMP 10 (BRL) and transformed into DH5[alpha] cells. Single colonies were plated on LB agar and then picked into wells of 96 well microtiter dishes. Gridded arrays on nylon membranes (Hybond+, Amersham) were prepared using a biomek 1000 robot (Beckman).
To assess the specificity of the cDNA sublibrary, two previously isolated cDNA clones and one RT-PCR product which we had previously mapped in the MDGCR were used as positive controls. These probes identified ten percent of the cDNA sublibrary (57/567 clones), indicating that the library was greatly enriched for cDNAs originating from the MDGCR. To avoid analysis of cDNAs selected by cross-hybridization to non-specific sequences, the sublibrary filters were hybridized to Alu (Blur 8) and pAMP10 probes. Of the 567 cDNAs, 34 cDNA clones gave strong hybridizing signals to the Alu probe, suggesting they contain sequences homologous to this highly repetitive element. Five of the Alu-containing clones mapped back to the MDGCR. Three of these are derived from the 3' untranslated region of LAN and the other two appear to be hnRNA from the LAN gene. Hybridization with the pAMP10 vector identified an additional 23 clones with a strong signal after a short exposure time. After PCR-amplification and restriction analysis with SpeI, these clones were found to have small or no inserts and they were excluded from further analysis. An additional 36 clones were eliminated from further characterization because they either contained multiple inserts or gave weak signals to all hybridization probes used, suggesting that these clones did not grow well.
To estimate the enrichment achieved by our cDNA selection, we compared the abundance of the partial cDNA N25-wa in the sublibrary to results from screening a conventional cDNA library. Seventeen skeletal muscle derived cDNAs were identified from the cDNA sublibrary. In contrast, only three positive clones were identified when N25-wa was used to probe a skeletal muscle cDNA library consisting of 106 recombinant clones. This indicates that the cDNAs in the sublibrary were enriched by a factor of 104.
Probes including cosmid inserts, cDNAs and PCR products were labeled with [alpha]-32P dCTP by using the random priming method (43 ). Human repetitive sequences were blocked by prehybridization with sheared human placental DNA (250 mg/ml) and human Cot-1 DNA (125 mg/ml). The prehybridization was carried out in 0.5 M Na2P04, pH 7.3, 7% SDS, 1 mM EDTA, pH 8 at 65oC for 3-4 h and hybridization was performed under the same conditions for 16-24 h. The filters were washed twice with 0.2* SSC, 0.1% SDS at 65oC for 15-25 min after Southern hybridization or twice with 0.1* SSC, 0.1% SDS at 65oC for 15-25 min after Northern hybridization. They were then exposed to Kodak X-OMAT film for several hours to several days at -70oC with an intensifying screen.
cDNA was synthesized in a 50 [mu]l reaction using 100 ng of poly(A) RNA extracted from various tissues. The RNA was heated with random and oligo(dT) primers for 5 min at 65oC and cooled to room temperature for 10 min. Reverse transcription was performed at 37oC for 1 h after adding 5 [mu]l 10 * RT buffer (Stratagene), 20 U RNase inhibitor (Stratagene), 2 [mu]l of 0.1 M dNTPs and 50 U MMLV reverse transcriptase. The cDNA mixture was then heated for 5 min at 90oC. For PCR amplification, 2 [mu]l of cDNA was used per 50 [mu]l reaction.
Primer pairs for PCR were generated for the cDNA contigs (Fig. 1 B and Table 1 ). Sequence data from an ABI automated sequencer was analyzed (Staden package; 21 ) and STSs were chosen using PRIMER version 0.5 (M.J. Daly, S. Lincoln and E.S. Lander, Whitehead Institute, Cambridge, MA, 1991). Using the following conditions, a unique PCR fragment was obtained for each primer pair. PCR was performed in 20 [mu]l reactions using approximately 50 ng genomic DNA or 5 ng cDNA synthesized from poly(A) RNA in 1 * PCR buffer: 10 mM Tris-HCl, pH 8.3, 1.0-1.5 mM MgCl2, 50 mM KCl, 1 [mu]M primers (final concentration) and 0.5 U Taq polymerase (Perkin Elmer Cetus or Boehringer-Mannheim). PCR conditions were: a 5 min denaturation step at 95oC followed by 30 cycles of denaturation at 95oC for 15 s, annealing at a temperature determined for each STS for 15 s, and extension at 72oC for 1 min 22 s, and lastly a 7 min extension at 72oC. Primer sequences are summarized in Table 1 .
Consistent with GDB nomenclature, we have called these PCR products sequence tagged sites (STSs) rather than expressed sequence tags (ESTs). In general, the term EST has been used to refer to partial sequence obtained from randomly isolated cDNAs. In contrast, the cDNA sequences which were used for STS generation in this study have been precisely mapped, amplify the same size fragment in cDNA as genomic DNA, and have been completely sequenced. These STSs were used in the construction of the cDNA contigs and serve as landmarks for the transcripts.
Marathon-ReadyTM human fetal and skeletal muscle cDNAs (Clontech) were used in PCR using an anchor primer provided by the manufacturer and a gene-specific primer. PCR was performed in 50 [mu]l reactions using 1* PCR buffer (Clontech): 40 mM Tricine-KOH, 15 mM KOAc, 3.5 mM Mg(OAc)2, 75 mg/ml bovine serum albumin and 0.25 U KlenTaq-1 DNA polymerase (Clontech). PCR conditions were: a 1 min denaturation step at 94oC followed by 30 cycles of [denaturation at 94oC for 30 s, annealing and extension at 68oC for 3 min] and lastly a 3 min extension at 68oC. The majority of PCR reactions were performed on Perkin Elmer 9600 thermal cyclers. PCR products were analyzed by gel electrophoresis using 1.5% agarose.
Double-stranded plasmid DNA was prepared and purified using the Wizard Mini Preps DNA Purification System (Promega) and sequenced from both ends on an ABI 370A sequencer using the universal forward and reverse M13 fluorescent primers. PCR products were purified using the SpinBind DNA purification kit from agarose(FMC) and directly sequenced using the primers specific for PCR amplification.
The authors wish to thank Dr Elizabeth Goldmuntz for providing probes, Dr Vahe Bedian at the University of Pennsylvania Sequencing Facility for the sequencing of PCR products and cDNAs and Drs Charles Bailey and Susan Holmes for critical reading of the manuscript. We would also like to acknowledge the invaluable assistance of Charles Bailey in the computational analysis. The chromosome-specific cosmid library LL22NC03 used in this study was constructed at the Biomedical Sciences Division, Lawrence Livermore National Laboratory, Livermore, CA 94550 under the auspices of the National Laboratory Gene Library Project sponsored by the U.S. Department of Energy. The authors would especially like to thank Dr Pieter de Jong and Jeffrey Garnes for providing this cosmid library. These studies were supported in part by CA39926 (B.S.E.), HG00425 (B.S.E., M.L.B), HL51533 (M.L.B, B.R., B.S.E.), DC02027 (M.L.B, B.S.E., B.R.) and HG00313 (B.R.) from the National Institutes of Health.
1 Kirby, M.L. and Bockman, D.E. (1984) Neural crest and normal development: A new perspective. Anat. Rec. 209, 1-6.MEDLINE Abstract
2 Lammer, E.J. and Opitz, J.M. (1986) The DiGeorge anomaly as a developmental field defect. Am. J. Med. Genet. 29, 113-127.
3 Greenberg, F., Elder, F., Haffner, P., Northup, H., and Ledbetter, D. (1988) Cytogenetic findings in a prospective series of patients with DiGeorge anomaly. Am. J. Hum. Genet. 43, 605-611.MEDLINE Abstract
4 Driscoll, D.A., Budarf, M.L. and Emanuel, B.S. (1992a) A genetic etiology for DiGeorge syndrome: consistent deletions and microdeletions of 22q11. Am. J. Hum. Genet. 50, 924-933.MEDLINE Abstract
5 Carey A.H., Kelly, D., Halford, S., Wadey, R., Wilson, D., Goodship. J., Burn, J., Paul, T., Sharkey, A., Dumanki, J. et al. (1992) Molecular genetic study of the frequency of monosome 22q11 in DiGeorge syndrome. Am. J. Hum. Genet. 51, 964-970.MEDLINE Abstract
6 Driscoll, D.A., Salvin, J., Sellinger, B., Budarf, M.L., McDonald-McGinn, D.M., Zackai, E.H., Emanuel, B.S. (1993) Prevalence of 22q11 microdeletions in DiGeorge and velocardiofacial syndromes: implications for genetic counseling and prenatal diagnosis. J. Med. Genet., 30, 813-817.MEDLINE Abstract
8 Driscoll, D.A., Spinner, N.B., Budarf, M.L., McDonald-McGinn, D.M., Zackai, E.H., Goldberg, R.B., Shprintzen, R.J., Saal, H.M., Zonana, J., Jones, M.C., Mascarello, J.T., Emanuel, B.S.(1992b) Deletions and microdeletions of 22q11.2 in velo-cardio-facial syndrome. Am. J. Med. Genet. 44, 261-268.MEDLINE Abstract
9 Budarf, M.L., Collins, J., Gong, W., Roe, B., Wang, Z., Sellinger, B.,Michaud, D., Driscoll, D. and Emanuel, B.S. (1995) Cloning a balanced translocation associated with DiGeorge syndrome and identification of a disrupted candidate gene. Nature Genet.10, 269-288. MEDLINE Abstract
10 Li, M., Budarf, M.L., Sellinger, B., Jaquez, M., Matalon, R., Ball, S., Pagon, R.A., Rosengren, S.S., Emanuel, B.S., Driscoll, D.A. (1994) Narrowing the DiGeorge region (DGCR) using DGS-VCFS associated translocation breakpoints. Am. J. Hum. Genet. 55, A10.
11 Augusseau, S., Jouk, S., Jalbert, P. and Prieur, M. (1986) DiGeorge syndrome and 22q11 rearrangements. Hum. Genet. 74, 206. MEDLINE Abstract
12 Uberbacher, E.C. and Mural, R.L. (1991) Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach. Proc. Natl Acad. Sci. USA88, 11261-11265.MEDLINE Abstract
13 Altshul, S.F., Gish, W., Miller, W., Myer, E.W. and Lipman, D.J. (1990) Basic local alignment search tool. J. Mol. Biol. 215, 403-410.
14 Demczuk, S., Aledo, R.; Zucman. J., Delattre, O., Desmaze, C., Dauphinot, L., Jalbert, P., Rouleau, G.A., Thomas, G. and Aurias, A. (1995) Cloning of a balanced translocation breakpoint in the DiGeorge syndrome critical region and isolation of a novel potential adhesion receptor gene in its vicinity. Hum. Mol. Genet., 4, 551-558.MEDLINE Abstract
15 Wadey, R., Daw, S., Taylor, C., Atif, U., Kamath, S., Halford, S., O'Donnell, H., Wilson, D., Goodship, J., Burn, J. and Scambler, P. (1995) Isolation of a gene encoding an integral membrane protein from the vicinity of a balanced translocation breakpoint associated with DiGeorge syndrome. Hum. Mol. Genet. 4, 1027-1033. MEDLINE Abstract
16 Emanuel, B.S., Driscoll, D., Goldmuntz, E., Baldwin, S., Biegel, J., Zackai, E.H., McDonald-McGinn, D., Sellinger, B., Gorman, N., Williams, S., and Budarf, M.L. (1993) Molecular and phenotypic analysis of the chromosome 22 microdeletion syndromes. In: Phenotypic Mapping of Down Syndrome and Other Aneuploid Conditions, ed. Epstein, C.J., Wiley Liss, NY, 207-224.
17 Dodge, G.R., Kovalszky, I., McBride, O.W., Yi, H.F., Chu, M-L., Saitta, B., Stokes, D.G., and Iozzo, R.V. (1991) Human clathrin heavy chain (CLTC): partial molecular cloning, expression, and mapping of the gene to human chromosome 17q11-qter. Genomics 11, 174-178.MEDLINE Abstract
18 Padgett, R.A., Grabowski, P.J., Konarska, M.M., Seiler, S. and Sharp, P.A. (1986) Splicing of messenger RNA precursors. Annu. Rev. Biochem. 55, 1119-1150.MEDLINE Abstract
19 Bird, A.P. (1987) CpG islands as gene markers in the vertebrate nucleus. Trends Genet. 3, 324-347.
20 Larsen, F., Gundersen, G., Lopez, R., and Prydz, H. (1992) CpG islands as gene markers in the human genome. Genomics 13, 1095-1107.MEDLINE Abstract
21 Dear. S., Staden, R.(1991) A sequence assembly and editing programe for efficient management of large projects. Nucleic Acids Res. 19, 3907-3911.MEDLINE Abstract
22 Craig,J .M. and Bickmore,W.A. (1994) The distribution ofCpG islands in mammalian chromosomes. Nature Genet. 7, 376-382.MEDLINE Abstract
23 Kirchhausen, T., Harrison, S.C., Chow, E.P., Mattaliano, R.J., Ramachandran, K.L., Smart, J. and Brosius, J. (1987) Clathrin heavy chain: molecular cloning and complete primary structure. Proc. Natl Acad. Sci. USA84, 8805-8809.MEDLINE Abstract
24 Bazinet, C., Katzen, A.L., Morgan, M., Mahowald, A.P., Lemmon, S.K. (1993) The Drosophila clathrin heavy chain gene: clathrin function is essential in a multicellular organism. Genetics 134, 1119-1134.MEDLINE Abstract
25 Payne, G.S. and Schekman, R. (1985) A test of clathrin function in protein secretion and cell growth. Science 230, 1009-1014.MEDLINE Abstract
26 Kaplan, R.S., Mayor, J.A. and Wood, D.O. (1993a) The mitochondrial tricaboxylate transport protein. J. Biol. Chem. 268, 13682-13690.MEDLINE Abstract
27 Kaplan, R.S., Oliveira, D.L. and Wilson, G.L. (1990) Streptozotocin-induced alterations in the levels of functional mitochondrial anion transport proteins. Arch. Biochem. Biophys. 280, 181-191.MEDLINE Abstract
28 Kaplan, R.S., Mayor, J.A., Blackwell, R., Maughon, R.H. and Wilson, G.L. (1991) The effect of insulin supplementation on diabetes-induced alterations in the extractable levels of functional mitochondrial anion transport proteins. Arch. Biochem. Biophys. 287, 305-311.MEDLINE Abstract
29 Kaplan, R.S., Mayor, J.A., Blackwell, R., Wilson, G.L. and Schaffer, S.W. (1991) Functional levels of mitochondrial anion transport proteins in non-insulin-dependent diabetes mellitus. Mol. Cell Biol. 107, 79-86.
30 Ferencz, C. (1990) A case-control study of cardiovascular malformation in liveborn infants: the morphogenetic relevance of epidemiologic findings. In: Development Cardiology: Morphogenesis and Function (Clark EB and Takao A, eds.) Futura Publishing Company, Inc., Mount Kisco, NY pp. 523-539.
31 Bielke, W., Blaschke, R.J., Miescher, G.C., Zurcher, G., Andres, A.-C., and Ziemiecki, A. (1994) Characterization of a novel murine tests-specific serine/threonine kinase. Gene 139, 235-239.MEDLINE Abstract
32 Hanks, S.K., Quinn, A.M., and Hunter, T. (1988) The protein kinase family: conserved features and deduced phylogeny of the catalytic domains. Science241, 42-51.MEDLINE Abstract
33 Stapleton, D., Gao, G., Michell, B.J., Widmer, J., Mitchelhill, K., Teh, T., House, C.M., Witters, L.A. and Kemp, B.E. (1994) Mammalian 5'-AMP-activated protein kinase non-catalytic subunits are homologs of proteins that interact with yeast Snf1 protein kinase. J. Biol. Chem. 269, 29343-29346.MEDLINE Abstract
34 Woods, A., Munday, M.R., Scott, J., Yang, X., Carlson, M. and Carling, D. (1994) Yeast SNF1 is functionally related to mammalian AMP-activated protein kinase and regulates acetyl-CoA carboxylase in vivo. J. Biol. Chem. 269, 19509-19515.MEDLINE Abstract
35 Guo, S. and Kemphues, K.J. (1995) par-1, a gene required for establishing polarity in C. elegans embryos, encodes a putative Ser/Thr kinase that is asymmetrically distributed. Cell 81, 611-620.MEDLINE Abstract
36 Ruiz, J.C., Conlon, F.L., Robertson, E.J. (1994) Identification of novel protein kinases expressed in the myocardium of the developing mouse heart. Mechan. Develop. 48, 153-164.MEDLINE Abstract
37 Rubin, M.R., Toth, L.E., Patel, M.D., D'Eustachio, P. and Nguyen-Huu, M.C. (1986) A mouse homeo box gene is expressed in spermatocytes and embryos. Science 233, 663-667.MEDLINE Abstract
38 Wolgemuth, D.J., Viviano, C.M., Gizang-Ginsberg, E., Frohman, M.A., Joyner, A.L. and Martin, G.R. (1987) Differential expression of the mouse homeobox-containing gene Hox-1.4 during male germ cell differentiation and embryonic development. Proc. Natl Acad. Sci. USA84, 5813-5817.MEDLINE Abstract
39 Shackleford, G.M. and Varmus, H.E. (1987) Expression of the proto-oncogene int-1 is restricted to postmeiotic male germ cells and the neural tube of mid-gestational embryos. Cell 50, 89-95.MEDLINE Abstract
40 Lovett, M., Kere, J. and Hinton, D. (1991) Direct selection: A method for the isolation of cDNAs encoded by large genomic regions. Proc. Natl Acad. Sci. USA 88, 9628-9632. MEDLINE Abstract
41 Parimoo, S., Patanjali, S.R., Shukla, H., Chaplin, D.D. and Weissman, S.M. (1991) cDNA selection: Efficient PCR approach for the selection of cDNAs encoded in large chromosomal DNA fragments. Proc. Natl Acad. Sci. USA, 88, 9623-9627. MEDLINE Abstract
42 Korn, B., Sedlacek, Z., Manca, A., Kioschis, P., Lehrach, H. and Poustka A. (1992) A strategy for the selection of transcribed sequences in the Xq28 region. Hum. Mol. Genet. 1, 235-242.MEDLINE Abstract
43 Feinberg, A.P. and Vogelstein, B. (1983) A technique for radiolabeling DNA restriction endonuclease fragments to high specific activity. Anal. Biochem, 132, 6-13.MEDLINE Abstract
*To whom correspondence should be addressed
This page is maintained by OUP admin. Last updated Thu Oct 31 15:24:35 GMT 1996. Part of the OUP Journals World Wide Web service.Copyright Oxford University Press, 1996