Human Molecular Genetics, 2001, Vol. 10, No. 4 339-352
© 2001 Oxford University Press
Sequence, structure and pathology of the fully annotated terminal 2 Mb of the short arm of human chromosome 16
1MRC Molecular Haematology Unit, Weatherall Institute for Molecular Medicine, John Radcliffe Hospital, Oxford OX3 9DS, UK, 2The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK and 3Los Alamos National Laboratory, Los Alamos, NM 87545, USA
Received 22 November 2000 ; Revised and Accepted 16 December 2000.
| ABSTRACT |
|---|
|
|
|---|
We have sequenced 1949 kb from the terminal Giemsa light band of human chromosome 16p, enabling us to fully annotate the region extending from the telomeric repeats to the previously published tuberous sclerosis disease 2 (TSC2) and polycystic kidney disease 1 (PKD1) genes. This region can be subdivided into two GC-rich, Alu-rich domains and one GC-rich, Alu-poor domain. The entire region is extremely gene rich, containing 100 confirmed genes and 20 predicted genes. Many of the genes encode widely expressed proteins orchestrating basic cellular processes (e.g. DNA recombination, repair, transcription, RNA processing, signal transduction, intracellular signalling and mRNA translation). Others, such as the
globin genes (HBA1 and HBA2), PDIP and BAIAP3, are specialized tissue-restricted genes. Some of the genes have been previously implicated in the pathophysiology of important human genetic diseases (e.g. asthma, cataracts and the ATR-16 syndrome). Others are known disease genes for
thalassaemia, adult polycystic kidney disease and tuberous sclerosis. There is also linkage evidence for bipolar affective disorder, epilepsy and autism in this region. Sixty-three chromosomal deletions reported here and elsewhere allow us to interpret the results of removing progressively larger numbers of genes from this well defined human telomeric region. | INTRODUCTION |
|---|
|
|
|---|
Although the Human Genome Project is nearing completion, the extent to which the current sequence is accurately assembled and annotated varies considerably from one region to another. In addition to identifying genes, fully annotated sequence will allow us to address global relationships between chromosome structure and function. In particular, we will be able to relate long-range, primary DNA sequence to the key processes of nuclear metabolism including transcription, replication, recombination, repair, methylation, chromatin assembly and nuclear positioning. Extensive preliminary data already suggest that correlations exist between chromosome banding, DNA sequence composition and these processes (1,2).
On a smaller scale, it is known that cis-acting sequences which control expression of specific genes may be located tens or hundreds of kilobases from the gene they regulate. It will therefore be important to establish whether regions of the genome that encode proteins are organized at a level above the unit of the gene and address the question of whether sequence analysis can help identify structurally discrete chromosomal domains that contribute to or reflect function.
The terminal 285 kb of 16p13.3, which includes the
globin genes, has been previously characterized using a variety of functional assays, allowing us to relate primary sequence to known biological function (3). Here, we have sequenced the terminal
2 Mb of human chromosome 16p, enabling us to fully annotate a contig extending from the telomeric repeats to the previously published tuberous sclerosis disease 2 (TSC2) and polycystic kidney disease 1 (PKD1) genes (4,5). This segment of the Giemsa light band 16p13.3 is GC-rich and Alu dense containing many putative CpG islands and genes. In addition, 63 deletions from this 2 Mb region and their corresponding phenotypes, including the ATR-16 syndrome, are reported here and elsewhere (611) allowing us to interpret the effects of deleting progressively larger numbers of genes from this well defined chromosomal region.
Given its very high gene density and proximity to a human telomere, it is not surprising that, in addition to
thalassaemia (12), the ATR-16 syndrome (6), tuberous sclerosis (4) and the adult form of polycystic kidney disease (5), several previously characterized human genetic disease genes may also lie in this gene-rich region. These include asthma (13), cataracts with micro-ophthalmia (14), susceptibility to bipolar affective disorder (15), epilepsy (16) and various forms of autism (1719). This highly annotated sequence extending 2 Mb from the 16p telomere should facilitate rapid identification of disease genes falling in this region. These data therefore provide an ideal opportunity to evaluate the extent to which DNA sequence analysis of the human genome will contribute to our understanding of chromosome structure, function and pathology.
| RESULTS |
|---|
|
|
|---|
Construction and sequencing of the 16p13.3 contig
A physical map of overlapping cosmids spanning the terminal 2 Mb of chromosome 16p was constructed (Fig. 1H) using three different restriction enzymes and multiple hybridizations with internal and end-clone fragments (see Materials and Methods). In most cases, several (314) cosmids were identified at each screening stage from a chromosome 16-specific cosmid library (20), providing significant depth (average, six cosmids), and hence confidence, in the resulting map. Clones chosen to represent the minimal tiling path were analysed using FISH to ensure that they mapped uniquely to the terminal region of 16p. In three instances, the genomic DNA was represented by a single cosmid [c398F6 (AL023882), c313F9/c305F3 (AL031707/AL031706) and c381G6 (AL031598) (Fig. 1H)]. Restriction maps of these regions were later confirmed using sequence data from the clones themselves but their structure has not yet been confirmed in genomic DNA. Two regions (around co-ordinates 426485 and 17751949 kb) are not represented in the chromosome 16 cosmid library. In these cases, a human PAC library (RPCI) (21) was screened with flanking probes and recombinants spanning each gap [PAC196A12 (AL049542) and PAC76P10 (AL132867) (Fig. 1H)] were identified. The provenance of these clones was confirmed from sequence data (see further results on PAC76P10 below), restriction mapping and fluorescence in situ hybridization (FISH) analysis.
|
Sixty-five cosmids and PACs representing the minimal tiling path were sequenced. Of these, 59 are finished with an error rate of <1 in 10 000 bp. As described for other areas of the human genome (22,23), some small segments were consistently difficult to sequence but, in most cases, this could be overcome by applying methods for analysing sequence with a high GC content (see Materials and Methods). Six clones remain unfinished due to difficulties in obtaining sequence.
Clone c357D8 (green box in Fig. 1H) is contiguous, but includes regions of single-stranded sequence and poor quality data, invariably flanked by long tracts of Alu repeats. The missing strands have so far proved impossible to sequence using a variety of technologies.
Five clones (blue boxes in Fig. 1H) each contain a single gap (850, <100, 600 and 1250 bp and
8 kb) not represented in the M13 shotgun libraries of the individual clones. These gaps are flanked by repeat sequences with tracts of very high GC content and it appears that polymerase consistently stalls at specific sequences. Despite considerable effort, completing these gaps, one of which contains an exon belonging to CACNA1H (gene no. 72), is beyond the scope of this current study, but efforts are continuing to complete these clones to a similar standard. It is interesting that 5 of the 16 ATR-16 centromeric chromosomal breakpoints fall within the 182 kb region spanned by four of these same clones, c344F5, c357D8, c303A1 and c333E1 (Fig. 1H), suggesting that there may be some link between chromosome breakage and segments that are difficult to clone, PCR and sequence. There is also evidence for a high degree of genetic recombination occurring in and around these clones (see Relationship between structure, gene expression and recombination).
Overview of the telomeric region of 16p13.3
The overall structure of this area is consistent with previous observations on GC-rich telomeric regions of the human genome (1,3) but provides further detail and resolution. The average GC content of the entire 1949 kb sequence is 57.5%, ranging from 47.2 to 65.3% when subdivided into 100 kb fragments (Fig. 2A), which is higher than the average for the human genome (
42%) (22). Superficial observation of the GC content suggested that this 2 Mb segment may be divided into three: Region I (1500 kb) with an average GC content of 54.1%, Region II (5011500 kb) with an average GC content of 60.9% and Region III (15011949 kb) with an average GC content of 53.7% (Fig. 2A). Assessing the GC content using a 20 kb moving average plotted at the midpoint showed that the regular
90 kb wavelength in GC content previously noted in the terminal 285 kb (3) extends throughout Region I (Fig. 1F). Genes that are transcribed towards the centromere appear to lie within the peaks of this wave pattern and those transcribed towards the telomere appear to lie within the troughs. The GC content remains relatively constant across Region II with a slight dip between co-ordinates 800 and 1000 kb (Fig. 2A), then, in Region III, further modulation in GC content occurs (Fig. 1F).
|
The repetitive elements are summarized in Table 1. As for other GC-rich isochores, the average Alu density is high (19.6%) whereas the density of LINE repeats is low (5.1%). The highest Alu density (29.5%) occurs in Region I (Figs 1G and 2B). In Region II, the Alu density is lower (12.7%) and this segment contains many tandem and simple repeats, a relative increase in the frequency of LINEs and fewer low complexity repeats than adjacent regions. Alu density increases again (24.2%) in Region III.
|
When the entire masked sequence was compared with itself, a 34 kb region was identified (between co-ordinates 1213 and 1247 kb) composed of one direct and two inverted repeats, the largest of which is 10 kb (data not shown). This is the same region that contains the clones which proved most difficult to clone and sequence; also located within this region are four members of the mast cell tryptase gene family: human transmembrane tryptase, tryptase beta III, tryptase beta I and tryptase beta II (13,2426). Directly centromeric to these are three more tryptase-like genes that appear to be pseudogenes. Although the tryptase genes beta IIII have previously been localized to 16p13.3 (13), there is some disagreement in the order and number of tryptase genes presented here (based on genomic sequence data) with the previous report [based on restriction map data (13)].
Synteny between human and mouse sequences
As previously shown, the telomeric 172 kb segment containing the
globin cluster and five of the genes located telomerically are syntenic to an interstitial fragment of mouse chromosome 11 (27). The remainder of the 2 Mb region between LUC7L (gene no. 16) up to and including at least the mouse orthologue of the PKD1 gene, which lies beyond the region sequenced here, is syntenic to mouse chromosome 17 (2830).
Identification of genes and estimates of gene density
The entire sequence was masked for repeats and initially annotated by sequence homology using the BLAST suite of programs (31,32) to search nucleotide [dbEST (33) and EMBL (34)] and protein [SWISS-PROT and TrEMBL (35)] sequence databases. The sequence was also analysed with the exon prediction programs GRAIL1.3, MZEF and XPOUND and the gene prediction programs GENSCAN, FGENES and FGENESH (3641; V.V. Solovyev, unpublished data, see http://genomic.sanger.ac.uk). All sequences and analyses were processed using an automated system and stored in ACEDB (http://www.acedb.org). After extensive review and editing of these data, we classified (Table 2) and characterized (Table 3) 120 genes in the 1949 kb telomeric region. We found corroborative evidence (spliced ESTs, peptide homology or CpG island) for 105; the remainder were supported by GENSCAN predictions alone.
|
|
Of the ab initio gene prediction programs used, GENSCAN was found to be the most accurate at predicting known genes (category A and B, Table 2). The accuracy increased when genes and exons were predicted by more than one program. In some regions, GENSCAN either over- or under-predicted. For example, around 844960 kb, GENSCAN predicted six genes which, from further analysis, appear to be one gene and in another case, (630669 kb), seven closely spaced genes (including two category A) were predicted to be a single transcript. Close inspection of this region revealed at least five clusters of ESTs and five CpG islands. We were able to amplify each putative gene from HeLa mRNA using internal primers but failed to amplify between these genes (data not shown), supporting the interpretation that they are separate genes.
Within this 2 Mb region, gene density is not uniform (Fig. 1C). On average, there is approximately one gene every 16 kb in this region, consistent with previous observations that telomeric, Giemsa light bands are gene dense (1,3,42). There seems to be no bias towards small or large genes. The smallest genomic coverage is TRG4 (gene no. 36), which has a single exon spanning 75 bp; the largest is C16orf26 (gene no. 62) spanning 116 kb. CACNA1H (gene no. 72) has the greatest number of exons (35) and there are several genes with only one exon, of which the largest is IGFALS (gene no. 106). CRAMP1L (gene no. 99) has the largest exon at 4029 bp. The smallest exon (9 bp) lies in NUBP2 (gene no. 105).
The association of CpG islands and genes
We have shown previously that, in the terminal 285 kb, most of the genes are associated with CpG islands (3), with the island lying at the 5' end of their associated genes, spanning the promoter. Putative CpG islands were initially identified from the frequency of CpG dinucleotides (15 CpGs in at least two adjacent 200 bp windows) but discarding sequences containing GC-rich repetitive DNA. Thus, we identified 84 CpG islands within the entire 1949 kb region equally distributed throughout Regions IIII. In contrast, computational methods, such as CPGREPORT [EMBOSS (43)], when searching with conventional criteria for CpG islands (CpG observed/expected frequency > 0.6, %GC > 50, over 200 bp), overestimated and identified 234 putative islands.
The presence of a CpG island is invariably thought to indicate the presence of a nearby gene (44). In the region studied here, 79 genes are associated with CpG islands and 41 are not. This is consistent with previous observations showing that most housekeeping genes and half of all tissue-restricted genes are associated with CpG islands (45); our observations [66% (79/120)], are somewhat higher than the genome average of 56% reported previously (44). Seven putative bi-directional CpG islands are each associated with two genes, one in either transcriptional orientation. Five of these appear to contain two CpG peaks very close together, possibly incorporating two separate CpG islands. Three genes [C16orf8 (gene no. 5), TMEM6 (gene no. 23) and SOX8 (gene no. 63)] each have two associated CpG islands.
Five CpG islands are associated with a predicted gene for which there is no other corroborative evidence (EST or protein homology matches, category F in Table 2). For nine CpG islands, we found no convincing evidence for any gene close by. These putative CpG islands may not be biologically active (unmethylated) or may be associated with genes which do not conform to the current prediction criteria or may be expressed in a tissue or developmental stage-specific manner so that they are not represented in any of the current sequence databases. Unless these orphan CpG islands mark some other chromosomal element, it seems likely that additional genes near these CpG islands will be identified in the future, in which case the gene density in this region may be somewhat higher than currently estimated.
Relationship between structure, gene expression and recombination
Many of the 79 genes associated with CpG islands are widely expressed, although some (e.g. the
globin genes and PDIP) are expressed in a highly tissue-specific manner. We detected no pattern to the orientation of genes in this region: 57.5% (69/120) of the genes are transcribed towards the centromere and 42.5% (51/120) towards the telomere.
We have previously shown, using an in situ hybridization assay (46), that the terminal 50 kb of 16p replicates (or separates) later in S phase than the adjacent 250 kb which replicates early in the cell cycle (46). Provisional data suggest that most of the 2 Mb region also replicates early in the cell cycle although there is a remarkable dip in replication, or separation of chromatids, in the central portion of Region II which is currently under investigation (V. Buckle, unpublished data). Although this occurs in a relatively gene-poor region of this contig, at present there appears to be no clear correlation between replication timing and GC content.
Microsatellite markers allow us to relate the physical map to recombination events recorded in the CEPH consortium linkage map of chromosome 16 (Fig. 1I) (47). Data from the CEPH map indicate that this 2 Mb region has a higher recombination rate (male 12.3 cM, female 8.0 cM, sex-averaged 10.5 cM) than the genome average (1.1 cM/Mb). Recombination events occur most frequently between co-ordinates 900 and 1200 kb (male 6.1 cM and female 5.2 cM for a region of just 0.3 Mb).
Many chromosomal rearrangements have been reported from this segment of chromosome 16 including truncations (6,7,48), interstitial deletions (49) and translocations (10). All known breakpoints, both telomeric and centromeric, are plotted in Figure 1B. While initial inspection suggests that some breakpoints cluster around the
globin complex, this can be explained by the fact that many of these (red bars in Fig. 1B) are highly selected deletions that cause
thalassaemia. Five breakpoints, associated with ATR-16 syndrome, cluster close to the inverted and tandem repeats (at 11211304 kb) which encompass the tryptase gene cluster. Further observations are required to establish if this represents a preferred site of chromosomal breakage.
The effects of monosomy for 16p13.3
Deletions that remove the
globin genes (co-ordinates 162168 kb) give rise to the well-defined haematological phenotype of
thalassaemia. Small deletions (within co-ordinates 129.5178.2 kb) from the
globin cluster are very common (190% carrier frequency) in individuals from tropical and subtropical regions of the world. These deletions are confined to the
globin gene cluster and the surrounding genes remain intact (12; see Conclusions). We recently reported a series of 21 rare, interstitial deletions that remove the
globin genes and a variable number of genes flanking the
globin cluster (11). The largest of these extends for 268 kb and removes 15 functional genes (larger black bar in Fig. 1A). Two heterozygotes for this deletion have
thalassaemia but otherwise appear phenotypically normal (11), demonstrating that, apart from the
globin genes, none of the genes in the terminal 268 kb region of 16p is haploinsufficient.
Here, we have extended this analysis by investigating 16 individuals with still larger deletions (up to 2 Mb) from chromosome 16p (Fig. 1A). All were initially brought to our attention because they have
thalassaemia. Fourteen also have a variety of developmental abnormalities and all have some degree of learning difficulty and therefore have alpha thalassaemia with mental retardation syndrome [ATR-16, OMIM 141750 (6,50)]. Most of these patients have unbalanced translocations making it impossible to distinguish phenotypic features due to monosomy for 16p or trisomy for the other unbalanced chromosome (10,51). However, five patients appear to have pure monosomy of 16p based on cytogenetic studies, multiprobe FISH analysis (52) and, in some cases, analysis of the chromosomal breakpoint (Fig. 1A, red bars).
Two such patients (BA: deletion
757 kb and TN: deletion
951 kb) have no reported physical abnormalities but their cognitive abilities fall in the low-average range in contrast to their close relatives. Three patients with substantially larger deletions (BO: deletion
1900 kb; IM: deletion
2000 kb; LIN: deletion
2000 kb) were previously shown to have facial dysmorphism with a variety of physical abnormalities and significant learning difficulties (6,8,9). The patient GS (deletion
1595 kb), who is also dysmorphic with learning difficulties, has an unbalanced translocation involving satellite DNA from chromosome 21p. Since there may be no contribution to the phenotype from this additional material, this patients phenotype is predominantly due to monosomy for 16p. It is already known that removal of the gene TSC2 causes the clearly defined phenotype of tuberous sclerosis (4,53,54) and thus TSC2 at
2050 kb effectively delimits the ATR-16 phenotype to this terminal 16p region.
The simplest conclusion is that the larger the region of monosomy, the more genes are deleted and the more severe the phenotype. Although it is clear that deletion of some genes may contribute more than others and, in some cases, direct disruption of a gene at a breakpoint may have a bearing on phenotype, there may not be a critical gene that explains all features of ATR-16 syndrome. However, the interpretation of these data is complex and is discussed further below.
Relationship to other known diseases
Previous reports have implicated the terminal region of 16p13.3 in several important human genetic diseases in addition to
thalassaemia, ATR-16 syndrome, tuberous sclerosis and the adult polycystic kidney disease.
The pathophysiology of asthma may involve members of the tryptase gene family (5559). Here, we have shown that four mast cell tryptase genes, and three putative tryptase pseudogenes, lie in a 60 kb region, 1240 kb from the telomere of chromosome 16, as reported previously (13). Although these genes are not exclusively responsible for this polygenic disorder, they appear to play an important role in its pathophysiology (5559).
One of the markers (D16S521), previously linked to a bipolar affective disorder (15), lies 34 kb from the telomere although other markers linked to this disease lie at least 2.5 Mb from the telomere (6062). Autosomal recessive, idiopathic myoclonic epilepsy of infancy has been mapped to a broad region between D16S3024 (at 1594 kb from the telomere) and D16S423 (16). While this includes some of the 2 Mb contig, the highest LOD score (q = 0) corresponds to D16S3027 located at least 2.5 Mb from the telomere.
Significant linkage exists between autism and markers in 16p13.3 (18,19) although the peak probability of linkage lies beyond this 2 Mb region. Autism with Tourettes syndrome has been reported in patients trisomic for 16p13.1-pter (17).
Cataracts with micro-ophthalmia (CATM) maps to 16p13.3 in a single family with a translocation involving chromosomes 2 and 16 (14). Both balanced and unbalanced translocations are associated with CATM indicating that a gene on one of these chromosomes is disrupted by the translocation. The breakpoint in chromosome 16 has been localized to band p13.3 by cytogenetic studies. Although the breakpoint in this family has not yet been refined, SOLH (gene no. 30) is a candidate because of its role in eye formation (63,64).
| CONCLUSIONS |
|---|
|
|
|---|
We have completed and fully annotated the sequence of a human telomere extending 2 Mb from the most terminal (TTAGGG)n repeats. This work highlights some deficiencies in the current public databases (such as: http://www.ncbi.nlm.nih.gov/genome/guide/HsChr16.shtml and http://www.ensembl.org) in which some of the released sequence generated here appears to be misassembled and only sparsely annotated.
The entire region is rich in CpG islands and genes, consistent with previous predictions that the greatest density of genes will occur in GC-rich, telomeric regions of the genome. It is interesting that this relatively small segment of the human genome (0.07%) contains 120 confirmed genes, predicted genes and pseudogenes; this is approximately half as many as identified from the whole of chromosome 21 (284 genes in 33.5 Mb, 1.12% of the genome) (23). It is also approximately three times as gene dense as the equivalent subtelomeric region of human chromosome 22q (22). This extreme variation in gene density emphasizes the difficulty in accurately predicting the number of genes in the human genome using isolated segments of the genome.
This telomeric sequence appears to be divisible into three segments (Regions IIII) on the basis of GC content and Alu density consistent with previous observations on isochores and chromosome flavors (65,66). Throughout these segments there is marked variation in GC content [particularly in Region I (Figs 1F and 2A)] which was not seen when the same sequence was randomized, suggesting a biological basis for this phenomenon. At present the mechanism underlying this variation cannot be clearly related to transcription, replication or recombination. It remains possible that these variations reflect or contribute to some aspect of the higher order folding or organization of the chromosome.
At a higher level of resolution, we examined the distribution of genes along the chromosome. Again, no clear patterns emerge with respect to size, type or orientation but it is clear that tissue-restricted genes are intermingled with widely expressed genes. The question arises of how such genes are independently regulated and it has been frequently proposed that each gene may be sequestered in an independent structural and functional domain. Despite the popularity of such models (67,68), to date, no DNA sequence basis for subdivisions of the chromosome has emerged. Given the considerable overlap between genes and regulatory elements in the well-characterized terminal 285 kb region of 16p it seems unlikely that this region could be simply subdivided into independently-regulated chromosomal domains as described by Prioleau et al. (69). The identification and characterization of putative chromatin boundary elements in other segments of the genome (70,71) suggest that if such chromosomal subdivisions exist in this telomeric region of the chromosome they may be difficult to predict from primary sequence.
The distribution of all known chromosomal breakpoints in this area is quite uneven and presumably reflects complex interactions between ascertainment bias, natural selection and the locations of preferred sites of recombination. One group (red bars in Fig. 1B), selected because they cause
thalassaemia, cluster around the
globin genes. It is interesting that none of the common, highly selected forms of
thalassaemia (12) removes the flanking genes even though rare individuals with larger deletions appear phenotypically normal. Presumably, although deletions extending into these highly conserved, widely expressed genes can be tolerated in rare heterozygotes (11), as a group, such individuals may be at some selective disadvantage. These deletions would almost certainly be lethal in homozygotes. The second group (blue bars in Fig. 1B) was identified because these individuals have
thalassaemia with learning difficulties and, in most cases, additional developmental abnormalities. None of these breakpoints falls in the region 268757 kb. Presumably although such telomeric deletions would cause
thalassaemia, one might predict that they do not produce any easily discernible phenotype and therefore do not commonly come to medical attention. The breakpoints clustered
1200 kb from the 16p telomere occur near a repetitive region that contains a block of tryptase genes and pseudogenes, has a high rate of recombination and contains cosmids that have proven difficult to clone and sequence. It remains to be determined whether this is a preferred site of chromosome breakage.
The acquisition of fully annotated sequence has enabled us to begin to relate long-range DNA sequence to chromosome structure function and pathology. Clearly this sequence resource will now enable us to extend these studies and construct microarrays to specifically analyse this region in a systematic, unbiased manner.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Physical mapping
The clones to complete the physical map were obtained from either the Los Alamos chromosome 16-specific cosmid library (20) or from the HGMP PAC library RPCI1 (21) by direct radioactive hybridization of filters using precisely known markers or probes derived from the clones themselves. Probes were labelled using the MegaPrime DNA Labelling Kit and [
-32P]dCTP (Amersham Pharmacia Biotech). Clones were grown up using standard conditions [using ampicillin for SuperCos1 based cosmids (Stratagene) or kanamycin for the PACs]. DNA was prepared using standard techniques (Hybaid and Qiagen). Clones were digested separately with EcoRI, NotI and HindIII and electrophoresed on 0.8% agarose gels. The gels were stained with Vistra Green (Amersham Pharmacia Biotech) and scanned on a Storm PhosphorImager (Molecular Dynamics, Amersham Pharmacia Biotech) to identify exact fragment sizes. Southern blots of the gels were hybridized using end fragments generated from the clones themselves, using the Gene Images DNA Labelling kit (Amersham Pharmacia Biotech) to identify positive clones and combined with the restriction enzyme data to allow an accurate map of the growing contig to be established. Based on these data, a minimum tiling path of clones was chosen to represent the physical map (Fig. 1H). All of these clones were analysed using standard FISH techniques to confirm their location on human chromosome 16p13.3 (72). EMBL IDs with accession numbers in brackets for the minimum tiling path are as follows, in order, telomeric to centromeric: HSPTEL (Z84812), HSLAW2 (Z84723), HSNFG9 (Z69719), HSRA36 (Z69720), HSGG4 (Z84722), HSX94 (Z84813), HS24F8 (Z69666), HSGG1 (Z84721), HScos12 (Z69706), HSRJ14 (Z69890), HS310H5 (Z69705), HS314G4 (Z69667), HS419C1 (Z99754), HS333B10 (Z81450), HS415C1 (Z98272), HS367G8 (Z97634), HS359F1 (AL023881), HSC196A12 (AL049542), HS356B8 (Z98882), HS366D1 (Z97986), HS407A10 (Z98883), HS338H10 (Z98881), HS398G5 (Z84479), HS349E10 (AL022341), HS313D11 (Z92544), HS380A1 (Z97653), HS444G9 (Z98258), HS335H7 (AL031258), HS321D2 (AL031033), HS398F6 (AL023882), HS360A4 (Al031008), HS360B4 (AL031716), HS306A4 (AL008727), HS366D3 (Z93041), HS443D9 (Z92845), HS394H11 (Z99757), HS422E10 (AL024496), HS313F9 (AL031707), HS305F3 (AL031706), HS349E11 (AL031713), HS381G6 (AL031598), HS344F5 (AL031712), HS302G6 (AL031703), HS357D8 (AL031715), HS303A1 (AL031704), HS333E1 (AL031711), HS358B7 (AL031714), HS316G12 (AL031709), HS399E4 (AL031721), HS312E8 (AL032819), LA16438F12 (AL137252), HS390E6 (AL031600), HS305C8 (AL031705), HS385E7 (AL031720), HS380F5 (AL031719), HS313F4 (Z97633), HS425C2 (AL133297), HS395F10 (Z97652), HS315G5 (AL031708), HS431H6 (AL031009), HS329F2 (AL031710), HS361A3 (AL031717), HS371H6 (AL031718) and HSAC76P10 (AL132867). The GenBank accession number for the complete 1949 kb is AE005175.
End-clone production
The terminal fragments for chosen clones were obtained in the following manner. The DNA was cleaved with SacI (or XhoI or ApaI) for SuperCos1 cosmids or XhoI (or ApaI) for the CyPAC2n clones (these enzymes were chosen because they did not cut within the vector and could be heat-inactivated). The digests were heat-inactivated and ligase and ligase buffer (Promega) were added according to the manufacturers instructions. The ligations and transformations were performed using standard protocols. The DNA was extracted using Hybaids Miniprep kit and a test quantity of DNA digested with the original enzyme described above to confirm that the new clone produced a single linear fragment. The correct subclones were then digested with the original enzyme and NotI to release the vector from the two terminal fragments. This digest was electrophoresed on a 0.8% low melting point agarose preparative gel and the terminal fragments excised. No further purification was required before labelling either radioactively or non-radioactively as described above, except for incubation at 65°C for 5 min to melt the agarose slice.
Sequencing
The cosmids and PACs were sequenced using a standard shotgun approach (73). In brief, the clone DNA was sonicated and 1.42.0 kb sized fragments were ligated into M13 or pUC vectors and transformed. Restriction digest data were used to estimate the size of each clone and around 200 sequence reads per 10 kb were generated using fluorescent dye-labelled terminators and primers on ABI 373A and ABI 377 sequencing machines (PE Applied Biosystems). The M13 subclones were sequenced using forward primers, while both forward and reverse primers were used to sequence the pUC subclones.
The sequence reads were base-called using phred (74) and assembled using phrap (http://www.phrap.org) into a GAP database (75) for editing. Standard finishing methods were employed to bring about gap closure and resolve sequence ambiguities. Various software tools were used to check the quality of the sequence and restriction digests were used to confirm the assembly of each clone.
Sequencing gaps that failed to be resolved by standard shotgun and finishing approaches were tackled by a number of techniques. (i) Using an oligo-screening strategy to identify further M13 clones that may extend the gap sequence or close the gap altogether (76). (ii) Sequencing subclones from a short insert library generated either from a pUC subclone that spanned the gap or from a subcloned restriction fragment (77). (iii) PCR across the gap and direct sequencing of the PCR product using the original and internal primers. (iv) Direct sequencing of the cosmid DNA using primers that flank the gap [using 3 µg template DNA, 16 µl of standard ABI BigDye (PE Applied Biosystems) sequencing mix and 45 cycles]. (v) Application of the previous methods (iii and iv) but substituting ABI BigDye dGTP, increasing the PCR and sequencing denaturing temperature to 98°C and/or adding 1 M betaine to the PCR and sequencing reactions. (vi) Using standard manual sequencing techniques (Amersham).
Phenotypes of patients with 16p monosomy
The clinical features of these patients are briefly described here but have been or will be presented in detail elsewhere.
Patient BA.
A preliminary report of this patient (78) described her as a phenotypically normal 14 year-old girl with a marked discrepancy between verbal and performance IQs, measured at 89 and 75, respectively. The chromosomal breakpoint in this patient lies in c335H7,
757 kb from the 16p telomere.
Patients TN.
Two brothers have delayed development for speech and walking; one also has a left iris coloboma. Their mother also has this deletion and is clearly intellectually different from her siblings (unpublished data). The TN breakpoint lies in c443D9,
951 kb from the 16p telomere.
Patient GS.
A boy aged 3 years. He had moderate delay in receptive language abilities and severe delay in expressive language abilities (unpublished data). The breakpoint lies between c313F4 and c395F10,
1595 kb from the 16p telomere.
Patient BO.
This patient was described in detail by Wilkie et al. (6) and references therein. At 15 years of age, he was moderately to severely retarded (IQ 53) with mild facial dysmorphism and minor congenital abnormalities. The breakpoint lies in PAC76P10,
1900 kb from the 16p telomere.
Patient IM.
This patient was described as having developmental delay and at 8 years of age had the mental ability of a 5-year-old (8). The breakpoint lies telomeric to the PKD1 region,
2000 kb from the 16p telomere.
Patient LIN.
This patient was described as having developmental delay with sign language developing at 2 years-of-age and walking by 2 years (9). She also has a variety of mild dysmorphic features. The breakpoint lies telomeric to the PKD1 region,
2000 kb from the 16p telomere.
Breakpoint analysis
Chromosome 16p deletion patients were analysed using standard FISH techniques as described previously (72), with selected clones from the minimum tiling path to localize the chromosome 16 breakpoint to one or two clones. Multiprobe FISH analyses were performed as described previously (52).
| ACKNOWLEDGEMENTS |
|---|
Members of the Sanger Centre team are: Rachael Ainscough, Claire Bagguley, Karen Barlow, Caroline Baynes, Lisa Beard, Victoria Cobley, Gerard Coville, Sancha Donnelly, Andrew Ellington, Kerry Fleming, Debbie Frame, John Frankland, Audrey Fraser, Lisa Gilby, Rebekah Hall, Gretta Hall-Tamlyn, Sarah Holmes, Bijay Jassal, Matthew Jones, Jo Kershaw, Andrew Kimberley, Andrew King, Julia Lightning, Madeleine Moore, Chantal Percy, Adelaide Pettett, Ratna Shownkeen, Matthew Sims, Charlie Steward, Daniel Thomas, Karen Thomas, Justine Wallis, David Willey, Laurens Wilming and John Woodward. The authors would also like to thank: Richard Gibbons, Jane Rogers (Sanger Centre), M. Gardner, M. Descartes, Helen Brown, Ahmed Daghir, Hadley Wood, Christopher Ward, Peter Harris, the HUGO Nomenclature Committee (H. Wain, M. Lush, M. Wright, R. Lovering, E. Bruford and S. Povey), the Medical Research Council, the Wellcome Trust (J.F.) and the UK HGMP Resource Centre.
| FOOTNOTES |
|---|
+ Christine Lloyd headed the production of the sequence at the Sanger Centre. A full list of past and present members of staff who contributed to generating this sequence is given in the Acknowledgements.
§ To whom correspondence should be addressed. Tel: +44 1865 222393; Fax: +44 1865 222500; Email: drhiggs@molbiol.ox.ac.uk ![]()
| REFERENCES |
|---|
|
|
|---|
1 Craig, J.M. and Bickmore, W.A. (1993) Chromosome bandsflavours to savour. Bioessays, 15, 349354.[ISI][Medline]
2 Bernardi, G. (2000) Isochores and the evolutionary genomics of vertebrates. Gene, 241, 317.[ISI][Medline]
3 Flint, J., Thomas, K., Micklem, G., Raynham, H., Clark, K., Doggett, N.A., King, A. and Higgs, D.R. (1997) The relationship between chromosome structure and function at a human telomeric region. Nature Genet., 15, 252257.[ISI][Medline]
4 European Chromosome 16 Tuberous Sclerosis Consortium (1993) Identification and characterization of the tuberous sclerosis gene on chromosome 16. Cell, 75, 13051315.[ISI][Medline]
5 European Polycystic Kidney Disease Consortium (1994) The polycystic kidney disease 1 gene encodes a 14 kb transcript and lies within a duplicated region on chromosome 16. Cell, 77, 881894.[ISI][Medline]
6 Wilkie, A.O.M., Buckle, V.J., Harris, P.C., Lamb, J., Barton, N.J., Reeders, S.T., Lindenbaum, R.H., Nicholls, R.D., Barrow, M., Bethlenfalvay, N.C. et al. (1990) Clinical features and molecular analysis of the
thalassaemia/mental retardation syndromes. I. Cases due to deletions involving chromosome band 16p13.3. Am. J. Hum. Genet., 46, 11121126.[ISI][Medline]
7 Lamb, J., Harris, P.C., Wilkie, A.O.M., Wood, W.G., Dauwerse, J.G. and Higgs, D.R. (1993) De novo truncation of chromosome 16p and healing with (TTAGGG)n in the
-thalassemia/mental retardation syndrome (ATR-16). Am. J. Hum. Genet., 52, 668676.[ISI][Medline]
8 Fei, Y.J., Liu, J.C., McKie, V.C. and Huisman, T.H. (1992) Hb H disease and mild mental retardation in a black girl with a Hb S heterozygosity. Hemoglobin, 16, 431434.[ISI][Medline]
9 Lindor, N.M., Valdes, M.G., Wick, M., Thibodeau, S.N. and Jalal, S. (1997) De novo 16p deletion: ATR-16 syndrome. Am. J. Med. Genet., 72, 451454.[ISI][Medline]
10 Lamb, J., Wilkie, A.O.M., Harris, P.C., Buckle, V.J., Lindenbaum, R.H., Barton, N.J., Reeders, S.T., Weatherall, D.J. and Higgs, D.R. (1989) Detection of breakpoints in submicroscopic chromosomal translocation, illustrating an important mechanism for genetic disease. Lancet, 2, 819824.[ISI][Medline]
11 Horsley, S.W., Daniels, R.J., Anguita, E., Raynham, H.A., Peden, J.F., Villegas, A., Vickers, M.A., Green, S., Chui, D.H.K., Ayyub, H., et al. (2001) Monosomy for the most telomeric, gene-rich region of human chromosome 16p causes minimal phenotypic effects. Eur. J. Hum. Genet., in press.
12 Higgs, D.R., Vickers, M.A., Wilkie, A.O.M., Pretorius, I.-M., Jarman, A.P. and Weatherall, D.J. (1989) A review of the molecular genetics of the human
-globin gene cluster. Blood, 73, 10811104.
13 Pallaoro, M., Fejzo, M.S., Shayesteh, L., Blount, J.L. and Caughey, G.H. (1999) Characterization of genes encoding known and novel human mast cell tryptases on chromosome 16p13.3. J. Biol. Chem., 274, 33553362.
14 Yokoyama, Y., Narahara, K., Tsuji, K., Ninomiya, S. and Seino, Y. (1992) Autosomal dominant congenital cataract and microphthalmia associated with a familial t(2;16) translocation. Hum. Genet., 90, 177178.[ISI][Medline]
15 Detera-Wadleigh, S.D., Barden, N., Craddock, N., Ewald, H., Foroud, T., Kelsoe, J. and McQuillin, A. (1999) Chromosomes 12 and 16 Workshop. Am. J. Med. Genet., 88, 255259.[ISI][Medline]
16 Zara, F., Gennaro, E., Stabile, M., Carbone, I., Malacarne, M., Majello, L., Santangelo, R., Antonio de Falco, F. and Bricarelli, F.D. (2000) Mapping of a locus for a familial autosomal recessive idiopathic myoclonic epilepsy of infancy to chromosome 16p13. Am. J. Hum. Genet., 66, 15521557.[ISI][Medline]
17 Hebebrand, J., Martin, M., Körner, J., Roitzheim, B., de Braganca, K., Werner, W. and Remschmidt, H. (1994) Partial trisomy 16p in an adolescent with autistic disorder and Tourettes syndrome. Am. J. Med. Genet., 54, 268270.[ISI][Medline]
18 International Molecular Genetic Study of Autism Consortium (1998) A full genome screen for autism with evidence for linkage to a region on chromosome 7q. Hum. Mol. Genet., 7, 571578.
19 Philippe, A., Martinez, M., Guilloud-Bataille, M., Gillberg, C., Råstam, M., Sponheim, E., Coleman, M., Zappella, M., Aschauer, H., van Malldergerme, L. et al. (1999) Genome-wide scan for autism susceptibility genes. Hum. Mol. Genet., 8, 805812.
20 Stallings, R.L., Torney, D.C., Hildebrand, C.E., Longmire, J.L., Deaven, L.L., Jett, J.H., Doggett, N.A. and Moyzis, R.K. (1990) Physical mapping of human chromosomes by repetitive sequence fingerprinting. Proc. Natl Acad. Sci. USA, 87, 62186222.
21 Ioannou, P.A., Amemiya, C.T., Garnes, J., Kroisel, P.M., Shizuya, H., Chen, C., Batzer, M.A. and de Jong, P.J. (1994) A new bacteriophage P1-derived vector for the propagation of large human DNA fragments. Nature Genet., 6, 8489.[ISI][Medline]
22 Dunham, I., Shimizu, N., Roe, B.A., Chissoe, S., Hunt, A.R., Collins, J.E., Bruskiewich, R., Beare, D.M., Clamp, M., Smink, L.J. et al. (1999) The DNA sequence of human chromosome 22. Nature, 402, 489495.[Medline]
23 The Chromosome 21 Mapping and Sequencing Consortium (2000) The DNA sequence of human chromosome 21. Nature, 405, 311319.[Medline]
24 Wong, G.W., Tang, Y., Feyfant, E., Sali, A., Li, L., Li, Y., Huang, C., Friend, D.S., Krilis, S.A. and Stevens, R.L. (1999) Identification of a new member of the tryptase family of mouse and human mast cell proteases which possesses a novel COOH-terminal hydrophobic extension. J. Biol. Chem., 274, 3078430793.
25 Miller, J.S., Moxley, G. and Schwartz, L.B. (1990) Cloning and characterization of a second complementary DNA for human tryptase. J. Clin. Invest., 86, 864870.
26 Vanderslice, P., Ballinger, S.M., Tam, E.K., Goldstein, S.M., Craik, C.S. and Caughey, G.H. (1990) Human mast cell tryptase: multiple cDNAs and genes reveal a multigene serine protease family. Proc. Natl Acad. Sci. USA, 87, 38113815.
27 Flint, J., Tufarelli, C., Peden, J., Clark, K., Daniels, R.J., Hardison, R., Miller, W., Philipsen, S., Tan-Un, K.C., McMorrow, T. et al. (2001) Comparative genome analysis delimits a chromosomal domain and identifies key regulatory elements in the
globin cluster. Hum. Mol. Genet., 10, 371382.
28 Tufarelli, C., Frischauf, A.-M., Hardison, R., Flint, J. and Higgs, D.R. (2001) Characterisation of a widely expressed gene (LUC7-LIKE) defining the centromeric boundary of the human
globin domain. Genomics, 71, in press.
29 Olsson, P.G., Sutherland, H.F., Nowicka, U., Korn, B., Poutska, A. and Frischauf, A.M. (1995) The mouse homologue of the tuberin gene (TSC2) maps to a conserved synteny group between mouse chromosome 17 and human 16p13.3. Genomics, 25, 339340.[ISI][Medline]
30 Olsson, P.G., Lohning, C., Horsley, S., Kearney, L., Harris, P.C. and Frischauf, A. (1996) The mouse homologue of the polycystic kidney disease gene (Pkd1) is a single-copy gene. Genomics, 34, 233235.[ISI][Medline]
31 Altschul, S.F., Gish, W., Miller, W., Myers, E.W. and Lipman, D.J. (1990) Basic local alignment search tool. J. Mol. Biol., 215, 403410.[ISI][Medline]
32 Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D.J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database searcch programs. Nucleic Acids Res., 25, 33893402.
33 Boguski, M.S., Lowe, T.M. and Tolstoshev, C.M. (1993) dbESTdatabase for expressed sequence tags. Nature Genet., 4, 332333.[ISI][Medline]
34 Baker, W., van den Broek, A., Camon, E., Hingamp, P., Sterk, P., Stoesser, G. and Tuli, M.A. (2000) The EMBO nucleotide sequence database. Nucleic Acids Res., 28, 1923.
35 Bairoch, A. and Apweiler, R. (2000) The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res., 28, 4548.
36 Xu, Y., Mural, R., Shah, M. and Uberbacher, E. (1994) Recognizing exons in genomic sequence using GRAIL II. Genet. Eng., 16, 241253.
37 Solovyev, V.V., Salamov, A.A. and Lawrence, C.B. (1995) Identification of human gene structure using linear discriminant functions and dynamic programming. Ismb, 3, 367375.
38 Zhang, M.Q. (1997) Identification of protein coding regions in the human genome by quadratic discriminant analysis. Proc. Natl Acad. Sci. USA, 94, 565568.
39 Thomas, A. and Skolnick, M.H. (1994) A probabilistic model for detecting coding regions in DNA sequences. IMA J. Math. Appl. Med. Biol., 11, 149160.
40 Uberbacher, E.C. and Mural, R.J. (1991) Locating protein coding regions in human DNA sequences using a multiple sensor-neural network approach. Proc. Natl Acad. Sci. USA, 88, 1126111265.
41 Burge, C. and Karlin, S. (1997) Prediction of complete gene structures in human genomic DNA. J. Mol. Biol., 268, 7894.[ISI][Medline]
42 Saccone, S., De Sario, A., Della Valle, G. and Bernardi, G. (1992) The highest gene concentrations in the human genome are in telomeric bands of metaphase chromosomes. Proc. Natl Acad. Sci. USA, 89, 49134917.
43 Rice, P., Longden, O. and Bleasby, A. (2000) EMBOSS: The European Molecular Biology Open Software Suite. Trends Genet., 16, 276277.[ISI][Medline]
44 Cross, S.H. and Bird, A.P. (1995) CpG islands and genes. Curr. Opin. Genet. Dev., 5, 309314.[ISI][Medline]
45 Larsen, F., Gundersen, G., Lopez, R. and Prydz, H. (1992) CpG islands as gene markers in the human genome. Genomics, 13, 10951107.[ISI][Medline]
46 Smith, Z.E. and Higgs, D.R. (1999) The pattern of replication at a human telomeric region (16p13.3): its relationship to chromosome structure and gene expression. Hum. Mol. Genet., 8, 13731386.
47 Kozman, H.M., Keith, T.P., Donis-Keller, H., White, R.L., Weissenbach, J., Dean, M., Vergnaud, G., Kidd, K., Gussella, J., Royle, N.J. et al. (1995) The CEPH Consortium linkage map of human chromosome 16. Genomics, 25, 4458.[ISI][Medline]
48 Flint, J., Craddock, C.F., Villegas, A., Bentley, D.P., Williams, H.J., Galanello, R., Cao, A., Wood, W.G., Ayyub, H. and Higgs, D.R. (1994) Healing of broken human chromosomes by the addition of telomeric repeats. Am. J. Hum. Genet., 55, 505512.[ISI][Medline]
49 Hatton, C., Wilkie, A.O.M., Drysdale, H.C., Wood, W.G., Vickers, M.A., Sharpe, J., Ayyub, H., Pretorius, I.-M., Buckle, V.J. and Higgs, D.R. (1990) Alpha thalassemia caused by a large (62 kb) deletion upstream of the human
globin gene cluster. Blood, 76, 221227.
50 Weatherall, D.J., Higgs, D.R., Bunch, C., Old, J.M., Hunt, D.M., Pressley, L., Clegg, J.B., Bethlenfalvay, N.C., Sjolin, S., Koler, R.D. et al. (1981) Hemoglobin H disease and mental retardation. A new syndrome or a remarkable coincidence? N. Engl. J. Med., 305, 607612.[Abstract]
51 Gibbons, R.J. and Higgs, D.R. (2001) The alpha thalassemia/mental retardation syndromes. In Steinberg, M.H., Forget, B.G., Higgs, D.R. and Nagel, R.L. (eds), Disorders of Hemoglobin. Cambridge University Press, Cambridge, UK.
52 Knight, S.J., Horsley, S.W., Regan, R., Lawrie, N.M., Maher, E.J., Cardy, D.L., Flint, J. and Kearney, L. (1997) Development and clinical application of an innovative fluorescence in situ hybridization technique which detects submicroscopic rearrangements involving telomeres. Eur. J. Hum. Genet., 5, 18.[ISI][Medline]
53 Harris, P.C. (1997) The TSC2/PKD1 contiguous gene syndrome. Contrib. Nephrol., 122, 7682.[ISI][Medline]
54 Cheadle, J.P., Reeve, M.P., Sampson, J.R. and Kwiatkowski, D.J. (2000) Molecular genetic advances in tuberous sclerosis. Hum. Genet., 107, 97114.[ISI][Medline]
55 De Sanctis, G.T., Merchant, M., Beier, D.R., Dredge, R.G., Grobholz, J.K., Martin, T.R., Lander, E.S. and Drazen, J.M. (1995) Quantitative locus analysis of airway hyperresponsiveness in A/J and C57BL/6J mice. Nature Genet., 11, 150154.[ISI][Medline]
56 Caughey, G.H. (1997) Of mites and men: trypsin-like proteases in the lungs. Am. J. Respir. Cell Mol. Biol., 16, 621628.[Abstract]
57 Hunt, J.E., Friend, D.S., Gurish, M.F., Feyfant, E., Sali, A., Huang, C., Ghildyal, N., Stechschulte, S., Austen, K.F. and Stevens, R.L. (1997) Mouse mast cell protease 9, a novel member of the chromosome 14 family of serine proteases that is selectively expressed in uterine mast cells. J. Biol. Chem., 272, 2915829166.
58 Johnson, P.R., Ammit, A.J., Carlin, S.M., Armour, C.L., Caughey, G.H. and Black, J.L. (1997) Mast cell tryptase potentiates histamine-induced contraction in human sensitized bronchus. Eur. Respir. J., 10, 3843.[Abstract]
59 Rice, K.D., Tanaka, R.D., Katz, B.A., Numerof, R.P. and Moore, W.R. (1998) Inhibitors of tryptase for the treatment of mast cell-mediated diseases. Curr. Pharm. Des., 4, 381396. [ISI][Medline]
60 McInnes, L.A., Escamilla, M.A., Service, S.K., Reus, V.I., Leon, P., Silva, S., Rojas, E., Spesny, M., Baharloo, S., Blakenship, K. et al. (1996) A complete genome screen for genes predisposing to severe bipolar disorder in two Costa Rican pedigrees. Proc. Natl Acad. Sci. USA, 93, 1306013065.
61 Ewald, H., Mors, P., Flint, T., Koed, K., Eiberg, H. and Kruse, T.A. (1995) A possible locus for manic depressive illness on chromosome 16p13. Psychiatr. Genet., 5, 7181.[ISI][Medline]
62 Edenberg, H.J., Foroud, T., Conneally, P.M., Sorbel, J.J., Carr, K., Crose, C., Willig, C., Zhao, J., Miller, M., Bowman, E. et al. (1997) Initial genomic scan of the NIMH genetics initiative bipolar pedigrees: chromosomes 3, 5, 15, 16, 17 and 22. Am. J. Med. Genet., 74, 238246.[ISI][Medline]
63 Kamei, M., Webb, G.C., Young, I.G. and Campbell, H.D. (1998) SOLH, a human homologue of the Drosophila melanogaster small optic lobes gene is a member of the calpain and zin-finger gene families and maps to human chromosome 16p13.3 near CATM (cataract with microphthalmia). Genomics, 51, 197206.[ISI][Medline]
64 Kamei, M., Webb, G.C., Heydon, K., Hendry, I.A., Young, I.G. and Campbell, H.D. (2000) Solh, the mouse homologue of the Drosophila melanogaster small optic lobes gene: organization, chromosomal mapping and localization of gene product to the olfactory bulb. Genomics, 64, 8289.[ISI][Medline]
65 Bernardi, G., Olofsson, B., Filipski, J., Zerial, M., Salinas, J., Cuny, G., Meunier-Rotival, M. and Rodier, F. (1985) The mosaic genome of warm-blooded vertebrates. Science, 228, 953957.
66 Holmquist, G.P. (1992) Review article: Chromosomal bands, their chromatin flavors and their functional features. Am. J. Hum. Genet., 51, 1737.[ISI][Medline]
67 Kitzberg, D., Selig, S. and Cedar, H. (1991) Chromosome structure and eukaryotic gene organization. Curr. Opin. Genet. Dev., 1, 534537.[Medline]
68 Kellum, R. and Elgin, S.C. (1998) Chromatin boundaries: punctuating the genome. Curr. Biol., 8, R521R524.[ISI][Medline]
69 Prioleau, M.N., Nony, P., Simpson, M. and Felsenfeld, G. (1999) An insulator element and condensed chromatin region separate the chicken beta-globin locus from an independently regulated erythroid-specific folate receptor gene. EMBO J., 18, 40354048.[ISI][Medline]
70 Bell, A.C. and Felsenfeld, G. (1999) Stopped at the border: boundaries and insulators. Curr. Opin. Genet. Dev., 9, 191198.[ISI][Medline]
71 Sun, F.L. and Elgin, S.C. (1999) Putting boundaries on silence. Cell, 99, 459462.[ISI][Medline]
72 Buckle, V.J. and Rack, K. (1993) Fluorescent in situ hybridisation. In Davies, K.E. (ed.), Human Genetic Diseases. IRL Press, Oxford, UK, pp. 5980.
73 Bankier, A.T., Weston, K.M. and Barrell, B.G. (1987) Random cloning and sequencing by the M13/dideoxynucleotide chain termination method. Methods Enzymol., 155, 5193.[ISI][Medline]
74 Ewing, B. and Green, P. (1998) Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res., 8, 186194.
75 Bonfield, J.K., Smith, K.f. and Staden, R. (1995) A new DNA sequence assembly program. Nucleic Acids Res., 23, 49924999.
76 Flint, J., Sims, M., Clark, K., Staden, R. and Thomas, K. (1998) An oligo-screening strategy to fill gaps found during shotgun sequencing projects. DNA Seq., 8, 241245. [ISI][Medline]
77 McMurray, A.A., Sulston, J.E. and Quail, M.A. (1998) Short-insert libraries as a method of problem solving in genome sequencing. Genome Res., 8, 562566.
78 Waye, J.S., Chui, D.H.K., Higgs, D.R., Hetherington, R. and Olivieri, N.F. (1995) De novo deletion of the entire
globin gene cluster in a girl with Hb H disease (Abstract). Blood, 86, 8a.
79 Brook-Carter, P.T., Peral, B., Ward, C.J., Thompson, P., Hughes, J., Maheshwar, M.M., Nellist, M., Gamble, V., Harris, P.C. and Sampson, J.R. (1994) Deletion of the TSC2 and PKD1 genes associated with severe infantile polycystic kidney diseasea contiguous gene syndrome. Nature Genet., 8, 328332.[ISI][Medline]
80 Burn, T.C., Connors, T.D., Van Raay, T.J., Dackowski, W.R., Millholland, J.M., Klinger, K.W. and Landes, G.M. (1996) Generation of a transcriptional map for a 700-kb region surrounding the polycystic kidney disease type 1 (PKD1) and tuberous sclerosis type 2 (TSC2) disease genes on human chromosome 16p13.3. Genome Res., 6, 525537.
81 Aspinwall, R., Rothwell, D.G., Roldan-Arjona, T., Anselmino, C., Ward, C.J., Cheadle, J.P., Sampson, J.R., Lindahl, T., Harris, P.C. and Hickson, I.D. (1997) Cloning and characterization of a functional human homolog of E.coli endonuclease III. Proc. Natl Acad. Sci. USA, 194, 109114.

