Skip Navigation

This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (44)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Daniels, R. J.
Right arrow Articles by Higgs, D. R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Daniels, R. J.
Right arrow Articles by Higgs, D. R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Human Molecular Genetics, 2001, Vol. 10, No. 4 339-352
© 2001 Oxford University Press

Sequence, structure and pathology of the fully annotated terminal 2 Mb of the short arm of human chromosome 16

Rachael J. Daniels1, John F. Peden1, Christine Lloyd2,+, Sharon W. Horsley1, Kevin Clark1, Cristina Tufarelli1, Lyndal Kearney1, Veronica J. Buckle1, Norman A. Doggett3, Jonathan Flint1 and Douglas R. Higgs1,§

1MRC Molecular Haematology Unit, Weatherall Institute for Molecular Medicine, John Radcliffe Hospital, Oxford OX3 9DS, UK, 2The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK and 3Los Alamos National Laboratory, Los Alamos, NM 87545, USA

Received 22 November 2000 ; Revised and Accepted 16 December 2000.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 CONCLUSIONS
 MATERIALS AND METHODS
 REFERENCES
 
We have sequenced 1949 kb from the terminal Giemsa light band of human chromosome 16p, enabling us to fully annotate the region extending from the telomeric repeats to the previously published tuberous sclerosis disease 2 (TSC2) and polycystic kidney disease 1 (PKD1) genes. This region can be subdivided into two GC-rich, Alu-rich domains and one GC-rich, Alu-poor domain. The entire region is extremely gene rich, containing 100 confirmed genes and 20 predicted genes. Many of the genes encode widely expressed proteins orchestrating basic cellular processes (e.g. DNA recombination, repair, transcription, RNA processing, signal transduction, intracellular signalling and mRNA translation). Others, such as the {alpha} globin genes (HBA1 and HBA2), PDIP and BAIAP3, are specialized tissue-restricted genes. Some of the genes have been previously implicated in the pathophysiology of important human genetic diseases (e.g. asthma, cataracts and the ATR-16 syndrome). Others are known disease genes for {alpha} thalassaemia, adult polycystic kidney disease and tuberous sclerosis. There is also linkage evidence for bipolar affective disorder, epilepsy and autism in this region. Sixty-three chromosomal deletions reported here and elsewhere allow us to interpret the results of removing progressively larger numbers of genes from this well defined human telomeric region.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 CONCLUSIONS
 MATERIALS AND METHODS
 REFERENCES
 
Although the Human Genome Project is nearing completion, the extent to which the current sequence is accurately assembled and annotated varies considerably from one region to another. In addition to identifying genes, fully annotated sequence will allow us to address global relationships between chromosome structure and function. In particular, we will be able to relate long-range, primary DNA sequence to the key processes of nuclear metabolism including transcription, replication, recombination, repair, methylation, chromatin assembly and nuclear positioning. Extensive preliminary data already suggest that correlations exist between chromosome banding, DNA sequence composition and these processes (1,2).

On a smaller scale, it is known that cis-acting sequences which control expression of specific genes may be located tens or hundreds of kilobases from the gene they regulate. It will therefore be important to establish whether regions of the genome that encode proteins are organized at a level above the unit of the gene and address the question of whether sequence analysis can help identify structurally discrete chromosomal domains that contribute to or reflect function.

The terminal 285 kb of 16p13.3, which includes the {alpha} globin genes, has been previously characterized using a variety of functional assays, allowing us to relate primary sequence to known biological function (3). Here, we have sequenced the terminal ~2 Mb of human chromosome 16p, enabling us to fully annotate a contig extending from the telomeric repeats to the previously published tuberous sclerosis disease 2 (TSC2) and polycystic kidney disease 1 (PKD1) genes (4,5). This segment of the Giemsa light band 16p13.3 is GC-rich and Alu dense containing many putative CpG islands and genes. In addition, 63 deletions from this 2 Mb region and their corresponding phenotypes, including the ATR-16 syndrome, are reported here and elsewhere (611) allowing us to interpret the effects of deleting progressively larger numbers of genes from this well defined chromosomal region.

Given its very high gene density and proximity to a human telomere, it is not surprising that, in addition to {alpha} thalassaemia (12), the ATR-16 syndrome (6), tuberous sclerosis (4) and the adult form of polycystic kidney disease (5), several previously characterized human genetic disease genes may also lie in this gene-rich region. These include asthma (13), cataracts with micro-ophthalmia (14), susceptibility to bipolar affective disorder (15), epilepsy (16) and various forms of autism (1719). This highly annotated sequence extending 2 Mb from the 16p telomere should facilitate rapid identification of disease genes falling in this region. These data therefore provide an ideal opportunity to evaluate the extent to which DNA sequence analysis of the human genome will contribute to our understanding of chromosome structure, function and pathology.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 CONCLUSIONS
 MATERIALS AND METHODS
 REFERENCES
 
Construction and sequencing of the 16p13.3 contig
A physical map of overlapping cosmids spanning the terminal 2 Mb of chromosome 16p was constructed (Fig. 1H) using three different restriction enzymes and multiple hybridizations with internal and end-clone fragments (see Materials and Methods). In most cases, several (314) cosmids were identified at each screening stage from a chromosome 16-specific cosmid library (20), providing significant depth (average, six cosmids), and hence confidence, in the resulting map. Clones chosen to represent the minimal tiling path were analysed using FISH to ensure that they mapped uniquely to the terminal region of 16p. In three instances, the genomic DNA was represented by a single cosmid [c398F6 (AL023882), c313F9/c305F3 (AL031707/AL031706) and c381G6 (AL031598) (Fig. 1H)]. Restriction maps of these regions were later confirmed using sequence data from the clones themselves but their structure has not yet been confirmed in genomic DNA. Two regions (around co-ordinates 426–485 and 1775–1949 kb) are not represented in the chromosome 16 cosmid library. In these cases, a human PAC library (RPCI) (21) was screened with flanking probes and recombinants spanning each gap [PAC196A12 (AL049542) and PAC76P10 (AL132867) (Fig. 1H)] were identified. The provenance of these clones was confirmed from sequence data (see further results on PAC76P10 below), restriction mapping and fluorescence in situ hybridization (FISH) analysis.



View larger version (55K):
[in this window]
[in a new window]
 
Figure 1. Summary of the key features of the 2 Mb region, which can also be accessed with further details at http://www.molbiol.ox.ac.uk/~haem/HMG/16p.html. (A) The bars represent the material deleted from a series of individuals. The smaller black bar denotes the range of the most common {alpha} globin deletions (12) and the larger black bar represents an individual with the largest known deletion with no phenotype other than {alpha} thalassaemia (11). Blue bars show the extent of 16p deleted material from individuals with ATR-16 who also have additional aneuplodies, and the chromosomal origin of the translocated material is shown at the end of the deletion. The deletion in patient GS, who has an unbalanced translocation involving an acrocentric p arm, is represented in green. Red bars show the extent of deleted material from ATR-16 individuals (BA, TN, BO, IM, LIN) currently presumed to be purely monosomic for this region of the genome. The large yellow bar represents the 16p deleted material from a patient with an unbalanced translocation who suffers from both tuberous sclerosis and polycystic kidney disease and the small yellow bar represents the extent of interstitial deletions causing tuberous sclerosis and polycystic kidney disease (79). (In most cases, the breakpoint is given as the midpoint of the cosmid in which the FISH signal changes; thus, the actual breakpoint could lie 30 kb in either direction.) (B) Breakpoints (both telomeric and centromeric where relevant) in this region from both {alpha} thalassaemia (red lines) and ATR-16 individuals (blue lines). (C) Genes identified throughout the region. The black oval denotes the telomeric repeat (TTAGGG)n region. Green boxes above the line show genes transcribed towards the centromere. Blue boxes below the line show genes transcribed towards the telomere. The red bar indicates the end of our analysis and annotation, but the physical map is contiguous with the PKD1 region, as shown (4,5,80,81). (D) Putative CpG islands. (E) Frequency of CpG dinucleotides per 200 bp. (F) Percentage GC content over a 200 bp window in grey. The blue line shows the percentage GC content of a moving window of 20 kb, stepping by 200 bp. The green line shows the average for the whole genome, currently estimated to be 42%. (G) Percentage of Alu (in red) and LINE (in blue) repeats as a proportion of total bases across a 1000 bp window, smoothed over 10 kb. (H) Minimal tiling path of clones. The filled red boxes represent the fully sequenced clones. Clones that contain sequencing gaps are shown in blue; the contiguous but unfinished clone is shown in green. (I) Polymorphic markers from the CEPH consortium chromosome 16 linkage map with the sex-averaged, male and female genetic distances between them shown in centiMorgans (47). (J) Scale bar in kilobases. Regions I, II and III are indicated.

 
Sixty-five cosmids and PACs representing the minimal tiling path were sequenced. Of these, 59 are finished with an error rate of <1 in 10 000 bp. As described for other areas of the human genome (22,23), some small segments were consistently difficult to sequence but, in most cases, this could be overcome by applying methods for analysing sequence with a high GC content (see Materials and Methods). Six clones remain unfinished due to difficulties in obtaining sequence.

Clone c357D8 (green box in Fig. 1H) is contiguous, but includes regions of single-stranded sequence and poor quality data, invariably flanked by long tracts of Alu repeats. The missing strands have so far proved impossible to sequence using a variety of technologies.

Five clones (blue boxes in Fig. 1H) each contain a single gap (850, <100, 600 and 1250 bp and ~8 kb) not represented in the M13 shotgun libraries of the individual clones. These gaps are flanked by repeat sequences with tracts of very high GC content and it appears that polymerase consistently stalls at specific sequences. Despite considerable effort, completing these gaps, one of which contains an exon belonging to CACNA1H (gene no. 72), is beyond the scope of this current study, but efforts are continuing to complete these clones to a similar standard. It is interesting that 5 of the 16 ATR-16 centromeric chromosomal breakpoints fall within the 182 kb region spanned by four of these same clones, c344F5, c357D8, c303A1 and c333E1 (Fig. 1H), suggesting that there may be some link between chromosome breakage and segments that are difficult to clone, PCR and sequence. There is also evidence for a high degree of genetic recombination occurring in and around these clones (see Relationship between structure, gene expression and recombination).

Overview of the telomeric region of 16p13.3
The overall structure of this area is consistent with previous observations on GC-rich telomeric regions of the human genome (1,3) but provides further detail and resolution. The average GC content of the entire 1949 kb sequence is 57.5%, ranging from 47.2 to 65.3% when subdivided into 100 kb fragments (Fig. 2A), which is higher than the average for the human genome (~42%) (22). Superficial observation of the GC content suggested that this 2 Mb segment may be divided into three: Region I (1–500 kb) with an average GC content of 54.1%, Region II (501–1500 kb) with an average GC content of 60.9% and Region III (1501–1949 kb) with an average GC content of 53.7% (Fig. 2A). Assessing the GC content using a 20 kb moving average plotted at the midpoint showed that the regular ~90 kb wavelength in GC content previously noted in the terminal 285 kb (3) extends throughout Region I (Fig. 1F). Genes that are transcribed towards the centromere appear to lie within the peaks of this wave pattern and those transcribed towards the telomere appear to lie within the troughs. The GC content remains relatively constant across Region II with a slight dip between co-ordinates 800 and 1000 kb (Fig. 2A), then, in Region III, further modulation in GC content occurs (Fig. 1F).



View larger version (85K):
[in this window]
[in a new window]
 
Figure 2. (A and B) Variation in GC and Alu content. These graphs show (A) the percentage of GC nucleotides and (B) the percentage of Alu repeats in 100 kb segments (non-overlapping). The shaded box divides the three regions, I, II and III. The horizontal lines represent the average for each region.

 
The repetitive elements are summarized in Table 1. As for other GC-rich isochores, the average Alu density is high (19.6%) whereas the density of LINE repeats is low (5.1%). The highest Alu density (29.5%) occurs in Region I (Figs 1G and 2B). In Region II, the Alu density is lower (12.7%) and this segment contains many tandem and simple repeats, a relative increase in the frequency of LINEs and fewer low complexity repeats than adjacent regions. Alu density increases again (24.2%) in Region III.


View this table:
[in this window]
[in a new window]
 
Table 1. Summary of repetitive elements
 
When the entire masked sequence was compared with itself, a 34 kb region was identified (between co-ordinates 1213 and 1247 kb) composed of one direct and two inverted repeats, the largest of which is 10 kb (data not shown). This is the same region that contains the clones which proved most difficult to clone and sequence; also located within this region are four members of the mast cell tryptase gene family: human transmembrane tryptase, tryptase beta III, tryptase beta I and tryptase beta II (13,2426). Directly centromeric to these are three more tryptase-like genes that appear to be pseudogenes. Although the tryptase genes beta I–III have previously been localized to 16p13.3 (13), there is some disagreement in the order and number of tryptase genes presented here (based on genomic sequence data) with the previous report [based on restriction map data (13)].

Synteny between human and mouse sequences
As previously shown, the telomeric 172 kb segment containing the {alpha} globin cluster and five of the genes located telomerically are syntenic to an interstitial fragment of mouse chromosome 11 (27). The remainder of the 2 Mb region between LUC7L (gene no. 16) up to and including at least the mouse orthologue of the PKD1 gene, which lies beyond the region sequenced here, is syntenic to mouse chromosome 17 (2830).

Identification of genes and estimates of gene density
The entire sequence was masked for repeats and initially annotated by sequence homology using the BLAST suite of programs (31,32) to search nucleotide [dbEST (33) and EMBL (34)] and protein [SWISS-PROT and TrEMBL (35)] sequence databases. The sequence was also analysed with the exon prediction programs GRAIL1.3, MZEF and XPOUND and the gene prediction programs GENSCAN, FGENES and FGENESH (3641; V.V. Solovyev, unpublished data, see http://genomic.sanger.ac.uk). All sequences and analyses were processed using an automated system and stored in ACEDB (http://www.acedb.org). After extensive review and editing of these data, we classified (Table 2) and characterized (Table 3) 120 genes in the 1949 kb telomeric region. We found corroborative evidence (spliced ESTs, peptide homology or CpG island) for 105; the remainder were supported by GENSCAN predictions alone.


View this table:
[in this window]
[in a new window]
 
Table 2. Summary of genes types identified within this region
 

View this table:
[in this window]
[in a new window]
 
Table 3. Genes identified within the terminal 1949 kb of 16p
 
Of the ab initio gene prediction programs used, GENSCAN was found to be the most accurate at predicting known genes (category A and B, Table 2). The accuracy increased when genes and exons were predicted by more than one program. In some regions, GENSCAN either over- or under-predicted. For example, around 844–960 kb, GENSCAN predicted six genes which, from further analysis, appear to be one gene and in another case, (630–669 kb), seven closely spaced genes (including two category A) were predicted to be a single transcript. Close inspection of this region revealed at least five clusters of ESTs and five CpG islands. We were able to amplify each putative gene from HeLa mRNA using internal primers but failed to amplify between these genes (data not shown), supporting the interpretation that they are separate genes.

Within this 2 Mb region, gene density is not uniform (Fig. 1C). On average, there is approximately one gene every 16 kb in this region, consistent with previous observations that telomeric, Giemsa light bands are gene dense (1,3,42). There seems to be no bias towards small or large genes. The smallest genomic coverage is TRG4 (gene no. 36), which has a single exon spanning 75 bp; the largest is C16orf26 (gene no. 62) spanning 116 kb. CACNA1H (gene no. 72) has the greatest number of exons (35) and there are several genes with only one exon, of which the largest is IGFALS (gene no. 106). CRAMP1L (gene no. 99) has the largest exon at 4029 bp. The smallest exon (9 bp) lies in NUBP2 (gene no. 105).

The association of CpG islands and genes
We have shown previously that, in the terminal 285 kb, most of the genes are associated with CpG islands (3), with the island lying at the 5' end of their associated genes, spanning the promoter. Putative CpG islands were initially identified from the frequency of CpG dinucleotides (15 CpGs in at least two adjacent 200 bp windows) but discarding sequences containing GC-rich repetitive DNA. Thus, we identified 84 CpG islands within the entire 1949 kb region equally distributed throughout Regions I–III. In contrast, computational methods, such as CPGREPORT [EMBOSS (43)], when searching with conventional criteria for CpG islands (CpG observed/expected frequency > 0.6, %GC > 50, over 200 bp), overestimated and identified 234 putative islands.

The presence of a CpG island is invariably thought to indicate the presence of a nearby gene (44). In the region studied here, 79 genes are associated with CpG islands and 41 are not. This is consistent with previous observations showing that most housekeeping genes and half of all tissue-restricted genes are associated with CpG islands (45); our observations [66% (79/120)], are somewhat higher than the genome average of 56% reported previously (44). Seven putative ‘bi-directional’ CpG islands are each associated with two genes, one in either transcriptional orientation. Five of these appear to contain two CpG peaks very close together, possibly incorporating two separate CpG islands. Three genes [C16orf8 (gene no. 5), TMEM6 (gene no. 23) and SOX8 (gene no. 63)] each have two associated CpG islands.

Five CpG islands are associated with a predicted gene for which there is no other corroborative evidence (EST or protein homology matches, category F in Table 2). For nine CpG islands, we found no convincing evidence for any gene close by. These putative CpG islands may not be biologically active (unmethylated) or may be associated with genes which do not conform to the current prediction criteria or may be expressed in a tissue or developmental stage-specific manner so that they are not represented in any of the current sequence databases. Unless these ‘orphan CpG islands’ mark some other chromosomal element, it seems likely that additional genes near these CpG islands will be identified in the future, in which case the gene density in this region may be somewhat higher than currently estimated.

Relationship between structure, gene expression and recombination
Many of the 79 genes associated with CpG islands are widely expressed, although some (e.g. the {alpha} globin genes and PDIP) are expressed in a highly tissue-specific manner. We detected no pattern to the orientation of genes in this region: 57.5% (69/120) of the genes are transcribed towards the centromere and 42.5% (51/120) towards the telomere.

We have previously shown, using an in situ hybridization assay (46), that the terminal 50 kb of 16p replicates (or separates) later in S phase than the adjacent 250 kb which replicates early in the cell cycle (46). Provisional data suggest that most of the 2 Mb region also replicates early in the cell cycle although there is a remarkable dip in replication, or separation of chromatids, in the central portion of Region II which is currently under investigation (V. Buckle, unpublished data). Although this occurs in a relatively gene-poor region of this contig, at present there appears to be no clear correlation between replication timing and GC content.

Microsatellite markers allow us to relate the physical map to recombination events recorded in the CEPH consortium linkage map of chromosome 16 (Fig. 1I) (47). Data from the CEPH map indicate that this 2 Mb region has a higher recombination rate (male 12.3 cM, female 8.0 cM, sex-averaged 10.5 cM) than the genome average (1.1 cM/Mb). Recombination events occur most frequently between co-ordinates 900 and 1200 kb (male 6.1 cM and female 5.2 cM for a region of just 0.3 Mb).

Many chromosomal rearrangements have been reported from this segment of chromosome 16 including truncations (6,7,48), interstitial deletions (49) and translocations (10). All known breakpoints, both telomeric and centromeric, are plotted in Figure 1B. While initial inspection suggests that some breakpoints cluster around the {alpha} globin complex, this can be explained by the fact that many of these (red bars in Fig. 1B) are highly selected deletions that cause {alpha} thalassaemia. Five breakpoints, associated with ATR-16 syndrome, cluster close to the inverted and tandem repeats (at 1121–1304 kb) which encompass the tryptase gene cluster. Further observations are required to establish if this represents a preferred site of chromosomal breakage.

The effects of monosomy for 16p13.3
Deletions that remove the {alpha} globin genes (co-ordinates 162–168 kb) give rise to the well-defined haematological phenotype of {alpha} thalassaemia. Small deletions (within co-ordinates 129.5–178.2 kb) from the {alpha} globin cluster are very common (1–90% carrier frequency) in individuals from tropical and subtropical regions of the world. These deletions are confined to the {alpha} globin gene cluster and the surrounding genes remain intact (12; see Conclusions). We recently reported a series of 21 rare, interstitial deletions that remove the {alpha} globin genes and a variable number of genes flanking the {alpha} globin cluster (11). The largest of these extends for 268 kb and removes 15 functional genes (larger black bar in Fig. 1A). Two heterozygotes for this deletion have {alpha} thalassaemia but otherwise appear phenotypically normal (11), demonstrating that, apart from the {alpha} globin genes, none of the genes in the terminal 268 kb region of 16p is haploinsufficient.

Here, we have extended this analysis by investigating 16 individuals with still larger deletions (up to 2 Mb) from chromosome 16p (Fig. 1A). All were initially brought to our attention because they have {alpha} thalassaemia. Fourteen also have a variety of developmental abnormalities and all have some degree of learning difficulty and therefore have alpha thalassaemia with mental retardation syndrome [ATR-16, OMIM 141750 (6,50)]. Most of these patients have unbalanced translocations making it impossible to distinguish phenotypic features due to monosomy for 16p or trisomy for the other unbalanced chromosome (10,51). However, five patients appear to have pure monosomy of 16p based on cytogenetic studies, multiprobe FISH analysis (52) and, in some cases, analysis of the chromosomal breakpoint (Fig. 1A, red bars).

Two such patients (BA: deletion ~757 kb and TN: deletion ~951 kb) have no reported physical abnormalities but their cognitive abilities fall in the low-average range in contrast to their close relatives. Three patients with substantially larger deletions (BO: deletion ~1900 kb; IM: deletion ~2000 kb; LIN: deletion ~2000 kb) were previously shown to have facial dysmorphism with a variety of physical abnormalities and significant learning difficulties (6,8,9). The patient GS (deletion ~1595 kb), who is also dysmorphic with learning difficulties, has an unbalanced translocation involving satellite DNA from chromosome 21p. Since there may be no contribution to the phenotype from this additional material, this patient’s phenotype is predominantly due to monosomy for 16p. It is already known that removal of the gene TSC2 causes the clearly defined phenotype of tuberous sclerosis (4,53,54) and thus TSC2 at ~2050 kb effectively delimits the ATR-16 phenotype to this terminal 16p region.

The simplest conclusion is that the larger the region of monosomy, the more genes are deleted and the more severe the phenotype. Although it is clear that deletion of some genes may contribute more than others and, in some cases, direct disruption of a gene at a breakpoint may have a bearing on phenotype, there may not be a critical gene that explains all features of ATR-16 syndrome. However, the interpretation of these data is complex and is discussed further below.

Relationship to other known diseases
Previous reports have implicated the terminal region of 16p13.3 in several important human genetic diseases in addition to {alpha} thalassaemia, ATR-16 syndrome, tuberous sclerosis and the adult polycystic kidney disease.

The pathophysiology of asthma may involve members of the tryptase gene family (5559). Here, we have shown that four mast cell tryptase genes, and three putative tryptase pseudogenes, lie in a 60 kb region, 1240 kb from the telomere of chromosome 16, as reported previously (13). Although these genes are not exclusively responsible for this polygenic disorder, they appear to play an important role in its pathophysiology (5559).

One of the markers (D16S521), previously linked to a bipolar affective disorder (15), lies 34 kb from the telomere although other markers linked to this disease lie at least 2.5 Mb from the telomere (6062). Autosomal recessive, idiopathic myoclonic epilepsy of infancy has been mapped to a broad region between D16S3024 (at 1594 kb from the telomere) and D16S423 (16). While this includes some of the 2 Mb contig, the highest LOD score (q = 0) corresponds to D16S3027 located at least 2.5 Mb from the telomere.

Significant linkage exists between autism and markers in 16p13.3 (18,19) although the peak probability of linkage lies beyond this 2 Mb region. Autism with Tourette’s syndrome has been reported in patients trisomic for 16p13.1-pter (17).

Cataracts with micro-ophthalmia (CATM) maps to 16p13.3 in a single family with a translocation involving chromosomes 2 and 16 (14). Both balanced and unbalanced translocations are associated with CATM indicating that a gene on one of these chromosomes is disrupted by the translocation. The breakpoint in chromosome 16 has been localized to band p13.3 by cytogenetic studies. Although the breakpoint in this family has not yet been refined, SOLH (gene no. 30) is a candidate because of its role in eye formation (63,64).


    CONCLUSIONS
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 CONCLUSIONS
 MATERIALS AND METHODS
 REFERENCES
 
We have completed and fully annotated the sequence of a human telomere extending 2 Mb from the most terminal (TTAGGG)n repeats. This work highlights some deficiencies in the current public databases (such as: http://www.ncbi.nlm.nih.gov/genome/guide/HsChr16.shtml and http://www.ensembl.org) in which some of the released sequence generated here appears to be misassembled and only sparsely annotated.

The entire region is rich in CpG islands and genes, consistent with previous predictions that the greatest density of genes will occur in GC-rich, telomeric regions of the genome. It is interesting that this relatively small segment of the human genome (0.07%) contains 120 confirmed genes, predicted genes and pseudogenes; this is approximately half as many as identified from the whole of chromosome 21 (284 genes in 33.5 Mb, 1.12% of the genome) (23). It is also approximately three times as gene dense as the equivalent subtelomeric region of human chromosome 22q (22). This extreme variation in gene density emphasizes the difficulty in accurately predicting the number of genes in the human genome using isolated segments of the genome.

This telomeric sequence appears to be divisible into three segments (Regions I–III) on the basis of GC content and Alu density consistent with previous observations on isochores and chromosome ‘flavors’ (65,66). Throughout these segments there is marked variation in GC content [particularly in Region I (Figs 1F and 2A)] which was not seen when the same sequence was randomized, suggesting a biological basis for this phenomenon. At present the mechanism underlying this variation cannot be clearly related to transcription, replication or recombination. It remains possible that these variations reflect or contribute to some aspect of the higher order folding or organization of the chromosome.

At a higher level of resolution, we examined the distribution of genes along the chromosome. Again, no clear patterns emerge with respect to size, type or orientation but it is clear that tissue-restricted genes are intermingled with widely expressed genes. The question arises of how such genes are independently regulated and it has been frequently proposed that each gene may be sequestered in an independent structural and functional domain. Despite the popularity of such models (67,68), to date, no DNA sequence basis for subdivisions of the chromosome has emerged. Given the considerable overlap between genes and regulatory elements in the well-characterized terminal 285 kb region of 16p it seems unlikely that this region could be simply subdivided into independently-regulated chromosomal domains as described by Prioleau et al. (69). The identification and characterization of putative chromatin ‘boundary elements’ in other segments of the genome (70,71) suggest that if such chromosomal subdivisions exist in this telomeric region of the chromosome they may be difficult to predict from primary sequence.

The distribution of all known chromosomal breakpoints in this area is quite uneven and presumably reflects complex interactions between ascertainment bias, natural selection and the locations of preferred sites of recombination. One group (red bars in Fig. 1B), selected because they cause {alpha} thalassaemia, cluster around the {alpha} globin genes. It is interesting that none of the common, highly selected forms of {alpha} thalassaemia (12) removes the flanking genes even though rare individuals with larger deletions appear phenotypically normal. Presumably, although deletions extending into these highly conserved, widely expressed genes can be tolerated in rare heterozygotes (11), as a group, such individuals may be at some selective disadvantage. These deletions would almost certainly be lethal in homozygotes. The second group (blue bars in Fig. 1B) was identified because these individuals have {alpha} thalassaemia with learning difficulties and, in most cases, additional developmental abnormalities. None of these breakpoints falls in the region 268–757 kb. Presumably although such telomeric deletions would cause {alpha} thalassaemia, one might predict that they do not produce any easily discernible phenotype and therefore do not commonly come to medical attention. The breakpoints clustered ~1200 kb from the 16p telomere occur near a repetitive region that contains a block of tryptase genes and pseudogenes, has a high rate of recombination and contains cosmids that have proven difficult to clone and sequence. It remains to be determined whether this is a preferred site of chromosome breakage.

The acquisition of fully annotated sequence has enabled us to begin to relate long-range DNA sequence to chromosome structure function and pathology. Clearly this sequence resource will now enable us to extend these studies and construct microarrays to specifically analyse this region in a systematic, unbiased manner.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 CONCLUSIONS
 MATERIALS AND METHODS
 REFERENCES
 
Physical mapping
The clones to complete the physical map were obtained from either the Los Alamos chromosome 16-specific cosmid library (20) or from the HGMP PAC library RPCI1 (21) by direct radioactive hybridization of filters using precisely known markers or probes derived from the clones themselves. Probes were labelled using the MegaPrime DNA Labelling Kit and [{alpha}-32P]dCTP (Amersham Pharmacia Biotech). Clones were grown up using standard conditions [using ampicillin for SuperCos1 based cosmids (Stratagene) or kanamycin for the PACs]. DNA was prepared using standard techniques (Hybaid and Qiagen). Clones were digested separately with EcoRI, NotI and HindIII and electrophoresed on 0.8% agarose gels. The gels were stained with Vistra Green (Amersham Pharmacia Biotech) and scanned on a Storm PhosphorImager (Molecular Dynamics, Amersham Pharmacia Biotech) to identify exact fragment sizes. Southern blots of the gels were hybridized using end fragments generated from the clones themselves, using the Gene Images DNA Labelling kit (Amersham Pharmacia Biotech) to identify positive clones and combined with the restriction enzyme data to allow an accurate map of the growing contig to be established. Based on these data, a minimum tiling path of clones was chosen to represent the physical map (Fig. 1H). All of these clones were analysed using standard FISH techniques to confirm their location on human chromosome 16p13.3 (72). EMBL IDs with accession numbers in brackets for the minimum tiling path are as follows, in order, telomeric to centromeric: HSPTEL (Z84812), HSLAW2 (Z84723), HSNFG9 (Z69719), HSRA36 (Z69720), HSGG4 (Z84722), HSX94 (Z84813), HS24F8 (Z69666), HSGG1 (Z84721), HScos12 (Z69706), HSRJ14 (Z69890), HS310H5 (Z69705), HS314G4 (Z69667), HS419C1 (Z99754), HS333B10 (Z81450), HS415C1 (Z98272), HS367G8 (Z97634), HS359F1 (AL023881), HSC196A12 (AL049542), HS356B8 (Z98882), HS366D1 (Z97986), HS407A10 (Z98883), HS338H10 (Z98881), HS398G5 (Z84479), HS349E10 (AL022341), HS313D11 (Z92544), HS380A1 (Z97653), HS444G9 (Z98258), HS335H7 (AL031258), HS321D2 (AL031033), HS398F6 (AL023882), HS360A4 (Al031008), HS360B4 (AL031716), HS306A4 (AL008727), HS366D3 (Z93041), HS443D9 (Z92845), HS394H11 (Z99757), HS422E10 (AL024496), HS313F9 (AL031707), HS305F3 (AL031706), HS349E11 (AL031713), HS381G6 (AL031598), HS344F5 (AL031712), HS302G6 (AL031703), HS357D8 (AL031715), HS303A1 (AL031704), HS333E1 (AL031711), HS358B7 (AL031714), HS316G12 (AL031709), HS399E4 (AL031721), HS312E8 (AL032819), LA16–438F12 (AL137252), HS390E6 (AL031600), HS305C8 (AL031705), HS385E7 (AL031720), HS380F5 (AL031719), HS313F4 (Z97633), HS425C2 (AL133297), HS395F10 (Z97652), HS315G5 (AL031708), HS431H6 (AL031009), HS329F2 (AL031710), HS361A3 (AL031717), HS371H6 (AL031718) and HSAC76P10 (AL132867). The GenBank accession number for the complete 1949 kb is AE005175.

End-clone production
The terminal fragments for chosen clones were obtained in the following manner. The DNA was cleaved with SacI (or XhoI or ApaI) for SuperCos1 cosmids or XhoI (or ApaI) for the CyPAC2n clones (these enzymes were chosen because they did not cut within the vector and could be heat-inactivated). The digests were heat-inactivated and ligase and ligase buffer (Promega) were added according to the manufacturers instructions. The ligations and transformations were performed using standard protocols. The DNA was extracted using Hybaid’s Miniprep kit and a test quantity of DNA digested with the original enzyme described above to confirm that the new clone produced a single linear fragment. The correct subclones were then digested with the original enzyme and NotI to release the vector from the two terminal fragments. This digest was electrophoresed on a 0.8% low melting point agarose preparative gel and the terminal fragments excised. No further purification was required before labelling either radioactively or non-radioactively as described above, except for incubation at 65°C for 5 min to melt the agarose slice.

Sequencing
The cosmids and PACs were sequenced using a standard shotgun approach (73). In brief, the clone DNA was sonicated and 1.4–2.0 kb sized fragments were ligated into M13 or pUC vectors and transformed. Restriction digest data were used to estimate the size of each clone and around 200 sequence reads per 10 kb were generated using fluorescent dye-labelled terminators and primers on ABI 373A and ABI 377 sequencing machines (PE Applied Biosystems). The M13 subclones were sequenced using forward primers, while both forward and reverse primers were used to sequence the pUC subclones.

The sequence reads were base-called using phred (74) and assembled using phrap (http://www.phrap.org) into a GAP database (75) for editing. Standard finishing methods were employed to bring about gap closure and resolve sequence ambiguities. Various software tools were used to check the quality of the sequence and restriction digests were used to confirm the assembly of each clone.

Sequencing gaps that failed to be resolved by standard shotgun and finishing approaches were tackled by a number of techniques. (i) Using an oligo-screening strategy to identify further M13 clones that may extend the gap sequence or close the gap altogether (76). (ii) Sequencing subclones from a short insert library generated either from a pUC subclone that spanned the gap or from a subcloned restriction fragment (77). (iii) PCR across the gap and direct sequencing of the PCR product using the original and internal primers. (iv) Direct sequencing of the cosmid DNA using primers that flank the gap [using 3 µg template DNA, 16 µl of standard ABI BigDye (PE Applied Biosystems) sequencing mix and 45 cycles]. (v) Application of the previous methods (iii and iv) but substituting ABI BigDye dGTP, increasing the PCR and sequencing denaturing temperature to 98°C and/or adding 1 M betaine to the PCR and sequencing reactions. (vi) Using standard manual sequencing techniques (Amersham).

Phenotypes of patients with 16p monosomy
The clinical features of these patients are briefly described here but have been or will be presented in detail elsewhere.

Patient BA.
A preliminary report of this patient (78) described her as a phenotypically normal 14 year-old girl with a marked discrepancy between verbal and performance IQs, measured at 89 and 75, respectively. The chromosomal breakpoint in this patient lies in c335H7, ~757 kb from the 16p telomere.

Patients TN.
Two brothers have delayed development for speech and walking; one also has a left iris coloboma. Their mother also has this deletion and is clearly intellectually different from her siblings (unpublished data). The TN breakpoint lies in c443D9, ~951 kb from the 16p telomere.

Patient GS.
A boy aged 3 years. He had moderate delay in receptive language abilities and severe delay in expressive language abilities (unpublished data). The breakpoint lies between c313F4 and c395F10, ~1595 kb from the 16p telomere.

Patient BO.
This patient was described in detail by Wilkie et al. (6) and references therein. At 15 years of age, he was moderately to severely retarded (IQ 53) with mild facial dysmorphism and minor congenital abnormalities. The breakpoint lies in PAC76P10, ~1900 kb from the 16p telomere.

Patient IM.
This patient was described as having developmental delay and at 8 years of age had the mental ability of a 5-year-old (8). The breakpoint lies telomeric to the PKD1 region, ~2000 kb from the 16p telomere.

Patient LIN.
This patient was described as having developmental delay with sign language developing at 2 years-of-age and walking by 2 years (9). She also has a variety of mild dysmorphic features. The breakpoint lies telomeric to the PKD1 region, ~2000 kb from the 16p telomere.

Breakpoint analysis
Chromosome 16p deletion patients were analysed using standard FISH techniques as described previously (72), with selected clones from the minimum tiling path to localize the chromosome 16 breakpoint to one or two clones. Multiprobe FISH analyses were performed as described previously (52).


    ACKNOWLEDGEMENTS
 
Members of the Sanger Centre team are: Rachael Ainscough, Claire Bagguley, Karen Barlow, Caroline Baynes, Lisa Beard, Victoria Cobley, Gerard Coville, Sancha Donnelly, Andrew Ellington, Kerry Fleming, Debbie Frame, John Frankland, Audrey Fraser, Lisa Gilby, Rebekah Hall, Gretta Hall-Tamlyn, Sarah Holmes, Bijay Jassal, Matthew Jones, Jo Kershaw, Andrew Kimberley, Andrew King, Julia Lightning, Madeleine Moore, Chantal Percy, Adelaide Pettett, Ratna Shownkeen, Matthew Sims, Charlie Steward, Daniel Thomas, Karen Thomas, Justine Wallis, David Willey, Laurens Wilming and John Woodward. The authors would also like to thank: Richard Gibbons, Jane Rogers (Sanger Centre), M. Gardner, M. Descartes, Helen Brown, Ahmed Daghir, Hadley Wood, Christopher Ward, Peter Harris, the HUGO Nomenclature Committee (H. Wain, M. Lush, M. Wright, R. Lovering, E. Bruford and S. Povey), the Medical Research Council, the Wellcome Trust (J.F.) and the UK HGMP Resource Centre.


    FOOTNOTES
 
+ Christine Lloyd headed the production of the sequence at the Sanger Centre. A full list of past and present members of staff who contributed to generating this sequence is given in the Acknowledgements. Back

§ To whom correspondence should be addressed. Tel: +44 1865 222393; Fax: +44 1865 222500; Email: drhiggs@molbiol.ox.ac.uk Back


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 CONCLUSIONS
 MATERIALS AND METHODS
 REFERENCES
 
1 Craig, J.M. and Bickmore, W.A. (1993) Chromosome bands—flavours to savour. Bioessays, 15, 349–354.[ISI][Medline]

2 Bernardi, G. (2000) Isochores and the evolutionary genomics of vertebrates. Gene, 241, 3–17.[ISI][Medline]

3 Flint, J., Thomas, K., Micklem, G., Raynham, H., Clark, K., Doggett, N.A., King, A. and Higgs, D.R. (1997) The relationship between chromosome structure and function at a human telomeric region. Nature Genet., 15, 252–257.[ISI][Medline]

4 European Chromosome 16 Tuberous Sclerosis Consortium (1993) Identification and characterization of the tuberous sclerosis gene on chromosome 16. Cell, 75, 1305–1315.[ISI][Medline]

5 European Polycystic Kidney Disease Consortium (1994) The polycystic kidney disease 1 gene encodes a 14 kb transcript and lies within a duplicated region on chromosome 16. Cell, 77, 881–894.[ISI][Medline]

6 Wilkie, A.O.M., Buckle, V.J., Harris, P.C., Lamb, J., Barton, N.J., Reeders, S.T., Lindenbaum, R.H., Nicholls, R.D., Barrow, M., Bethlenfalvay, N.C. et al. (1990) Clinical features and molecular analysis of the {alpha} thalassaemia/mental retardation syndromes. I. Cases due to deletions involving chromosome band 16p13.3. Am. J. Hum. Genet., 46, 1112–1126.[ISI][Medline]

7 Lamb, J., Harris, P.C., Wilkie, A.O.M., Wood, W.G., Dauwerse, J.G. and Higgs, D.R. (1993) De novo truncation of chromosome 16p and healing with (TTAGGG)n in the {alpha}-thalassemia/mental retardation syndrome (ATR-16). Am. J. Hum. Genet., 52, 668–676.[ISI][Medline]

8 Fei, Y.J., Liu, J.C., McKie, V.C. and Huisman, T.H. (1992) Hb H disease and mild mental retardation in a black girl with a Hb S heterozygosity. Hemoglobin, 16, 431–434.[ISI][Medline]

9 Lindor, N.M., Valdes, M.G., Wick, M., Thibodeau, S.N. and Jalal, S. (1997) De novo 16p deletion: ATR-16 syndrome. Am. J. Med. Genet., 72, 451–454.[ISI][Medline]

10 Lamb, J., Wilkie, A.O.M., Harris, P.C., Buckle, V.J., Lindenbaum, R.H., Barton, N.J., Reeders, S.T., Weatherall, D.J. and Higgs, D.R. (1989) Detection of breakpoints in submicroscopic chromosomal translocation, illustrating an important mechanism for genetic disease. Lancet, 2, 819–824.[ISI][Medline]

11 Horsley, S.W., Daniels, R.J., Anguita, E., Raynham, H.A., Peden, J.F., Villegas, A., Vickers, M.A., Green, S., Chui, D.H.K., Ayyub, H., et al. (2001) Monosomy for the most telomeric, gene-rich region of human chromosome 16p causes minimal phenotypic effects. Eur. J. Hum. Genet., in press.

12 Higgs, D.R., Vickers, M.A., Wilkie, A.O.M., Pretorius, I.-M., Jarman, A.P. and Weatherall, D.J. (1989) A review of the molecular genetics of the human {alpha}-globin gene cluster. Blood, 73, 1081–1104.[Free Full Text]

13 Pallaoro, M., Fejzo, M.S., Shayesteh, L., Blount, J.L. and Caughey, G.H. (1999) Characterization of genes encoding known and novel human mast cell tryptases on chromosome 16p13.3. J. Biol. Chem., 274, 3355–3362.[Abstract/Free Full Text]

14 Yokoyama, Y., Narahara, K., Tsuji, K., Ninomiya, S. and Seino, Y. (1992) Autosomal dominant congenital cataract and microphthalmia associated with a familial t(2;16) translocation. Hum. Genet., 90, 177–178.[ISI][Medline]

15 Detera-Wadleigh, S.D., Barden, N., Craddock, N., Ewald, H., Foroud, T., Kelsoe, J. and McQuillin, A. (1999) Chromosomes 12 and 16 Workshop. Am. J. Med. Genet., 88, 255–259.[ISI][Medline]

16 Zara, F., Gennaro, E., Stabile, M., Carbone, I., Malacarne, M., Majello, L., Santangelo, R., Antonio de Falco, F. and Bricarelli, F.D. (2000) Mapping of a locus for a familial autosomal recessive idiopathic myoclonic epilepsy of infancy to chromosome 16p13. Am. J. Hum. Genet., 66, 1552–1557.[ISI][Medline]

17 Hebebrand, J., Martin, M., Körner, J., Roitzheim, B., de Braganca, K., Werner, W. and Remschmidt, H. (1994) Partial trisomy 16p in an adolescent with autistic disorder and Tourette’s syndrome. Am. J. Med. Genet., 54, 268–270.[ISI][Medline]

18 International Molecular Genetic Study of Autism Consortium (1998) A full genome screen for autism with evidence for linkage to a region on chromosome 7q. Hum. Mol. Genet., 7, 571–578.[Abstract/Free Full Text]

19 Philippe, A., Martinez, M., Guilloud-Bataille, M., Gillberg, C., Råstam, M., Sponheim, E., Coleman, M., Zappella, M., Aschauer, H., van Malldergerme, L. et al. (1999) Genome-wide scan for autism susceptibility genes. Hum. Mol. Genet., 8, 805–812.[Abstract/Free Full Text]

20 Stallings, R.L., Torney, D.C., Hildebrand, C.E., Longmire, J.L., Deaven, L.L., Jett, J.H., Doggett, N.A. and Moyzis, R.K. (1990) Physical mapping of human chromosomes by repetitive sequence fingerprinting. Proc. Natl Acad. Sci. USA, 87, 6218–6222.[Abstract/Free Full Text]

21 Ioannou, P.A., Amemiya, C.T., Garnes, J., Kroisel, P.M., Shizuya, H., Chen, C., Batzer, M.A. and de Jong, P.J. (1994) A new bacteriophage P1-derived vector for the propagation of large human DNA fragments. Nature Genet., 6, 84–89.[ISI][Medline]

22 Dunham, I., Shimizu, N., Roe, B.A., Chissoe, S., Hunt, A.R., Collins, J.E., Bruskiewich, R., Beare, D.M., Clamp, M., Smink, L.J. et al. (1999) The DNA sequence of human chromosome 22. Nature, 402, 489–495.[Medline]

23 The Chromosome 21 Mapping and Sequencing Consortium (2000) The DNA sequence of human chromosome 21. Nature, 405, 311–319.[Medline]

24 Wong, G.W., Tang, Y., Feyfant, E., Sali, A., Li, L., Li, Y., Huang, C., Friend, D.S., Krilis, S.A. and Stevens, R.L. (1999) Identification of a new member of the tryptase family of mouse and human mast cell proteases which possesses a novel COOH-terminal hydrophobic extension. J. Biol. Chem., 274, 30784–30793.[Abstract/Free Full Text]

25 Miller, J.S., Moxley, G. and Schwartz, L.B. (1990) Cloning and characterization of a second complementary DNA for human tryptase. J. Clin. Invest., 86, 864–870.

26 Vanderslice, P., Ballinger, S.M., Tam, E.K., Goldstein, S.M., Craik, C.S. and Caughey, G.H. (1990) Human mast cell tryptase: multiple cDNAs and genes reveal a multigene serine protease family. Proc. Natl Acad. Sci. USA, 87, 3811–3815.[Abstract/Free Full Text]

27 Flint, J., Tufarelli, C., Peden, J., Clark, K., Daniels, R.J., Hardison, R., Miller, W., Philipsen, S., Tan-Un, K.C., McMorrow, T. et al. (2001) Comparative genome analysis delimits a chromosomal domain and identifies key regulatory elements in the {alpha} globin cluster. Hum. Mol. Genet., 10, 371–382.[Abstract/Free Full Text]

28 Tufarelli, C., Frischauf, A.-M., Hardison, R., Flint, J. and Higgs, D.R. (2001) Characterisation of a widely expressed gene (LUC7-LIKE) defining the centromeric boundary of the human {alpha} globin domain. Genomics, 71, in press.

29 Olsson, P.G., Sutherland, H.F., Nowicka, U., Korn, B., Poutska, A. and Frischauf, A.M. (1995) The mouse homologue of the tuberin gene (TSC2) maps to a conserved synteny group between mouse chromosome 17 and human 16p13.3. Genomics, 25, 339–340.[ISI][Medline]

30 Olsson, P.G., Lohning, C., Horsley, S., Kearney, L., Harris, P.C. and Frischauf, A. (1996) The mouse homologue of the polycystic kidney disease gene (Pkd1) is a single-copy gene. Genomics, 34, 233–235.[ISI][Medline]

31 Altschul, S.F., Gish, W., Miller, W., Myers, E.W. and Lipman, D.J. (1990) Basic local alignment search tool. J. Mol. Biol., 215, 403–410.[ISI][Medline]

32 Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D.J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database searcch programs. Nucleic Acids Res., 25, 3389–3402.[Abstract/Free Full Text]

33 Boguski, M.S., Lowe, T.M. and Tolstoshev, C.M. (1993) dbEST—database for ‘expressed sequence tags’. Nature Genet., 4, 332–333.[ISI][Medline]

34 Baker, W., van den Broek, A., Camon, E., Hingamp, P., Sterk, P., Stoesser, G. and Tuli, M.A. (2000) The EMBO nucleotide sequence database. Nucleic Acids Res., 28, 19–23.[Abstract/Free Full Text]

35 Bairoch, A. and Apweiler, R. (2000) The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res., 28, 45–48. [Abstract/Free Full Text]

36 Xu, Y., Mural, R., Shah, M. and Uberbacher, E. (1994) Recognizing exons in genomic sequence using GRAIL II. Genet. Eng., 16, 241–253.

37 Solovyev, V.V., Salamov, A.A. and Lawrence, C.B. (1995) Identification of human gene structure using linear discriminant functions and dynamic programming. Ismb, 3, 367–375.

38 Zhang, M.Q. (1997) Identification of protein coding regions in the human genome by quadratic discriminant analysis. Proc. Natl Acad. Sci. USA, 94, 565–568.[Abstract/Free Full Text]

39 Thomas, A. and Skolnick, M.H. (1994) A probabilistic model for detecting coding regions in DNA sequences. IMA J. Math. Appl. Med. Biol., 11, 149–160.[Abstract/Free Full Text]

40 Uberbacher, E.C. and Mural, R.J. (1991) Locating protein coding regions in human DNA sequences using a multiple sensor-neural network approach. Proc. Natl Acad. Sci. USA, 88, 11261–11265.[Abstract/Free Full Text]

41 Burge, C. and Karlin, S. (1997) Prediction of complete gene structures in human genomic DNA. J. Mol. Biol., 268, 78–94.[ISI][Medline]

42 Saccone, S., De Sario, A., Della Valle, G. and Bernardi, G. (1992) The highest gene concentrations in the human genome are in telomeric bands of metaphase chromosomes. Proc. Natl Acad. Sci. USA, 89, 4913–4917.[Abstract/Free Full Text]

43 Rice, P., Longden, O. and Bleasby, A. (2000) EMBOSS: The European Molecular Biology Open Software Suite. Trends Genet., 16, 276–277.[ISI][Medline]

44 Cross, S.H. and Bird, A.P. (1995) CpG islands and genes. Curr. Opin. Genet. Dev., 5, 309–314.[ISI][Medline]

45 Larsen, F., Gundersen, G., Lopez, R. and Prydz, H. (1992) CpG islands as gene markers in the human genome. Genomics, 13, 1095–1107.[ISI][Medline]

46 Smith, Z.E. and Higgs, D.R. (1999) The pattern of replication at a human telomeric region (16p13.3): its relationship to chromosome structure and gene expression. Hum. Mol. Genet., 8, 1373–1386.[Abstract/Free Full Text]

47 Kozman, H.M., Keith, T.P., Donis-Keller, H., White, R.L., Weissenbach, J., Dean, M., Vergnaud, G., Kidd, K., Gussella, J., Royle, N.J. et al. (1995) The CEPH Consortium linkage map of human chromosome 16. Genomics, 25, 44–58.[ISI][Medline]

48 Flint, J., Craddock, C.F., Villegas, A., Bentley, D.P., Williams, H.J., Galanello, R., Cao, A., Wood, W.G., Ayyub, H. and Higgs, D.R. (1994) Healing of broken human chromosomes by the addition of telomeric repeats. Am. J. Hum. Genet., 55, 505–512.[ISI][Medline]

49 Hatton, C., Wilkie, A.O.M., Drysdale, H.C., Wood, W.G., Vickers, M.A., Sharpe, J., Ayyub, H., Pretorius, I.-M., Buckle, V.J. and Higgs, D.R. (1990) Alpha thalassemia caused by a large (62 kb) deletion upstream of the human {alpha} globin gene cluster. Blood, 76, 221–227.[Abstract/Free Full Text]

50 Weatherall, D.J., Higgs, D.R., Bunch, C., Old, J.M., Hunt, D.M., Pressley, L., Clegg, J.B., Bethlenfalvay, N.C., Sjolin, S., Koler, R.D. et al. (1981) Hemoglobin H disease and mental retardation. A new syndrome or a remarkable coincidence? N. Engl. J. Med., 305, 607–612.[Abstract]

51 Gibbons, R.J. and Higgs, D.R. (2001) The alpha thalassemia/mental retardation syndromes. In Steinberg, M.H., Forget, B.G., Higgs, D.R. and Nagel, R.L. (eds), Disorders of Hemoglobin. Cambridge University Press, Cambridge, UK.

52 Knight, S.J., Horsley, S.W., Regan, R., Lawrie, N.M., Maher, E.J., Cardy, D.L., Flint, J. and Kearney, L. (1997) Development and clinical application of an innovative fluorescence in situ hybridization technique which detects submicroscopic rearrangements involving telomeres. Eur. J. Hum. Genet., 5, 1–8.[ISI][Medline]

53 Harris, P.C. (1997) The TSC2/PKD1 contiguous gene syndrome. Contrib. Nephrol., 122, 76–82.[ISI][Medline]

54 Cheadle, J.P., Reeve, M.P., Sampson, J.R. and Kwiatkowski, D.J. (2000) Molecular genetic advances in tuberous sclerosis. Hum. Genet., 107, 97–114.[ISI][Medline]

55 De Sanctis, G.T., Merchant, M., Beier, D.R., Dredge, R.G., Grobholz, J.K., Martin, T.R., Lander, E.S. and Drazen, J.M. (1995) Quantitative locus analysis of airway hyperresponsiveness in A/J and C57BL/6J mice. Nature Genet., 11, 150–154.[ISI][Medline]

56 Caughey, G.H. (1997) Of mites and men: trypsin-like proteases in the lungs. Am. J. Respir. Cell Mol. Biol., 16, 621–628.[Abstract]

57 Hunt, J.E., Friend, D.S., Gurish, M.F., Feyfant, E., Sali, A., Huang, C., Ghildyal, N., Stechschulte, S., Austen, K.F. and Stevens, R.L. (1997) Mouse mast cell protease 9, a novel member of the chromosome 14 family of serine proteases that is selectively expressed in uterine mast cells. J. Biol. Chem., 272, 29158–29166.

58 Johnson, P.R., Ammit, A.J., Carlin, S.M., Armour, C.L., Caughey, G.H. and Black, J.L. (1997) Mast cell tryptase potentiates histamine-induced contraction in human sensitized bronchus. Eur. Respir. J., 10, 38–43.[Abstract]

59 Rice, K.D., Tanaka, R.D., Katz, B.A., Numerof, R.P. and Moore, W.R. (1998) Inhibitors of tryptase for the treatment of mast cell-mediated diseases. Curr. Pharm. Des., 4, 381–396. [ISI][Medline]

60 McInnes, L.A., Escamilla, M.A., Service, S.K., Reus, V.I., Leon, P., Silva, S., Rojas, E., Spesny, M., Baharloo, S., Blakenship, K. et al. (1996) A complete genome screen for genes predisposing to severe bipolar disorder in two Costa Rican pedigrees. Proc. Natl Acad. Sci. USA, 93, 13060–13065.[Abstract/Free Full Text]

61 Ewald, H., Mors, P., Flint, T., Koed, K., Eiberg, H. and Kruse, T.A. (1995) A possible locus for manic depressive illness on chromosome 16p13. Psychiatr. Genet., 5, 71–81.[ISI][Medline]

62 Edenberg, H.J., Foroud, T., Conneally, P.M., Sorbel, J.J., Carr, K., Crose, C., Willig, C., Zhao, J., Miller, M., Bowman, E. et al. (1997) Initial genomic scan of the NIMH genetics initiative bipolar pedigrees: chromosomes 3, 5, 15, 16, 17 and 22. Am. J. Med. Genet., 74, 238–246.[ISI][Medline]

63 Kamei, M., Webb, G.C., Young, I.G. and Campbell, H.D. (1998) SOLH, a human homologue of the Drosophila melanogaster small optic lobes gene is a member of the calpain and zin-finger gene families and maps to human chromosome 16p13.3 near CATM (cataract with microphthalmia). Genomics, 51, 197–206.[ISI][Medline]

64 Kamei, M., Webb, G.C., Heydon, K., Hendry, I.A., Young, I.G. and Campbell, H.D. (2000) Solh, the mouse homologue of the Drosophila melanogaster small optic lobes gene: organization, chromosomal mapping and localization of gene product to the olfactory bulb. Genomics, 64, 82–89.[ISI][Medline]

65 Bernardi, G., Olofsson, B., Filipski, J., Zerial, M., Salinas, J., Cuny, G., Meunier-Rotival, M. and Rodier, F. (1985) The mosaic genome of warm-blooded vertebrates. Science, 228, 953–957.[Abstract/Free Full Text]

66 Holmquist, G.P. (1992) Review article: Chromosomal bands, their chromatin flavors and their functional features. Am. J. Hum. Genet., 51, 17–37.[ISI][Medline]

67 Kitzberg, D., Selig, S. and Cedar, H. (1991) Chromosome structure and eukaryotic gene organization. Curr. Opin. Genet. Dev., 1, 534–537.[Medline]

68 Kellum, R. and Elgin, S.C. (1998) Chromatin boundaries: punctuating the genome. Curr. Biol., 8, R521–R524.[ISI][Medline]

69 Prioleau, M.N., Nony, P., Simpson, M. and Felsenfeld, G. (1999) An insulator element and condensed chromatin region separate the chicken beta-globin locus from an independently regulated erythroid-specific folate receptor gene. EMBO J., 18, 4035–4048.[ISI][Medline]

70 Bell, A.C. and Felsenfeld, G. (1999) Stopped at the border: boundaries and insulators. Curr. Opin. Genet. Dev., 9, 191–198.[ISI][Medline]

71 Sun, F.L. and Elgin, S.C. (1999) Putting boundaries on silence. Cell, 99, 459–462.[ISI][Medline]

72 Buckle, V.J. and Rack, K. (1993) Fluorescent in situ hybridisation. In Davies, K.E. (ed.), Human Genetic Diseases. IRL Press, Oxford, UK, pp. 59–80.

73 Bankier, A.T., Weston, K.M. and Barrell, B.G. (1987) Random cloning and sequencing by the M13/dideoxynucleotide chain termination method. Methods Enzymol., 155, 51–93.[ISI][Medline]

74 Ewing, B. and Green, P. (1998) Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res., 8, 186–194.[Abstract/Free Full Text]

75 Bonfield, J.K., Smith, K.f. and Staden, R. (1995) A new DNA sequence assembly program. Nucleic Acids Res., 23, 4992–4999.[Abstract/Free Full Text]

76 Flint, J., Sims, M., Clark, K., Staden, R. and Thomas, K. (1998) An oligo-screening strategy to fill gaps found during shotgun sequencing projects. DNA Seq., 8, 241–245. [ISI][Medline]

77 McMurray, A.A., Sulston, J.E. and Quail, M.A. (1998) Short-insert libraries as a method of problem solving in genome sequencing. Genome Res., 8, 562–566.[Abstract/Free Full Text]

78 Waye, J.S., Chui, D.H.K., Higgs, D.R., Hetherington, R. and Olivieri, N.F. (1995) De novo deletion of the entire {zeta}{alpha} globin gene cluster in a girl with Hb H disease (Abstract). Blood, 86, 8a.

79 Brook-Carter, P.T., Peral, B., Ward, C.J., Thompson, P., Hughes, J., Maheshwar, M.M., Nellist, M., Gamble, V., Harris, P.C. and Sampson, J.R. (1994) Deletion of the TSC2 and PKD1 genes associated with severe infantile polycystic kidney disease—a contiguous gene syndrome. Nature Genet., 8, 328–332.[ISI][Medline]

80 Burn, T.C., Connors, T.D., Van Raay, T.J., Dackowski, W.R., Millholland, J.M., Klinger, K.W. and Landes, G.M. (1996) Generation of a transcriptional map for a 700-kb region surrounding the polycystic kidney disease type 1 (PKD1) and tuberous sclerosis type 2 (TSC2) disease genes on human chromosome 16p13.3. Genome Res., 6, 525–537.[Abstract/Free Full Text]

81 Aspinwall, R., Rothwell, D.G., Roldan-Arjona, T., Anselmino, C., Ward, C.J., Cheadle, J.P., Sampson, J.R., Lindahl, T., Harris, P.C. and Hickson, I.D. (1997) Cloning and characterization of a functional human homolog of E.coli endonuclease III. Proc. Natl Acad. Sci. USA, 194, 109–114.


Add to CiteULike CiteULike