Sanfilippo B syndrome is caused by a deficiency of [alpha]-N-acetylglucosaminidase, a lysosomal enzyme involved in the degradation of heparan sulphate. Accumulation of the substrate in lysosomes results in degeneration of the central nervous system with progressive dementia often combined with hyperactivity and aggressive behaviour. In order to clone the deficient gene, we purified the enzyme from human placenta and obtained amino acid sequence information. Alignment of one of the CNBr generated internal peptides to sequence from the database revealed the chromosomal location of the gene in the 5' upstream flanking region of the gene for 17-[beta]-hydroxysteroid-dehydrogenase at 17q21.1. The available DNA sequence was used to clone the cDNA coding for [alpha]-N-acetylglucosaminidase and analyse its gene structure. The gene is fully contained in the 5' upstream flanking region of the gene for 17-[beta]-hydroxy- steroid-dehydrogenase and interrupted by five introns. The cDNA clone has a length of 2575 bp and encodes a protein of 743 amino acids. Chinese hamster ovary cells transfected with the cDNA construct show [alpha]-N-acetylglucosaminidase activity about 17-fold over background. This will allow correction studies with NAG deficient Sanfilippo B cell lines and facilitate the development of enzyme replacement therapy for these patients.
Sanfilippo B syndrome, also known as mucopolysaccharidosis IIIB (MPS IIIB), is one of the four recognised biochemical subtypes of this disease. Each of the MPS III types is inherited as an autosomal recessive disorder with considerable variation in clinical severity (1 ). The incidence of Sanfilippo syndrome has been estimated at 1:24 000 (2 ) and MPS IIIB is the most common of the four Sanfilippo subtypes in Greece (3 ,4 ) whereas in Northern Europe MPS IIIA seems to be more predominant (5 ). MPS IIIC is less frequent than subtypes A and B and to date only seven patients with MPS IIID have been described (1 ).
MPS IIIB patients suffer from severe central nervous system degeneration resulting in progressive dementia often combined with hyperactivity and aggressive behaviour (1 ). Type A is claimed to be the most severe of the four Sanfilippo subtypes with earlier onset of symptoms and more rapid progression (2 ) whereas type B shows a wider clinical heterogeneity with mild and severe cases reported even within the same sibship (6 ).
In contrast to other MPS disorders, Sanfilippo patients show only mild somatic disease with the coarse facial features that are prominent in the other MPS often absent in adults. Together with the high incidence of false negative results in the urinary screening test for mucopolysaccharides, this makes the diagnosis of mild cases difficult. Hence MPS III may be underdiagnosed in patients with mild mental retardation (1 ).
MPS III results from a deficiency in one of the enzymes involved in the degradation of heparan sulphate, with [alpha]-N-acetylglucosaminidase (NAG, EC 3.2.1.50) being deficient in MPS IIIB. There is evidence for polymorphism of NAG with apparently three different genotypes (7 ,8 ) segregating between black and white affected families. Vance et al. pointed out that alleles with high enzymatic activity might interfere with the identification of heterozygotes (9 ) which might have occurred in the case described by Pande et al. (10 ). In this family some of the heterozygotes had normal levels of NAG activity presumably due to a `hyperactive' allele. This MPS IIIB patient also had Glanzmann disease, which is coded by two genes located on chromosome 17. Although no linkage could be found at the time, the NAG gene was subsequently shown to be located also on chromosome 17, using a partial cDNA clone (11 ) and the full-length cDNA that we are reporting here. Previously the genes deficient in MPS IIIA and D have been cloned (12 ,13 ) and shown to be located on chromosome 17q25.3 (12 ) in the case of MPS IIIA and 12q14 for MPS IIID (14 ). Obviously the isolation and characterization of the gene coding for NAG will enable the definition of the genotype of MPS IIIB patients and ultimately to the development of therapies.
Here we report the purification of [alpha]-N-acetylglucosaminidase (NAG) to apparent homogeneity from human placenta, isolation of a full-length cDNA encoding NAG, the structure of the corresponding gene on chromosome 17q21.1 and expression of recombinant enzyme.
Table 1
NAG was purified to apparent homogeneity from human placenta by a series of columns as described in Materials and Methods. Purification was about 18 000 fold with a yield of 4-5 % (Table 1 ). Except for the Basilen-Blue Agarose, the enzyme activity bound to the matrix of all columns. NAG was found in the flow-through from the Basilen-Blue Agarose with a major contaminant of the preparation, [beta]-glucuronidase, removed in this step. In the last chromatographic step over a Blue 172-Agarose column, NAG was eluted by a step salt gradient. All fractions contained activity, but the 200 and 400 mM NaCl fractions appeared to be homogenous showing two bands of approximately 82 and 77 kDa in SDS-PAGE (Fig. 1 ).
Sequence data from the 17-[beta]-hydroxysteroid-dehydrogenase (17 [beta]-HSD) flanking region was used to generate a NAG-specific probe to screen several cDNA libraries. Approximately 600 000 clones from a human kidney and 350 000 from a human testis 5'-stretch plus cDNA library were screened with the probe described above. One and 15 clones respectively were found to be authentic as determined by direct sequencing of PCR products generated with primers designed to the open reading frame of the 17 [beta]-HSD gene and forward or reverse primers made to the arms of the vector. None of the 15 testis clones contained sequence N-terminal to the open reading frame (exon 6). The clone isolated from the kidney library (K19) was colinear with the two N-terminal peptides but lacked the start codon and exon 5. Screening of further 500 000 clones of the testis library with various probes homologous to the N-terminus of K19 failed to identify a clone containing the missing 5'-end. An additional 50 bp were obtained by 5' RACE.
Northern blot analysis showed a single mRNA species of about 2.7 kb with high expression levels in liver, ovary and peripheral blood leucocytes and measurable amounts of transcript in testis, prostate, spleen, colon small intestine, lung, placenta and kidney (Fig. 2 ).
An expression vector, containing a full-length cDNA, was constructed by subcloning the insert of [lambda] clone 133, containing bases 179 to 2575 (Fig. 3 ), into pBluescript II SK- together with a 178 bp fragment (bases 1 to 178 in Fig. 3 ) derived from the cosmid subclone containing the start codon. The resulting construct contains the full-length cDNA coding for the 743 amino acids of NAG and 101 bp 5'-non-translated sequence as well as 245 bp 3'-non-translated region including a polyadenylation-signal and a potential polyA-tail.
A BLAST database search (18 ) at the NCBI server with the cDNA sequence revealed that the NAG gene is fully contained in the 5'-flanking region of the 17 [beta]-HSD gene (GenBank U34879) interrupted by five introns which allowed prediction of the intron/exon structure of the NAG gene (Fig. 4 ). The transcription start has not been defined but the coding sequence starts at position 10 950 and the first exon ends at position 11 332 of the HSD flanking region, bp 11 333 to 12 074 are intronic followed by exon 2 (bp 12 075 to 12 222) corresponding to position 485 to 632 of the NAG cDNA. Exon 3 (position 633 to 779, Fig. 3 ) corresponds to bp 13 014 to 13 162 of the HSD flanking region, exon 4 (position 780 to 865, Fig. 3 ) to bp 13 347 to 13 432, exon 5 (position 866 to 1122, Fig. 3 ) to bp 15 627 to 15 883 and exon 6 (position 1123 to 2544, Fig. 3 ) to bp 17 705 to 19 126. The latter contains a potential polyadenylation site and is followed by a stretch of As not present in the genomic sequence and therefore likely to represent the polyA-tail.
In this study we report the cloning of the gene involved in Sanfilippo B syndrome by the classical approach of purification of the deficient enzyme and obtaining amino acid sequence information to produce probes suitable for screening of cDNA libraries.
NAG had been purified from several tissues and species before and previously published results reported a molecular mass of 82 kDa for NAG in human fibroblasts with precursor and intermediate or mature forms ranging from 86 kDa to 77 and 73 kDa (19 ). In human kidney carcinoma cells the enzyme had a molecular weight of 80 kDa (20 ). A secreted 86 kDa form was observed in the medium of these cells (21 ) and isolated from urine (22 ), whereas Sasaki et al. reported a molecular mass of 80 kDa for NAG purified from human liver (23 ). The two forms of NAG we purified from human placenta with apparent molecular weights of 77 kDa and 80 kDa are well within the published size range.
Cloning was made easier by the fact that the NAG gene is located next to an unrelated gene mapped to chromosome 17q12-21 (16 ), a region under intense investigation by several groups due to its association with the gene for familial early onset breast and ovarian cancer (BRCA1). BRCA1 was mapped initially to chromosome 17q12-21 by linkage analysis and and was narrowed subsequently to 17q21 (24 ,25 ). In the search for the cancer gene the genomic region has been cloned and sequenced very intensively (26 ,27 ). The alignment of one of the CNBr generated peptides to a known sequence within this region, 5' of the 17 [beta]-HSD gene, allowed the production of a highly specific and long C-terminal probe for the screening. The subsequent release of a further 25 000 bp of sequence information in this particular region on the database made it possible to predict the intron/exon structure without sequencing the isolated cosmid containing the NAG gene. Furthermore the chromosomal localisation had been narrowed to 17q21.1 (28 ). We are currently confirming the results obtained by database comparison by sequencing the intron/exon junctions in the cosmid.
Although we screened 5' stretch plus libraries designed to overcome the common underrepresentation of 5' ends of cDNAs in libraries, we were not able to isolate a clone containing the start codon. This might be due to the high GC-content of the N-terminus which caused secondary structures making the sequencing of this region very difficult.
We also observed possible alternative splicing of exon 5 which is not present in the clone isolated from a kidney cDNA library (K19). PCR amplification of this region from testis cDNA resulted consistently in two products with a size difference of about 250 bp, whereas from placenta cDNA only the smaller product could be amplified (data not shown).
Northern blot analysis showed a single RNA species of about 2.7 kb in all examined tissues, whereas Zhao et al. reported a mRNA size of 3.0 kb (11 ). Compared with the size of the open reading frame for NAG (start codon to beginning of polyA-tail) of 2431 bp this leaves about 300 to 600 bp for the polyA-tail and the 5'-nontranslated region, whose length are unknown. NAG was expressed in all examined tissues except the thymus, which is not surprising since lysosomal enzymes are considered housekeeping genes.
The full-length construct, consisting of genomic DNA as well as cDNA encodes a protein with a predicted molecular mass of 82 166 Da for the precursor with the signal-peptidase cleavage consensus site (29 ) at position 23, immediately prior to the N-terminal sequence of the 82 kDa form of NAG which has a predicted molecular mass of 80 245 Da. Amino acid sequence of the 77 kDa form, starting at position 59 (Fig. 3 ) accounts for 76 742 Da. We concluded that at least one of the seven potential N-glycosylation sites is glycosylated. This probably includes the site at residue 272 (Fig. 3 ), since N-terminal sequencing of the CNBr peptide was blocked at this position. Three of five internal peptides that were sequenced are not present in the NAG sequence and may be due to minor, low molecular weight contaminants of the NAG preparation (Fig. 1 ).
NAG activity was measured using the fluorogenic substrate 4-methylumbelliferyl-2-acetamido-2-deoxy-[alpha]-D-glucopyranoside (Calbiochem) as described (32 ) and normalised for protein concentration as determined by the method of Lowry (33 ).
SDS-PAGE was performed according to the method of Laemmli (34 ) in a 7.5% gel. Samples were treated with 2% SDS, boiled for 2 min and subjected to electrophoresis. Gels were stained with Coomassie Blue R-250 or with silver stain (BioRad).
Frozen human placenta was homogenized and recycled over concanavalinA-Sepharose and Blue-Agarose-columns as described previously (35 ,36 ).DEAE-Sepharose chromatography. The conA/BlueA eluate was concentrated, dialyzed against 20 mM Tris-HCl buffer pH 7.5 and applied to a 100 ml of DEAE-Sepharose (Pharmacia) previously equilibrated with 20 mM Tris-HCl buffer pH 7.5 at 4oC. The column was washed with 500 ml Tris-HCl buffer and eluted with a linear gradient of 0-0.5 M NaCl in the same buffer. Fractions containing MAG activity were pooled and dialyzed against 50 mM NaAc pH 5.5.Heparin-agarose chromatography. The dialyzed sample was applied to 40 ml heparin-agarose (Sigma) previously equilibrated with 50 mM NaAc buffer pH 5.5, washed with 200 ml and eluted with a linear gradient of 0-0.5 M NaCl in the same buffer (4oC). Fractions containing high NAG activity were pooled. All subsequent steps were done at room temperature (20-24oC).Phenyl-Sepharose chromatography. The pooled fractions of the previous step were directly applied to a 10 ml phenyl-Sepharose (Pharmacia) column equilibrated with 50 mM NaAc buffer pH 5.5. The column was washed with 50 ml 10% (v/v) ethylene glycol in 20 mM Tris-HCl buffer pH 7.5 and MAG activity was eluted with 50% (v/v) ethylene glycol in the same buffer. All fractions were dialyzed immediately against 20 mM Tris-HCl buffer pH 7.5.Basilen-Blue-agarose chromatography. Dialyzed fractions containing high NAG activity were pooled and applied to 7 ml Basilen-Blue-agarose (Sigma) previously equilibrated with 20 mM Tris-HCl buffer pH 7.5. The flow-through fraction was collected.Blue-172-agarose chromatography. The flow-through fraction was applied to 5 ml of Blue172-agarose (Centre for Protein and Enzyme Technology) equilibrated with 20 mM Tris-HCl buffer pH 7.5 then washed with 25 ml of the same buffer and eluted with 100, 200, 400 and 500 mM NaCl in Tris-HCl buffer.
Purified NAG mature form (77 kDa) and precursor (82 kDa) were separated by SDS-PAGE, blotted onto PVDF membrane and excised separately (37 ). N-terminal sequence from both bands as well as internal peptide sequence from a bulk preparation digested with CNBr were obtained. The open reading frame, surrounding the site of alignment of CNBr peptide 3 was amplified from genomic DNA with primers designed to nt 5077-5100 (forward) and 6009-5983 (reverse complement) of the HSD gene by PCR under the following conditions: 40 cycles of denaturation at 94oC for 1 min, annealing at 60oC for 1 min and extension at 72oC for 2 min in Tth Plus reaction buffer (Biotech), 25 mM MgCl, 400 [mu]M dNTPs, 200 ng primer each and 1 [mu]g template DNA. The PCR product was purified using a Centricon 30 spin column (Amicon) and labelled with [alpha]Z32P]dCTP by random priming (Megaprime labelling system, Amersham).
Approximately 600 000 clones from a human kidney and 350 000 from a human testis 5'-stretch plus cDNA library (Clontech # HL3001a and # HL3024a) in [lambda]gt10 were screened with the probe described above. Hybridisation was performed using standard methods (38 ). Colinearity with the open reading frame found in the upstream flanking region of the 17 [beta]-HSD gene was determined by direct sequencing of PCR products generated with primers designed to nt 5690-5731 of the 17 [beta]-HSD gene and forward or reverse primers made to the arms of [lambda]gt10 using the fmol system (Promega). These clones were subjected to further rounds of purification and [lambda] DNA, prepared using the Wizard [lambda] purification system (Promega), was subcloned into pBluescript II SK- (Stratagene) and sequenced.
A chromosome 17 library on gridded filter from the Reference Data Library, Berlin (17 ) was screened with the insert of K19. Twelve cosmids positive for NAG sequence were identified and obtained from the Reference Data Library. Hybridisation with NAG-specific oligonucleotides suggested that cos6 contains the whole gene for NAG. DNA of cos6 was purified using Qiagen 100 columns (Diagen), digested with EcoRI and subcloned into pBluescript SK-. A subclone containing the N-terminus of the NAG gene was identified by hybridisation with an oligonucleotide corresponding to nt 135-152 (Fig. 3 ).
5' RACE was performed using the Marathon cDNA amplification kit (Clontech) with a NAG-specific primer designed to nt 1113-1132 (reverse complement) for the initial PCR and a primer designed to nt 386-405 (reverse complement) for the nested PCR. The initial PCR was done using cDNA reverse transcribed from testis mRNA (Clontech) according to the instructions provided. The cycling conditions were as following: denaturation for 20 s at 94oC; annealing for 45 s at 55oC; elongation for 3 min at 72oC; 30 cycles in the presence of 10% DMSO. The nested PCR was performed under the same conditions except for the annealing temperature which was 60oC. The 5' RACE product was purified and labelled as described above and used to screen a peripheral blood leucocyte 5'-stretch plus cDNA library (Clontech # HL5007a).
Inserts of purified [lambda] and cosmid clones were excised with EcoRI and subcloned into pBluescript II SK- (Stratagene). Plasmid DNA was prepared using Qiagen 100 columns according to manufacturer's instructions. Sequencing was performed in microtiter trays on a Hybaid OmniGene thermocycler using the Sequenase system (USB, Amersham) and M13 sequencing primers as well as gene-specific primers which were designed every 200 bp as sequence was generated. GC-rich regions were sequenced additionally with the Bst DNA Sequencing kit (Bio-Rad) and the fmol Sequencing system (Promega).
The insert of [lambda] clone l33, containing bases 107 to 2575 (Fig. 3 ) was excised with EcoRI and subcloned into pBluescript II SK- (Stratagene). A 178 bp XmaI fragment (bases 1 to 178 in Fig. 3 ) from cosmid subclone 6.3 containing the start codon was cloned into the pBluescript subclone. The resulting construct with the correct orientation contains the full-length coding sequence for NAG, 5'- and 3'-non-translated region, a polyadenylation-signal, a potential polyA-tail and linkerDNA.
The construct was directionally cloned into the pCDNA3 expression vector (Invitrogen) via the EcoRI and BamHI sites. CHO cells were transfected with expression vector or expression construct using the DOTAP transfection reagent (Boehringer Mannheim). Cells were grown and selected as described (39 ), cell extracts were assayed for protein and NAG activity.
Multiple tissue northern blots (Clontech, MTN blots I and II, #7760-1 and # 7759-1) were probed with the insert of clone K19 labelled with [alpha][32P]dCTP using the megaprime system (Amersham).
We thank Drs Don Anson and Phillip Morris for helpful discussions and Xiao-Hui Guo for technical assistance during the purification of NAG. This work was supported by grants from the National Health and Medical Research Council of Australia, the Adelaide Women's and Children's Hospital Research Foundation and a Raymond A. Bryan IV Fellowship from the American MPS Society Inc. BW is supported by a long-term fellowship of the Human Frontier Science Program, Strasbourg.
NAG, [alpha]-N-acetylglucosaminidase; MPS IIIB, Mucopolysaccharidosis type IIIB or Sanfilippo B syndrome; BRCA1, familiar early onset breast and ovarian cancer syndrome gene 1; 17 [beta]-HSD, 17 [beta]-hydroxysteroid dehydrogenase; CHO cells, chinese hamster ovary cells; 5' RACE, rapid amplification of a 5' cDNA end; PAGE, polyacrylamide gel electrophoresis.
Human Molecular Genetics
Pages
Introduction
Results
Purification of NAG
Cloning of a full-length cDNA
Expression of recombinant enzyme
Genomic structure
Discussion
Materials And Methods
Enzyme assay
SDS-polyacrylamide gel electrophoresis
Purification of NAG
Peptide sequence and probe production
Library screening and clone characterisation
5' RACE
Sequencing
Expression in CHO cells
Northern blot analysis
Acknowledgements
Abbreviations
References
Purification step
NAG specific activity
Recovery
Purification
(nmol/min/mg)
(%)
(-fold)
1. Homogenate
0.027
100
1
2. ConA/BlueArecycling
1.7
44
63
3. DEAE-Sepharose
5.2
30
193
4. Heparin-agarose
20
28
741
5. Phenyl-Sepharose
130
11
4815
6. BasilenBlue-agarose
300
11
11 111
7. Blue 172-agarose
500
5
18 519
REFERENCES
This page is maintained by OUP admin. Last updated Thu Oct 31 15:24:35 GMT 1996. Part of the OUP Journals World Wide Web service.Copyright Oxford University Press, 1996



