Familial juvenile nephronophthisis (NPH) is an autosomal recessive, genetically heterogeneous disorder, representing the most frequent inherited cause of chronic renal failure in children. One of the responsible loci, NPH1, has been mapped to 2q13. The presence of large homozygous deletions of ~250 kb in the majority of affected patients allowed us to define a minimal deletion interval for NPH1. A BAC contig covering this interval was established. Combination of large scale genomic sequencing, cDNA selection and computer-aided analysis led to the characterization of two transcriptional units. One encodes the already known BENE protein, and the other encodes a novel protein of at least 732 amino acids containing a putative src homology 3 domain. In two patients carrying the large deletion of the NPH1 region on only one allele, two mutations were detected in two independent exons of the novel gene. One consists of a single base deletion, causing a frameshift, and the other is a G -> A substitution in the consensus 5' splice donor site. Both mutations thus potentially generate null mutants. One of these mutations was found to segregate with the disease in the family, and the second appeared to be a de novo mutation. We therefore conclude that this novel gene is a strong candidate for NPH.
Familial juvenile nephronophthisis (NPH) or recessive medullary cystic kidney disease is an autosomal recessive kidney disorder representing the most frequent inherited cause of chronic renal failure in children (1 ). The first symptom is a reduced urinary concentrating ability, followed by decline of renal function leading to end-stage renal disease, generally during adolescence. The underlying pathology is a chronic tubulo-interstitial nephropathy with characteristic tubular basement membrane (TBM) thickening and medullary cyst formation (2 ). Associations with various extrarenal symptoms, especially ocular lesions, are frequently observed (reviewed in 3 ). By linkage analysis, a gene (NPH1) responsible for the vast majority, ~85%, of the purely renal form of NPH, has been mapped to chromosome 2q13 (4 -6 ). We have further narrowed the NPH1 interval to a region between the loci D2S1890 and D2S1888, and cloned the region into a yeast artificial chromosome (YAC) contig (7 ). The region turned out to be partially duplicated on chromosome 2p12. Furthermore, several markers revealed more than one locus within the NPH1 region on 2q13, suggesting the presence of low copy repeats. Consistent with the role of repeated elements promoting large rearrangements, we detected large-scale deletions in 80% of the patients belonging to inbred or multiplex NPH1 families and in 65% of the sporadic cases with purely renal symptoms (8 ). These rearrangements appeared to be large homozygous deletions of ~250 kb involving a 100 kb inverted duplication. Since no common haplotype was detected, these rearrangements are likely to occur independently, but probably due to a common mechanism (8 ). Assuming that at least part of the NPH1 gene should lie in this deleted region, a bacterial artificial chromosome (BAC) contig covering the complete NPH1 deletion was established and three BAC clones were entirely sequenced. A combination of cDNA selection and computer-aided analysis of the sequences led us to identify two genes in the region. One is BENE (9 ), predicted to encode a protein related to the T-cell differentiation protein MAL (10 ). The second is a previously unknown gene, encoding a putative src homology 3 (SH3) domain. Two mutations in different exons, a 1 bp deletion and a substitution in a consensus splice site, were identified in this gene in two patients with heterozygous deletions of the NPH1 region.The combination of one large deletion on one allele and a potential null point mutation on the other allele in two patients with NPH strongly suggests that this new gene is the NPH1 gene.
Using the previously established long-range restriction map (8 ), and new markers derived from the BAC-end sequences (96G18BD and 183K24BD, see below)as probes, we were able to refine the mapping of the breakpoint sites in 24 patients with homozygous NPH1 deletions. As previously shown for some of these, hybridization of SfiI-digested patients' DNA with the 921H4R probe, which maps to the inverted duplication (8 ), detected the normal 95 kb centromeric fragment, but not the normal 65 kb telomeric fragment. In contrast, when the same blots were hybridized with the 183K24BD probe, derived from the end of BAC 183K24 located in the telomeric copy of the duplication, all patients had the normal 50 kb centromeric fragment, but only eight of them had the normal 75 kb telomeric fragment (Fig. 1 ). This indicated that a common deletion was not present in all patients and allowed us to reduce the minimal NPH1 deletion region to a 205 kb region between two SfiI sites (Fig. 2 A, B).
Two markers (804/6 and 765F2L) mapping within the deletion interval (8 ) were used to screen a human BAC library (11 ). Five clones were isolated. By sequence-tagged site (STS) content screening and BAC-end cloning, all clones except two (183K24 and 187E16) were shown to overlap. However, one end of each of these two non overlapping clones mapped to the same EcoRI fragment also detected by one STS (765F2L). Alignment of the sequences of the BAC ends and 765F2L clearly showed that the two BACs were immediately adjacent. These two clones, together with 96G18, completely covered the minimum NPH1 deletion interval (Fig. 2 B), and were thus chosen for large scale sequencing.
A total of 325 kb was assembled into 12contigs separated by 10 sequencing gaps (average 1 kb). The restriction map deduced from the sequence was in agreement with the data obtained by long range restriction mapping and the 13 STS previously ordered on the physical map of the region (8 ) were found to perfectly align with the DNA sequence. The sequence contains portions of the expected duplicated inverted regions which are >97% homologous in the 55 kb sequenced and are separated by 180 kb. The genomic sequence was compared with public sequence databases with the BLAST algorithm (12 ) and analyzed with two different exon-prediction programs GRAIL (13 ) and FEXH (14 ). Comparison with non-redundant nucleic acid and protein databases revealed two regions homologous to already identified genes. One region is highly similar to the [alpha]-centractin cDNA, but is probably a pseudogene, since it has no intron and no significant open-reading frame (ORF). The second region is identical to the human BENE cDNA sequence (9 ). By genomic sequence analysis, two copies of the 3' end of the BENE gene, corresponding to the last three exons, were found in each inverted region. An additional 5' exon of the BENE cDNA was found proximal to the telomeric copy of the duplication in the deletion interval (Fig. 2 C). The absence of this exon in the centromeric copy argues for the distal copy of BENE as being the active gene. Comparison with the expressed sequence tag (EST) databases (15 ,16 ) revealed one cluster of seven ESTs (Unigene Hs.75474), ESTs similar to BENE and to the [alpha]-centractin pseudogene and three independent and distally spread ESTs (Fig. 2 C). The longest IMAGE clone (ID321596) of the Unigene cluster and the IMAGE clones (ID179152, ID278250 and ID70593) containing the other ESTs, were obtained from the IMAGE consortium and sequenced. PCR amplification of sequences derived from these clones and from BENE with a foetal kidney cDNA library showed that these sequences are transcribed in the kidney.
The sequence of one clone isolated by direct cDNA selection revealed six sequences encoding potential exons, as determined by perfect matches with the consensus splice sequences. This cDNA clone and the IMAGE clone ID278250 were assigned to the same transcriptional unit by human foetal kidney cDNA library PCR amplification using primers located in each sequence, and sequencing of the PCR products. Furthermore, screening the human foetal cDNA library with probes derived from ID278250 and ID321596 detected the same clones. Two different clones (named 4A and 6A) were sequenced (Fig. 3 ). Both contain a single ORF starting at the first nucleotide and a methionine codon located 6-34 nucleotides downstream of the first nucleotide. The sequence around this ATG fits poorly with the Kozak translation initiation consensus sequence, but contains the required purine in position -3 (17 ). The genomic sequence upstream of this ATG does not contain a potential 3' splice site between the first in-frame stop codon and the ATG. These different features suggest that this ATG is the translation initiation methionine codon.
The 4A clone contains the longest ORF of 732 amino acids extending from the first ATG to a stop codon at position2230, followed by several in-frame stop codons. The 6A cDNA has an in-frame deletion (nucleotides 805-969), leading to a shorter putative protein product missing the 55 amino acids between positions 258 and 312. Whether this transcript results from an alternative splicing or from a cloning artifact remains to be clarified. Alignment of the sequences of these clones with the different ESTs of the Hs.75474 Unigene cluster showed that the 3' untranslated region is 453 bp long and contains three consensus polyadenylation signals (Fig. 3 ).
Table 1
Amino acids 156-210 (numbered from the first ATG) exhibit significant homologies with SH3 domains. Comparison with ProDom (18 ) and Prosite (19 ) protein databases for protein domain and motif homologies, showed that the closest homology, 64%, is found with the second SH3 domain of the human CRK protein (20 ). No other homology or known motifs were found.
Comparison of the different cDNA sequences with the genomic sequence led to the identification of 20 exons ranging from 43 to 211 bp. The corresponding genomic sequence covers 83 kb, entirely located in the non repeated part of the deletion, with introns ranging up to 21.7 kb. The two transcripts (4A and 6A) differ in the sequences of exons 8 and 9 (Table 1 and Fig. 3 ). The intron-exon boundaries exhibit close adherence to 5' and 3' splice site consensus sequences (21 ), except for exon 8-6A. Nine exons were recognized by the GRAIL2 program (13 ) (with scores of good or excellent), 13 by the FEXH program (14 ) (scores 5.11-12.94) and seven by both of them.
The high percentage of large homozygous deletions and the genetic heterogeneity in NPH patients seriously hindered the identification of the responsible gene. Mutation screening by direct sequencing of the coding exons was therefore performed only in five patients in which heterozygous deletions of the NPH1 region had been detected by PFGE (8 ). Fourteen exons have been analysed. Two different sequence alterations were identified in two unrelated patients. One change is a 1 bp deletion at position 39 of exon 9 causing a frameshift and the generation of a stop codon 98 nucleotides downstream. The affected sister of the proband was shown to carry the mutation whereas the unaffected sister did not. This is in agreement with the haplotype analysis with the chromosome 2 microsatellite markers flanking the NPH1 deletion, since the unaffected sister did not share any common allele with her affected sisters. The other change is a G -> A substitution in the first base of intron 10, in the 5' consensus splice site. This mutation at a position of a normally invariable G is expected to affect the splicing process, probably leading to abnormally-sized transcripts. This mutation was not found in the parents and in the unaffected sister. Haplotype analysis showed that the two sisters share the same haplotypes. This means that the point mutation observed is a de novo mutation which arose on one parental allele (Fig. 4 ). Single-strand conformation polymorphism (SSCP) analysis detected each mutation, but did not reveal either mutation in 140 and 80 control chromosomes respectively.
Figure
Figure
To identify the NPH1 gene, responsible for the vast majority of the purely renal form of NPH, we used a large scale genomic sequencing strategy. For that purpose, we first reduced the NPH1 candidate region by refining the breakpoints of the large deletions commonly found in NPH1 patients (8 ). This minimal deletion interval includes part of the telomeric copy of a large inverted repeat flanking the deletion. Three BAC clones, which cover the entire NPH1 candidate region were entirely sequenced. A combination of cDNA selection and computer-aided sequence analysis identified at least two genes expressed in the foetal kidney located in the candidate region. One gene encodes the BENE protein (9 ) and is mostly included in the duplication. The other gene, which encodes a previously unidentified gene, contains at least 20 exons spanning >80 kb and is entirely located in the non-duplicated region common to all the NPH1 deletions. The genetic heterogeneity and the high frequency of NPH1 patients with two null alleles hindered deeply the identification of small intragenic mutations which can pinpoint the responsible gene. Thus, to identify the morbid locus, we decided to characterize the NPH1 compound heterozygous patients harbouring a deletion on one allele, and presumably a point mutation on the second one. Therefore, the newly discovered gene was scanned for point mutations only in the five patients for whom a large deletion of one allele had been demonstrated by PFGE (8 ). In two of them, we found deleterious mutations, which are expected to lead to the absence or a dramatic alteration of the protein product. These mutations were not found in a large number of control chromosomes. Although the BENE gene has not been screened for the presence of mutations, and other potential genes might be present in the NPH1 interval, as indicated by the presence of two additional ESTs and a large number of sequences predicted to be exonic by exon prediction programs, our findings strongly suggest that the novel gene we have identified in the NPH1 region is the gene responsible for NPH. This gene encodes an SH3 domain. This domain, found in a great variety of membrane-associated or intracellular proteins, may mediate assembly of specific protein complexes via binding to proline-rich peptides (22 ). No other known motif and no other significant homology was found in any other part of the protein. It is thus difficult to speculate about the function of such a gene and its potential role in the disease. NPH is a tubulointerstitial disorder characterized by early occurrence of irregular thickening of the TBM and focal interstitial fibrosis (2 ). Because of these abnormalities associated with the failure of some anti-TBM antibodies to recognize the TBM, it has been suggested that NPH might be related to a defective TBM component (23 ,24 ). The absence of the protein product encoded by the gene identified here might interrupt a signalling pathway that normally regulates expression of specific TBM components. Characterisation of the spatiotemporal pattern of expression of this protein, as well as biochemical studies should provide some insights into the pathogenesis of the disease.
The refined analysis of the deletions revealed at least two types of deletions, in contrast to previous analyses of large NotI-digested DNA fragments (8 ). The sequencing of the complete NPH1 deletion regions will facilitate the identification of all deletion breakpoints so that the rearrangement mechanism might be understood.
Agarose-embedded DNA from 24 patients was digested with SfiI. Electrophoresis in 1% agarose gels in 0.5* TBE buffer using a CHEF DRII system 5 (Biorad) was for 24 h with 200 V and 2-15 s switch time. Gel blotting and hybridizations were performed as described (25 ).
cDNA selection was performed as described (26 ,27 ). In brief, BAC 187E16 DNA (400 ng) was heat denatured, loaded onto 25 mm2 nylon discs (Hybond N+, Amersham France), and incubated in a quench solution containing pBeloBAC11 (0.5 µg/µl), bacterial host strain DH10B/r (0.5 µg/µl), mitochondrial human (0.5 µg/µl), ribosomal human pR5.8 and pR7.3 (0.8 µg/µl) and human repetitive (0.5 µg/µl) DNA. cDNA inserts from a foetal kidney library (Clontech) were PCR amplified, gel purified, preblocked using the same quench solution and hybridized to the immobilized BAC DNA. Membranes were washed and the trapped cDNA was eluted, amplified and subjected to a second round of selection. Reamplified cDNA was then cloned into pCRTM2.1 (TA cloning Kit, Invitrogen) and sequenced.
IMAGE clones were obtained from the `I.M.A.G.E Consortium (LLNL)' (28 ). The sequences of the clones were used to select primers for PCR with a foetal kidney cDNA library (Clontech). Clones inserts were recovered by double digestion with EcoRI and NotI and used as probes after radioactive labelling (Rediprime Kit, Amersham). IMAGE clone ID 321596 was shown to be chimeric. Using selected primers, an amplified product outside the chimeric region was cloned into pGEMR-5Zf(+) (Promega) and used for screening.
Recombinant clones (5 * 105) of a [lambda]gt10 human foetal kidney cDNA library were plated, transferred and hybridized according to manufacturer's instructions (Clontech). Duplicate positive clones were purified and subsequently amplified by the ExpandTM Long Template PCR System (Boehringer Mannheim). The PCR product was then directly sequenced.
BAC library filters from Research Genetics (Huntsville, AL) were hybridized with probes derived from YAC-end cloning and Alu-PCR (respectively 765F2L and 804/6) (8 ). BAC-end cloning was performed as described for YAC clones (29 ), with primers chosen in the pBeloBAC11 sequence.
Three BAC clones were partially digested by CvijI, and DNA ranging in size from 7-10 kb was subcloned into plasmid pBC-SK+ (Stratagene). Approximately 2500 clones were sequenced at both ends, and assembled using Phred and Phrap software (P. Green, unpublished). With the orientation and distance constraints of end sequences from subclone inserts of a determined size, a scaffold of subclones was obtained for each BAC. The information of the size and position of intercontig gaps permitted selection of individual clones and extension of the sequence by primer walking.
Similarities with known genes and ESTs were identified with BLAST programs (12 ) using a non-redundant compilation of the EMBL and GenBank databases. ESTs sharing homologies withthe genomic sequence were retrieved from UniGene (http://www.ncbi.nlm.nih.gov:80/Schuler/UniGene/). Amino acid comparisons were performed against protein sequences in the non-redundant Swiss-Prot, GenPept and PIR databases using BLASTX and BLASTP programs (12 ). Protein domain homologies and motifs were searched by screening the ProDom (18 ) and Prosite (19 ) databases. Exon predictions were performed using GRAIL (13 ) and BCMGeneFinder [option FEXH (14 )] through theire-mail servers grail@ornl.gov and service@theory.bchs.uh.edu, respectively.
Exons were amplified by PCR using flanking intronic primers. Primers and conditions were selected with OLIGO 5.0 program (NBI). The primers were as follows: exon 2: 5'-CTAAGGCGATATGGTATTTA-3' and 5'-ATGTAAGTGCGGTTCCTGTA-3'; exon 3: 5'-TTTCTGGTTCTGATAATAGA-3' and 5'-GTATAGGAAGAGATGTTTTA-3'; exon 4: 5'-AAATTAGGAAACTGAAATTA-3' and 5'-CATTAAAGCTATTGGTGATA-3'; exon 5: 5'-TGTATATTATTTCAAGTAGT-3' and 5'-GTATGGACATCGACCCTTAG-3'; exon 6: 5'-GATTATTGAATTTTATTTAA-3' and 5'-TGTTTTATTAAAAGCGAAAA-3'; exon 7: 5'-GCATTAGTTAAAAAGCACTT-3' and 5'-AAATGTTTCCTAAACCTACT-3'; exon 8: 5'-CATAACCTGACCTGACTCAC-3' and 5'-CAATGAGAATGTTTCCAAGT-3'; exon 9: 5'-TATAGAGATGCAGAAAC-3' and 5'-GAAAAATTAGACGTGGATCT-3'; exon 10: 5'-GATTTGGAGTTTCTTTCTTT-3' and 5'-CTATGACAAAATCTGGAAGA-3'; exon 12: 5'-GGTGACATTTCAAAGAGCTT-3' and 5'-GAAATTCACTCACTCCACTC-3'; exon 14: 5'-GGACTTGGTATGTGCTTATA-3' and 5'-CCTGAGGTATCAAGAGTCTA-3'; exon 15: 5'-TACATGCCCACAGCTTATAT-3' and 5'-ACCTCTCAGATGCTTCTATT-3'; exon 18: 5'-AAATCATTTGGCACAATAAT-3' and 5'-ATAAGCCAGCAGGTTTCCAT-3'; exon 19: 5'-AAGGACTTGTTACTACTTGG-3' and 5'-ACTGCAAATATGGAGTTCAG-3'. Following denaturation at 95°C for 3 min, PCR was with 30 cycles of 1 min at 94°C, 1 min at annealing temperatures of 40°C for exon 3, 44°C for exons 4-6 and 18, 46°C for exons 2, 9 and 10 and 50°C for exons 7, 8, 12, 14, 15 and 1 min at 72°C for 1 min and final extension at 72°C for 10 min. PCR products were purified with Wizard PCR Preps DNA Purification System (Promega) and sequenced.
Volumes of 8 µl of the PCR reactions were mixed with 1 µl of sterile water and 3.5 µl of 95% formamide, 0.025% xylene cyanol. Samples were heated at 98°C for 5 min, then placed immediately on ice. Electrophoresis with the GenePhor Electrophoresis Unit using the GeneGEl Excel 12.5 /24 Kit (Pharmacia) was for 3 h at 200 V, 10 mA and 3 W. Staining was performed in a GeneStain Automated Gel Stainer using PlusOne Silver Staining Kit (Pharmacia).
We thank the patients, the families and the physicians who have contributed to this project. We gratefully acknowledge C. Petit, J.S. Beckmann and P. Brooks for helpful discussion and critical review of the manuscript, A. Munnich for his invaluable support, D. Samson for aid with computer analysis, I. Bordelais and F. Gary for technical assistance. This study was supported by the Association Française contre les Myopathies (AFM), the MENESR (ACC-SV95), the Assistance Publique-Hôpitaux de Paris, the Fondation pour la Recherche Médicale, and the Association pour l'Utilisation de Rein Artificiels.
*To whom correspondence should be addressed. Tel: +33 1 44 49 50 98; Fax: +33 1 44 49 02 90; Email: antignac@necker.fr
+The two first authors contributed equally to this work
Human Molecular Genetics
Pages
A novel gene that encodes a protein with a putative src homology 3 domain is a candidate gene for familial juvenile nephronophthisis
Introduction
Results
Characterization of deletions by pulsed-field gel electrophoresis (PFGE)
Construction of a BAC contig
Sequencing the NPH1 candidate region
Identification of a novel gene in the NPH1 interval
Mutations in patients affected with NPH
Discussion
Materials And Methods
Pulsed field gel electrophoresis
Direct cDNA selection
Probe design and cDNA screening
BAC library screening and BAC-end probe generation.
Pairwise end sequencing
Sequence analysis and gene prediction
Mutation detection
SSCP analysis
Acknowledgements
References
Exon
numberExon
size (bp)3' splice site
5' splice site
Intron size (kb)
1
ND
-
CAACAGgtatggtcg
3.7
2
74
tctccatagGTTGAT
TCAAAGgtaaagtat
21.7
3
61
ttaatttagATGTAT
AGCAAAgtaagtatt
1.1
4
125
ctctcatagGCTGAT
AACTGAgtatgcttc
8.4
5
193
tttttgtagAGTTGG
TTTAAGgtaggtaga
1.2
6
102
tgctttaagAAAGGG
CTAGAGgttagtctt
3.3
7
104
gttttttagCCTTAT
GCAAAGgtacaaaag
0.3
8-4A
211
tctccccagAACTGA
CTCCAAgtaacattt
1.4
8-6A
43
tctccccagAACTGA
TCAGAGgcgggcatc
1.6
9-4A
85
cttcagcagATAAAC
AGGAAGgtaatgcgt
1.3
9-6A
88
tgtcttcagCAGATA
AGGAAGgtaatgcgt
1.3
10
95
ttttttcagGGAATC
GGCACTgtaagtata
1.3
11
129
ttctttcagATTAGG
AATAAGgtaaggtta
9.9
12
75
tcatttcagGTTCTG
CCCCAGgtaagtaat
2.1
13
111
tcatttcagGTTACT
CGCAATgtatgtccg
1.0
14
83
atttcctagTCAACT
AGCAAAgtaagttggct
2.1
15
77
cttttatagAACTTA
GAAGAGgtatggctctc
0.8
16
100
tcttaaaagCACACG
ACTAAGgtgggtacc
11.7
17
113
tctttaatagTCTACT
GTACTGgtaagtggg
2.4
18
74
tgcatttcagATTTAA
CTCAGGgtgggtagc
3.5
19
45
gctctttagAGTTCG
GAGAAGgtaaaatat
0.8
20
>273
ttttcccagAGAGAC
REFERENCES
{Present address: Department of Pediatrics, Philipps University, 35033 Marburg, Germany
This page is maintained by OUP admin. Last updated Sat Oct 18 13:29:50 BST 1997. Part of the OUP Journals World Wide Web service.
Copyright
Oxford University Press, 1997
