Comparative analysis of the polycystic kidney disease 1 (PKD1) gene reveals an integral membrane glycoprotein with multiple evolutionary conserved domains
Comparative analysis of the polycystic kidney disease 1 (PKD1) gene reveals an integral membrane glycoprotein with multiple evolutionary conserved domainsRichard Sandford*, Barbara Sgotto, Sam Aparicio, Sydney Brenner, Mark Vaudin1,+, Richard K. Wilson1, Stephanie Chissoe1, Kym Pepin1, Alex Bateman2, Cyrus Chothia2, Jim Hughes3 and Peter Harris3
Department of Medicine, Addenbrooke's Hospital, Cambridge, CB2 2QQ, UK, 1Genome Sequencing Centre, Washington University School of Medicine, St. Louis, MO 63108, USA, 2MRC Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, UK and 3MRC Molecular Haematology Unit, Institute of Molecular Medicine, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
Received May 6, 1997;Revised and Accepted July 4, 1997
PKD1 is the major locus of the common genetic disorder autosomal dominant polycystic kidney disease (ADPKD). Analysis of the predicted protein sequence of the human PKD1 gene, polycystin, shows a large molecule with a unique arrangement of extracellular domains and multiple putative transmembrane regions. The precise function of polycystin remains unclear with a paucity of mutations to define key structural and functional domains. To refine the structure of this protein we have cloned the genomic region encoding the Fugu PKD1 gene. Fugu PKD1 spans 36 kb of genomic DNA and has greater complexity with 54 exons compared with 46 in man. Comparative analysis of the predicted protein sequences shows a lower level of homology than in similar studies with identity of 40 and 59% similarity. However key structural motifs including leucine rich repeats (LRR), a C-type lectin and LDL-A like domains and 16 PKD repeats are maintained. A region of homology with the sea urchin REJ protein was also confirmed in Fugu but found to extend over 1000 amino acids. Several highly conserved intra- and extra-cellular regions, with no known sequence homologies, that are likely to be of functional importance were detected. The likely structure of the membrane associated region has been refined with similarity to the PKD2 protein and voltage gated Ca2+ and Na+ channels highlighted over part of this area. The overall protein structure has therefore been clarified and this comparative analysis derived structure will form the basis for the functional study of polycystin and its individual domains.
Mutations at the PKD1 locus account for 85% of cases of the common genetic disorder autosomal dominant polycystic kidney disease (1 ). The pathogenesis of ADPKD, a systemic condition characterised by progressive renal cystic disease and other extrarenal cystic and noncystic manifestations, remains poorly understood. A wide variety of abnormalities in epithelial cell growth and differentiation have been described in ADPKD and other human and animal models of cystic kidney disease but the precise molecular mechanisms involved in cyst development are unidentified (2 ). The cloning of PKD1 (3 ) provides an important opportunity to determine the primary molecular events in cystogenesis and the pathways involved in maintaining normal epithelial cell structure and differentiation.
The PKD1 gene encodes a novel protein, polycystin, whose primary structure predicts a large membrane spanning glycoprotein with multiple domains that may be involved in cell-cell or cell-matrix interactions (4 ). These domains include a leucine-rich repeat (LRR), a carbohydrate-binding domain, multiple immunoglobulin-like (PKD) domains and a region described as having similarity to fibronectin-type III domains and a newly described sea urchin Receptor for Egg Jelly (REJ) protein involved in sperm-egg interactions (4 ,5 ). Much of the 14.1 kb PKD1 transcript is reiterated several times elsewhere on chromosome 16 which has made full characterisation of the gene and pathogenic mutations very difficult (3 ). Immunolocalisation studies with anti-polycystin antibodies favours renal tubular and cystic epithelial expression although variation exists between different published reports (6 ,7 ). Therefore the precise function of polycystin remains unclear with a lack of consensus both on the protein's structure and its subcellular localisation, plus a paucity of mutations to define key structural and functional domains.
To refine structural predictions and identify potential functional regions of polycystin we have examined the sequence conservation of the PKD1 protein across vertebrate evolution using the Fugu genome (8 ). Characterisation of the Fugu genome has permitted the analysis of genes across 400 million years of vertebrate evolution using its small size and simplicity to rapidly identify homologues and other conserved sequences (9 ). Whilst many Fugu genes have highly conserved coding sequence but minimal or absent noncoding similarity, the evolutionary distance between teleosts and man can be exploited to highlight sequence divergence that readily identifies highly conserved regions as being potentially functionally important (10 ). The complete sequence of the Fugu homologue of the PKD1 gene has been determined by analysis of two previously reported cosmid clones (11 ). They were shown to contain the entire PKD1 gene together with homologues of TSC2 and SSTR5 defining a conserved synteny group. Both cosmids were sequenced to generate a single contig that had homology to the entire human PKD1 gene. Low overall sequence identity has allowed the clear definition of highly conserved domains. The protein structure has been clarified, including confirmation and refinement of the domain structure, membrane associated region and a large region with homology to the sea urchin REJ protein (5 ). In addition we report the identification of potential novel functional domains by sequence homology which will permit the rational design of domain specific reagents for their in vivo and in vitro functional analysis.
The sequence of the Fugu PKD1 gene was determined by analysis of 65 kb of contiguous genomic sequence derived from cosmid clones 295C6 and 48D10 (11 ). Sequence similarity to human PKD1 was initially identified using BLAST (12 ) and FASTA (13 ) and regions of the contig annotated with the human exon equivalent. Homologies to human PKD1 exons 1 and 46 confirmed the presence of the entire Fugu homologue on this contig. The human polycystin sequence was also used to search for regions of similarity in a three frame translation of the Fugu genomic sequence using DOTTER (14 ) and MacVectorTM. Completion of the assembly for the Fugu gene was performed using Xgrail (15 ) to predict further candidate exons and RT-PCR, using primers from regions of significant homology, to confirm intron-exon boundaries and exons with low homology. RT-PCR did not reveal any evidence of significant alternative splicing in RNAs isolated from multiple tissues although brain tissue was unavailable (unpublished data).
Unlike many other reported Fugu genes (9 ), Fugu PKD1 showed only minimal reduction in genomic size (36 kb compared with 52 kb for human), and had greater complexity with 54 exons compared with the 46 in the human gene (Fig. 1 ). The reduction in genomic size was accounted for solely by the difference in size of intron 1, 6.7 kb versus 17 kb. Additional introns were found in exons 5, 11, 15, and 23, with the 3.6 kb human exon 15 represented over five Fugu exons (15a-15e), the largest being 2.3 kb (15a). With the exception of the novel introns all other intron splice sites were in the same position and phase as those of the human gene. Using pairwise alignments and GC content analysis of intron sequences no evidence for other alternatively spliced exons was found in a manner analogous to exon 31 of the TSC2 gene (16 ). However a region of 90% identity over 31 nucleotides was identified in intron 1 from both PKD1 sequences (Fig. 1 ). RT-PCR provided no evidence for it representing part of an additional transcribed exon. Except for a complex repeat lying 8 kb 5' to Fugu PKD1 no other repetitive elements were identified in the 65 kb of Fugu genomic sequence. No region of similarity to the long polypyrimidine tract seen in intron 21 of the human was found (17 ).
The 13.7 kb Fugu PKD1 open reading frame predicted a protein of 4572 aa compared with 4302 aa for the human sequence. The predicted molecular weight of 500 kDa is likely to be a considerable underestimate as many of the predicted domains are heavily glycosylated in other species (5 ,18 ) and >50% of the potential N-glycosylation sites predicted in the Fugu sequence correspond to those identified in the human sequence (4 ). The low overall sequence identity of 40% (59% similarity) with human polycystin readily allowed the identification of regions of particular homology (Fig. 2 ). These included the LRR, PKD domains III and X, the REJ domain, several cytoplasmic loops in the transmembrane (TM) region and the juxta-membrane region of the cytoplasmic tail.
Comparative analysis of the PKD1 gene has demonstrated an evolutionary conserved sequence which forms part of a conserved synteny group (11 ). Its description across 400 million years of vertebrate evolution and the conservation of its domain structure suggest that it has an important role in normal cellular physiology. Whether this domain structure is present in non-vertebrate homologues remains to be determined. The low overall sequence identity of 40% (59% similarity) with human polycystin suggests that tertiary structure, rather than absolute sequence identity, is necessary for the function of much of the protein. The low overall sequence identity also readily identifies highly conversed but unique areas of the protein that may interact with other proteins or define critical functional and structural domains. Compared with the previously described structure of polycystin (4 ,5 ,17 ,18 ) several new domains are clearly apparent. The Fibronectin type III domains and the first two TM domains of the original model of polycystin have been replaced with a single REJ domain of ~1000 aa based on the data of Moy et al. (5 ) and homology with the Fugu sequence. Several cytoplasmic loops of the membrane associated region of polycystin and the juxtamembrane portion of the cytoplasmic tail are also identified by their level of similarity (Table 1 ).
The presence of leucine rich repeats, a carbohydrate binding domain and a region with similarity to an LDL-A domain all suggest that extracellular protein-protein interactions form the basis of polycystin function. This may also be predicted by the presence of multiple Ig-like (PKD) domains but their weak similarity to known Ig sets and the lack of known function of any other similar structures does not allow precise functional predictions to be made based on their sequence alone. The weak similarity of the C-terminal region to the Caenorhabditis elegans predicted protein, ZK945.9 similarly provides no clues to the function of polycystin but homology to the PKD2 predicted protein and the REJ protein in sea urchin suggests that polycystin's role may be in regulating ion transport in epithelial cells (5 ,21 ). The REJ protein is a membrane glycoprotein, which also contains two C type lectin domains, that interacts with egg extracellular glycoproteins to regulate the sperm acrosome reaction, a process characterised by alterations in Na+ and Ca2+ ion channel function (5 ). As the protein product of the PKD2 gene has homology to a family of voltage-gated Ca2+ channels in addition to PKD1 this suggests a physiological role for polycystin in regulating transmembrane Ca2+ fluxes.
. Positions of the PKD1 domains based on comparison between the Fugu and human (GenBank accession number L33243) sequences
Positions of putative polycystin domains
Domain
Human (aa)
Fugu (aa)
amino flank
33- 71
48- 88
LRR Leucine rich repeat
72- 125
89- 141
carboxy flank
126- 180
142- 195
PKD1 repeat
273- 356
279- 361
C type lectin
403- 532
408- 535
LDL-A related motif
639- 671
646- 677
PKD repeats II-XVI
851-2145
887-2202
REJ module
2146-3109
2203-3234
TM1
3075-3095
3200-3220
TM2
3281-3301
3406-3426
TM3
3323-3343
3448-3468
TM4
3559-3579
3693-3713
TM5
3582-3602
3716-3736
TM6
3669-3689
3803-3823
TM7
3895-3915
4015-4035
TM8
3934-3953
4055-4073
TM9
3994-4014
4114-4134
TM10
4027-4045
4147-4165
TM11
4084-4104
4211-4231
Coiled-coil
4193-4248
4374-4411
Amino acid residues corresponding to the domain predictions in Fugu polycystin are given with the revised predictions for the human sequence.
Previous descriptions of human polycystin identified a C-terminal region with multiple hydrophobic domains (17 ,18 ), with a suggested structure of 11 transmembrane (TM) domains (3 ). This predicted an extracellular location for the N-terminal region, supported by the presence of a signal peptide, the location of known extracellular domains and possible N-glycosylation sites, and a short cytoplasmic COOH-terminus. This hypothesis that polycystin is membrane bound with multiple passes is greatly strengthened by comparative analysis across vertebrate evolution with the Fugu homologue. The revised model of polycystin also includes 11 TM domains (TM I-XI) but re-defines the extent of the extracellular and cytoplasmic regions (Table 1 ). Interestingly, the homologous PKD1 TM9 and PKD2 TM4 regions show similarity to the voltage-sensing [alpha] helices (TM4) of voltage-gated channel subunits (Fig. 3 b), with positively charged amino acids at every third residue. This similarity strengthens the predicted TM topology and suggests that these regions of polycystin and PKD2 could be subunits of a voltage-gated channel. It should be noted, however, that the pattern of positively charged residues is not complete in the human PKD molecules and that a similar structure is found in some cyclic nucleotide gated channels (Fig. 3 b) which are only weakly voltage dependent, suggesting an ancestral rather than functional relationship.
Other conserved sequences found in polycystin include a possible coiled-coil domain and several phosphorylation sites in the cytoplasmic tail. Analysis of Fugu polycystin predicts a possible coiled-coil in the corresponding region, although with a lower level of certainty (Fig. 4 ). Closer inspection of this predicted motif shows a limited number of repeats interrupted with a partial unit which may disrupt the amphipathic structure. Furthermore, the first heptad contains a proline residue which is also likely to disrupt the alpha helical structure. Preliminary evidence however does suggests that the coiled-coil domain is functional and mediates interactions with the PKD2 protein (25 ). If this interaction with PKD2 is physiological it further strengthens the predictions of polycystin's role in ion channel function. The functional significance of the Fugu coiled-coil is therefore unclear and analysis of its ability to interact with PKD2 would be interesting. Whether polycystin may require phosphorylation for activation is unknown as is the role of the highly conserved juxtamembrane region of the cytoplasmic tail but its location suggests an important functional role in either signalling, binding to cytoskeletal proteins or membrane targeting (26 ).
An analysis of the likely 5' and 3' untranslated regions of the PKD1 gene and the intergeneic region between PKD1 and TSC2 identified no regions of conserved nucleotide sequence. However, a region of 90% identity over 31 nucleotides was identified in intron 1 (Fig. 1 ). Its possible function is unknown but analysis by RT-PCR provided no evidence for it representing part of an additional transcribed exon. It remains to be determined whether it may have some regulatory role in PKD1 gene expression but such regulatory sequences may be highly conserved and have been successfully identified using sequence comparison between Fugu and human (10 ).
Comparative analysis of the PKD1 gene and its protein product, polycystin, has been successfully employed to confirm and modify previous structural predictions and also to identify highly conserved regions that may function as regulatory sequences or highlight new, unique domains. It is anticipated that this study will allow the rational design of further reagents to study the in vivo function of polycystin.
Fugu cosmid clones were identified as previously described (11 ). Fugu RNA samples were prepared using guanidium thiocyanate-phenol-chloroform extraction (27 ) and first strand synthesis and RT-PCR reaction were carried out according to standard protocols (28 ) using Fugu PKD1 specific primers.
To prepare random subclone libraries cosmid DNA was purified and sonicated, and the resultant DNA fragments end-repaired, size selected, and ligated to blunt-ended M13 cloning vector as described (29 ). Electrocompetent Escherichia coli were transformed with the ligation and plated onto agar plates. Clones providing single-stranded M13 templates were purified using the ThermoMAX modified PEG/Triton protocol (30 ). DNA templates were sequenced using fluorescent dye-primer cycle sequencing with Sequitherm DNA polymerase (31 ). Fluorescent sequencing reactions were electrophoresed on ABI 373A Sequencers equipped with the Stretch upgrade, and the sequence data were automatically collected and analysed (29 ). DNA sequence data were automatically processed using the OTTO script (L. Hillier, unpublished), which performs quality evaluation, vector excision, and initial assembly. Base-calling and sequence assembly also was performed using PHRED and PHRAP (P. Green, unpublished). Sequence was manually edited using the XBAP interface (32 ). Sequence gaps were closed, the sequence was double-stranded and ambiguities resolved.
Sequence, structural and domain homology searches were carried out using BLAST, FASTA and Prosearch. Other analyses were performed as described in the text.
The Fugu PKD1 sequence has been submitted to Genbank under accession numbers AFO13613 and AFO13614.
The authors wish to thank Jim Hawkins and other members of the Genome Sequencing Center for technical assistance. This work was supported by a Medical Research Council Clinician Scientist Fellowship awarded to RS.
1 Gabow, P.A. (1993) Autosomal dominant polycystic kidney disease. N. Engl. J. Med., 329, 332-342. MEDLINE Abstract
2 Calvet, J. P. (1994) Injury and development in polycystic kidney disease. Curr. Opin. Nephrol. Hypertension, 3, 340-348.
3 The European Polycystic Kidney Disease Consortium (1994) The polycystic kidney disease 1 gene encodes a 14 kb transcript and lies within a duplicated region on chromosome 16. Cell, 77, 881-894.
4 Hughes, J., Ward, C, J., Peral, B., Aspinwall, R., Clark, K., San Millan, J. l.,Gamble, V., and Harris, P. C. (1995) The polycystic kidney disease 1 (PKD1) gene encodes a novel protein with multiple cell recognition domain. Nature Genet., 10, 151-159.MEDLINE Abstract
5 Moy, G. W., Mendoza, L. M., Schulz, J. R., Swanson, W. J., Glabe, C. G., and Vacquier, V. D. (1996) The sea urchin sperm receptor for egg jelly is a modular protein with extensive homology to the human polycystic kidney disease protein, PKD1. J. Cell Biol., 133, 809-817.
6 Ward, C. J, Turley, H, Ong, A. C. M, Comley, M., Biddolph, S., Chetty, R., Ratcliffe, P. J., Gatter, K., Harris, P. C. (1996) Polycystin, the polycystic kidney disease 1 protein, is expressed by epithelial cells in fetal, adult, and polycystic kidney. Proc. Natl. Acad. Sci. USA, 93,1524-1528.
7 Griffin, M. D., Torres, V. E., Grande, J. P., and Kumar, R. (1996) Immunolocalization of polycystin in human tissues and cultured-cells. Proc. Assoc. Am. Phys.,108, 185-197.
8 Brenner, S., Elgar, G., Sandford, R., Macrae, A., Venkatesh, B., and Aparicio, S. (1993). Characterisation of the pufferfish (Fugu) genome as a compact model vertebrate genome. Nature, 366, 265-268.MEDLINE Abstract
9 Elgar, G. Sandford, R., Aparicio, S., Macrae, A., Venkatesh, B., and Brenner, S. (1996) Small is beautiful: comparative genomics with the pufferfish (Fugu rubripes). Trends Genet.,12, 145-150.
10 Aparicio, S., Morrison, A., Gould, A., Gilthorpe, J., Chaudhuri, C., Rigby, P., Krumlauf, R., and Brenner, S. (1995) Detecting conserved regulatory elements with the model genome of the Japanese puffer fish, Fugu rubripes. Proc. Natl. Acad. Sci. USA., 92, 1684-1688.MEDLINE Abstract
11 Sandford, R., Sgotto, B., Burn, T., and Brenner, S. (1996) The tuberin (TSC2), autosomal dominant polycystic kidney disease (PKD1) and somatostatin type V receptor (SSTR5) genes form a synteny group in the Fugu genome. Genomics,38, 84-86.MEDLINE Abstract
12 Altschul, S. F., Gish, W., Miller, M., Myers, E. W., and Lipman, D. J. (1990) Basic local alignment search tool. J. Mol. Biol., 215, 403-410.
13 Pearson, W. L., and Lipman, D. J. (1988) Improved tools for biological sequence analysis. Proc. Natl. Acad. Sci. USA, 85, 2444-2448.
14 Dotter. Erik Sonnhammer. Sanger Centre, UK. Unpublished.
15 Xu, Y., Mural, R. J., Shah, M. B., and Uberbacher, E. C. Recognising exons in genomic sequenceusing GRAIL II. In Genetic Engineering Principles and Methods, Vol. 15. Jane Setlow (ed.). Plenum Press.
16 Maheshwar, M. M., Sandford, R., Nellist, M., Cheadle, J. P., Sgotto, B., Vaudin, M., and Sampson, J. R. (1996) Comparative analysis and genomic structure of the tuberous sclerosis 2 (TSC2) gene in human and pufferfish. Hum. Mol. Genet.,5, 131-137.
17 The American PKD1 Consortium (1995) Analysis of the genomic sequence for the autosomal dominant polycystic kidney disease (PKD1) gene predicts the presence of a leucine-rich repeat. Hum. Mol. Genet., 4, 575-582
18 The International Polycystic Kidney Disease Consortium (1995) Polycystic kidney disease: the complete structure of the PKD1 gene and its protein. Cell, 81, 289-298
19 Harpaz, Y. and Chotia, C. (1994) Many of the immunoglobulinsuperfamily domains in cell adhesion molecules and surface receptors belong to a new structural set which is close to that containing variable domains. J. Mol. Biol., 238, 528-539.MEDLINE Abstract
20 Bateman, A., Chothia, C. (1996) Fibronectin type III domains in yeast detected by a hidden Markov model. Current Biol., 6, 1544-1546.
21 Mochizuki, T., Wu, G., Hayashi, T., Xenophontos, S. L., Veldhuisen, B., Saris, J. J., Reynolds, D. M., Cai, Y., Gabow, P. A, Pierides, A., Kimberling, W. J.,Breuning, M. H., Deltas, C. C., Peters, D. J. M., Somlo, S. (1996) PKD2, a gene for polycystic kidney disease that encodes an integral membrane protein. Science, 272, 1339-1342.MEDLINE Abstract
22 Rost, B., Casadio, R., Fariselli, P., and Sander, C. (1995) Prediction of helical transmembrane segments at 75% accuracy. Protein Sci., 4, 521-533.MEDLINE Abstract
23 Lupas, A. (1996) Prediction and analysis of coiled-coil structures. Meth. Enzymol., 266, 513-525.MEDLINE Abstract
24 Qian, F., Watnick, T. J., Onuchic, L. F., and Germino, G. G. (1996) The molecular basis of focal cyst formation in human autosomal dominat polycystic kidney disease type 1. Cell, 87, 979-987.MEDLINE Abstract
25 Qian, F., Germino, F., Cai, Y., Zhang, X., Somlo, S., and Germino, G. (1997) PKD1 interacts with PKD2 through a probable coiled-coil domain. Nature Genet., 16, 179-183.MEDLINE Abstract
26 Neame, S. J., and Isacke, C. M. (1993) The cytoplasmic tail of CD44 is required for basolateral localisation in epithelial MDCK cells but does not mediate association with the detergent-insoluble cytoskeleton of fibroblasts. J. Cell. Biol., 121, 1299-1310.
27 Chomczynski, P., and Sacchi, N. (1987) Single-step method of RNA isolation by acid guanidinium thiocyanate-phenol-chloroform extraction. Anal. Biochem., 162, 156-159.MEDLINE Abstract
28 Sambrook, J., Fritsch, E.E., and Maniatis, T. (1989) Molecular Cloning: A Laboratory Manual, 2nd edn. Cold Spring Harbor Laboratory Press. New York. USA.
29 Wilson, R.K., and Mardis, E.R. (1996) In Genome Analysis: A Laboratory Manual. Cold Spring Harbor Laboratory Press, New York. In press
30 Mardis, E.R. (1994) High-throughput detergent extraction of M13 subclones for fluorescent DNA sequencing. Nucleic Acids Res., 22, 2173-2175MEDLINE Abstract
31 Fulton, L.L., and Wilson, R.K. (1994) Variations on cycle sequencing. BioTechniques, 17, 298-301.MEDLINE Abstract
32 Dear, S., and Staden, R. (1991) A sequence assembly and editing program for efficient management of large projects. Nucleic Acids Res., 19, 3907-3911.MEDLINE Abstract
33 Program Manual for the Wisconsin Package, Version 8, August 1994, Genetics Computer Group, 575 Science Drive, Madison, Wisconsin, USA 53711.
*To whom correspondence should be addressed. Tel: +44 1223 331755; Fax: +44 1223 336846; Email: rsandfor@med.cam.ac.uk
+Present address: The Sanger Centre, Hinxton Hall, Hinxton, Cambridge, CB10 1RQ, UK
-->
This page is maintained by OUP admin. Last updated Wed Aug 13 15:52:16 BST 1997. Part of the OUP Journals World Wide Web service.
Copyright
Oxford University Press, 1996