Human Molecular Genetics, 2000, Vol. 9, No. 6 967-977
© 2000 Oxford University Press
Structural features of normal and mutant human lysosomal glycoside hydrolases deduced from bioinformatics analysis
1Systèmes Moléculaires et Biologie Structurale, LMCP, CNRS UMR 7590, Universités Paris VI-Paris VII, T16, Case 115, 4 Place Jussieu, 75252 Paris Cédex 5, France, 2Hôpital Robert Debré, INSERM U458, 48 Boulevard Sérurier, 75019 Paris, France and 3Architecture et Fonctions des Macromolécules Biologiques (AFMB), CNRS UPR9039, 31 chemin Joseph Aiguier, 13402 Marseille Cédex 20, France
Received 4 January 2000; Accepted 10 February 2000.
| ABSTRACT |
|---|
|
|
|---|
Lysosomal storage diseases are due to inherited deficiencies in various enzymes involved in basic metabolic processes. As with other genetic diseases, accurate structure data for these enzymatic proteins should help in better understanding the molecular effects of mutations identified in patients with the corresponding lysosomal diseases; however, no such three-dimensional (3D) structure data are available for many lysosomal enzymes. Thus, we herein intend to illustrate for an audience of molecular geneticists how structure information can nonetheless be obtained via a bioinformatics approach in the case of five human lysosomal glycoside hydrolases. Indeed, using the two-dimensional hydrophobic cluster analysis method to decipher the sequence information available in data banks for the large group of glycoside hydrolases (clan GH-A) to which these human lysosomal enzymes belong, we could deduce structure predictions for their catalytic domains and propose explanations for the molecular effects of mutations described in patients. In addition, in the case of human ß-glucuronidase for which experimental 3D data have been reported, we also show here that bioinformatics methods relying on the available 3D structure information can be used to obtain further insights into the effects of various mutations described in patients with Sly disease. In a broader perspective, our work stresses that, in the context of a rapid increase in protein sequence information through genome sequencing, bioinformatics approaches might be highly useful for generating structurefunction predictions based on sequencestructure interrelationships.
| INTRODUCTION |
|---|
|
|
|---|
Lysosomal storage diseases comprise a large group of hereditary genetic diseases caused by deficiencies in enzymes involved in basic metabolic processes in all cell types except red blood cells. The resultant accumulation of non-degraded substrates within the lysosomes leads to dysfunction of the affected organs. In recent years, advances in molecular genetics have enabled the cloning and molecular characterization of a large number of genes encoding lysosomal enzymes from various species, including man. Moreover, as with many other genetic diseases, the causal mutations in affected patients have been identified. Interestingly, gene mutations most frequently found in patients with lysosomal storage diseases are missense point mutations, which substitute one amino acid with another residue that often has different physicochemical properties (1,2).
Although it is easy to understand why a truncated protein would not be functional, it is much more difficult to explain the effects of such missense point mutations on the enzyme, as these may be variously due to incorrect three-dimensional (3D) folding, insolubility, incorrect glycosylation, a defect in lysosomal targeting, instability within the lysosome, inability to form a homo- or heteromultimer, or a direct effect on enzyme activity. Obviously, as the answer cannot come solely from molecular analysis of the normal and mutant genes, complex and varied biochemical techniques are required to characterize the mutant enzymes.
The availability of accurate structural data for these enzymes might nonetheless allow a direct insight, at the molecular level, into the impact of a specific point mutation on enzyme structure/function. Unfortunately, no such structural information is available for many lysosomal enzymes (Table 1). Thus, the aim of this article is to review our own recent work on human lysosomal glycoside hydrolases in order to clearly outline for an audience of molecular geneticists how structure predictions can, however, be made by use of a bioinformatics method named hydrophobic cluster analysis (HCA) (3,4). In addition, in the case of human ß-glucuronidase whose crystal structure has been reported by Jain et al. (5), we also illustrate here how other bioinformatics methods can be used to refine crude 3D structure data. Most importantly, we also show how the structural information gained in both situations can further our understanding of the molecular effects of gene mutations described in patients with the corresponding lysosomal storage diseases; on the other hand, such structure information may also help in the rational design of site-directed mutagenesis studies when the corresponding mutations have not been described in patients, as shown by the recent work of Islam et al. in the case of human ß-glucuronidase (6).
|
| STRUCTURE PREDICTIONS VIA BIOINFORMATICS METHODS IN THE ABSENCE OF EXPERIMENTAL 3D DATA |
|---|
|
|
|---|
Despite a lack of structural data for many lysosomal enzymes, their structure and/or function can nonetheless be investigated using the data found in protein banks, in particular sequence and 3D structure banks. Indeed, sequence comparisons of related proteins make it possible to identify the most highly conserved structural features (
helices and ß strands) and functional elements such as catalytic amino acids, which play an important role in the structure and/or function of these enzymatic proteins. When the protein under study is related to another protein whose 3D structure is known, molecular modelling techniques can be used. Provided that there is at least 25% sequence identity between the two proteins and that they are roughly the same size, an accurate 3D model of the protein under study can even be built using homology modelling techniques. If this is not the case, one can still use a bioinformatics method such as the 2D HCA method (see ref. 7 for a review) to make structurefunction predictions whose accuracy will depend mainly on the wealth of information available in the data banks.
Contributions of homology modelling
As regards lysosomal enzymes, it was shown for example that crystallographic structure data from a bacterial chitobiase could be used to build an accurate 3D model of the
chain of human lysosomal hexosaminidase A (8). This approach relied on the fact that the two enzymes, which both belong to family 20 of glycoside hydrolases, share 26% amino acid identity. The 3D structure model of the
chain of hexosaminidase A allowed an explanation of the effects of various mutations described in TaySachs patients on the catalytic domain of this enzyme. In particular, it was found that most of the mutations that cause a severe infantile form of TaySachs disease are located in the core of the 3D structure of the enzyme.
Contributions of the HCA method: lysosomal glycoside hydrolases as an example
Background considerations.
Over the last several years, we have used the 2D HCA method to gain insights into the structurefunction characteristics of the catalytic domains of lysosomal glycoside hydrolases for which no 3D structure data were available (at the time of our study). Five human enzymes implicated in lysosomal storage diseases were included in our study: ß-glucuronidase (hBGLU, Sly disease) (9), ß-glucocerebrosidase (hBGC, Gaucher disease) (10),
-L-iduronidase (hIDUA, Hurler-Scheie disease) (11), ß-galactosidase (hBGAL, Landing disease and Morquio type B disease) (12) and ß-mannosidase (hBMAN, mannosidosis) (13).
The choice of these enzymes was justified by the existence of extensive structural and functional data concerning the large group of which they are members: the clan GH-A of glycoside hydrolases. Indeed, glycoside hydrolases are a widespread group of enzymes involved in many critical pathways of life. Sequence alignment strategies have permitted their classification into families on the basis of amino acid sequence similarities and mechanistic considerations (14,15). In a recent update, a group of families has been named clan GH-A (16); this group is currently composed of glycoside hydrolase families 1, 2, 5, 10, 17, 26, 30, 35, 39, 42, 51 and 53. Enzymes belonging to clan GH-A hydrolyze the glycosidic bond in a general acid catalysis mechanism with retention of the anomeric configuration. Such retaining enzymes function through a two-step mechanism involving a covalent glycosyl-enzyme intermediate (17,18). In this double displacement reaction, a critical active site residue functions as a nucleophile to form the glycosyl-enzyme intermediate, whereas the other (the acid/base catalyst also named proton donor) acts as a general acid catalyst during glycosylation and then as a general base during deglycosylation. In clan GH-A glycoside hydrolases, the critical amino acids are two glutamic acid residues (with an asparagine or a histidine preceding the acid/base catalyst), which are situated on opposite sides of the glycosidic bond and are separated by a distance of ~5.5 Å. Of note, Withers and Aebersold (17) were able to conclusively identify the catalytic nucleophile in several glycosidases by use of active-site labeling with mechanism-based inhibitors such as 2-deoxy-2-fluoro-glycosides; for example, in the case of human glucocerebrosidase, the nucleophile was identified as Glu340, not as Asp443 as suggested previously by affinity labeling with conduritol B epoxide (1921).
Using the HCA method to analyze the protein sequences of clan GH-A available in sequence data banks, we were able to localize the catalytic domains of all five aforementioned lysosomal enzymes, to predict the function of some of the residues and to explain the impact of specific point mutations on catalytic activity (3,4). Of note, we have reported previously results concerning bovine lysosomal ß-mannosidase; since then, we have obtained similar results for hBMAN whose sequence was only recently added to the SwissProt data bank (13,22), the two sequences sharing 75% identity.
Methodological aspects.
Our study comprised three different steps. First, we compared the known 3D structures of enzymes belonging to clan GH-A to detect any shared features, with particular emphasis on conserved elements of their catalytic domains (secondary structures and functional residues). This comparative structural analysis of several 3D structures from families 1, 2, 5, 10 and 17 of clan GH-A revealed a remarkable conservation of the 3D structures of the active sites despite large differences in size (250450 amino acids) and sequence (<20% identity) of the catalytic domains. All these enzymes share a similar catalytic domain consisting of a 3D (
/ß)8 barrel. Figure 1 depicts the catalytic domain of the cyanogenic ß-glucosidase (CBG) of Trifolium repens as an example of an (
/ß)8 barrel. Interestingly, the C-terminal ends of the ß2, ß3, ß4, ß6, ß7 and ß8 strands of the catalytic ß barrel bear functional residues that are conserved in the various 3D structures studied. In particular, there is strict conservation of the two glutamic acid residues located at the C-terminal end of strands ß4 and ß7 and acting as the acid/base catalyst and the nucleophile, respectively (4).
|
In a second step, we compared the sequences of enzymes from families 1, 2, 5, 10 and 17 of clan GH-A to see whether the aforementioned conserved features, which were observed when analyzing known 3D structures, were also conserved for all members of these families. This study was performed using the 2D HCA method described in Figure 2 (7,23). Indeed, the HCA method, starting from a 2D helical representation of protein sequences, can help to overcome the limitations of 1D linear methods (such as BLAST and FASTA) whose automatic use is generally limited to the case of fair similarities, i.e. above a threshold estimated at ~2530% sequence identity over a sufficient length. In the twilight zone below this threshold, the HCA method remains efficient as it places the observed sequence similarities in the 2D context of the protein; it does indeed intimately combine the comparison of sequences to that of the protein secondary structures statistically centered on hydrophobic clusters. Moreover, the HCA method is not always dependent on the prior detection of 1D similarities, thereby revealing structural relationships even when no sequence conservation has been highlighted. Numerous applications as well as theoretical studies have now established the efficiency of this approach (for a review see ref. 7). As far as the efficiency of the HCA method is concerned, it should be stressed here that, following recognition, delineation and alignment of related sequence domains by HCA, it is often possible to use statistical tools proposed by classical lexical 1D softwares to estimate the value of the considered alignment; for example, extensive statistical estimation has recently allowed us to show that the N-termini of focal adhesion kinases and Janus kinases contain divergent band 4.1 domains (24). Otherwise, statistical Z-score indexes relative to sequence identity, sequence similarity (using an appropriate matrix) and hydrophobic matching may be calculated with respect to random alignment (conserving the overall amino acid content) (7). The ratio (named sequence reliability index) between the product of the three aforementioned Z-scores and the equivalent best random product does usually show fair values (e.g. 510) for distantly related proteins typically exhibiting only ~15% sequence identity.
|
In the present work, use of the HCA method was justified for two reasons. First, it allows the alignment of protein sequences having very low identity (<2025%), as discussed above. Secondly, the aforementioned 3D structural information derived from data banks could be combined with the HCA method for prediction of protein secondary structure. Thus, a vast comparative analysis of all sequences from families 1, 2, 5, 10 and 17 showed that the elements conserved in the known 3D structures belonging to these families (ß strands bearing the catalytic site) were also likely to be conserved in all proteins of these families. It should be emphasized here that hBGLU and hBMAN belong to family 2 of clan GH-A.
Finally, in step 3, the comparative study using the HCA method was extended to families 30, 35 and 39 of clan GH-A for which no 3D structures have been reported. Of note, these three families include lysosomal hBCG, hBGAL and hIDUA, respectively. As a consequence, we could also identify the structural motifs of the catalytic domains of these three additional human lysosomal glycoside hydrolases (Fig. 3).
|
Active-site motifs of human lysosomal glycoside hydrolases.
Taken together, our results revealed that, despite low level of sequence identity, all proteins of clan GH-A (including the five aforecited human lysosomal enzymes) are likely to share a similar catalytic domain consisting of an (
/ß)8 barrel (3,4). Interestingly, the C-terminal ends of strands ß2, ß3, ß4, ß6, ß7 and ß8 of the catalytic barrel harbor functional residues that are conserved in families 1, 2, 5, 10, 17, 30, 35 and 39 of the GH-A clan. In particular, the Asn/HisGlu dipeptide motif characterizing the acid/base catalyst and the Glu residue acting as the nucleophile, which are located at the C-terminal ends of strands ß4 and ß7, respectively, appear to be strictly conserved for all members of clan GH-A (3,4). In addition, two other residues are also well conserved (Fig. 3): His/Tyr and Trp/Phe located at the C-terminal ends of strands ß6 and ß8, respectively. The functional residues on strands ß2 and ß3 were not found to be strictly conserved. These variations may be explained by the adaptation of certain enzymes of clan GH-A to a specific substrate, the absence of a functional residue that is usually well conserved being most likely accompanied by a compensatory mutation; this was indeed observed in family 10 for the arginine residue on strand ß2. It is nonetheless remarkable that >550 enzymes from the eight different families forming clan GH-A (25) share three major characteristics: (i) similar 3D structures; (ii) the same catalytic mechanism leading to overall retention of the anomeric configuration; and (iii) a similar catalytic machinery involving a very limited number of conserved amino acids located on equivalent secondary structure elements (26). Of note, results obtained with mutant enzymes (from bacteria) belonging to different families of clan GH-A have experimentally confirmed the identification of the key catalytic glutamic acid residues by the HCA method (27,28).
Mutations in the active site of human lysosomal glycoside hydrolases.
Having been able to propose models for the catalytic domains of hBGLU, hBMAN, hBGC, hBGAL and hIDUA, we attempted to explain the effects of point mutations identified in patients with the corresponding lysosomal storage diseases. It appeared that many of these mutations involve amino acids located within the various ß strands; some mutations are of particular interest as they affect the conserved critical amino acids located at the C-terminal ends of the strands constituting the catalytic ß barrel (Fig. 3). For example, a mutation common to hBGLU, hBGC and hIDUA affects the arginine residue of strand ß2, which could play a role in the activation of the nucleophile (4). Such a mutation has indeed been described in patients with Sly disease, type I Gaucher disease and Scheie disease, respectively. For hBGAL, a mutation concerning the tyrosine residue at the C-terminal end of strand ß3 was observed in a patient with Morquio type B disease (29). For hBGC, a mutation involving the tryptophan residue at the end of strand ß8 was described in a patient with type I Gaucher disease (30). Interestingly, in the case of hBGLU, the experimentally determined native 3D structure confirmed our predictions for the catalytic domain of this enzyme. Indeed, by comparing the X-ray crystal structure of hBGLU with those of lysozyme and Escherichia coli ß-galactosidase, Jain et al. (5) proposed that Glu451 and Glu540 were critical for catalysis, Glu451 being the acid/base catalyst. However, these authors also suggested that Asp207 might be the nucleophilic residue, whereas the HCA method had led us to clearly predict that Glu540 was the nucleophile in the active site. In such a context, it should be emphasized that Islam et al. (6) have reported very recently the results of site-directed mutagenesis studies which confirm our HCA-based predictions. Indeed, from enzymatic activity and kinetic analyses of active site mutants, they concluded that Glu451 and Glu540 do form the acid/base catalystnucleophile pair. In addition, they demonstrated that Tyr504 (located at the C-terminal end of strand ß6) plays an important role in catalysis, a finding also in good agreement with our HCA-based prediction (4,6). Finally, from similar studies involving heterologous expression of active site mutants created by site-directed mutagenesis (as no such mutants have so far been described in patients), we recently obtained experimental evidence supporting our prediction that Glu235 and Glu340 (numbering as in the mature protein) play an essential role in the active site of hBGC (3,4, and unpublished data). As far as the identification of the nucleophilic residue in hBGLU and hBGC is concerned, it is worth noting that our predictions and the aforementioned mutagenesis results are in complete agreement with the elegant work of Wong et al. (31) and also of Miao et al. (19) who identified Glu540 and Glu340 as the catalytic nucleophile in hBGLU and hBGC, respectively, by use of active site labeling with 2-deoxy-2-fluoro-glycosides.
| BIOINFORMATICS METHODS ALSO ALLOW FULL EXPLOITATION OF 3D STRUCTURE DATA |
|---|
|
|
|---|
Experimental 3D structures provide essential information
It is widely recognized that experimental determination of the 3D structure of an enzymatic protein contributes essential information. Indeed, knowledge of the 3D structure of an enzyme generally leads to a better understanding of its catalytic machinery. On the other hand, 3D structures are also helpful in localizing and elucidating the effects of point mutations on structure and catalytic activity of these enzymatic proteins.
In the particular case of human ß-glucuronidase, we also intend to show here how 3D structure information can be used to better understand the molecular effects of point mutations identified in patients with the corresponding lysosomal disease, i.e. type VII mucopolysaccharidosis (MPS-VII) or Sly disease (32). Indeed, the experimental determination of the native 3D structure of hBGLU (20) provided us with a critical background to more fully analyze (using various bioinformatics softwares) the impact of the mutations on the catalytic function, stability, flexibility and homotetramer formation characteristic of this enzyme. Of note, genetic analysis of patients has heretofore identified >20 point mutations that result in loss of hBGLU activity (according to the Human Gene Mutation Database, Institute of Medical Genetics, Cardiff, UK: http://www.uwcm.ac.uk/uwcm/mg/hgmd0.html ).
Schematic outline of the 3D structure of hBGLU
The enzyme hBGLU is a homotetrameric protein and it is in this form that it is functional (Fig. 4). Each hBGLU monomer comprises 612 residues organized into three domains (5). The first domain contains residues 22223 and forms a highly deformed ß barrel, residues 121 constituting the signal peptide that is cleaved in the endoplasmic reticulum. The second domain stretches from residue 224 to 342 and its structure resembles that of the immunoglobulin constant domain, i.e. an all-ß structure with two facing ß sheets (one with three ß strands and the other with four). The third domain, comprising residues 343632, is the catalytic (
/ß)8 barrel. The 3D structure of hBGLU does not contain any disulfide bridges. The determination of this structure, together with those of cathepsin D (33) and arylsulfatase B (34), has enabled the identification of a large part of the recognition site of hBGLU by phosphotransferase. Like cathepsin D and arylsulfatase B, the hBGLU site has a ß hairpin surface structure, suggesting that the recognition site of a lysosomal enzyme by phosphotransferase is based more on this ß hairpin structure than on its sequence (5,33,34). However, experimental evidence supporting this hypothesis is still lacking. Of note, in the case of human lysosomal aspartylglucosaminidase, mutagenesis studies suggest that phosphotransferase recognition may not involve a universal single structural determinant but may be based on small contact points and their mutual position, the key elements in the contact being one or more lysine residues on the surface of the lysosomal enzyme (35).
|
Beyond experimental 3D structure
We have analyzed hBGLU mutations in two ways. First, we have localized known mutations in the structure of the protein using a molecular visualization software (Insight II, MSI, San Diego, CA). Next, we studied mutations located in the core of the enzyme, in particular those in proximity to internal cavities, using both the VOIDOO (36) and Insight II softwares. We indicate here some typical results of such analyses in order to illustrate how such bioinformatics approaches can help to understand better the impact of various point mutations in hBGLU.
Location of mutations in the global structure of hBGLU.
The different point mutations identified in patients with Sly disease are listed in Table 2; in addition, for each mutation, the location of the side chain of the wild-type amino acid on the hBGLU monomer is shown in Figure 5.
|
|
It appears that mutations H351Y, R382H, R382C and Y508C all affect amino acids located in the catalytic site, more specifically at the C-terminal end of one of the ß strands; these data are in good agreement with our HCA-based prediction of the limits of the ß strands forming the (
/ß)8 catalytic barrel. In addition, it should also be noted that mutation Y626H may modify the conformation of the catalytic site as it affects residue Y626 which is located on helix
8 between strands ß1 and ß8.
Other mutations may interfere with the formation or the stability of the enzymatic tetramer. Mutations K606N and R611W concern residues located on a large loop (in the catalytic domain) involved in contact between neighboring monomers. Mutation G136R may have an effect on the flexibility of a loop located in domain I. Interestingly, mutation Y495C results in the appearance of a Cys residue at the surface of each monomer. Consequently, in the 3D tetrameric structure, four cysteines are then present at the protein surface, possibly leading to the formation of abnormal disulfide bridges between tetramers and subsequent enzyme aggregation and inactivation. Mutation W627C also results in appearance of a Cys residue at the monomer surface with a similar potential effect as mutation Y495C.
Internal cavities and mutations affecting the core of hBGLU: the core of globular proteins (general considerations).
The core of globular proteins is mainly hydrophobic and forms the backbone of their 3D structure. Although the hydrophobic core plays an essential role in enzyme stability, its 3D structure is not totally rigid. In fact, many enzymes undergo varying degrees of conformational change when they carry out their natural functions, especially during interaction with their ligands (3739). Such conformational changes are allowed by, for example, the presence of internal cavities.
It has indeed been shown that virtually all proteins of >100 amino acid residues contain one or more internal cavities, which generally account for <2.3% of total protein volume, regardless of size or folding (40,41). The internal cavities found in proteins can be empty or, conversely, filled with one or more water molecules. In the first case, the surface of the cavity is layered with amino acid residues characterized by hydrophobic side chains, whereas in the second case it is formed by side chains of charged and/or polar residues. In solvated cavities, hydrogen bonds form between the side chains of the charged and/or polar residues, and between these side chains and the water molecules within the cavity.
It is now widely agreed that internal cavities play an important role in protein stability, flexibility and function (4043). In monomeric proteins composed of two or more domains, cavities at the interface of these domains permit varying degrees of movement between the domains; in multimeric complexes, the cavities formed at monomermonomer interfaces are also important for protein structure and function. Internal cavities also enable structural rearrangements that facilitate catalytic activity (42).
Molecular effects of mutations located in the core of hBGLU.
Relying on the 3D structure published by Jain et al. (5), we analyzed the core of hBGLU using the VOIDOO (35) and Insight II softwares, the former to count the internal cavities in the protein and the second to visualize these cavities and study nearby mutations. The VOIDOO analysis showed that hBGLU likely contains 25 internal cavities occupying a total volume of 1732 Å3, i.e. 1.46% of total monomer volume (118 600 Å3).
By visually analyzing the position of these different cavities with InsightII, we found that some of them are actually located at the junction of domains I, II and III (Fig. 6). Although the resolution (2.6 Å) at which the 3D structure of hBGLU was determined does not enable positioning of water molecules, it appears that some of these cavities, in particular those near the catalytic site, can be solvated since their surfaces are composed mainly of polar and/or charged residues. Mutations of such residues in solvated cavities may lead to varying degrees of alterations in their internal hydrogen-bonding network.
|
It is noteworthy that several mutations described in patients with Sly disease do affect amino acids that are located in the core of hBGLU and have side chains oriented towards the internal cavities of the enzyme. Mutations P148S, E150K, R216W and R435P involve residues in the walls of cavities located at the interface between domains I and III. Mutations Y320S/Y320C modify a cavity located at the junction of domains I, II and III. Mutations A354V and P408S concern residues in cavities close to the catalytic site. Thus, all these mutations may alter the flexibility, stability and function of the enzyme by affecting the internal cavities of the hBGLU protein and modifying local interactions.
| CONCLUSIONS |
|---|
|
|
|---|
Use of the HCA method allowed us to deduce precise structure predictions for the catalytic domains of hBGLU, hBMAN, hBGC, hBGAL and hIDUA. It also enabled us to propose hypothetical explanations for the molecular impact of point mutations identified in patients with the corresponding lysosomal storage diseases. Indeed, we found that a good many of these mutations affect amino acid residues located in the catalytic site and possibly playing an essential role in the enzymatic activity of these enzymes. Most importantly, specific predictions concerning the key catalytic residues have already been experimentally confirmed by site-directed mutagenesis studies or by approaches involving active-site labeling with mechanism-based inhibitors. In addition, experimental determination of the native 3D structure of hBGLU by Jain et al. (5) provided us with a very reliable support to analyze in further detail (via other bioinformatics methods) the impact of point mutations identified in patients with Sly disease not only on the catalytic activity but also on the stability, flexibility and formation of the active tetrameric structure of this enzyme. Taken together, our results should be very helpful for a better understanding of the molecular bases of the corresponding lysosomal diseases. In addition, our work may also make a contribution towards paving the way for protein engineering approaches for therapeutic purposes.
In a broader perspective, it is noteworthy that the sequences of the countless proteins of the biosphere give rise to a much more limited number of 3D folds. For example, recent studies estimated that ~6501000 independent folds are found in nature (4446). Furthermore, among the >10 000 experimentally determined 3D protein structures, at least 400 different types of fold are already known and characterized, which represent nearly half of all possibilities. As a result, it should be possible to rapidly increase the number of proteins for which structurefunction information is available if, relying on crude sequence data, we succeed in detecting these sequencestructure relationships in spite of high evolutionary divergence, i.e. low sequence identity (7). In such an increasingly favorable context, it will no doubt become possible to generalize the type of study outlined in this article which aims to bridge the gap between basic science and medicine.
| ACKNOWLEDGEMENTS |
|---|
We are indebted to Isabelle Callebaut for the completion of this manuscript. This work was supported by fellowships (P.D. and S.F.) and grants (P.L.) from the Association Vaincre les Maladies Lysosomales (VML, Evry, France). We also acknowledge the support of Universities Paris 6 and Paris 7, CNRS and INSERM.
| FOOTNOTES |
|---|
+ Present address: Information Engineering Branch, NCBI, NLM, NIH, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
§ To whom correspondence should be addressed. Tel: +33 1 40 03 19 32; Fax: +33 1 40 03 19 03; Email: plehn@infobiogen.fr ![]()
| REFERENCES |
|---|
|
|
|---|
1 Neufeld, E.F. (1991) Lysosomal storage diseases. Annu. Rev. Biochem., 60, 257280.[Web of Science][Medline]
2 Gieselmann, V. (1995) Lysosomal storage diseases. Biochim. Biophys. Acta, 1270, 103136.[Medline]
3 Henrissat, B., Callebaut, I., Fabrega, S., Lehn, P., Mornon, J.P. and Davies, G. (1995) Conserved catalytic machinery and the prediction of a common fold for several families of glycosyl hydrolases. Proc. Natl Acad. Sci. USA, 92, 70907094. [Erratum (1996) Proc. Natl Acad. Sci. USA, 93, 5674.]
4 Durand, P., Lehn, P., Callebaut, I., Fabrega, S., Henrissat, B. and Mornon, J.P. (1997) Active-site motifs of lysosomal acid hydrolases: invariant features of clan GH-A glycosyl hydrolases deduced from hydrophobic cluster analysis. Glycobiology, 7, 277284.
5 Jain, S., Drendel, W.B., Chen, Z.W., Mathews, F.S., Sly, W.S. and Grubb, J.H. (1996) Structure of human ß-glucuronidase reveals candidate lysosomal targeting and active-site motifs. Nature Struct. Biol., 3, 375381.[Web of Science][Medline]
6 Islam, M.R., Tomatsu, S., Shah, G.N., Grubb, J.H., Jain, S. and Sly, W.S. (1999) Active site residues of human ß-glucuronidase. Evidence for Glu(540) as the nucleophile and Glu(451) as the acidbase residue. J. Biol. Chem., 274, 2345123455.
7 Callebaut, I., Labesse, G., Durand, P., Poupon, A., Canard, L., Chomilier, J., Henrissat, B. and Mornon, J.P. (1997) Deciphering protein sequence information through hydrophobic cluster analysis (HCA): current status and perspectives. Cell Mol. Life Sci., 53, 621645.[Web of Science][Medline]
8 Tews, I., Perrakis, A., Oppenheim, A., Dauter, Z., Wilson, K.S. and Vorgias, C.E. (1996) Bacterial chitobiase structure provides insight into catalytic mechanism and the basis of TaySachs disease. Nature Struct. Biol., 3, 638648.[Web of Science][Medline]
9 Oshima, A., Kyle, J.W., Miller, R.D., Hoffmann, J.W., Powell, P.P., Grubb, J.H., Sly, W.S., Tropak, M., Guise, K.S. and Gravel, R.A. (1987) Cloning, sequencing, and expression of cDNA for human ß-glucuronidase. Proc. Natl Acad. Sci. USA, 84, 685689.
10 Ginns, E.I., Choudary, P.V., Martin, B.M., Winfield, S., Stubblefield, B., Mayor, J., Merkle-Lehman, D., Murray, G.J., Bowers, L.A. and Barranger, J.A. (1984) Isolation of cDNA clones for human ß-glucocerebrosidase using the
gt11 expression system. Biochem. Biophys. Res. Commun., 123, 574580.[Web of Science][Medline]
11 Scott, H.S., Anson, D.S., Orsborn, A.M., Nelson, P.V., Clements, P.R., Morris, C.P. and Hopwood, J.J. (1991) Human
-L-iduronidase: cDNA isolation and expression. Proc. Natl Acad. Sci. USA, 88, 96959699.
12 Morreau, H., Galjart, N.J., Gillemans, N., Willemsen, R., van der Horst, G.T. and dAzzo, A. (1989) Alternative splicing of ß-galactosidase mRNA generates the classic lysosomal enzyme and a ß-galactosidase-related protein. J. Biol. Chem., 264, 2065520663.
13 Alkhayat, A.H., Kraemer, S.A., Leipprandt, J.R., Macek, M., Kleijer, W.J. and Friderici, K.H. (1998) Human ß-mannosidase cDNA characterization and first identification of a mutation associated with human ß-mannosidosis. Hum. Mol. Genet., 7, 7583.
14 Henrissat, B. (1991) A classification of glycosyl hydrolases based on amino acid sequence similarities. Biochem. J., 280, 309316.
15 Henrissat, B. and Bairoch, A. (1993) New families in the classification of glycosyl hydrolases based on amino acid sequence similarities. Biochem. J., 293, 781788.
16 Henrissat, B. and Bairoch, A. (1996) Updating the sequence-based classification of glycosyl hydrolases. Biochem. J., 316, 695706.
17 Withers, S.G. and Aebersold, R. (1995) Approaches to labeling and identification of active site residues in glycosidases. Protein Sci., 4, 361372.[Web of Science][Medline]
18 Davies, G. and Henrissat, B. (1995) Structures and mechanisms of glycosyl hydrolases. Structure, 3, 853859.[Medline]
19 Miao, S., McCarter, J.D., Grace, M.E., Grabowski, G.A., Aebersold, R. and Withers, S.G. (1994) Identification of Glu340 as the active-site nucleophile in human glucocerebrosidase by use of electrospray tandem mass spectrometry. J. Biol. Chem., 269, 1097510978.
20 Dinur, T., Osiecki, K.M., Legler, G., Gatt, S., Desnick, R.J. and Grabowski, G.A. (1986) Human acid ß-glucosidase: isolation and amino acid sequence of a peptide containing the catalytic site. Proc. Natl Acad. Sci. USA, 83, 16601664.
21 Grace, M.E., Newman, K.M., Scheinker, V., Berg-Fussman, A. and Grabowski, G.A. (1994) Analysis of human acid ß-glucosidase by site-directed mutagenesis and heterologous expression. J. Biol. Chem., 269, 22832291.
22 Bairoch, A. and Apweiler, R. (1998) The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1998. Nucleic Acids Res., 26, 3842.
23 Gaboriaud, C., Bissery, V., Benchetrit, T. and Mornon, J.P. (1987) Hydrophobic cluster analysis: an efficient new way to compare and analyse amino acid sequences. FEBS Lett., 224, 149155.[Web of Science][Medline]
24 Girault, J.A., Labesse, G., Mornon, J.P. and Callebaut, I. (1999) The N-termini of FAK and JAKs contain divergent band 4.1 domains. Trends Biochem. Sci., 24, 5457.[Web of Science][Medline]
25 Henrissat, B. and Coutinho, P.M. (1998) Carbohydrate-active enzymes server (http://afmb.cnrs-mrs. fr/~pedro/CAZY/db.html ).
26 Henrissat, B. and Davies, G.J. (1997) Structural and sequence-based classification of glycoside hydrolases. Curr. Opin. Struct. Biol., 7, 637644.[Web of Science][Medline]
27 Bolam, D.N., Hughes, N., Virden, R., Lakey, J.H., Hazlewood, G.P., Henrissat, B., Braithwaite, K.L. and Gilbert, H.J. (1996) Mannanase A from Pseudomonas fluorescens ssp. cellulosa is a retaining glycosyl hydrolase in which E212 and E320 are the putative catalytic residues. Biochemistry, 35, 1619516204.[Medline]
28 Braithwaite, K.L., Barna, T., Spurway, T.D., Charnock, S.J., Black, G.W., Hughes, N., Lakey, J.H., Virden, R., Hazlewood, G.P., Henrissat, B. and Gilbert, H.J. (1997) Evidence that galactanase from Pseudomonas fluorescens subspecies cellulosa is a retaining family 53 glycosyl hydrolase in which E161 and E270 are the catalytic residues. Biochemistry, 36, 1548915500.[Medline]
29 Ishii, N., Oohira, T., Oshima, A., Sakuraba, H., Endo, F., Matsuda, I., Sukegawa, K., Orii, T. and Suzuki, Y. (1995) Clinical and molecular analysis of a Japanese boy with Morquio B disease. Clin. Genet., 48, 103108.[Web of Science][Medline]
30 Beutler, E., Demina, A. and Gelbart, T. (1994) Glucocerebrosidase mutations in Gaucher disease. Mol. Med., 1, 8292.[Web of Science][Medline]
31 Wong, A.W., He, S. Grubb, J.H., Sly, W.S. and Withers, S.G. (1998) Identification of Glu 540 as the catalytic nucleophile of human ß-glucuronidase using electrospray mass spectrometry. J. Biol. Chem., 273, 3405734062.
32 Sly, W.S., Quinton, B.A., McAlister, W.H. and Rimoin, D.L. (1973) ß-glucuronidase deficiency: report of clinical, radiologic, and biochemical features of a new mucopolysaccharidosis. J. Pediatr., 82, 249257.[Web of Science][Medline]
33 Baldwin, E.T., Bhat, T.N., Gulnik, S., Hosur, M.V., Sowder, R.C., Cachau, R.E., Collins, J., Silva, A.M. and Erickson, J.W. (1993) Crystal structures of native and inhibited forms of human cathepsin D: implications for lysosomal targeting and drug design. Proc. Natl Acad. Sci. USA, 90, 67966800.
34 Bond, C.S., Clements, P.R., Ashby, S.J., Collyer, C.A., Harrop, S.J., Hopwood, J.J. and Guss, J.M. (1997) Structure of a human lysosomal sulfatase. Structure, 5, 277289.[Medline]
35 Tikkanen, R., Peltola, M., Oinonen, C., Rouvinen, J. and Peltonen, L. (1997) Several cooperating binding sites mediate the interaction of a lysosomal enzyme with phosphotransferase. EMBO J., 16, 66846693.[Web of Science][Medline]
36 Kleywegt, G.J. and Jones, T.A. (1994) Detection, delineation, measurement and display of cavities in macromolecular structures. Acta Crystallogr. D, 50, 178185.[Medline]
37 Creighton, T.E. (1993) Proteins: Structures and Molecular Properties, 2nd edn. W.H. Freeman, New York, NY.
38 Perutz, M. (1996) Structure des protéinespathologie et approches thérapeutiques. John Libbey Eurotext, Montrouge, France.
39 Dominguez, R., Souchon, H., Lascombe, M. and Alzari, P.M. (1996) The crystal structure of a family 5 endoglucanase mutant in complexed and uncomplexed forms reveals an induced fit activation mechanism. J. Mol. Biol., 257, 10421051.[Web of Science][Medline]
40 Hubbard, S.J., Gross, K.H. and Argos, P. (1994) Intramolecular cavities in globular proteins. Protein Eng., 7, 613626.
41 Williams, M.A., Goodfellow, J.M. and Thornton, J.M. (1994) Buried waters and internal cavities in monomeric proteins. Protein Sci., 3, 12241235.[Web of Science][Medline]
42 Hubbard, S.J. and Argos, P. (1994) Cavities and packing at protein interfaces. Protein Sci., 3, 21942206.[Web of Science][Medline]
43 Merritt, E.A., Sarfaty, S., Pizza, M., Domenighini, M., Rappuoli, R. and Hol, W.G. (1995) Mutation of a buried residue causes loss of activity but no conformational change in the heat-labile enterotoxin of Escherichia coli. Nature Struct. Biol., 2, 269272.[Web of Science][Medline]
44 Wang, Z.X. (1998) A re-estimation for the total numbers of protein folds and superfamilies. Protein Eng., 11, 621626.
45 Zhang, C. and DeLisi, C. (1998) Estimating the number of protein folds. J. Mol. Biol., 284, 13011305.[Web of Science][Medline]
46 Govindarajan, S., Recabarren, R. and Goldstein, R.A. (1999) Estimating the total number of protein folds. Proteins, 35, 408414.[Web of Science][Medline]
47 Rudenko, G., Bonten, E., dAzzo, A. and Hol, W.G. (1995) Three-dimensional structure of the human protective protein: structure of the precursor form suggests a complex activation mechanism. Structure, 3, 12491259.
48 Rudenko, G., Bonten, E., Hol, W.G. and dAzzo, A. (1998) The atomic model of the human protective protein/cathepsin A suggests a structural basis for galactosialidosis. Proc. Natl Acad. Sci. USA, 95, 621625.
49 Musil, D., Zucic, D., Turk, D., Engh, R.A., Mayr, I., Huber, R., Popovic, T., Turk, V., Towatari, T., Katunuma, N. et al. (1991) The refined 2.15 Å X-ray crystal structure of human liver cathepsin B: the structural basis for its specificity. EMBO J., 10, 23212330.[Web of Science][Medline]
50 Turk, D., Podobnik, M., Popovic, T., Katunuma, N., Bode, W., Huber, R. and Turk, V. (1995) Crystal structure of cathepsin B inhibited with CA030 at 2.0 Å resolution: a basis for the design of specific epoxysuccinyl inhibitors. Biochemistry, 34, 47914797.[Medline]
51 Podobnik, M., Kuhelj, R., Turk, V. and Turk, D. (1997) Crystal structure of the wild-type human procathepsin B at 2.5 Å resolution reveals the native active site of a papain-like cysteine protease zymogen. J. Mol. Biol., 271, 774788.[Web of Science][Medline]
52 Hof, P., Mayr, I., Huber, R., Korzus, E., Potempa, J., Travis, J., Powers, J.C. and Bode, W. (1996) The 1.8 Å crystal structure of human cathepsin G in complex with Suc-Val-Pro-PheP-(OPh)2: a Janus-faced proteinase with two opposite specificities. EMBO J., 15, 54815491.[Web of Science][Medline]
53 Guncar, G., Podobnik, M., Pungercar, J., Strukelj, B., Turk, V. and Turk, D. (1998) Crystal structure of porcine cathepsin H determined at 2.1 Å resolution: location of the mini-chain C-terminal carboxyl group defines cathepsin H aminopeptidase function. Structure, 6, 5161.[Medline]
54 Coulombe, R., Grochulski, P., Sivaraman, J., Menard, R., Mort, J.S. and Cygler, M. (1996) Structure of human procathepsin L reveals the molecular basis of inhibition by the prosegment. EMBO J., 15, 54925503.[Web of Science][Medline]
55 Oinonen, C., Tikkanen, R., Rouvinen, J. and Peltonen, L. (1995) Three-dimensional structure of human lysosomal aspartylglucosaminidase. Nature Struct. Biol., 2, 11021108.[Web of Science][Medline]
56 Lukatela, G., Krauss, N., Theis, K., Selmer, T., Gieselmann, V., von Figura, K. and Saenger, W. (1998) Crystal structure of human arylsulfatase A: the aldehyde function and the metal ion at the active site suggest a novel mechanism for sulfate ester hydrolysis. Biochemistry, 37, 36543664.[Medline]
57 Vervoort, R., Gitzelmann, R., Bosshard, N., Maire, I., Liebaers, I. and Lissens, W. (1998) Low ß-glucuronidase enzyme activity and mutations in the human ß-glucuronidase gene in mild mucopolysaccharidosis type VII, pseudodeficiency and a heterozygote. Hum. Genet., 102, 6978.[Web of Science][Medline]
58 Vervoort, R., Buist, N.R., Kleijer, W.J., Wevers, R., Fryns, J.P., Liebaers, I. and Lissens, W. (1997) Molecular analysis of the ß-glucuronidase gene: novel mutations in mucopolysaccharidosis type VII and heterogeneity of the polyadenylation region. Hum. Genet., 99, 462468.[Web of Science][Medline]
59 Vervoort, R., Islam, M.R., Sly, W.S., Zabot, M.T., Kleijer, W.J., Chabas, A., Fensom, A., Young, E.P., Liebaers, I. and Lissens, W. (1996) Molecular analysis of patients with ß-glucuronidase deficiency presenting as hydrops fetalis or as early mucopolysaccharidosis VII. Am. J. Hum. Genet., 58, 457471.[Web of Science][Medline]
60 Yamada, S., Tomatsu, S., Sly, W.S., Islam, R., Wenger, D.A., Fukuda, S., Sukegawa, K. and Orii, T. (1995) Four novel mutations in mucopoly- saccharidosis type VII including a unique base substitution in exon 10 of the ß-glucuronidase gene that creates a novel 5'-splice site. Hum. Mol. Genet., 4, 651655.
61 Wu, B.M., Tomatsu, S., Fukuda, S., Sukegawa, K., Orii, T. and Sly, W.S. (1994) Overexpression rescues the mutant phenotype of L176F mutation causing ß-glucuronidase deficiency mucopolysaccharidosis in two Mennonite siblings. J. Biol. Chem., 269, 2368123688.
62 Wu, B.M. and Sly, W.S. (1993) Mutational studies in a patient with the hydrops fetalis form of mucopolysaccharidosis type VII. Hum. Mutat., 2, 446457.[Web of Science][Medline]
63 Tomatsu, S., Fukuda, S., Sukegawa, K., Ikedo, Y., Yamada, S., Yamada, Y., Sasaki, T., Okamoto, H., Kuwahara, T., Yamaguchi, S. et al. (1991) Muco- polysaccharidosis type VII: characterization of mutations and molecular heterogeneity. Am. J. Hum. Genet., 48, 8996.[Web of Science][Medline]
64 Islam, M.R., Vervoort, R., Lissens, W., Hoo, J.J., Valentino, L.A. and Sly, W.S. (1996) ß-glucuronidase P408S, P415l mutations: evidence that both mutations combine to produce an MPS VII allele in certain Mexican patients. Hum. Genet., 98, 281284.[Web of Science][Medline]
65 Shipley, J.M., Klinkenberg, M., Wu, B.M., Bachinsky, D.R., Grubb, J.H. and Sly, W.S. (1993) Mutational analysis of a patient with mucopolysaccharidosis type VII, and identification of pseudogenes. Am. J. Hum. Genet., 52, 517526.[Web of Science][Medline]
66 Barrett, T., Suresh, C.G., Tolley, S.P., Dodson, E.J. and Hughes, M.A. (1995) The crystal structure of a cyanogenic ß-glucosidase from white clover, a family 1 glycosyl hydrolase. Structure, 3, 951960.[Medline]
67 Woodcock, S., Mornon, J.P. and Henrissat, B. (1992) Detection of secondary structure elements in proteins by hydrophobic cluster analysis. Protein Eng., 5, 629635.
68 Kraulis, P.J. (1991) MOLSCRIPT: a program to produce both detailed and schematic plots of protein structures. J. Appl. Crystallogr., 24, 946950.
69 Connolly, M.L. (1993) The molecular surface package. J. Mol. Graph., 11, 139141.[Web of Science][Medline]
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
J. Matsuda, O. Suzuki, A. Oshima, Y. Yamamoto, A. Noguchi, K. Takimoto, M. Itoh, Y. Matsuzaki, Y. Yasuda, S. Ogawa, et al. Chemical chaperone therapy for brain pathology in GM1-gangliosidosis PNAS, December 23, 2003; 100(26): 15912 - 15917. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. R. Sawkar, W.-C. Cheng, E. Beutler, C.-H. Wong, W. E. Balch, and J. W. Kelly Chemical chaperones increase the cellular activity of N370S beta -glucosidase: A therapeutic strategy for Gaucher disease PNAS, November 26, 2002; 99(24): 15428 - 15433. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. A. Brooks, S. Fabrega, L. K. Hein, E. J. Parkinson, P. Durand, G. Yogalingam, U. Matte, R. Giugliani, A. Dasvarma, J. Eslahpazire, et al. Glycosidase active site mutations in human {alpha}-L-iduronidase Glycobiology, September 1, 2001; 11(9): 741 - 750. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Fabrega, P. Durand, P. Codogno, C. Bauvy, C. Delomenie, B. Henrissat, B. M. Martin, C. McKinney, E. I. Ginns, J.-P. Mornon, et al. Human glucocerebrosidase: heterologous expression of active site mutants in murine null cells Glycobiology, November 1, 2000; 10(11): 1217 - 1224. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||







