Human Molecular Genetics Advance Access originally published online on September 8, 2005
Human Molecular Genetics 2005 14(Review Issue 2):R225-R234; doi:10.1093/hmg/ddi330
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Synapse proteomics of multiprotein complexes: en route from genes to nervous system diseases
1Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, UK and 2School of Informatics, Edinburgh University, UK
* To whom correspondence should be addressed. Tel: +44 1223494908; Email: sg3{at}sanger.ac.uk
Received July 5, 2005; Accepted August 25, 2005
| ABSTRACT |
|---|
Proteomic experiments have produced a draft profile of the overall molecular composition of the mammalian neuronal synapse. It appears that synapses have over 1000 protein components and the mapping of their interactions, organization and functions will lead to a global view of the role of synapses in physiology and disease. A major functional subcomponent of the synaptic machinery is a multiprotein complex of glutamate receptors and adhesion proteins with associated adaptor and signalling enzymes totally 185 proteins known as the N-methyl-D-aspartate receptor complex/MAGUK associated signalling complex (NRC/MASC). Here, we review the proteomic studies and functions of NRC/MASC and specifically report on the role of its component genes in human diseases. Using a systematic literature search protocol, we identified reports of mutations or polymorphisms in 47 genes associated with 183 disorders, of which 54 were nervous system disorders. A similar number of genes are important in mouse synaptic plasticity and behaviour, where the NRC/MASC acts as a signalling complex with multiple functions provided by its individual protein components and their interactions. The individual gene mutations suggest not only an important role for the NRC/MASC in human diseases but that these diseases may be functionally connected by their common link to the NRC/MASC. The NRC/MASC is a rich source of genetic variation and provides a platform for understanding relationships of disease phenotype amenable to systematic studies such as the Genes to Cognition research consortium (www.genes2cognition.org) that links human and mouse genetics with proteomic studies.
| INTRODUCTION |
|---|
Finding the genetic basis of human nervous system diseases that are not readily explained by major single gene effects poses serious logistical and theoretical questions for clinical genetics. For example, we may ask how can many genes (perhaps dozens) contribute to a disease yet maintain a similar phenotype that is clinically identifiable? Implicit is the need for approaches that utilize sets of genes that have some functional and phenotypic link. Moreover, a guiding theme emerging from reductionist single-gene studies in basic biology is that sets of genes, rather than single genes, are responsible for physiological functions. This principle is of practical significance as many new technical approaches are well suited to identify multiple genes or proteins in parallel, such as gene expression arrays or proteomic profiling experiments (1
These methods generate lists or sets of proteins, with some common functional attributes, and these sets can be used in human genetic association studies. Although this approach of linking proteomic with human genetic studies is a new strategy, and requires prospective study, we can evaluate its potential using existing data from single gene studies. Subsequently, we describe a systematic search of the literature for human mutations in components of the multiprotein complex associated with glutamate receptors.
| SYNAPSE PROTEOME AND GLUTAMATE RECEPTOR COMPLEXES |
|---|
The list of all proteins that comprise the synapse is referred to as the synapse proteome. Classical ultrastructural studies of the synapse show the pre- and post-synaptic terminals containing synaptic vesicles and post-synaptic density (PSD), respectively. Biochemical fractionation of isolated synapses has been used to separate these visible components and further separation techniques have defined greater detail, such as neurotransmitter receptor complexes (Fig. 1A) (3
|
The glutamate neurotransmitter receptor families found at excitatory synapses are themselves components of multiprotein complexes (5
-amino-3-hydroxy-5-methylisoxazole-4-propionate (AMPA) receptor is in separate complexes. The NMDA subtype of glutamate receptor (6| FUNCTIONAL CLASSIFICATION OF SYNAPSE COMPLEX PROTEINS |
|---|
We have taken the view that before understanding the organization of the overall synapse proteome, we will need to develop tools that are useful and can be tested on smaller data sets. We have focussed on the NRC/MASC set of 185 proteins because there is a substantial volume of data indicating their physiological importance, and the remainder of this article will address this set (NRC/MASC proteins are listed in Supplementary Material, Table S1).
A cornerstone of this process is to annotate structural or functional information to each protein or gene. Genome- and proteome-wide databases provide general information such as the classification of protein family (for classification of NRC/MASC and PSD proteins, see Table 1), protein domains and other sequence-derived information. More physiological data have been obtained by examining the effect of knocking out individual genes in mice or interfering with the proteins with pharmacological tools. For example, knockout and knockin mutations that disrupt the interactions between NMDA receptors (8
) and the MAGUK scaffold protein PSD-95 (9
) demonstrated that the NRC/MASC was involved in learning and synaptic plasticity. These observations have been extended, and of 185 proteins found in the NRC/MASC complexes, mutations or drugs that interfere with the function of 43 proteins were reported to be important in synaptic plasticity and 40 have been associated with rodent behaviour (10
) (Pocklington et al., in preparation).
|
The fact that in excess of 40 of the proteins are important in a highly specific function or synaptic physiology, namely, NMDA receptor dependent synaptic plasticity, strongly supports the conclusion that the proteins in the complex work together in mediating this physiological process. It is important to note that the effects of each mutation contain subtle differences. For example, long-term potentiation (LTP) of synaptic transmission can be induced with brief trains of action potentials varying in frequency (e.g. 5, 50, 100 Hz) and some mutations disrupt all of these frequencies and others a specific frequency (11
In the original proteomic experiments of the NRC/MASC, it was recognized that several of these proteins were encoded by genes that were mutated in humans with mental retardation (3
). These data from mice suggest that more detailed scrutiny of the NRC/MASC genes may uncover further roles in human brain function and disease. Given the results from the mice, the expectation is not that all the genes would have identical phenotypes, but might have some overlapping or common functions with specific and variable aspects distinguishing the genes. In other words, it might be that some diseases would have evidence of multiple NRC/MASC genes involved and may be relevant to a model of multiple gene based diseases.
Subsequently, we describe a review of the published literature on NRC/MASC genes in human disease focussing on reports of mutations or polymorphisms. We will specifically describe the search and curation methods involved as these approaches are generally useful for similar studies where a list of genes from proteomic or microarray data is a starting point.
| NEED FOR LITERATURE MINING IN MOVING FROM PROTEOMIC DATA TO HUMAN GENETIC EXPERIMENTS |
|---|
Generating a list of genes or proteins typifies a contemporary output from experimental molecular biology and raises the difficult problem of analysing the list. The first step is to ask what is known in the literature about these proteins? Mining the literature for information already known about sets of proteins is a valuable procedure through which existing data can guide the direction of future research. Accumulating and sorting already available data in a meaningful way can reveal large-scale relationships, which would otherwise go unidentified. Our experience suggests that searching the literature for a list of molecules requires the methods of automated text mining (14
| LITERATURE SEARCH AND CURATION METHODS |
|---|
Our purpose in this search was to survey the literature on 185 genes and identify mutations that were believed to be linked to diseases or disorders in humans. We utilized a two-step approach: high-throughput text mining to identify relevant abstracts followed by manual curation of data into spreadsheets. For each gene, synonyms and name variants were generated and were batch-loaded into the search tool. Generic search terms were used for each protein in order to home in on the subjects of human diseases and mutations. The generic search terms were mutation, polymorphism, single nucleotide polymorphism (SNP), duplication, deletion, inversion, translocation, overexpression, splice, splicing, chromosome, linkage, cytogenetics and human, Homo sapiens. These gene names and generic terms were searched on PubMed's database, chosen for its size and breadth of content. When completed, this program returned a webpage which displayed a list of proteins. Clicking on each name brought up a list of abstracts (with links to the page on PubMed) for that protein, where each search term was colour highlighted. It was therefore possible for us to scroll down through the lists of abstracts; when we found one that seemed to be relevant, we could then immediately link to the correct page on PubMed and download the full text of the paper.
Our selection criteria for papers to be included in the spreadsheet were broad. Essentially, any paper demonstrating that a mutation in the gene in question was associated with a human disease would be included; papers showing a clear lack of an association were also included as having returned a null result. In the majority of cases, we obtained our information from the abstract and only in a small percentage of cases was it was necessary to study the full text. Where possible we included only original papers. A small number of reviews have also been included, where it was not possible to locate the original publications on PubMed. The results were accumulated into a master spreadsheet (Supplementary Material, Table S2), which due to its size (seven columns, 506 rows) was compiled to a simplified version (Table 2). This simplification involved (i) removing genes for which there were no reported mutations, (ii) removing mutations which had not been shown to be associated with a disorder, (iii) where more than one report had been included showing that a gene was associated with a particular disease, these reports were amalgamated into one. Further details are provided in Table 2.
|
We found that text mining was highly effective for obtaining a large volume of relevant papers as it is designed to be all encompassing. As a result, of the lists of abstracts returned, only a small percentage was relevant, and this made the task of studying them time-consuming. However, any attempt to narrow the search would be likely to overlook important papers. The major difficulty inherent to this method is that of the protein names. Some protein names contain a large number of generic words, e.g. a search for guanine nucleotide-binding protein will select abstracts that contain the word protein even if they have no other relevance. Consequently, we are likely to have missed important results on proteins whose names contain generic words. Whenever possible, we downloaded the full text of the paper, typically in PDF format. In some cases, papers had not been archived on-line. In others, we did not have access to the journal in question; this was particularly problematic when papers had been published in specialist or foreign journals. Of the 395 abstracts we examined on-line, we were able to download 243 papers (62%). More complete access to journals, perhaps with new public access policies will increase this percentage in future. The complete list of references is presented in Supplementary Material, Table S3.
| HUMAN MUTATIONS AND DISEASES IN NRC GENES |
|---|
Of the 185 proteins in the NRC/MASC, our search returned abstracts for 135 of them (73%). Of these 135, there were 47 (25%) in which a mutation had been identified in humans. In total, we found 395 reports of mutations. Although some of these reports duplicate one another, it is nevertheless clear that many of these 47 genes exhibit a range of different mutations. However, this should not be taken as strong evidence that these genes are more prone to mutations than the others; it is equally plausible that the other genes have not been studied as intensely, a possibility supported by our search's failure to identify any abstracts for 50 (27%) of the MASC proteins. The nature of the mutation was typically taken down verbatim from the text of the abstract, with little or no attempt to re-classify. Consequently, the range of different mutation types described is wide, and a full key has been provided in the table legends. It is very likely that a more in-depth examination of the genetic information would enable a simplification; for example, many nonsense mutations are likely to be SNPs or insertions.
We constructed a list of genes reported to exhibit a pathogenic mutation. In tandem, a list was assembled of genes reported to exhibit a non-pathogenic mutation (for instance, several SNPs in NR1 were suggested to be associated with schizophrenia, but an association was not found and thus it was not considered pathogenic). These lists are shown in Table 3. In total, 40 genes were reported to exhibit a pathogenic mutation, whereas 27 exhibited non-pathogenic mutations; 20 genes showed both. Of the genes with non-pathogenic mutations, 74% also exhibit pathogenic mutations. This result indicates that we should be cautious about any suggestion that the 40 genes exhibiting pathogenic mutations are more critical than the other 145 members of the NRC/MASC. It is very probable that our finding of a small group of proteins within the NRC/MASC exhibiting these mutations reflects previous research emphases rather than a true property of the NRC/MASC.
|
To examine the diversity of diseases involving NRC/MASC genes, we have tabulated the disorders (Supplementary Material, Table S4). One hundred and eighty-three disorders were reported, of which 54 were classed as nervous system disorders (30%); these are listed in Table 4. The remainder are highly varied, affecting a wide range of physiological systems and anatomical regions. A significant proportion (37%) of the disorders were tumours and cancers. Any human disorder was eligible for inclusion and if multiple papers showed a link between a protein and the same disease, we included all of them, even if some of them were apparently considering the same mutation. We included papers which claimed that a particular gene/mutation was definitely not linked to a disease; an extra column was introduced to state whether the mutation was thought to be involved in the disease. This was a crucial decision, as in some cases, a variety of studies had been carried out into the possibility of a particular gene being involved in a particular disorder, with contradictory results; had we only included the studies which claimed a link, the spreadsheet would have presented a misleading picture of the literature. We also included papers which looked for a mutation in a gene that could be involved in a disorder but could not find one, for similar reasons. Supplementary Material, Table S2 lists all the disorders included in the spreadsheet, and further details on the curation process are included in the legend.
|
| OVERVIEW OF NRC IN HUMAN DISEASE |
|---|
Proteomic studies show that the NRC/MASC complex has 185 proteins and many of these are important in the physiology of learning and memory and other forms of plasticity in rodents. Here, we describe the systematic text mining and curation of a set of NRC/MASC genes encoding proteins found in synaptic signalling complexes in mammalian nervous system. Mutations were reported in 47 genes and associated with 183 disorders including 54 affecting the nervous system. Proteomic data from functionally important entities in nerve cells, such as other complexes, can also be mined in a similar manner and provide the basis for linking physiology and disease with proteomics and genetics.
The finding that over one-third of genes encoding the NRC/MASC are important in human disease is a figure that may be more likely an underestimate for several reasons. First, the rate of discovery of mutations and their associations has not reached a plateau or decreased (data not shown). Secondly, although 47 genes implicated in humans appear high, data from rodent studies show interference with 43 genes by mutation or drugs impair synaptic plasticity (10
). Moreover, there are many genes that have not been tested using the mouse knockouts, and it seems very likely that the number of mutations with phenotypes will increase. In contrast, the over-reporting of associations which do not hold up to replication may reduce this number. Systematic testing of these genes in multiple clinical centres and cohorts will be required to refine these figures. The Genes to Cognition program (www.genes2cognition.org) aims to facilitate these activities by providing information and tools to collaborators and a central repository of data from human, mouse and other studies on these NRC/MASC proteins.
The wide range of medical disorders involving NRC/MASC genes raises several interesting issues. First, the fact that 129 of 183 disorders are not primarily classified as nervous system disorders could be most easily explained by knowledge from gene expression studies that at least 40% of NRC/MASC genes are expressed in non-neural cells (data not shown). Secondly, the disorders vary in aspects of their cellular pathology; for example, some genes are involved in cancers and others in degenerative disorders and this may be because of common signalling pathways. Thirdly, the complexes contain proteins that are responsible for regulating multiple cell biological processes such as receptor trafficking, nuclear signalling and cytoskeletal rearrangement (3
,15
). Together, this provides an explanation for the pleiotropic role of mutations affecting the NRC/MASC.
The NRC/MASC appears to be involved with both psychiatric and neurological conditions (Supplementary Material, Table S4). A considerable number of these disorders have cognitive components (autism, schizophrenia, mental retardation) consistent with mouse genetic studies showing specific impairments in cognitive function. It is also clear that not all mouse mutations in the NRC/MASC produce similar cognitive impairments. For example, Dlg4 (PSD-95) and Dlg3 (SAP-102) are homologues in the MAGUK family and bind directly to NMDA receptor subunits, yet mouse knockouts for these two genes have clearly distinct phenotypes in assays of working memory (unpublished data). In addition, several knockouts in NRC proteins, including NR2B and SynGAP, are perinatal lethals (11
), whereas NR2A (16
), PSD-95 (9
) and SAP102 are viable. Again this illustrates similar general patterns of pleiotropic function in mouse and humans. The NRC/MASC set will likely be a rich set of genes to investigate for human studies in the future.
Systematic studies of NRC/MASC genes, integrating mouse and human genetics, are underway. The G2C programme was recently established in the UK to bring together an integrated research program linking basic and clinical neuroscience around the study of the NRC (www.genes2cognition.org). In addition to systematically analysing human mutations in disease cohorts, and creating and characterizing mouse mutants, tools are under development to understand the diversity of molecules and their roles in different phenotypes. Large-scale integrative approaches (human, mouse, physiology, behaviour, etc.) using scalable methods for analysing hundreds and thousands of genes will be essential for dissecting the subtlety of functions and their mapping onto human diseases. These strategies are not only essential for studying the basic biology of disease, but will provide new approaches to identification of drug targets, which will be other molecules in the complexes and networks. This will be important as it will increase the number of druggable targets from simply the mutant genes directly involved with the aetiology of disease.
| SUPPLEMENTARY MATERIAL |
|---|
Supplementary Material is available at HMG Online.
| ACKNOWLEDGEMENTS |
|---|
We thank Jane Turner for administrative support and Peter Visscher for comments and Mark Collins for Figure 1. S.G.N.G., M.C.M., K.-L.P., M.A.C. and J.D.A. were supported by the Genes to Cognition project funded by the Wellcome Trust. See www.genes2cognition.org for details of authors' contributions.
Conflict of Interest statement: No authors have declared any conflict of interest.
| REFERENCES |
|---|
- Choudhary, J. and Grant, S.G. (2004) Proteomics in postgenomic neuroscience: the end of the beginning. Nat. Neurosci., 7, 440445.[CrossRef][ISI][Medline]
- Dougherty, J.D. and Geschwind, D.H. (2005) Progress in realizing the promise of microarrays in systems neurobiology. Neuron, 45, 183185.[CrossRef][ISI][Medline]
- Husi, H., Ward, M.A., Choudhary, J.S., Blackstock, W.P. and Grant, S.G. (2000) Proteomic analysis of NMDA receptor-adhesion protein signaling complexes. Nat. Neurosci., 3, 661669.[CrossRef][ISI][Medline]
- Collins, M.O., Husi, H., Yu, L., Brandon, J.M., Anderson, C.N.G., Blackstock, W.P., Choudhary, J.S. and Grant, S.G.N. (2005) Molecular characterization and comparison of the components and multi-protein complexes in the postsynaptic proteome. J. Neurochem., in press, 2005.
- Kim, E. and Sheng, M. (2004) PDZ domain proteins of synapses. Nat. Rev. Neurosci., 5, 771781.[CrossRef][ISI][Medline]
- Mayer, M.L. and Armstrong, N. (2004) Structure and function of glutamate receptor ion channels. Annu. Rev. Physiol., 66, 161181.[CrossRef][ISI][Medline]
- Farr, C.D., Gafken, P.R., Norbeck, A.D., Doneanu, C.E., Stapels, M.D., Barofsky, D.F., Minami, M. and Saugstad, J.A. (2004) Proteomic analysis of native metabotropic glutamate receptor 5 protein complexes reveals novel molecular constituents. J. Neurochem., 91, 438450.[CrossRef][ISI][Medline]
- Sprengel, R., Suchanek, B., Amico, C., Brusa, R., Burnashev, N., Rozov, A., Hvalby, Ø., Jenson, V., Paulsen, O., Andersen, P., Kim, J.J. et al. (1998) Importance of the intracellular domain of NR2 subunits for NMDA receptor function in vivo. Cell, 92, 279289.[CrossRef][ISI][Medline]
- Migaud, M., Charlesworth, P. Dempster, M., Webster, L.C., Watabe, A.M., Makhinson, M., He, Y., Ramsay, M.F., Morris, R.G.M., Morrison, J.H., O'Dell, T.J. and Grant, S.G.N. (1998) Enhanced long-term potentiation and impaired learning in mice with mutant postsynaptic density-95 protein. Nature, 396, 433439.[CrossRef][Medline]
- Grant, S.G.N., Husi, H., Choudhary, J., Cumiskey, M., Blackstock, W. and Armstrong, J.D. The organisation and integrative function of the postsynaptic proteome. (2003) In Hensch, T.K. and Fagiolini, M. (eds), Excitatory-Inhibitory Balance: Synapses, Circuits, Systems. Kluwer Academic Publishers/Plenum Publishers, New York, pp. 1344.
-
Komiyama, N.H., Watabe, A.M., Carlisle, H.J., Porter, K., Charlesworth, P., Monti, J., Strathdee, D.J.C., O'Carroll, C.M., Martin, S.J., Morris, R.G.M., O'Dell, T.J. and Grant, S.G.N. (2002) SynGAP regulates ERK/MAPK signaling, synaptic plasticity, and learning in the complex with postsynaptic density 95 and NMDA receptor. J. Neurosci., 22, 97219732.
[Abstract/Free Full Text] - Porter, K., Komiyama, N.H., Vitalis, T., Kind, P.C. and Grant, S.G.N. (2005) Differential expression of two NMDA receptor interacting proteins, PSD-95 and SynGAP, during mouse development. Eur. J. Neurosci., 21, 351362.[CrossRef][Medline]
- Garry, E.M., Moss, A., Delaney, A., O'Neill, F., Blakemore, J., Bowen, J., Husi, H., Mitchell R., Grant, S.G.N. and Fleetwood-Walker, S.M. (2003) Neuropathic sensitization of behavioral reflexes and spinal NMDA receptor/CaM kinase II interactions are disrupted in PSD-95 mutant mice. Curr. Biol. 13, 321328.[CrossRef][ISI][Medline]
- Hoffmann, R., Krallinger, M., Andres, E., Tamames, J., Blaschke, C. and Valencia, A. (2005) Text mining for metabolic pathways, signaling cascades, and protein networks. Sci. STKE, 2005, pe21.
- Husi, H. and Grant, S.G. (2001) Proteomics of the nervous system. Trends Neurosci., 24, 259266.[CrossRef][Medline]
-
Sakimura, K., Kutsuwada, T., Ito, I., Manabe, T., Takayama, C., Kushiya, E., Yagi, T., Aizawa, S., Inoue, Y., Sugiyama, H. and Mishina, M. (1995) Reduced hippocampal LTP and spatial learning in mice lacking NMDA receptor
1 subunit. Nature, 373, 151155.[CrossRef][Medline]
This article has been cited by other articles:
![]() |
S. L. Eastwood, L. Lyon, L. George, A. Andrieux, D. Job, and P. J. Harrison Altered expression of synaptic protein mRNAs in STOP (MAP6) mutant mice J Psychopharmacol, August 1, 2007; 21(6): 635 - 644. [Abstract] [PDF] |
||||
![]() |
P. C. Cuthbert, L. E. Stanford, M. P. Coba, J. A. Ainge, A. E. Fink, P. Opazo, J. Y. Delgado, N. H. Komiyama, T. J. O'Dell, and S. G. N. Grant Synapse-Associated Protein 102/dlgh3 Couples the NMDA Receptor to Specific Plasticity Pathways and Learning Strategies J. Neurosci., March 7, 2007; 27(10): 2673 - 2682. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


