Skip Navigation

Human Molecular Genetics 2005 14(Review Issue 1):R121-R132; doi:10.1093/hmg/ddi101
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Mattick, J. S.
Right arrow Articles by Makunin, I. V.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Mattick, J. S.
Right arrow Articles by Makunin, I. V.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oupjournals.org

Small regulatory RNAs in mammals

John S. Mattick* and Igor V. Makunin

ARC Special Research Centre for Functional and Applied Genomics, Institute for Molecular Bioscience,University of Queensland, Brisbane QLD 4072, Queensland, Australia

* To whom correspondence should be addressed. Tel: +61 733462110; Fax: +61 73346 2111; Email: j.mattick{at}imb.uq.edu.au

Received January 3, 2005; Accepted February 23, 2005


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 SMALL NUCLEOLAR RNAs
 miRNAs AND siRNAs
 OTHER PLAYERS
 PARALLEL OUTPUTS AND REGULATORY...
 THE TIP OF THE...
 CONCLUSIONS
 REFERENCES
 
Mammalian cells harbor numerous small non-protein-coding RNAs, including small nucleolar RNAs (snoRNAs), microRNAs (miRNAs), short interfering RNAs (siRNAs) and small double-stranded RNAs, which regulate gene expression at many levels including chromatin architecture, RNA editing, RNA stability, translation, and quite possibly transcription and splicing. These RNAs are processed by multistep pathways from the introns and exons of longer primary transcripts, including protein-coding transcripts. Most show distinctive temporal- and tissue-specific expression patterns in different tissues, including embryonal stem cells and the brain, and some are imprinted. Small RNAs control a wide range of developmental and physiological pathways in animals, including hematopoietic differentiation, adipocyte differentiation and insulin secretion in mammals, and have been shown to be perturbed in cancer and other diseases. The extent of transcription of non-coding sequences and the abundance of small RNAs suggests the existence of an extensive regulatory network on the basis of RNA signaling which may underpin the development and much of the phenotypic variation in mammals and other complex organisms and which may have different genetic signatures from sequences encoding proteins.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 SMALL NUCLEOLAR RNAs
 miRNAs AND siRNAs
 OTHER PLAYERS
 PARALLEL OUTPUTS AND REGULATORY...
 THE TIP OF THE...
 CONCLUSIONS
 REFERENCES
 
Although only 1.2% of the human genome encodes protein, a large fraction of it is transcribed. Indeed, ~98% of the transcriptional output in humans and other mammals consists of non-protein-coding RNAs (ncRNA) from the introns of protein-coding genes and the exons and introns of non-protein-coding genes (1Go,2Go), including many that are anti-sense to or overlapping protein-coding genes (3Go–5Go).

Until recently, the non-coding RNA fraction was considered mainly useless with the exception of the common infrastructural RNAs involved in protein synthesis, transport and splicing. Introns have long been regarded as evolutionary debris with intronic RNA assumed to be simply degraded after splicing excision, and the increasing number of non-protein-coding transcripts being detected in mammalian cells has been suggested, at least by some, to be largely ‘transcriptional noise’(6Go). However, a significant proportion of ncRNAs appears to be stable in eukaryotic cells. For example, some excised introns have half-lives comparable with mRNA and are even exported from the nucleus to the cytoplasm (7Go,8Go). Whole chromosome tiling chip arrays have shown that the range of detectable ncRNAs in human cells is much greater than can be accounted for by mRNAs (9Go) and that there appear to be roughly equal numbers of protein-coding and non-coding transcripts regulated by common transcription factors in the human genome (4Go,10Go). Similar data have been reported in Drosophila (11Go).

All intensively studied gene loci, including those that are imprinted and conventional loci such as beta-globin have been shown to contain a majority of non-coding transcripts (12Go–15Go). The number of known functional ncRNA genes has risen dramatically in recent years and over 800 ncRNAs [excluding, transfer RNAs (tRNAs), ribosomal RNAs (rRNAs) and small nuclear RNAs (snRNAs)] have been catalogued in mammals, at least some of which are alternatively spliced (16Go,17Go), along with almost 20 000 putative ncRNA transcripts identified in cDNA libraries (16Go,18Go). ncRNAs have been implicated in diseases including various cancers and neurological diseases (2Go,16Go), and at least some are processed into smaller functional molecules (19Go,20Go).

Apart from tRNAs and spliceosomal snRNAs, which are housekeeping RNAs involved in mRNA splicing and translation, there are several functionally and structurally distinct classes of short RNAs in eukaryotic cells. In most, if not all, cases, their function is based on recognition of RNA or DNA target sequences by specific base pairing, analogous to digital signaling (21Go). Because of this feature, even short RNAs contain sufficient information to specify individual targets in the genome and the transcriptome, in a much more compact and energy-efficient manner than proteins, which may have been a necessary adaptation to address the accelerating regulatory requirements of more complex organisms (21Go,22Go) and have been crucial to their evolution and development (23Go,24Go). These small RNAs and their role in mammalian cell and developmental biology are the subjects of the current review.


    SMALL NUCLEOLAR RNAS
 TOP
 ABSTRACT
 INTRODUCTION
 SMALL NUCLEOLAR RNAs
 miRNAs AND siRNAs
 OTHER PLAYERS
 PARALLEL OUTPUTS AND REGULATORY...
 THE TIP OF THE...
 CONCLUSIONS
 REFERENCES
 
Small nucleolar RNAs (snoRNAs) guide the site-specific modification of nucleotides in target RNAs. Two types of modification occur, 2'-O-ribose methylation and pseudouridylation, directed by two large families of snoRNAs termed box C/D and box H/ACA snoRNAs, respectively (reviewed in 25Go,26Go). SnoRNAs recognize target sequences by formation of a canonical guide RNA duplex and recruit associated proteins to perform the corresponding modification at the target site. Generally, snoRNAs range between 60 and 300 nucleotides in length, but only short sequences participate in target recognition via antisense interactions. Initially, it was thought that the role of snoRNAs was restricted to rRNA modification in ribosome biogenesis, but it is now evident that they can target other RNAs including snRNAs and possibly mRNAs (25Go,27Go). Interestingly, a pseudouridine synthase and pseudouridinylation of the steroid receptor RNA activator SRA have recently been shown to be involved in retinoic acid and other nuclear receptor-dependent transactivation (28Go).

Over 300 different snoRNAs are known in humans and almost 200 in mouse (see databases at http://noncode.bioinfo.org.cn and http://www.sanger.ac.uk/Software/Rfam) (17Go,29Go), many of which occur in polycistronic clusters and at least some of which show tissue-specific expression, suggesting that they are specifically regulated and in turn have specific regulatory roles in the differential modification of selected target RNAs in different tissues including brain. Interestingly, a number of brain-specific snoRNAs come from imprinted regions and some of them represent orphan snoRNAs with unknown targets (30Go,31Go). One of these snoRNAs has an 18 nucleotide phylogenetically conserved sequence complementary to a critical alternative splice site and adenosine-to-inosine (A–I) RNA editing site in the serotonin 2C receptor mRNA whose gene also encodes another snoRNA that itself has an unknown target (30Go). Moreover, at least some brain-specific snoRNAs appear to have evolved recently and to be restricted in their phylogenetic distribution (32Go), suggesting their importance in the epigenetic control of behavior.

Mammalian snoRNAs are derived from introns of pre-mRNA transcripts, and are produced by processing of the excised intron (debranched lariat) as well as by endonucleolytic cleavage of unspliced primary transcripts, by a complex pathway involving endonucleases, exonucleases and helicases (25Go,26Go). In many cases, the snoRNA-containing introns occur within protein-coding transcripts, such as those encoding ribosomal proteins and others involved in ribosomal biogenesis/nucleolar function, a clear example of a parallel genetic output. However, in many other cases, snoRNAs are derived from the introns of transcripts that do not have any protein-coding capacity (25Go–27Go,33Go), the function of whose exons (if any) is unknown.

No mutations with phenotypic consequences have been recorded in snoRNA sequences, which suggest that they are lethal, functionally redundant or (most likely) cause subtle effects, for example on growth or brain function. It has been reported that certain snoRNAs, present as multicopy genes (designated as HBII-52 and HBII-85 snoRNAs, respectively), are absent from the cortex of a patient with Prader–Willi Syndrome (PWS) and from a PWS mouse model, demonstrating their paternal imprinting and pointing to their potential role in the etiology of PWS (30Go). However, a recent study of a genomic deletion of the HBII-52 snoRNA gene cluster in humans indicates that these snoRNAs do not play a major role in PWS on their own, or do so only in connection with the HBII-85 snoRNA cluster (34Go).

In relation to RNA regulation, it should be noted that another form of RNA editing, A–I conversion, catalyzed by adenosine deaminases that act on RNA (ADARs), is also common in the brain, and aberrant editing has been associated with certain cancers and a range of abnormal behaviors including epilepsy and depression (35Go). In humans, A–I editing has recently been shown to be much more widespread than was previously thought and to occur primarily in Alu elements (which are primate-specific) in non-coding RNA sequences in both protein-coding and non-coding transcripts (36Go–38Go). A–I editing also appears to modulate RNA interference (RNAi) (described subsequently) by altering the sequence of introduced (39Go,40Go) and naturally occurring (41Go) double-stranded RNAs.


    MIRNAS AND SIRNAS
 TOP
 ABSTRACT
 INTRODUCTION
 SMALL NUCLEOLAR RNAs
 miRNAs AND siRNAs
 OTHER PLAYERS
 PARALLEL OUTPUTS AND REGULATORY...
 THE TIP OF THE...
 CONCLUSIONS
 REFERENCES
 
The other broad class of small regulatory RNAs in mammals (and indeed in animals and plants generally) are the tiny, generally 21–25 nt long, molecules named microRNAs (miRNAs) and short interfering RNAs (siRNA), which have been the subject of intense recent interest (42Go–48Go). miRNAs come from endogenous short hairpin precursor structures (described subsequently) and usually target other loci with similar but not identical sequences for translational repression. siRNAs are produced from longer double-stranded (bimolecular) RNAs or long hairpins, often of exogenous origin, and usually target homologous sequences at the same locus or elsewhere in the genome for destruction (gene silencing) (43Go–45Go), the phenomenon termed RNAi (49Go). However, the distinction between miRNAs and siRNAs is becoming blurred as both are produced by similar pathways and have similar mechanisms of action (43Go–45Go). Recent observations have shown that both miRNAs and siRNAs can suppress translation of mRNAs (in the case of an imperfect match) and can cleave target RNAs (in the case of a perfect match) (44Go,50Go,51Go). miRNAs have been shown to be involved in a variety of developmental processes in animals and plants (43Go,46Go) (described subsequently). In contrast, siRNAs were originally proposed to act mainly as an antiviral defense and transposon repression system via the phenomenon of RNAi (52Go), but recent findings indicate that such RNAs may play a much broader role in gene and genome regulation (described subsequently).

A database of known and predicted endogenous miRNAs in various animal and plant species is available at http://www.sanger.ac.uk/Software/Rfam/mirna (29Go,53Go). The database (release 6.1) currently lists 222 human and 224 mouse miRNAs (as well as 186 rat miRNAs), many of which are orthologous. Roughly half are conserved in fishes and a quarter are conserved in invertebrates (54Go), which presumably have evolutionarily conserved functions in vertebrate and metazoan development. However, new human and rodent miRNAs are constantly being identified (55Go). Very recently, 976 candidate miRNAs were identified by scanning whole-genome human/mouse and human/rat alignments, most of which are also conserved in other vertebrates, and around 20% of which have experimental support (56Go).

Two databases of siRNAs directed against human genes have also recently been published (see http://www.human-siRNA-database.net, and http://siRNA.cgb.ki.se) (57Go,58Go). These databases contain several hundred siRNAs that have been experimentally verified to be active against human genes (57Go,58Go) and thousands of siRNA sequences designed computationally to be active against the RefSeq curated human gene set (58Go).

Biogenesis of miRNAs and siRNAs
miRNAs are processed by the RNAi machinery in a two-step cleavage process (59Go) from longer primary transcripts that have been termed ‘pri-miRNAs’ (60Go) but that in reality appear to be conventional pre-mRNAs and ncRNAs (15Go,20Go,55Go,61Go–66Go), including antisense transcripts (47Go,55Go), at least some of which are polycistronic (15Go,60Go,61Go,66Go–68Go). miRNA-containing primary transcripts have been shown to synthesized by RNA polymerase II and to be polyadenylated and capped (62Go,65Go). Many human and mouse miRNAs are derived from the introns of protein-coding genes and the remainder from the introns and the exons of mRNA-like ncRNA genes (63Go,66Go).

These transcripts, presumably after splicing (64Go), are processed by the RNase III endonuclease Drosha (69Go). Drosha cleaves RNA hairpins that contain a large (≥10 nt) terminal loop approximately two helical turns into the stem, to excise 65–75 nt precursors called ‘pre-miRNAs’ (69Go,70Go). Drosha appears to occur in two complexes in the nucleus. The larger of these complexes includes a variety of RNA-associated proteins including RNA helicases, proteins that bind double-stranded RNA, novel heterogeneous nuclear ribonucleoproteins and the Ewing's sarcoma family of proteins (71Go), whereas the smaller complex is composed of Drosha and the double-stranded-RNA-binding protein, DGCR8 (also called Pasha), the product of a gene deleted in DiGeorge syndrome (71Go–74Go). The pre-miRNAs are then exported from the nucleus by Exportin 5 (75Go,76Go) and processed by the cytoplasmic RNase III endonuclease Dicer (77Go) into ~22 bp (imperfect) duplexes with a 2 nt overhang at their 3' ends (43Go,46Go,59Go).

siRNAs are also processed by Dicer from double-stranded RNA precursors but do not require Drosha (reviewed in 43Go,46Go). These precursors may be produced endogenously, for example, from sense–antisense transcripts. They may also be supplied exogenously, as occurred in the initial discovery of RNAi, whereby such RNAs can act catalytically to destroy endogenous RNAs with matching sequence (78Go), now a widespread tool for probing gene function by siRNA-induced target knockdown (57Go,58Go).

The Dicer-processed short duplex RNAs are incorporated into the RISC ribonucleoprotein complex, which contains a member of the Argonaute family. There are many Argonaute homologs in animals, plants and fungi, implying that there may be many forms of such complexes that may recognize different RNA substrates (reviewed in 79Go). Recent evidence suggests that different RISC complexes containing different Argonaute proteins may be involved in miRNA- and siRNA-mediated RNAi (80Go), although there are conflicting reports (81Go). Argonaute proteins intersect with the Wingless/Wnt and Hedgehog pathways that control cell fate and developmental patterning (82Go,83Go). Mutations in Argonaute family members affect a variety of developmental processes including germ cell development and stem cell fate, as well as being implicated in various human cancers and developmental abnormalities (79Go). The fragile X mental retardation protein (FMRP), an RNA-binding protein which associates with hundreds of mRNAs in neurons via a G-quartet structure and/or U-rich sequences (84Go–88Go), is also associated with the RISC complex, as well as with Dicer itself (89Go–92Go), and is involved with the control of behavior via a process involving Argonaute2 (93Go). There is a strong enrichment of predicted miRNA targets in mRNAs associated with FMRP in mammals (94Go), and it has also recently been reported that FMRP is phosphorylated by casein kinase II (95Go), hinting at the enormous complexity in these RNA processing and signaling pathways and their regulation.

In general, only one strand from the processed duplex is retained in the RISC complex, the selection of which appears to be determined by the relative stability of the two ends of the duplex, favoring the one whose 5'-end is less tightly paired (96Go,97Go). The RISC complex then forms a complex with a target RNA and either leads to its translational repression by an, as yet, unknown mechanism, but which may involve interaction with polyribosomes (98Go), or in the case of (near) perfect identity cleaves the target RNA approximately in the middle of the paired region. Recent evidence suggests that the endonuclease activity within the RISC complex, dubbed ‘slicer’, is in fact mediated by an RNase H-like domain (piwi) in Argonaute (99Go–102Go).

The roles of miRNAs and siRNAs in development and disease
Some mammalian miRNAs appear to be ubiquitously expressed, but most have been found to exhibit developmentally regulated expression patterns in a variety of cells and tissues, including brain, lung, liver, spleen, heart and skeletal muscle, using northern blots, PCR, microarray chips and sensor transgenes (15Go,46Go,68Go,103Go–112Go). Many miRNAs are specifically expressed during embryonal stem cell differentiation (68Go,105Go) and embryogenesis (112Go), as well as during brain development (104Go,108Go,113Go,114Go), neuronal differentiation (103Go,115Go) and hematopoietic lineage differentiation (106Go,111Go).

Studies in model organisms have shown that miRNAs are involved in the control of developmental timing, cell proliferation, left–right patterning, neuronal cell fate, apoptosis and fat metabolism in invertebrates (as well as in a variety of developmental processes in plants) (reviewed in 43Go,45Go–47Go,116Go), and there is every reason to expect a similar range of functions in vertebrates. Indeed the archetypal miRNAs, lin-4 and let-7, which were first discovered by genetic screens to control developmental timing in Caenorhabditis elegans, have been shown to have close homologs in other species, including mammals (117Go–119Go), as do many other miRNAs (54Go,67Go,118Go–120Go). The target of lin-4 and let-7, lin-28, is also conserved in mammals (121Go). siRNAs targeted against let-7 cause developmental abnormalities in fish and frogs (122Go), and human let-7 paralogs are able to suppress a sensor transgene containing the human homolog of lin-28 (123Go). It has also been shown that let-7b miRNA associates together with lin-28 mRNA in polyribosomes in human cells, indicating a possible physical interaction between this miRNA and its target mRNA (98Go). Reduced expression of let-7 is observed in certain human lung cancers in association with shortened postoperative survival, and over-expression of let-7 in a lung adenocarcinoma cell line inhibited lung cancer cell growth in vitro (124Go).

Knockout of the miRNA-producing enzyme Dicer1 in mice leads to lethality early in development, with Dicer1-null embryos depleted of stem cells (125Go). These observations and the apparent inability to generate viable Dicer1-null embryonic stem cells in vitro suggest a role for Dicer, and, by implication, miRNAs, in maintaining stem cell populations during early mouse development (125Go). Dicer-defective ES cells also exhibit severe defects in differentiation in vitro as well as in centromeric silencing (126Go). Inactivation of Dicer also causes developmental arrest in zebrafish embryos (127Go).

In mammals, miRNAs have been shown to regulate B-cell differentiation (106Go), adipocyte differentiation (128Go) and insulin secretion (129Go). Chen et al. have shown that three miRNAs are differentially expressed during mouse hematopoiesis and that ectopic over-expression of one of them (miR-181) in hematopoietic stem/progenitor cells increases the fraction of B-lineage cells both in vitro and in vivo (106Go). Reduction in the level of miR-143, one of whose predicted targets is the MAP kinase BMK1/ERK5 mRNA (130Go), resulted in an increase in the level of BMK1/ERK5 and inhibited adipocyte differentiation in culture (128Go). In pancreatic endocrine cells, inhibition of miR-375 enhanced glucose-induced insulin secretion and conversely, over-expression of miR-375 suppressed insulin secretion, an effect that could be mimicked by siRNAs directed against Myotrophin, the putative target of miR-375 (129Go). The authors suggest that many of 67 miRNA sequences cloned from pancreatic cells (11 of which had not been previously identified) may regulate endocrine pancreas development (129Go).

Some miRNAs are embedded in Hox clusters and exhibit expression patterns that are reminiscent of Hox genes (51Go,112Go). At least one of these miRNAs (miR-196) has extensive, evolutionarily conserved complementarity to HoxB8, HoxC8 and HoxD8 sequences and has been shown to negatively regulate HoxB8 and other Hox genes, suggesting a miRNA-mediated mechanism for the posttranscriptional restriction of Hox gene expression during vertebrate development (51Go,112Go).

It has also been shown that some mammalian miRNAs are imprinted (15Go,61Go). The imprinting process clearly involves transactions with non-coding RNAs (131Go), although the mechanisms remain unknown. Mouse miR-127 and miR-136 are transcribed antisense to a reciprocally imprinted retrotransposon-like gene, which are expressed from maternal and paternal chromosomes, respectively. In addition, the neighboring region contains maternally expressed clusters of snoRNA and miRNAs, and it has been suggested that these miRNAs may play a role in the imprinting process, either by directing allele-specific chromatin modification or by targeting particular transcripts (15Go,61Go).

These observations all indicate that miRNAs are part of the molecular circuitry and complex regulatory networks that control cell fate during mammalian development, a conclusion supported by a variety of studies in other animals. For example, the miRNAs, miR-7 and miR-2a/b, have been shown to regulate the Notch pathway and proapoptotic genes reaper, grim and sickle in Drosophila (132Go).

Consistent with their role in developmental processes, perturbations of miRNA expression are observed in aberrant developmental states, i.e. oncogenesis (133Go), in human B-cell chronic lymphocytic leukemia (109Go,134Go), Burkitt lymphoma (20Go), colorectal cancer (135Go), lung cancer (124Go) and in a number of cancer cell lines (110Go). A high proportion of known miRNAs are located at fragile sites or in cancer-associated genomic regions (minimal regions of loss of heterozygosity, minimal regions of amplification or common breakpoint regions) (136Go). Interestingly, some miRNAs from the same genomic cluster show different expression patterns in cancer cells, indicating that the regulation of some miRNAs might occur post transcriptionally (109Go), perhaps, themselves regulated by other miRNAs as part of more complex regulatory networks (described subsequently).

It has been reported that some siRNAs can induce sequence-dependent off-target effects on proteins such as p53 and p21 that are sensitive markers of cell state (137Go), which suggest care in the interpretation of such experiments. These effects may occur either because of partial complementary sequence matches to other genes (137Go) or because perturbation of miRNA-regulated pathways can have pleiotropic effects on wider networks of gene expression.

Target prediction in mammals
Thus far, only a few miRNA targets have been identified in mammals and the rules of interaction are largely unknown. The majority of identified mammalian miRNAs has non-perfect matches to target mRNAs but the rules are complex (138Go). Because of the very short target sequences and the presence of mismatches, the bioinformatic prediction of miRNA targets, especially in the complex genomes of mammals, is a very challenging task. Most studies predict miRNA targets on the basis of an evolutionarily conserved sequence complementarity and low free energy of interaction, usually focused on 3'-UTRs of known genes (94Go,123Go,130Go,132Go), on the presumption that most miRNAs are involved in translational repression via UTRs, by extension from the well-studied examples of lin-4 and let-7.

Such approaches potentially minimize the numbers of false positives but may well seriously underestimate the actual numbers of such RNAs (described subsequently). On the basis of minimal binding energies, Kiriakidou et al. (123Go) predicted 5031 target sequences for 94 miRNAs. More than 400 targets were predicted by Lewis et al. (130Go) and 11 out 15 were confirmed experimentally. Predictions from a bigger miRNA dataset (218 known mammalian miRNAs) identified 2273 target genes with one or more target sites showing 90% sequence conservation between human, mouse and rat in aligned UTRs (94Go). The predicted target genes had diverse functions, but were enriched for genes encoding mRNAs coding for transcription factors, components of the miRNA machinery, other proteins involved in translational regulation and components of the ubiquitin protein-degradation machinery (94Go,130Go), many of which are known to play important roles in developmental regulation and some of which are involved in the molecular etiology of cancer. Very recently, sophisticated bioinformatics analyses based on the overabundance of conserved adenosines flanking the complementary sites in mRNAs have implicated more than 5300 human genes (>20% of all known or predicted human protein-coding genes) as potential miRNA targets, most of which occur in 3'-UTRs, but some of which occur in coding sequences (139Go). The fact that many miRNAs are predicted to have multiple cognate mRNAs, and vice versa, suggests that the regulatory networks in which they participate are very complex indeed.

Small RNAs regulate chromosome dynamics and chromatin structure
As described previously, miRNAs and siRNAs target mRNAs for either translational inhibition or destruction by RISC-mediated cleavage (43Go,45Go,46Go). However, there is also considerable evidence that small RNAs also regulate chromosome dynamics, chromatin modification and epigenetic memory, including imprinting, DNA methylation and transcriptional gene silencing (2Go).

The RNAi pathway and non-coding RNAs have been shown to be central to the formation of silenced chromatin and chromosomal dynamics in animals, plants, fungi and protozoa (reviewed in 140Go–142Go). In the fission yeast, Schizosaccharomyces pombe, the RNAi pathway has been shown to be involved in heterochromatin formation, as well as in centromere function in meiosis and mitosis, via the methylation of histone H3 on lysine-9 and the RITS (RNA-induced initiation of transcriptional gene silencing) complex which contains Argonaute, the Chp1 chromodomain protein (among others) and Dicer-produced small RNAs homologous to the target DNA in heterochromatic regions (143Go–148Go). Meiosis in S. pombe has also been reported to require a number of specific non-coding RNAs (149Go). Similar observations have been made in Drosophila where mutations in components of the RNAi machinery affect silencing and heterochromatin formation, accompanied by reduction in histone H3 lysine-9 methylation and delocalization of the heterochromatin proteins HP1 and HP2 (150Go). Short 25–27 nt RNAs, derived from dsRNA of Drosophila Su(Ste) repeats from the Y chromosome, suppress the Stellate gene on the X chromosome, and complementarity between the Stellate transcript and the Su(Ste) repeats is essential for silencing (151Go). Knockout of Dicer has been recently reported to affect centromeric heterochromatin formation in mouse (126Go).

These observations suggest that small RNAs may be central to chromatin regulation in all eukaryotes including mammals. Indeed, it has been shown that the localization of mammalian HP1 to heterochromatin involves its co-ordinate binding to methylated histone H3 and RNA, involving interactions in the hinge region between its chromodomains (152Go,153Go). This is consistent with previous reports that chromodomains (which are present in many different types of chromatin-binding and chromatin remodeling proteins, including the polycomb family, the histone methyltransferase and histone acetyltransferase families, the retinoblastoma binding protein 1 family, the CHD family and the SWI3 family) bind RNA as well as modified histones (154Go–156Go). RNA-interacting proteins are also components of the mammalian DNA methylation system (157Go).

Moreover, it has recently been shown that synthetic siRNAs targeted to CpG islands in the E-cadherin promoter reduced the expression of the gene and induced significant DNA methylation and histone H3 lysine-9 methylation in human cultured cells (158Go). Similar results were obtained with siRNAs directed against the erbB2/HER2 promoter (158Go) and the elongation factor 1alpha promoter (159Go), providing strong support for the notion that endogenous small RNAs may perform similar functions in vivo. However, these results remain to be confirmed, and there is a recent report that over-expression of fragments of a non-coding antisense RNA Khps1 results in demethylation of CG sites and methylation of CC(A/T)GG sites in a region of the Sphk1 gene promoter that is subjected to tissue-specific differential methylation (160Go), suggesting that RNA signaling may control epigenetic modifications in more than one way. RNA signaling may also play a key role in the control of transcription and splicing (see below), although such RNAs have yet to be identified.


    OTHER PLAYERS
 TOP
 ABSTRACT
 INTRODUCTION
 SMALL NUCLEOLAR RNAs
 miRNAs AND siRNAs
 OTHER PLAYERS
 PARALLEL OUTPUTS AND REGULATORY...
 THE TIP OF THE...
 CONCLUSIONS
 REFERENCES
 
Many other small RNAs appear to exist in animal and plant cells. Ambros and colleagues (116Go) recently described 33 small non-coding RNAs in C. elegans, which are similar in size to miRNAs and are developmentally regulated but which are not derived from hairpin precursors and are not evolutionarily conserved. They also described over 700 small RNAs that are antisense to known protein-coding sequences (compared with only 49 from sense strands), some of which are detectable as ~22 nt species in northern blots, which are potential endogenous siRNAs (116Go). Endogenous siRNAs derived from the sense and antisense strands of non-coding RNA have been described in Arabidopsis (161Go). A number of small RNAs, including new snoRNAs and 22 others ranging from 70 to 450 bp derived from intergenic and genic regions, including splice junctions, were detected and confirmed by northern blot analyses in Drosophila (162Go).

It seems that other small RNAs also exist in mammals. Only 30% of 179 small RNA sequences cloned from mouse ES cells appear to come from hairpin precursors (68Go). Approximately 20% show similarity to tRNA or rRNA, which leaves close to half that may act as small regulatory RNAs in some other capacity. Of 733 non-redundant sequences isolated from human ES cells, only 36 could have been derived from hairpin precursors (105Go). More than 50 unknown short non-coding RNAs were cloned from neural stem cells (163Go). One of these sequences is present in more than 60 copies in the mouse genome and has similarity to the NRSE/RE1 sequence, which is preferentially localized in promoter regions of neuron-specific genes. This RNA, which occurs in the nucleus as a small ~20 nt dsRNA, controls the differentiation of adult neural stem cells and activates the transcription of genes containing NRSE/RE1 sequence, apparently mediated through dsRNA–protein interactions, rather than through siRNA or miRNA (163Go).


    PARALLEL OUTPUTS AND REGULATORY NETWORKS
 TOP
 ABSTRACT
 INTRODUCTION
 SMALL NUCLEOLAR RNAs
 miRNAs AND siRNAs
 OTHER PLAYERS
 PARALLEL OUTPUTS AND REGULATORY...
 THE TIP OF THE...
 CONCLUSIONS
 REFERENCES
 
Approximately two-thirds of annotated mammalian miRNAs are encoded within known genes and (like snoRNAs) mainly occur within introns of protein-coding and non-coding genes, with some residing within exons of non-coding genes (63Go). Some of these transcripts may be very long and contain multiple miRNAs (61Go,67Go). The situation appears to be even more complicated as some miRNAs from intronic regions are derived from anti-sense transcripts (54Go). The most common molecular function for mammalian protein-coding host genes are those annotated as ‘purine nucleotide binding’, ‘DNA binding’ (63Go) and those containing homeobox and RNA-binding domains (94Go), all of which point to the parallel output of proteins and regulatory RNA sequences as part of complex networks which underpin mammalian biology (1Go,24Go) (Fig. 1). Moreover, it has also been suggested, on the basis of sequence homologies, that some miRNAs may regulate other miRNAs rather than mRNAs via a network of regulatory interactions at the RNA level (164Go).



View larger version (20K):
[in this window]
[in a new window]
 
Figure 1. Regulatory networks involving small non-coding RNAs. Small non-coding RNAs regulate genome structure and gene expression at many levels. miRNAs, siRNAs, snoRNAs and other small RNAs are involved in the regulation of translation, mRNA stability and chromatin structure, as well as self-regulation (dashed lines) and possibly also the control of transcription and splicing (question marks).

 

    THE TIP OF THE ICEBERG?
 TOP
 ABSTRACT
 INTRODUCTION
 SMALL NUCLEOLAR RNAs
 miRNAs AND siRNAs
 OTHER PLAYERS
 PARALLEL OUTPUTS AND REGULATORY...
 THE TIP OF THE...
 CONCLUSIONS
 REFERENCES
 
There are only limited numbers of known miRNAs in mammals. The fact that some of these miRNAs show up repeatedly in cloning experiments had led some to suggest that the set of miRNAs in mammals and other organisms is small. However, there are good reasons to think that this is not the case and that there are in fact tens or even hundreds of thousands of RNA signals which constitute a hitherto hidden control network that regulates chromatin architecture and gene expression during mammalian ontogeny (24Go). Some miRNAs are present in large amounts (54Go), but it is clear that at least some miRNAs and other small regulatory RNAs are present at very low levels (45Go,165Go), and it is possible, indeed likely, that most will exhibit very restricted expression patterns in specific cell types such as observed in hematopoietic and pancreatic cells (106Go,129Go). Some miRNAs, such as that encoded by the lys-6 locus, which controls the asymmetry of chemosensory neurons in C. elegans (166Go), and that encoded by the bantam locus in Drosophila (167Go), were only discovered by sensitive genetic screens, which are difficult if not impossible to carry out in mammals. The miRNA encoded by lys-6 is very scarce and cannot be detected by normal biochemical procedures, and loss-of-function mutations in it result in a very subtle phenotype (166Go).

The lack of known mutations in miRNAs (and no doubt other types of regulatory RNAs) in mammals is likely to be due to a combination of ascertainment bias focused on exons of protein-coding genes and the difficulty of mutation screening across large tracts of non-coding sequences in regions identified by genome scanning for quantitative trait or disease associations. In this context, it is worth noting that the mutations underlying the callipyge (‘beautiful bottom’) phenotype in sheep or the enhanced muscling of domestic pigs are single base substitutions within non-coding sequences (a long intergenic sequence of unknown transcriptional status in the DLK1-GTL2 imprinted region and the third intron of the IGF2 gene, respectively), the identification of which involved tour-de-force analyses in well structured pedigrees (168Go–170Go).

The problems of cloning small RNAs (171Go) and the contamination of cDNA libraries with rRNAs and other common RNA sequences have led to the conclusion that not many more miRNAs will be identified by this approach (54Go). Bioinformatic predictions (at least to date) have been limited by the tight constraints on the search parameters, including their focus on hairpin precursors, mRNA/UTR targets and strong evolutionary conservation, although improved filters based on the different patterns of miRNA and flanking sequence conservation in different species have recently identified almost 1000 candidates (56Go). Moreover, new algorithms based on secondary structural parameters are being developed, which appear to have the potential to identify other types of non-coding RNAs (172Go) that presumably also have regulatory functions.

There is strong evidence that chromatin dynamics and heterochromatin formation are controlled by small regulatory RNAs and that local chromatin architecture (in promoter regions) can also be directed by small RNAs. Indeed, this would make a lot of sense. It is well established that chromatin modification occurs at many different loci in different cells and that this is central to developmental ontogeny. There must either be an army of sequence-specific DNA binding proteins that carry out these modifications, which is not the case—there are only a limited number of DNA and histone modifying enzymes (methylases, acetylases and deacetylases etc.) (173Go)—or these enzymes must be directed to their sites of action by some other signal, most logically sequence-specific RNAs. Such signals would also potentially solve the conundrum of how to select from the huge number of transcription factor binding sites that exist in the genome. In this context, it is interesting to note that triplexes, which may contain RNA, are very common in human chromosomes (174Go) and many transcription factors have high affinity for RNA (175Go,176Go).

Trans-acting guide RNAs may also regulate alternative splicing (177Go), which is currently mainly thought to be controlled by the combinatorial effects of protein ‘splicing factors’ but is not at all well understood in these terms (2Go,178Go,179Go). Consistent with the possibility that site-specific trans-acting RNAs are involved, the nucleotide sequences around alternative splice sites are often highly conserved between species (180Go,181Go), and it has been shown by many studies that splicing patterns may be easily altered in cultured cells and in whole animals by introducing small antisense RNAs, an approach which is showing considerable promise for gene therapy of splice site mutations in muscular dystrophy and other human genetic diseases (182Go–186Go). It is not a big leap of faith to conclude that RNA control of splice site selection is also likely to happen naturally and that the reason that it has not yet been demonstrated to be the case is because of the sheer complexity of the numbers and variety of such signals in regulatory networks in different cells. If cells are awash with small RNA signals processed from longer precursors, which (as such) have short half-lives, identification of these signals will be difficult, although bioinformatics using appropriate search algorithms may provide a means to do so.

The known miRNAs tend to be highly conserved (54Go,56Go), presumably because their sequence is constrained by functional interactions with multiple targets (94Go,187Go), which is possibly also the case for the ultra-conserved elements that are far more conserved than protein-coding sequences in the mammalian genome (188Go). In contrast, endogenous siRNAs are less conserved presumably because these RNAs and their homologous targets can easily co-vary and still maintain specificity (43Go), which would make them difficult to identify bioinformatically, at least on the basis of evolutionary conservation. The level of selection pressure on such sequences (as signaling molecules largely dependent on primary sequence recognition and secondary structure) will be a function of the number of interactions that must be maintained, rather than the precise sequence itself. Those with one or few interacting partners will be able to evolve relatively freely and also explore new connections in regulatory networks, which themselves can evolve to explore new developmental space and which (given a relatively stable proteome) may be the major route to higher complexity and phenotypic variation.

Thus, many small regulatory RNAs, including possibly the majority of miRNAs, may not show strong evidence of sequence conservation over significant evolutionary distances. In this context, it is worth pointing out that known non-coding RNAs with conserved functions, such as Xist, are not highly conserved at the primary sequence level among mammals (189Go,190Go). This will, of course, also contribute to the perception that genomic sequences encoding such ncRNAs are (in general) drifting neutrally and that the majority of the genome which is transcribed is non-functional, which may not be the case at all (21Go). Indeed, a recent analysis suggests that the proportion of the human genome which is under purifying selection for functions held in common with other mammals is (at least) an order of magnitude higher in non-coding than in protein-coding sequences (191Go), an observation which is hard to reconcile with protein-based models of regulation of gene expression.

We have argued elsewhere that the majority of the genome of humans and other complex organisms is in fact devoted to extensive, but hitherto largely hidden, regulatory networks that are trans-acted by non-coding RNAs and that were essential to the evolution of complex organisms (1Go,2Go,21Go). We predicted that regulatory ncRNAs would be derived from the vast tracts of transcribed introns (23Go,24Go), which has now been confirmed in principle, as well as from non-protein-coding transcripts, which also appear to be the case. Indeed, the evidence is accumulating that the major advance in the evolution of complex organisms was the co-option of RNA as a digital signaling network, which was required to overcome the limitations of an analog (protein) based regulatory system (21Go,22Go,24Go). These RNA networks are likely to be intrinsically robust and their perturbation, particularly in the case of single nucleotide polymorphisms, may lead to a range of subtle phenotypes (in contrast to the generally severe effects of non-synonymous mutations in protein-coding sequences) (2Go). These may underpin much of the variation observed between species and individuals, including differences in quantitative and behavioral traits, and variation in susceptibility to complex diseases. If this is correct, most of our conceptions of gene regulation and approaches to molecular genetic analysis will have to be revised.


    CONCLUSIONS
 TOP
 ABSTRACT
 INTRODUCTION
 SMALL NUCLEOLAR RNAs
 miRNAs AND siRNAs
 OTHER PLAYERS
 PARALLEL OUTPUTS AND REGULATORY...
 THE TIP OF THE...
 CONCLUSIONS
 REFERENCES
 
We have barely begun to investigate RNA regulatory networks in mammals. The majority of the mammalian genome is transcribed into non-protein-coding RNA. Numerous short RNAs are processed from longer transcripts and possess various expression patterns, but the biological functions and targets are known for very few of them. The mechanism of RNAi is relatively well studied but our knowledge of chromatin modification by siRNAs is incomplete to say the least. It also seems likely that other short non-coding RNAs exist in the cell that may regulate many other processes and utilize other mechanisms, which remain to be discovered. Our understanding of the mammalian genome is undergoing major change. We used to consider most non-coding regions to be junk, but the extent of non-coding RNA transcription, the rapidly emerging evidence of regulatory networks controlled by RNA and a new logic about the genetic structure of complex organisms suggest that most of the mammalian genome may in fact be functional or at least that this possibility should be more seriously considered.


    ACKNOWLEDGEMENTS
 
The authors thank the Australian Research Council and the Queensland State Government for financial support and their colleagues for many stimulating discussions. They also thank Alex Hüttenhofer and two anonymous reviewers for alerting us to some important omissions and for some very helpful suggestions.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 SMALL NUCLEOLAR RNAs
 miRNAs AND siRNAs
 OTHER PLAYERS
 PARALLEL OUTPUTS AND REGULATORY...
 THE TIP OF THE...
 CONCLUSIONS
 REFERENCES
 

  1. Mattick, J.S. (2001) Non-coding RNAs: the architects of eukaryotic complexity. EMBO Rep., 2, 986–991.[CrossRef][ISI][Medline]

  2. Mattick, J.S. (2003) Challenging the dogma: the hidden layer of non-protein-coding RNAs in complex organisms. Bioessays, 25, 930–939.[CrossRef][ISI][Medline]

  3. Yelin, R., Dahary, D., Sorek, R., Levanon, E.Y., Goldstein, O., Shoshan, A., Diber, A., Biton, S., Tamir, Y., Khosravi, R. et al. (2003) Widespread occurrence of antisense transcription in the human genome. Nat. Biotechnol., 21, 379–386.[CrossRef][ISI][Medline]

  4. Cawley, S., Bekiranov, S., Ng, H.H., Kapranov, P., Sekinger, E.A., Kampa, D., Piccolboni, A., Sementchenko, V., Cheng, J., Williams, A.J. et al. (2004) Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell, 116, 499–509.[CrossRef][ISI][Medline]

  5. Lavorgna, G., Dahary, D., Lehner, B., Sorek, R., Sanderson, C.M. and Casari, G. (2004) In search of antisense. Trends Biochem. Sci., 29, 88–94.[CrossRef][ISI][Medline]

  6. Dennis, C. (2002) The brave new world of RNA. Nature, 418, 122–124.[CrossRef][Medline]

  7. Clement, J.Q., Qian, L., Kaplinsky, N. and Wilkinson, M.F. (1999) The stability and fate of a spliced intron from vertebrate cells. RNA, 5, 206–220.[Abstract]

  8. Clement, J.Q., Maiti, S. and Wilkinson, M.F. (2001) Localization and stability of introns spliced from the Pem homeobox gene. J. Biol. Chem., 276, 16919–16930.[Abstract/Free Full Text]

  9. Kapranov, P., Cawley, S.E., Drenkow, J., Bekiranov, S., Strausberg, R.L., Fodor, S.P. and Gingeras, T.R. (2002) Large-scale transcriptional activity in chromosomes 21 and 22. Science, 296, 916–919.[Abstract/Free Full Text]

  10. Kampa, D., Cheng, J., Kapranov, P., Yamanaka, M., Brubaker, S., Cawley, S., Drenkow, J., Piccolboni, A., Bekiranov, S., Helt, G. et al. (2004) Novel RNAs identified from an in-depth analysis of the transcriptome of human chromosomes 21 and 22. Genome Res., 14, 331–342.[Abstract/Free Full Text]

  11. Stolc, V., Gauhar, Z., Mason, C., Halasz, G., van Batenburg, M.F., Rifkin, S.A., Hua, S., Herreman, T., Tongprasit, W., Barbano, P.E. et al. (2004) A gene expression map for the euchromatic genome of Drosophila melanogaster. Science, 306, 655–660.

  12. Ashe, H.L., Monks, J., Wijgerde, M., Fraser, P. and Proudfoot, N.J. (1997) Intergenic transcription and transinduction of the human beta-globin locus. Genes Dev., 11, 2494–2509.[Abstract/Free Full Text]

  13. Charlier, C., Segers, K., Wagenaar, D., Karim, L., Berghmans, S., Jaillon, O., Shay, T., Weissenbach, J., Cockett, N., Gyapay, G. et al. (2001) Human–ovine comparative sequencing of a 250 kb imprinted domain encompassing the callipyge (clpg) locus and identification of six imprinted transcripts: DLK1, DAT, GTL2, PEG11, antiPEG11, and MEG8. Genome Res., 11, 850–862.[Abstract/Free Full Text]

  14. Holmes, R., Williamson, C., Peters, J., Denny, P. and Wells, C. (2003) A comprehensive transcript map of the mouse Gnas imprinted complex. Genome Res., 13, 1410–1415.[Abstract/Free Full Text]

  15. Seitz, H., Youngson, N., Lin, S.P., Dalbert, S., Paulsen, M., Bachellerie, J.P., Ferguson-Smith, A.C. and Cavaille, J. (2003) Imprinted microRNA genes transcribed antisense to a reciprocally imprinted retrotransposon-like gene. Nat. Genet., 34, 261–262.[CrossRef][ISI][Medline]

  16. Pang, K.C., Stephen, S., Engström, P.G., Tajul-Arifin, K., Chen, W., Wahlestedt, C., Lenhard, B., Hayashizaki, Y. and Mattick, J.S. (2005) RNAdb—a comprehensive mammalian noncoding RNA database. Nucleic Acids Res., 33(Database issue), D125–D130.[Abstract/Free Full Text]

  17. Liu, C., Bai, B., Skogerbo, G., Cai, L., Deng, W., Zhang, Y., Bu, D., Zhao, Y. and Chen, R. (2005) NONCODE: an integrated knowledge database of non-coding RNAs. Nucleic Acids Res., 33(Database issue), D112–D115.[Abstract/Free Full Text]

  18. Okazaki, Y., Furuno, M., Kasukawa, T., Adachi, J., Bono, H., Kondo, S., Nikaido, I., Osato, N., Saito, R., Suzuki, H. et al. (2002) Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs. Nature, 420, 563–573.[CrossRef][Medline]

  19. van den Berg, A., Kroesen, B.J., Kooistra, K., de Jong, D., Briggs, J., Blokzijl, T., Jacobs, S., Kluiver, J., Diepstra, A., Maggio, E. et al. (2003) High expression of B-cell receptor inducible gene BIC in all subtypes of Hodgkin lymphoma. Genes Chromosomes Cancer, 37, 20–28.[CrossRef][ISI][Medline]

  20. Metzler, M., Wilda, M., Busch, K., Viehmann, S. and Borkhardt, A. (2004) High expression of precursor microRNA-155/BIC RNA in children with Burkitt lymphoma. Genes Chromosomes Cancer, 39, 167–169.[CrossRef][ISI][Medline]

  21. Mattick, J.S. (2004) RNA regulation: a new genetics? Nat. Rev. Genet., 5, 316–323.[CrossRef][ISI][Medline]

  22. Mattick, J.S. and Gagen, M.J. (2005) Accelerating networks. Science, 307, 856–858.[Abstract/Free Full Text]

  23. Mattick, J.S. (1994) Introns: evolution and function. Curr. Opin. Genet. Dev., 4, 823–831.[CrossRef][Medline]

  24. Mattick, J.S. and Gagen, M.J. (2001) The evolution of controlled multitasked gene networks: the role of introns and other noncoding RNAs in the development of complex organisms. Mol. Biol. Evol., 18, 1611–1630.[Abstract/Free Full Text]

  25. Bachellerie, J.P., Cavaille, J. and Huttenhofer, A. (