Non-coding RNAs: regulators of disease†
No conflicts of interest were declared.
Abstract
For 50 years the term ‘gene’ has been synonymous with regions of the genome encoding mRNAs that are translated into protein. However, recent genome-wide studies have shown that the human genome is pervasively transcribed and produces many thousands of regulatory non-protein-coding RNAs (ncRNAs), including microRNAs, small interfering RNAs, PIWI-interacting RNAs and various classes of long ncRNAs. It is now clear that these RNAs fulfil critical roles as transcriptional and post-transcriptional regulators and as guides of chromatin-modifying complexes. Here we review the biology of ncRNAs, focusing on the fundamental mechanisms by which ncRNAs facilitate normal development and physiology and, when dysfunctional, underpin disease. We also discuss evidence that intergenic regions associated with complex diseases express ncRNAs, as well as the potential use of ncRNAs as diagnostic markers and therapeutic targets. Taken together, these observations emphasize the need to move beyond the confines of protein-coding genes and highlight the fact that continued investigation of ncRNA biogenesis and function will be necessary for a comprehensive understanding of human disease. Copyright © 2009 Pathological Society of Great Britain and Ireland. Published by John Wiley & Sons, Ltd.
Introduction
Over the past decade there has been an explosion of large-scale genome sequencing, which has led to both great insights and unexpected conundrums. Contrary to the original expectation that more complex organisms would have a greater number of genes, it is now clear that humans and mice have approximately the same number of protein-coding genes as the microscopic roundworm Caenorhabditis elegans, most of which are orthologous, and that all multicellular organisms sequenced to date have fewer protein-coding genes than some simple unicellar eukaryotes 1. An explanation for this apparent paradox comes from two unexpected findings: (a) that biological complexity generally correlates with the proportion of the genome that is non-protein-coding 1; and (b) that, while only 2% of the mammalian genome encodes mRNAs, the vast majority is transcribed, largely as long and short non-protein-coding RNAs (ncRNAs) 2-10. These findings have directly challenged the traditional view of RNA as simply an intermediary between DNA and protein, and imply that the vast majority of the genome—long regarded as ‘junk’ - encodes functional RNA species that orchestrate the development of complex organisms 11, 12. Indeed it appears that RNA signalling is central to all complex genetic phenomena in the eukaryotes, including transcriptional and post-transcriptional gene silencing 13-22, hybrid dysgenesis 15, 23, X-chromosome dosage compensation and allelic exclusion 24, 25, germ cell reprogramming 26 and paramutation 27, 28—all of which involve epigenetic processes (see below).
The expanding RNA world
Small regulatory RNAs
A fundamental and general role for regulatory RNA in eukaryotic biology was dramatically demonstrated in the late 1990s by the finding that double-stranded RNA introduced into C. elegans is cleaved by the bidentate ribonuclease Dicer into ∼21 nucleotide (nt) small RNAs that induce widespread and heritable gene silencing 29-31. Although this phenomenon, termed RNA interference (RNAi), was originally thought to be restricted to exogenous dsRNAs, it soon became clear that plants and animals produce a dazzling array of endogenous small interfering RNAs (siRNAs), microRNAs (miRNAs) and PIWI-interacting RNAs (piRNAs) 14-17, 32-37. Recent work has seen this repertoire increase even further to include a host of short transcripts that sit adjacent to transcription start sites 38, 39, including promoter-associated small RNAs (PASRs) 9, 40 and transcription initiation RNAs (tiRNAs) 41, species that are derived from centromeres and telomeres 42, 43, and tiny species processed from other short RNAs 44 (Figure 1, Table 1). Indeed, over the last decade we have witnessed a near-exponential growth of manuscripts devoted to regulatory RNAs (Figure 2).

Simplified representation of regulatory ncRNAs and their functions. Generalized gene models are presented in dark grey and light orange and overlap the double-strand DNA structure (light grey), representative of the genome. Each class of regulatory RNA is defined by a colour. Functions are ascribed to each class of ncRNA by colour below the text at the top of the figure. Bars and arrows indicate the direction of transcription. For more detail please see the text and references therein. PARs, promoter-associated RNAs; lncRNAs, long non-coding RNAs; miRNAs, microRNAs; snoRNAs, small nucleolar RNAs; sdRNAs, sno-derived RNAs; endo-siRNAs, endogenous siRNAs; piRNAs, PIWI-interacting RNAs; tiRNAs, transcription initiation RNAs

The rise in the number of non-coding RNA publications per year. The number of PubMed indexed publications with title, abstract or keywords matching each class of ncRNA is plotted. Records were retrieved using an in-house Perl script and the PubMed eFetch utility. The search terms used were: piRNA, PIWI RNA, PIWI-associated RNA, PIWI-interacting RNA, siRNA, small interfering RNA, endo-siRNA, endogenous siRNA, miRNA, microRNA, non-coding RNA, non-coding RNA, non-protein-coding RNA, ncRNA and lincRNA. Note that 2009 data were only gathered through August 2009
NcRNA class | Characteristics | References |
---|---|---|
Established ncRNA classes | ||
Long (regulatory) non-coding RNAs (lncRNAs) | The broadest class, lncRNAs, encompass all non-protein-coding RNA species > ∼200 nt, including mRNA-like ncRNAs. Their functions include epigenetic regulation, acting as sequence-specific tethers for protein complexes and specifying subcellular compartments or localization | Reviewed in 97, 98 |
Small interfering RNAs (siRNAs) | Small RNAs ∼21–22 nt long, produced by Dicer cleavage of complementary dsRNA duplexes. siRNAs form complexes with Argonaute proteins and are involved in gene regulation, transposon control and viral defence | Reviewed in 15-17, 227 |
microRNAs (miRNAs) | Small RNAs ∼22 nt long, produced by Dicer cleavage of imperfect RNA hairpins encoded in long primary transcripts or short introns. They associate with Argonaute proteins and are primarily involved in post-transcriptional gene regulation | Reviewed in 16, 17, 70 |
PIWI-interacting RNAs (piRNAs) | Dicer-independent small RNAs ∼26–30 nt long, principally restricted to the germline and somatic cells bordering the germline. They associate with PIWI-clade Argonaute proteins and regulate transposon activity and chromatin state | Reviewed in 15, 17 |
Promoter-associated RNAs (PARs) | A general term encompassing a suite of long and short RNAs, including promoter-associated RNAs (PASRs) and transcription initiation RNAs (tiRNAs) that overlap promoters and TSSs. These transcripts may regulate gene expression | Reviewed in 38, 39 |
Small nucleolar RNAs (snoRNAs) | Traditionally viewed as guides of rRNA methylation and pseudouridylation. However, there is emerging evidence that they also have gene-regulatory roles | Reviewed in 228 |
Other recently described classes | ||
X-inactivation RNAs (xiRNAs) | Dicer-dependent small RNAs processed from duplexes of two lncRNAs, Xist and Tsix, which are responsible for X-chromosome inactivation in placental mammals | 131 |
Sno-derived RNAs (sdRNAs) | Small RNAs, some of which are Dicer-dependent, which are processed from small nucleolar RNAs (snoRNAs). Some sdRNAs have been shown to function as miRNA-like regulators of translation | 229-231 |
microRNA-offset RNAs (moRNAs) | Small RNAs ∼20 nt long, derived from the regions adjacent to pre-miRNAs. Their function is unknown | 232 |
tRNA-derived RNAs | tRNAs can be processed into small RNA species by a conserved RNase (angiogenin). They are able to induce translational repression | Reviewed in 44 |
MSY2-associated RNAs (MSY-RNAs) | MSY-RNAs are associated with the germ cell-specific DNA/RNA binding protein MSY2. Like piRNAs, they are largely restricted to the germline and are ∼26–30 nt long. Their function is unknown | 234 |
Telomere small RNAs (tel-sRNAs) | Dicer-independent ∼24 nt RNAs principally derived from the G-rich strand of telomeric repeats. May have a role in telomere maintenance | 43 |
Centrosome-associated RNAs (crasiRNAs) | A class of ∼34–42 nt small RNAs, derived from centrosomes that show evidence of guiding local chromatin modifications | 42 |
Of the classes identified to date, miRNAs, siRNAs and piRNAs, which guide effector Argonaute proteins to genomic loci or target RNAs in a sequence-specific manner, have been most thoroughly investigated. In humans there are at least 700 miRNAs, hundreds of siRNAs and millions of unique piRNA sequences 15-17, 45, suggesting that small RNAs are a substantial portion of the RNA output of cells and that they comprise a diverse, widespread and basal regulatory system. Indeed, several recent studies have shown that miRNAs and piRNAs are detectable in the most primitive multicellular organisms 46 and that once acquired, they are seldom, if ever, lost 47-49. Although exogenous siRNAs were discovered a decade ago, endogenous siRNAs (endo-siRNAs) have only recently been identified in fruitflies and mammals 50-58, where their biogenesis is dependent on Dicer processing of duplexes formed by overlapping transcripts or long perfect hairpin structures. Work from other animals, including nematode and fruitfly, yeast and plants, indicates that that these endo-siRNAs are involved in anti-viral defence, transposon silencing, chromatin remodelling and post-transcriptional gene regulation through Argonaute-mediated cleavage of target transcripts (reviewed in 16, 17). The longest small RNA class, piRNAs, which are ∼25–30 nt in length, are also largely derived from, and involved in, transposon defence (reviewed in 15, 19) but are largely restricted to the germline, where active transposons could severely disrupt embryogenesis. Intriguingly, in a departure from the Dicer biogenesis pathway that defines siRNAs and miRNAs, piRNAs are produced by successive waves of Argonaute-cleavage of long non-coding transcripts (reviewed in 15, 59), which may suggest that other Dicer-independent small RNA species are still to be discovered.
A detailed examination of miRNA biogenesis and function is beyond the scope of this work and has recently been covered in detail in several excellent reviews 16, 17, 60. However, miRNA mechanisms of action, and the autoregulatory feedback loops that increasingly characterize small RNA biogenesis, are well illustrated by one of the first miRNAs discovered, let-7 (Figure 3). The let-7 family of miRNAs is highly conserved throughout the Metazoa and functions as a master temporal regulator of development and differentiation, both in early embryos and complex adult tissues such as brain, in nematode, fruitfly, zebrafish and mouse 61-67. Indeed, let-7 targets well-established cell-cycle regulators, including Cdk6 and Ras 68, 69. Like the majority of miRNAs, the let-7 precursor hairpin, or pre-miRNA, is processed from a long RNA polymerase II transcript by the nuclear RNase Drosha, which is then exported to the cytosol and processed to a ∼22 nt mature miRNA by Dicer 70.

let-7 provides a window into miRNA biogenesis and function. (a) let-7 biogenesis and gene regulation is characterized by a series of autoregulatory feedback loops. Lines ending in bars indicate inhibitory interactions, while those terminating in arrows indicate activating interactions. For simplicity, all let-7 family members (of which there are 11 in vertebrates) are considered as a group. Likewise, mammalian LIN28 homologues (LIN28 and LIN28B) and TRIM–NHL family members (TRIM71 and TRIM32) are depicted as single elements within the schematic. Mature let-7 mediates its effects through a complex composed of an Argonaute protein (grey) and GW182 (brown), which is also depicted in simplified form in the lower panel. Consistent with its expression in late embryogenesis, the principal targets of let-7 are cell cycle regulators, oncofetal genes, pluripotency factors and components of the miRNA biogenesis pathway. Please see the text for more detail and references for each depicted interaction. (b) A general schematic of mRNA transcription and miRNA targeting. Canonical miRNA targets (blue) are dependent on base pairing between nucleotides 2–8, the seed sequence, and the mRNA 3′ UTR. Due to the short length of the seed sequence, legitimate interactions can be abolished and illegitimate targets created by single base changes. Non-canonical targets (orange), e.g. those in coding sequences (CDSs) or 5′ UTRs, are not reliant on the ‘seed’ sequence and generally show more extensive base pairing. Canonical and non-canonical targets are depicted for let-7a:HMGA2 235 and let-7b:Dicer 87, respectively
Each of these steps of let-7 biogenesis is tightly regulated (Figure 3a). For example, while differentiation factors such as Notch 71 induce transcription, pluripotency factors (i.e. those that support an undifferentiated cellular state), such as c-Myc, repress transcriptional activation 72-74. Likewise, the pluripotency factor LIN28 can bind to the conserved loop of the primary let-7 transcript (pri-let-7) to directly inhibit the Drosha cleavage steps 75-77 and can inhibit Dicer cleavage directly or by facilitating pre-miRNA degradation 78, 79. Completing the feedback loop, let-7 targets LIN28 75, 80, 81, c-Myc 82, 83 and the c-Myc-activating gene IMP-1 80. let-7 also forms a separate overlapping loop with the TRIM–NHL family of proteins that negatively regulate c-Myc and enhance let-7 activity 66, 84-86. These let-7 targets are ‘canonical’, i.e. the miRNA ‘seed’ sequence (nucleotides 2–8) binds to a target mRNA 3′ UTR and (generally) represses translation (Figure 3b).
However, let-7 also targets the Dicer coding sequence (CDS) 87 (Figure 3b), consistent with what appears to be an emerging theme of non-canonical miRNA targets in developmentally regulated genes 88-92. Additionally, let-7 was also recently shown to regulate HMGA2, an oncofetal gene and pluripotency factor, in a cell cycle-dependent manner. HMGA2 translation is up-regulated upon cell cycle arrest but inhibited in proliferating cells 93. Taken as an example, let-7 provides a compelling illustration of the complexity of small RNA biogenesis and function, and points more generally towards small RNAs having a wide range of regulatory functions facilitated by sequence-specific interactions, any of which may malfunction to cause disease.
Long non-coding RNAs
Genome-wide transcriptomic studies have now shown that the mammalian genome is abundantly transcribed 2-10 and that at least 80% of this transcription is exclusively associated with long non-coding RNAs (lncRNAs) 9. Although lncRNAs have frequently been disregarded as artifacts of chromatin remodelling or transcriptional ‘noise’ 94, 95, there is substantial evidence to suggest that they mirror protein-coding genes. Indeed, they are frequently long (generally > 2 and some > 100 kb) 96, spliced and contain canonical polyadenylation signals 97, 98. Additionally, lncRNA promoters are bound and regulated by transcriptional factors, including Oct3/4, Nanog, CREB, Sp1, c-myc, Sox2, NF-κB and p53 99-102 and epigenetically marked with specific histone modifications 102, 103. Overall, there are at least tens of thousands of lncRNAs that show signatures of selection—many of which, like small RNAs, are tissue and developmental stage-specific 97, 104-108.
Long ncRNAs have a variety of functions, but one of their primary roles appears to be as epigenetic regulators of protein-coding gene expression 109. For example, Hox genes are associated with hundreds of lncRNAs that define domains of differential histone methylation and RNA polymerase accessibility along the spatial and temporal axes of human development 110. A lncRNA transcribed from the HOXC locus, termed HOTAIR, regulates the chromatin methylation state of the HOXD locus in trans through the polycomb repressive complex PRC2 110. Additionally, a recent study has shown that more than 20% of long intergenic ncRNAs associate with chromatin-modifying complexes 111, with evidence that other lncRNAs operate to activate gene expression through Trithorax-group complexes 108. Long ncRNAs are also frequently associated with the phenomenon of genomic imprinting, which ensures that one of the two parental alleles of certain autosomal genes are epigenetically silenced 112-116.
Long ncRNAs can also directly modulate gene transcription and protein degradation. For example, the lncRNA Evf2 activates the homeobox transcription factor Dlx2 and recruits it to an ultraconserved genomic element, which induces transcription of Dlx5 117. Mutant mice lacking Evf2 show reduced numbers of GABAergic interneurons early in development and reduced synaptic inhibition in adulthood 118, suggesting that lncRNA-dependent processes are fundamental to the central nervous system. Indeed, a large fraction of lncRNAs is expressed in very precise patterns in the brain 106. Similarly, the archetypal tumour suppressor p53 mRNA, in addition to being translated, also functions as an RNA to inhibit the ubiquitin ligase activity of Mdm2 119. Indeed, there is increasing evidence to suggest that mRNAs contain extensive RNA structural features 120, 121, raising the possibility that, in addition to their protein-coding functions, mRNAs may intrinsically act as regulatory RNAs 122.
Long ncRNAs have also been implicated in organelle biogenesis and subcellular trafficking. For instance, the NRON lncRNA is involved in the cytoplasmic-to-nuclear shuttling of the NFAT transcription factor 123, and the lncRNA NEAT1/MEN ε/β is required for the formation of the nuclear subcompartment paraspeckles in differentiated cells and regulates the movement of primate-specific mRNAs containing inverted Alu repeats 124-127. Another lncRNA, Gomafu, is specifically localized in a novel nuclear domain in a subset of neurons 128.
Although long and short regulatory RNAs are typically studied and classified separately, it is important to note that they frequently overlap both physically and functionally. Indeed, the dynamic interplay between lncRNAs and small RNAs is evident during the process of X-chromosome inactivation (XCI), which occurs in female mammals to ensure dosage compensation for X-linked genes between the sexes. The antisense lncRNAs Xist and Tsix are not only responsible for the chromatin modifications that maintain XCI 24, 129, 130, but also form dsRNA duplexes in vivo that are processed by Dicer into ∼25–42 nt X-inactivation RNAs (xiRNAs) 131. Given the abundance of mammalian antisense transcripts with the potential to form dsRNA structures 132, 133, such relationships between lncRNAs and small RNAs may be commonplace.
Non-coding RNAs in disease
Small regulatory RNAs
Small RNAs have roles in virtually all developmental processes, including stem cell and germline maintenance, development and differentiation, transcriptional and post-transcriptional gene silencing and subcellular localization (see above, and reviewed in 14-17, 20, 21). Not surprisingly, therefore, their disruption has been linked to human disease. For example, miRNAs are aberrantly expressed in: liver, pancreatic, oesophageal, stomach, colon, haematopoietic, ovarian, breast, pituitary, prostate, thyroid, testicular and brain cancers 134-138; central nervous system disorders (e.g. schizophrenia and Alzheimer's disease) 139; and cardiovascular disease 140, 141. MiRNAs are also enriched at fragile sites in the human genome and are associated with oncogenic viral integration sites 142, 143.
Similarly, loss of specific small RNA loci is associated with Prader–Willi syndrome (PWS), a disorder caused by the loss of imprinting on chromosome 15q11-q13 and characterized by hyperphagia, hypogonadism and cognitive impairment. A recent study has shown that a single microdeletion involving several small nucleolar RNA clusters (HBII-85 and HBII-52) results in PWS, suggesting that loss of small RNAs is a causal determinant of the disease 144. Consistent with this hypothesis, knockout mice lacking the relevant snoRNAs largely recapitulate the PWS phenotype 145. Interestingly, HBII-52 forms an antisense duplex with the serotonin receptor 2C (5HT2C) mRNA and negatively regulates its post-transcriptional editing 146, 147, strongly implicating it in PWS-associated and autistic neurological defects 148. Taken together, these studies suggest that the loss of small RNA loci plays an important role in human illness.
Like protein-coding genes, small RNAs can function either as activators or inhibitors of disease. Consistent with its role as a differentiation factor, let-7 is a well-established tumour suppressor 61, 149-151 whose reduced expression is associated with poor survival in human lung cancers 152. Likewise, mir-29b expression is associated with disease-free survival in patients with ovarian serous carcinoma 153, potentially due to regulation of the de novo methyl transferases Dnmt3a and Dnmt3b 154. Indeed, altered expression of a broad suite of miRNAs that, dependent on their targets can either act as tumour suppressors or oncogenes (so-called oncomiRs), has been detected in virtually all cancer types examined (for reviews and tables of cancer-associated miRNAs and their targets, see 134, 151, 155, 156). Similar relationships are apparent in cardiovascular illnesses 140, 141. For example, miR-92a controls functional recovery of ischaemic tissues in mice 157, and miR-145 and miR-143, which have recently been implicated in differentiation of progenitors into cardiac myocytes, are down-regulated in injured and atherosclerotic vessels 158. MicroRNAs may even play a direct role in viral defence. A study of human T lymphocytes has shown that miR-29a targets the HIV-1 3′ UTR and directs it to P bodies, where it is suppressed by the RNA-induced silencing complex (RISC) 159.
Small RNA dysregulation occurs for multiple reasons and reflects the processes involved in their biogenesis, regulation and targeting. MicroRNA loci and individual components of the miRNA biogenesis pathway are frequently lost or amplified in a wide range of cancers 160, 161, and there is now widespread evidence that miRNAs that act as differentiation factors (e.g. let-7, above) are globally down-regulated in cancers 134, 150, 151. Indeed, ovarian cancer patients who show decreased expression of Dicer and Drosha, the RNases involved in miRNA production (see above), are associated with poor prognoses, suboptimal surgical cytoreduction and advanced tumour stages 162. Likewise, a mutation resulting in premature termination of DICER1 results in pleuropulmonary blastoma, a rare paediatric lung tumour 163. Consistent with these findings, studies in mice have shown that mammalian systems are highly sensitive to Dicer activity. Complete loss of Dicer results in the disruption of the developmental programme and early embryonic lethality 164.
Elements associated with the regulation of miRNA processing can also be associated with various pathologies. A recent study has shown that over-expression of LIN28 and LIN28B, negative regulators of let-7 biogenesis, correlates with repression of let-7, occurs in at least 15% of human malignancies and is associated with more advanced disease states 165. Similarly, and consistent with let-7's role in developmental regulation, genetic variants of the LIN28 locus have recently been associated with altered timing of human pubertal growth and development 166.
Single nucleotide polymorphisms (SNPs) in mature and precursor miRNAs have been robustly associated with schizophrenia and autism 167, 168, and a pathogenic SNP in the seed sequence of miR-96 is responsible for progressive hearing loss 169 (Figure 3b). A SNP in the 3′ UTR of K-Ras, a well-characterized GTPase-regulated oncogene and target of let-7, inhibits let-7 translational suppression and results in reduced survival in oral cancers 170. Indeed, SNPs in the 3′ UTRs of mRNAs that abolish or create target sites may be common in miRNA-associated diseases (reviewed in 171). As examples, SNP-induced illegitimate miRNA binding sites are associated with muscular hypertrophy in sheep, Tourette's syndrome and cardiovascular disease 171-173. More generally, allele-specific polymorphisms in miRNA target sites have been shown to play a role in the tissue-specific miRNA regulation of hundreds of genes, suggesting that such genetic subtleties may be a widespread underlying cause of individual phenotypic variability 174.
Long non-coding RNAs
The data gathered to date strongly implicate lncRNAs in the basal regulation of protein-coding genes, including those central to normal development and oncogenesis, at both the transcriptional (e.g. epigenetic) and post-transcriptional (e.g. subcellular dynamics) levels, and an increasing number have been functionally validated to affect different cellular and developmental pathways (see 107). It is not surprising, then, that the dysregulation of lncRNAs appears to be a primary feature of many complex human diseases, including leukaemia 175, colon cancer 176, prostate cancer 177, breast cancer 178, hepatocellular carcinoma 175, 179, psoriasis 180, ischaemic heart disease 181, 182, Alzheimer's disease 183 and spinocerebellar ataxia type 8 184.
In some cases, the mechanisms by which lncRNAs contribute to disease have been carefully dissected. For example, the dsDNA-binding protein PSF constitutively silences the proto-oncogene GAGE6. However, at least five lncRNAs can bind to PSF, which results in deactivation of PSF-induced silencing, expression of GAGE6 and enhanced tumorigenicity 185. Long ncRNAs overlapping or antisense to protein-coding gene promoters may also contribute to oncogenesis. A transcript antisense to the p15 tumour suppressor gene, first identified in a human leukaemia, regulates the chromatin and DNA methylation status of the p15 locus 186. A lncRNA antisense to p21 was also recently shown to behave similarly 187. These results, combined with the observation that antisense transcripts are present at thousands of protein-coding genes, have led to speculation that antisense lncRNAs generally control the expression of their cognate protein-coding genes through epigenetic modifications 188, 189. This model has profound ramifications for our understanding of disease, particularly cancer—dysregulation of a lncRNA regulating the expression of a tumour suppressor or oncogene, and not the protein-coding sequence itself, may be one of the ‘hits’ that leads to oncogenesis.
The hidden layer of non-coding variation
These examples likely represent the tip of a very big iceberg. The same technologies that have revealed a breadth of ncRNA expression are also driving a revolution in genome sequencing that will ultimately identify variations in the human genome that underpin disease susceptibility and aetiology. However, given the focus on mutations in protein-coding exons that cause most of the high-penetrance simple genetic disorders, the variation that occurs in non-protein-coding regions of the genome has, to date, largely been ignored or at least not been considered 107, 190. This is changing: the emergence of genome-wide association studies to identify variant loci affecting complex diseases and traits and an increased awareness of ncRNA biology have prompted a reconsideration of the underlying protein-centric assumptions and provided a number of novel insights into disease-causing mechanisms. For example, many pathological mechanisms are now known to involve aberrant regulation (and in many cases ncRNAs) rather than alterations to the protein-coding sequences themselves. This is perhaps not surprising, given that the primary engine of phenotypic radiation and higher complexity has been the expansion and divergence of the regulatory architecture that controls the deployment of protein components during differentiation and development 191, much of which may be embedded within ncRNAs 12, 105. Indeed, the same forces that drive evolutionary innovations can result in deleterious variations.
Genome-wide association studies are beginning to identify novel ncRNAs as candidate disease-associated genes. For example, the lncRNA MIAT is associated with myocardial infarction 181, and a novel lncRNA induced by a chromosomal deletion that truncates the polyadenylation site of the LUC7L gene 192 results in aberrant methylation and silencing of the neighbouring HBA2 gene, leading to the onset of α-thalassemia. Indeed, many disease variants map far from protein-coding genes and, given the level of genome-wide transcription, are therefore likely to interrupt lncRNAs. For example, a disease-causing 7.4 kb deletion associated with blepharophimosis syndrome occurs over 250 kb upstream from the nearest gene, FOXL2 193, and this mutation interrupts a lncRNA of unknown function, PISRT1, which has also been identified as a candidate gene in a goat model of this disease 193.
The potential role of lncRNAs in long-range enhancer function, and therefore dysfunction, is illustrated by the Evf2 lncRNA (see above), which may contribute to split-hand/split-foot malformation 1 (SHFM1). Although the region associated with this developmental disorder encompasses three genes, DLX5, DLX6 and DSS1, none are directly mutated in patients 194. Instead, exhibition of the limb phenotype requires the expression of both the Dlx5 and Dlx6 genes to be disrupted, suggesting that SHMF1 results from the ablation of a shared regulatory element 195. Since it is now known that Evf2 regulates the expression of both these genes, it warrants investigation as a candidate SHFM1 disease locus.
Similarly, two lncRNAs, SOX2OT and SOX2DOT, exhibit enriched expression in the lens of eye and overlap a known myopia susceptibility locus (Figure 4) 196, 197. These transcripts, one of which is transcribed from a distal ultraconserved enhancer, also overlaps the SOX2 gene, itself an important regulator of ocular development. Given that developmental genes are significantly enriched for a proximal association with lncRNAs, it may well be that the understanding of developmental disorders is likely to benefit from an appreciation of lncRNA biology. The future convergence of lncRNA identification by deep RNA sequencing with the increased resolution of disease variants afforded by genomic sequencing will, we suggest, prove a potent combination in elucidating the functional contribution of ncRNAs to disease.

Long non-coding RNAs, SOX2 distal overlapping transcript (SOX2DOT) and SOX2 overlapping transcript (SOX2OT) map to the myopia susceptibility locus. A representation of these features, the SOX2 gene, and a highly conserved SOX2DOT enhancer 236 are shown in the inset schematic representative of a region of the human genome (chr3: 182, 255, 415–182, 945, 055; UCSC Genome Browser hg18). The relative expression of spliced ESTs corresponding to SOX2DOT or SOX2OT is depicted in the lower half of the figure and shows that these transcripts are highly expressed in the lens of the eye. Adapted from Amaral et al. 196
Non-coding RNAs as diagnostics and therapeutics
The growing body of research showing that ncRNAs may be primary genetic regulators in complex animals has led to the corresponding realization that this may make them ideal diagnostic markers. For example, in some cases the expression profiles of miRNAs, in contrast to those of protein-coding mRNAs, are able to accurately identify the origin of poorly differentiated tumours and carcinomas 198, 199. Indeed, a signature of as few as 200 miRNAs may be sufficient for cancer classification 198, and it appears that some of the difficulties of early detection associated with colon and other occult cancers may be overcome by profiling miRNAs obtained from patient serum, plasma, saliva and tissues 200, 201. Likewise colon, lung and breast cancer prognosis is strongly associated with a small suite of miRNAs (reviewed in 202), suggesting that assays designed to query ncRNAs may eventually become core components of the pathologist's toolkit. This will undoubtedly be facilitated by recent advancements in massively parallel sequencing technologies 203, 204, which allow rapid and sensitive profiling of both long and short ncRNAs, and will almost certainly make personal genomics a reality in the next 5 years 205. The analysis and integration of this information with other datasets (e.g. protein interaction and genome-wide association studies) will pose a considerable, but tractable, challenge well into the future.
The link between endogenous ncRNAs and disease, and the perfection of RNAi-based techniques to silence genes in simple animals, has led to speculation that RNA molecules can be employed as therapeutic agents. Indeed, it may be both easier and more productive to adjust the regulatory software (i.e. ncRNAs) than to try and correct the hardware (i.e. protein-coding genes). Hopes for RNA-based and RNA-targeted therapies were bolstered by early successes using siRNAs in human in vitro culture systems 206 and in targeting HIV-1 and human BCL2 with siRNA-like molecules 207-209. Like gene therapy, however, RNA therapeutics face considerable hurdles, including development of reliable delivery systems, dosage regimes and techniques to ameliorate off-target effects 210, 211. Nonetheless, multiple modes of administration have been developed, including viral, liposome and nanoparticle delivery systems, and there are currently multiple ongoing clinical trials targeting age-related macular degeneration, respiratory syncytial virus, acute renal failure, hepatocellular carcinoma and congenital pachyonychia, among others (reviewed in 212).
There is also an increasing interest in RNA therapeutics that mimic or regulate miRNA activity in human cancers (reviewed in 213). This could be facilitated by exogenous expression of a repressed miRNA (using the same delivery systems as siRNA therapeutics), by the introduction of antagomirs 214 that are complementary and bind to miRNAs, or the use of ‘sponges’ that contain multiple artificial miRNA-binding sites 215. Artificial expression of specific miRNAs in vivo may be a powerful therapeutic mechanism, particularly given recent reports that over-expression of a single miRNA, miR-302, is capable of inducing stemness 216, 217.
A series of recent studies has suggested that an equally fruitful target may be gene promoters. Indeed, there is a growing body of work showing that exogenous small RNAs can activate or suppress transcription by interfering with epigenetic marks and chromatin formation, and thereby disrupting transcription initiation 40, 187, 188, 218-225. Moreover, siRNAs have been shown to effectively modulate alternative splicing 226, suggesting that, once viable, RNA therapeutics may have a wide diversity of possible uses.
Conclusions
The absolute number of protein-coding genes encoded by a genome is essentially static across all animals from simple nematodes to humans 1, indicating that additional genetic elements must be involved in the development of the increasingly complex cellular, physiological and neurological systems. Non-coding RNAs are likely candidates, as they are adaptively plastic, capable of regulating processes both broadly and sequence-specifically, and are now known to be components of nearly all cellular and developmental systems. It is becoming clear that a comprehensive understanding of human biology must include both small and large non-coding RNAs, and that it is perhaps only through inclusion of these elements in the biomedical research agenda, including studies to determine the mechanistic basis of the causative variations identified by genome-wide association studies, that complex human diseases will be completely deciphered.
Acknowledgements
This study was supported by the Australian National Health and Medical Research Council, the Menzies Foundation, the Australian–American Fulbright Commission and the Royal Australasian College of Physicians (KCP), the Australian Research Council, the University of Queensland and the Queensland State Government (JSM).
Teaching materials
PowerPoint slides of the figures from this review are supplied as supporting information in the online version of this article.