Beruflich Dokumente
Kultur Dokumente
NON-CODING RNA
Expressed sequence tag Long non-coding RNAs (lncRNAs) are defined as RNAs (miRNA) is conserved from human to nematodes ignited
(EST). Typically 3′‑biased of at least 200 nucleotides (nt) in length that are inde- major interest in miRNAs in 2000 (REF. 7), and subsequent
Sanger-sequencing read pendently transcribed, and that molecularly resem- comparative analysis has been instrumental in identify-
of approximately ble mRNAs, yet do not have recognizable potential to ing miRNA genes, predicting miRNA targets in mRNAs
700 nucleotides.
encode functional proteins. The 200 nt cutoff excludes and for revealing features that are important for miRNA
Full-length cDNA most canonical ncRNAs, such as small nucleolar RNAs biogenesis8–10. Comparative approaches require two main
A cDNA that ideally captures (snoRNAs), small nuclear RNAs (snRNAs) and tRNAs, ingredients: sets of genes or genomes that can be com-
a full-length mRNA transcript and roughly corresponds to the retention threshold of pared, and algorithms for matching and evaluating the
from the 5′ cap to the
protocols for the purification of long RNAs. Genomic similarity. Applying comparative sequence analysis to
3′ polyadenylated tail;
sequenced by multiple Sanger studies based on expressed sequence tag (EST) and full- lncRNAs is challenging on both fronts. Until recently,
sequencing runs. length cDNA sequencing, tiling microarrays and RNA only a few lncRNAs had been annotated in species other
sequencing (RNA-seq) identified thousands of lncRNAs than human and mouse, and lncRNAs typically lack long
in diverse animal and plant genomes. One recent study regions with high constraint on sequence (which are
that combined RNA-seq data from multiple sources needed by tools that have been developed for comparing
reported over 58,000 lncRNA loci in the human genome1. protein-coding genes) or regions with strong constraint
Future studies will plausibly increase this number, as on secondary structure (which is a key ingredient used
lncRNAs are more tissue-specific and expressed at lower by tools that have been developed for studying shorter
levels than mRNAs2–4, and many cell types (in particular RNAs). In addition, as our understanding of the modes
those that are rare or found in early embryonic stages) of action of lncRNAs is still very rudimentary and the
have not yet been thoroughly interrogated by RNA-seq. ‘rules’ underlying their functions remain unknown, it is
The fraction of annotated lncRNAs that are functional a major challenge to develop models that will accurately
— that is, have any recordable impact on a molecular, capture evolutionary constraints on lncRNA loci (simi-
cellular or organismal level — is still unknown. The ques- lar to models that use the ratio of non-synonymous to
tions of whether lncRNAs are functional and how they synonymous changes (dN/dS) to study the constraints
perform their functions are of particular interest consid- on preserving a particular protein-coding sequence11).
Department of Biological ering the rapidly increasing number of lncRNAs that are Nevertheless, recent studies have begun to take the first
Regulation, Weizmann implicated in changing expression or losing sequence steps towards mapping and comparing lncRNAs across
Institute of Science, 234 Herzl
integrity in different instances of human disease5,6. mammals and other vertebrates2,12–14 (TABLE 1), and have
Street, Rehovot 76100, Israel.
igor.ulitsky@weizmann.ac.il Comparative analysis of genes across species can be a uncovered constant turnover of lncRNA genes in evolu-
powerful tool for studying their functions and modes of tion alongside extensive sequence changes in those lnc
doi:10.1038/nrg.2016.85
Published online 30 Aug 2016; action, as it has been for other non-coding RNAs and pro- RNAs that are conserved. In parallel, as detailed below,
corrected online 6 Sep 2016 teins. For instance, the discovery that the let‑7 microRNA researchers have tested conservation of function among
homologues of specific lncRNAs. Although most lncRNA yielded fewer transcript models that were more accu-
studies were conducted in vertebrate species, studies in rately annotated owing to longer read length19. A rough
other clades have so far reported a surprisingly similar scheme of the RNA-seq-based lncRNA identification
picture, suggesting that although no common lncRNA is outlined in FIG. 1. Until recently, the most common
genes have been found so far between species separated sequencing methods used oligo(dT)-based enrichment
by more than 500 million years of evolution, the principles for poly(A) RNAs, which include the vast majority of
guiding lncRNA evolution across eukaryotes are similar. functionally characterized lncRNAs. More recently, pro-
In this Review, I survey the main methods that have tocols that deplete only the rRNAs and sequence the rest
been used to identify and compare lncRNAs across spe- of the ‘total RNA’, including non-poly(A) transcripts, are
cies, and summarize the shared conclusions of studies on being adopted increasingly 20. In my experience, the use
lncRNA evolution in vertebrates, insects, sponges and of total RNA does not add a substantial number of lnc
plants. I then discuss the current understanding of the RNAs, and it is important to keep in mind the drawbacks
evolutionary origins of lncRNAs and the mechanisms of using total RNA; namely, a lower per cent of usable
through which the complexity of their loci has increased reads and a higher per cent of reads mapping to introns21,
during evolution. Last, I discuss recent studies of the which are features that make transcript model assembly
evolution of function in specific lncRNAs. Throughout and expression-level quantification more challenging
this Review (and particularly in BOX 1), I provide practi- than from poly(A)-enriched data. For either protocol,
cal guidelines for identifying and studying homologues the most popular tools for read mapping and transcript
of lncRNAs of interest. assembly are TopHat 22 and Cufflinks23 from the Tuxedo
suite (HISAT and StringTie are recent successors of those
Identification of lncRNA genes tools and have improved performance24,25). Recent bench-
A typical lncRNA is biochemically identical to an mRNA: marking efforts showed that the Tuxedo tools are com-
it harbours a 5′ cap and a 3′ polyadenylated (poly(A)) parable in performance to others26,27, and that full-length
tail, and is thus easily sequenced by standard RNA-seq transcript assembly using short-read data is a challenging
protocols15. In recent years, researchers have been using task. Therefore, although the transcript models recon-
increasingly deep RNA-seq to map the transcriptomes structed from short-read RNA-seq are certainly useful,
of various tissues and conditions across eukaryotes, and they are not necessarily accurate across all exons.
Homologues
have identified numerous new lncRNAs in each sys- Once a transcriptome is assembled, a computational
A pair of genes that descended tem2,12,13,16–18. These efforts built on earlier studies that were pipeline is needed for the filtering, annotation and
from a common ancestral gene. based on ESTs and full-length cDNA sequencing, which discovery of those transcripts that meet the lncRNA
criteria. Some of the major differences between the structures that are based mostly on curated EST and
computational approaches are whether they consider cDNA data; these databases contain few, but relatively
single-exon transcripts (that are notoriously enriched accurate, isoforms. Primarily based on deep RNA-seq,
with artefacts), whether they allow some degree of over- other databases hold almost an order of magnitude
lap between lncRNAs and other known genes (for exam- more transcript isoforms than RefSeq; for example,
ple, overlap with introns of protein-coding genes on the approximately 60,000 lncRNA genes have been identi-
same strand), and how they distinguish between coding fied in the MiTranscriptome data set 1. The complexity
and non-coding genes28. These factors heavily influence of alternative splicing, along with alternative promoters
the numbers of identified lncRNAs. and polyadenylation sites (and to a lesser extent, algo-
rithmic difficulties), contributes to the large number of
Databases of lncRNA annotations isoforms that are reconstructed for individual lncRNA
Systematic curation efforts have enabled the develop- genes29. Importantly, even lncRNAs that are expressed
ment of several lncRNA databases (TABLE 1). Reference at low levels are reproducibly detected across individ-
Sequence (RefSeq) and GENCODE (accessible through uals13, indicating that their annotation is unlikely to
Ensembl) are widely used databases of transcript be erroneous.
non-transcribed sequences in the other species2,12,13. important for function. On average, lncRNA transcripts
Therefore, it is important to study lncRNAs by directly are slightly less structured than mRNAs in vitro50, but
comparing lncRNA-producing loci, and such studies significantly more structured than mRNAs in vivo 51.
in multiple species have uncovered rapid turnover of Surprisingly, there is no correlation between the amount
lncRNA loci2,12,13. For example, my laboratory found of secondary structure and overall sequence conserva-
that in 17 vertebrates, more than 70% of lncRNAs tion44,50. The experimental evidence for lncRNAs broadly
have appeared in the past 50 million years2. Splicing acting through specific structures is scarce. Notable
patterns also evolve rapidly, with only approximately exceptions are triplex elements that stabilize the 3′ ends
20% of splicing events in human lncRNAs conserved of MALAT1 (metastasis-associated lung adenocarci-
outside of primates13. lncRNA loci are thus commonly noma transcript 1) and NEAT1 (nuclear enriched abun-
gained and lost in evolution, and those lncRNAs that dant transcript 1) lncRNAs52, the roX-box stem-loop
are retained drastically change their exon–intron archi- structures in the D. melanogaster roX (RNA on the X)
tecture and their sequences across s pecies in which the lncRNAs53,54 and possibly the RepA repeat in the XIST
lncRNA is present. (X-inactive specific transcript) RNA55–57.
One potential caveat of studies comparing lncRNAs For cases in which using only primary sequence con-
in bulk and using heterogeneous data is that the con- servation to define homology has not identified human
servation of the lncRNAs expressed only in specific cell homologues of mouse lncRNAs, can structure-only
types might be underestimated if the compared tissues conservation lead us to these ‘missing’ homologues,
are not carefully matched. This does not seem to be a as proposed by one study58? To the best of my know
major concern, as studies that focused on a specific tis- ledge, there are no examples of such cases that have been
sue in a few species reached similar conclusions. For shown experimentally. Furthermore, sequence alignabil-
example, one study identified and compared lncRNAs ity between mammalian species does not require strong
expressed in the liver in three rodents and found that purifying selection over long stretches59, and pressure
only 60% (160 out of 268) of the lncRNAs expressed to preserve structured elements should in most cases be
in mouse liver had homologues that are expressed in sufficient for maintaining alignability. Indeed, elements
rat liver and only 27% (76 out of 273) had homologues in which the structure but not the sequence is thought
that are expressed in human liver 45. Similar results to be important, such as basal stems of miRNA hair-
were seen when comparing human and mouse islets of pins, are easily alignable between mammals. Additional
Langerhans46, eye47 and pluripotent stem cells30; even evidence suggesting that structure conservation without
in carefully matched systems, most human lncRNAs sequence alignability is rare among mammals comes
do not have recognizable homologues in mice and from comparing the number of lncRNAs that are syntenic
vice versa. between humans and other mammals (after subtracting
The rapid evolution of most lncRNAs is inconsist- background expectation) to the number of lncRNAs that
ent with many having functions that depend on specific have sequence similarity. The gap between these two
sequence throughout their loci. It is possible that many numbers is small (not more than a couple of dozen lnc
lncRNAs carry no function, or that lncRNA functions RNAs for humans and each other tested mammal)2,30, so
may rely on short elements for which the surrounding it is unlikely that many lncRNAs have conserved struc-
sequence context has limited importance. One possible tures between species as distant as human and mouse yet
type of such sites comprises binding sites for miRNAs or remain invisible in whole-genome alignments.
RNA-binding proteins, which may allow some lncRNAs Short regions of sequence evolving under selec-
to act as competing endogenous RNAs (ceRNAs)48, tion to preserve secondary structure can be predicted
although the mechanisms that allow low-abundance across the genome using methods based on scanning
lncRNAs to compete for binding with hundreds to whole-genome alignments, such as EvoFold and RNAz
thousands of more abundant mRNAs remain unclear (reviewed in REF. 60), and loci of some functional human
in many cases (see REF. 49 for a review of the current lncRNAs, such as MALAT1, NEAT1 (REF. 61) and NORAD
understanding of the ceRNA hypothesis). (non-coding RNA activated by DNA damage)62,63, overlap
such regions. Surprisingly, the overlap between lncRNA
Secondary structure and its conservation. An open and exons and segments predicted to evolve under constraints
debated question is whether secondary structure plays on secondary structures is small in the human genome38
Triplex
an important part in lncRNA biology, as it does in other as well as in the genomes of other species64. A study using
An RNA structure formed by non-coding RNAs, which rely heavily on structured a different background model recently reported more
three strands of RNA, two that elements for their biogenesis and functions. Two main than 4 million regions that are evolving under selection to
form a Watson–Crick duplex practical aspects of the importance of secondary struc- preserve secondary structure65, but a vast number of those
and a third that binds in the
ture are whether selection acting on structure rather regions overlap regions that do not appear to be tran-
major groove of the duplex
forming Hoogsteen and reverse than primary sequence explains the rapid rate of lncRNA scribed at appreciable levels. Another recent study has
Hoogsteen hydrogen bonds. sequence evolution, and whether focusing on regions mapped the secondary structure of the HOTAIR (HOX
with stable or conserved structures assists in homing in transcript antisense intergenic RNA) lncRNA66 and high-
Syntenic on functionally important regions. lighted some structures as evolutionarily conserved, but
Preserving order and
orientation of genes or other
As with any long RNA, lncRNAs fold into second- a recent preliminary statistical analysis of the levels of
genomic elements between ary structures, many of which are stable, but that fact conservation suggested that in this and potentially other
species. alone does not imply that the secondary structure is cases there is no evidence for selection on preservation of
specific structures67. Genome-wide analysis thus provides ‘Class I’ lncRNAs are conserved lncRNAs in which
limited support for widespread pressure to preserve sec- exon–intron structure and multiple sequences along
ondary structures in lncRNAs. This does not imply that the length of the lncRNA are conserved among species.
structure-based homology searches cannot sometimes be A representative of this class is MIAT (myocardial infarc-
very useful for lncRNA homology detection; for example, tion associated transcript; also known as GOMAFU)76,
elegant and carefully tailored structure-based approaches which contains 5–7 exons in both human and mouse,
were used to detect homologues of roX lncRNAs in 4 of which are conserved (FIG. 2b). At present, we know
Drosophila species68 and the viral PAN (polyadenylated that this class constitutes a minority of conserved lnc
nuclear non-coding) RNA69 in distant species in which RNAs but includes some of the better-studied ones, such
primary-sequence homology approaches have failed (it is as XIST, cyrano (also known as OIP5‑AS1), NEAT1,
noteworthy that the species in these studies are separated MALAT1 and NORAD. It is expected that many of
by many more generations than humans and mice). the trans-acting lncRNAs will belong to this group and
Overall, although there is still much remaining to be indeed some of the better-studied Class I lncRNAs are
learned about the structure–function axis in lncRNA enriched in the cytoplasm, and therefore probably act
genes, most current evidence suggests that regions independently of their sites of transcription.
where specific secondary structures are important ‘Class II’ lncRNAs are those in which the act of tran-
for conserved functions occupy a much smaller frac- scription and some RNA elements (biased towards the
tion of the lncRNA sequences compared with those of 5′ end of the RNA) are conserved, whereas the majority
canonical ncRNAs, such as rRNAs, snoRNAs, tRNAs of the locus experienced drastic changes in exon–intron
and snRNAs. structure and length. For example, such a conserved
lncRNA is found downstream of the ONECUT1 gene in
Positional conservation. It has been proposed that in human, mouse and other vertebrates (FIG. 2c). In Class II
many cases, transcription through a lncRNA locus (or lncRNAs, only a few splice sites, if any, are conserved,
part of it) is important, whereas the RNA product plays and transposable elements (TEs) contributed heavily to
a secondary part, if any 70. For example, transcription locus diversification across species (see below). These
through the region of the AIRN (antisense of IGF2R lncRNAs are more likely to be cis-acting and to regulate
non-protein-coding RNA) lncRNA locus that overlaps gene expression in regions surrounding their loci.
the promoter of insulin-like growth factor 2 receptor ‘Class III’ lncRNAs are conserved lncRNAs in which,
(IGF2R) is important for IGF2R silencing, whereas the rest beyond conservation of promoter sequences and the
of the 118 kb AIRN RNA is dispensable for this purpose71. act of transcription of the specific region, there are no
In such lncRNAs, one can expect that the position of the
region that is transcribed would be conserved, whereas
the exon positions and the bulk of the mature lncRNA Figure 2 | Classes of lncRNA conservation. a | Proposed ▶
sequence would evolve neutrally, with the exception of classes of sequence conservation among long non-coding
elements that are required for continued transcriptional RNAs (lncRNAs) and their correlation with genomic
features. See the main text for a description of the
elongation, such as short splicing motifs. Indeed, splicing
individual features and references to the publications
motifs are preferentially conserved in many lncRNAs72. supporting the positive and negative correlations with the
Furthermore, it has been observed that when compar- level of conservation. b | High conservation of exon–intron
ing distant species, a significant number of lncRNAs structure; for example, the MIAT (myocardial infarction
are ‘positionally conserved’ — that is, found in the same associated transcript; also known as GOMAFU) lncRNA
relative orientation to orthologous protein-coding genes locus in human and mouse. The RNA sequencing (RNA-seq)
and/or other conserved regions2,32,73,74 — and many of track shows the coverage of reads from the human cortex
those do not share detectable sequence conservation. from the Human Proteome Atlas (HPA) transcriptome
Such pairs may correspond to lncRNAs that have con- database122 and the mouse cerebellar granular neurons123.
served functional sequences that are too short or degen- Phylogenetic P value (PhyloP) scores124, which describe
base-wise conservation during vertebrate evolution, were
Orthologous erate to be detected, or to lncRNAs in which only the
taken from the University of California, Santa Cruz (UCSC)
Pertains to homologous genes act of transcription is under selective pressure. In many Genome Browser. Whole-genome alignment (WGA) track
in different species that have of the lncRNAs with deep positional conservation, such shows alignable regions between human and mouse
evolved from a common as PVT1 and DEANR1 (definitive endoderm-associated
ancestral gene by speciation.
genomes. c | A lncRNA with conserved sequence, but
lncRNA 1; also known as linc‑FOXA2)2,75, the length of divergent exon-intron structure; for example, a lncRNA
Trans-acting the transcribed locus and the exon–intron architecture found downstream of the ONECUT1 gene in human
Regulation that is not cis also evolve rapidly, indicating that the second scenario and mouse. Human adult liver RNA-seq is from the HPA and
acting; for example, regulation (a role for transcription itself) may be more common. mouse adult liver RNA-seq is from the Encyclopedia of DNA
by diffusible factors that can Elements (ENCODE) project. d | A lncRNA with a conserved
comparably regulate both position and very limited sequence conservation: the
homologous loci in a diploid Classes of lncRNA evolutionary trajectories
forkhead box F1 (FOXF1) gene and the FOXF1 adjacent
organism. The analysis of lncRNA conservation at the different
non-coding developmental regulatory RNA (FENDRR)
levels presented above30 gives rise to the classification lncRNA. RNA-seq from adult lung from the HPA and
Cis-acting system proposed here in which each class corresponds
Acting from the same molecule, ENCODE projects. e | A mouse lncRNA with no evidence of
typically interpreted as
to a different level of conservation and distinct lncRNA expression in human, the Haunt (also known as Halr1 or
regulation occurring on the features, and probably different mechanisms of action linc‑Hoxa1) locus. RNA-seq from human125 and mouse126
same physical chromosome. as well (FIG. 2). embryonic stem (ES) cells. TEs, transposable elements.
Species A Orthologues
Species B
Tissue specificity
Expression levels
b MIAT 1 kb
Cortex RNA-seq
Conservation
(PhyloP)
Human WGA
c lnc-ONECUT1
Liver RNA-seq
Conservation
(PhyloP)
Human WGA
d FENDRR
Lung RNA-seq
FENDRR FOXF1
Conservation
(PhyloP)
Human WGA
e Haunt
Human ES cell RNA-seq 318.726 _
Conservation
(PhyloP)
Human WGA
Haunt
regions with recognizable sequence similarity and there ago. In species in which the comparison of lncRNAs
is typically no conservation of gene structure. Some of across a set of species with a reasonable gradient of
these lncRNAs might be transcribed from conserved evolutionary distances is possible, the emerging picture
enhancer elements, with limited or no function of the is of rapid turnover similar to the one observed in ver-
RNA product or the act of transcription, such as in tebrates. For example, only 20% of the lncRNAs in the
the case of the Lockd lncRNA77. In several lncRNAs, such mosquito Anopheles gambiae have alignable sequences
as FENDRR (FOXF1 adjacent non-coding developmen- in the genome of Anopheles minimus, whereas 90% of
tal regulatory RNA) (FIG. 2d), there is conservation of the A. gambiae proteins are alignable between the two spe-
promoter and of the first splice site, but not of the rest of cies16 (these species diverged less than 80 million years
the exons, suggesting that it is the act of transcriptional ago). In plants, a large excess of positionally conserved
elongation (supported by productive splicing; see below) lncRNAs, compared with sequence-conserved lncRNAs,
that is important. was found among the genomes of nine Brassicaceae and
Notably lncRNAs that host conserved small RNAs, Cleomaceae plants32, and between rice and maize33.
such as miRNAs and snoRNAs, evolve under a separate The overall features of lncRNAs observed in ver
set of pressures and therefore can be defined as a sepa- tebrates are thus probably applicable to lncRNAs from
rate class30. Importantly, most human or mouse lncRNAs other clades and vice versa. However, to date, no clear
are not found in the other species, such as Haunt (also homologues and no lncRNAs with clearly analogous
known as Halr1 or linc‑Hoxa1)78 (FIG. 2e). It is possible mechanisms have been identified between vertebrate
that some human lncRNAs perform primate-specific lncRNAs and those of other species. Therefore, it is not
functions, and that others independently evolved func- clear to what extent the mechanisms used by lncRNAs in
tions similar to those of lncRNAs in other species, but it other clades are also used in vertebrates and vice versa.
is plausible that many of them are simply not functional.
As illustrated in FIG. 2a, various genomic and func- Evolutionary origins of new lncRNAs
tional features are correlated with the degree of con- The observation that most lncRNAs in vertebrate
servation. For example, lncRNAs with enhancer-like genomes do not have homologues in species separated
chromatin marks at their promoters (a high ratio of his- by more than 50 million years of evolution2 suggests a
tone H3 lysine 4 monomethylation (H3K4me1) com- high frequency of new lncRNA origination. Several
pared to trimethylation (H3K4me3)) are less conserved mechanisms for such events are described below and
than those with more canonical promoters (enrichment in FIG. 3Aa–Ae.
for H3K4me3 relative to H3K4me1)79. Most conserved
lncRNAs are also closer to protein-coding genes, less Duplication. Protein-coding genes evolve by duplica-
likely to overlap transposable elements, and are more tion and subfunctionalization83, with few exceptions84.
broadly and highly expressed2,13. If this route were common in lncRNAs, we would expect
to see some sequence similarity among lncRNAs within
Rapid turnover of lncRNAs in other phyla the same species (although lncRNA paralogues are
The high prevalence of lncRNAs is not unique to ver- expected to be less similar to each other than protein
tebrate genomes. Thousands of lncRNAs have now paralogues owing to the faster sequence evolution). In
been described in the much smaller D. melanogaster practice, such intra-species similarity among lncRNAs
genome18, as well as in mosquito and bee genomes16,31 is rare2,73,85, and when it does occur, it can often be
(see REF. 80 for a review on lncRNAs in insects), and attributed to unannotated fragments of TEs2; therefore,
over 10,000 lncRNAs may be present in some species whole-locus duplication only rarely contributes to the
of plants with larger genomes, such as maize81 and cot- evolution of new lncRNAs. Still, specific lncRNA pairs
ton82. More than 2,000 lncRNA loci were annotated in (such as the two or three paralogues of megamind (also
sponges, which are non-bilaterian animal species with known as TUNA) found in most vertebrates73) have
simple morphology 35,36. probably evolved by duplication, as have MALAT1
The number of lncRNAs identified in individual spe- and NEAT1 (REF. 61), which have apparently unrelated
cies is strongly influenced by the breadth and depth of functions but maintain certain common features, such
the available RNA-seq data, as well as by the annotation as nuclear retention and stabilization by a triple-helical
criteria and filters (for example, whether single-exon or element at their 3′ end52.
intron-overlapping transcripts were considered), and
so it remains difficult to correlate genomic character- Loss of coding potential of protein-coding genes.
istics with the propensity of the genome to give rise Mutations, TE insertions and genomic rearrangements
to lncRNAs. However, it is interesting to note that the in protein-coding loci can lead to nonsense mutations and
main features associated with vertebrate lncRNAs — loss of protein-coding function. If these events do not
Paralogues short length with few exons, and low and tissue-specific lead to a loss of transcription (or if transcription is later
Homologous genes related by expression — also appear in these other species. regained) and if nonsense-mediated decay is either not
duplication within a genome. In many species it has been difficult to measure triggered or not very efficient, then a new lncRNA gene
lncRNA conservation owing to a lack of sufficiently can be formed at the same locus. Three of the lncRNA
Nonsense mutations
Mutations in which a codon
close species with sequenced genomes and/or tran- loci in the eutherian X‑inactivation centre — XIST, JPX
encoding an amino acid is scriptomes; for example, the closest sequenced relatives (also known as ENOX) and FTX — originated through
mutated into a stop codon. of some sponges diverged more than 450 million years this mechanism and were retained across mammals86,87.
Routes for increased complexity in lncRNA loci Two main routes are known to contribute to an increase
Interestingly, most ‘young’ (that is, <50 million years in the complexity of lncRNA loci during e volution,
old) lncRNAs across vertebrates and other species share adoption of TEs and local duplications (FIG. 3Ba,Bb).
common genomic features that sometimes set them
apart from ‘old’ lncRNAs2,12,13, suggesting that rather Adoption of TEs. TE insertions are typically heavily
than being associated with specific functionality, the selected against within protein-coding exons, as they
features of young lncRNAs characterize new genes usually lead to disruption of the protein sequence. By
as they emerge in evolution from unstable transcripts contrast, depletion of TEs within lncRNA sequences is
or from non-transcribed regions. In vertebrates, such very weak2, and numerous TE insertions are found in
genes typically have two or three exons (compared with relatively highly conserved lncRNAs such as PVT1 (REF. 2)
~8 exons for protein-coding genes in vertebrates) and are and cyrano73,90. A reasonable expectation is that most of
~1,000 nt long (compared with ~2,000 nt for mRNAs)2. the TE‑derived sequences in lncRNAs are not functional,
Interestingly, some functionally characterized lncRNAs but are also not deleterious to lncRNA function, leading
are much longer or much more highly expressed. For to weak selection against TE insertions. An intriguing
example, XIST is a 17 kb transcript that is spread across question that will require extensive experimental studies
8 exons, and one of the isoforms of ANRIL (antisense is whether some TE‑derived exons also correspond to
non-coding RNA in the INK4 locus; also known as functional domains acquired following TE exonization98;
CDKN2B‑AS1) is ~4 kb long and spread across 19 exons. to date, functional elements in several lncRNAs98, includ-
There is a mild but significant correlation between the ing XIST99, UCHL1‑AS1 (REF. 100) and ANRIL101, have
evolutionary age of a lncRNA and its length2. Longer been mapped to regions derived from TEs.
lncRNAs are in many cases extensively alternatively
spliced, producing multiple isoforms. The locus diversi- Local duplications. Local repeats have been reported to
fication does not appear to be completely random, with be enriched in lncRNA loci102, and several well-studied
conserved elements exhibiting 5′ bias2 and TE insertions functional lncRNAs, such as XIST, FIRRE (functional
preferentially occurring closer to the 3′ end90. intergenic repeating RNA element)102,103, NORAD62,63,
A(n)
A(n) A(n) A(n) A(n) A(n) A(n)
A(n)
Figure 4 | Manifestations of conserved functionality in lncRNA genes. a | Loss of a homologous long non-coding RNA
(lncRNA) in different species can result in the same phenotype. b | Homologous lncRNAs can act through a conserved
Nature Reviews | Genetics
mechanism. c | Target genes regulated by the lncRNAs can be the same. d | The loss of function of a lncRNA in one species
can be rescued by the exogenous expression of the homologue from a different species. lncRNAs are shown as curved
lines, with a 5′ cap (circle) and 3′ polyadenlylated tail (A(n)). lncRNAs from different species are shown in blue versus yellow.
Conserved function is indicated by the green bar and triangles; red dashed lines indicate experimental loss-of-function of
a lncRNA; and the black hexagon represents an RNA-binding protein.
CDR1as104, and roX1 and roX2 (REFS 53,68), harbour The functional interrogation of lncRNA func-
short repeated sequences. These repeats span a range of tion in vivo is still in its infancy, and there are multiple
sequence similarities; for example, the repeats in FIRRE methodological issues that need to be considered (see
and XIST are highly similar to each other, whereas those REF. 106 for a detailed discussion of the pros and cons of
in NORAD are very diverged63. These differences might the available methods). Still, several lncRNAs were shown
reflect functional constraints on preserving inter-repeat to have related loss‑of‑function phenotypes across spe-
similarity (which may facilitate higher-order structures57). cies. For example, XIST is required for X inactivation in
Functionally, sequence duplications can endow a lncRNA both human and mouse cells107, loss of NEAT1 causes
with multiple platforms for the binding of factors; for loss of paraspeckles across species108,109, and CARMEN
example, miRNAs in the case of CDR1as104, HNRNPU (cardiac mesoderm enhancer-associated non-coding
(heterogeneous nuclear ribonucleoprotein U) proteins RNA) is required for cardiomyogenesis in both humans
by FIRRE102, and PUM (pumilio) proteins by NORAD62. and mice110. In the case of HOTAIR, a functional dis-
crepancy between human and mouse lncRNAs has been
Conservation of lncRNA function reported: the human orthologue of HOTAIR was shown
Do lncRNAs that have conserved sequences also act in to regulate the expression of the HOXD (homeobox D)
similar ways across species? One easily quantifiable yet cluster in primary human fibroblasts111, whereas HoxD
very crude proxy for the physiological function is the expression was unaffected in mice in which the entire
expression domain. When the spatial expression patterns HoxC cluster (part of which encodes HOTAIR) has been
of lncRNA homologues are compared across species, they deleted112. Interpretation of cross-species differences in
are typically as conserved as those of mRNAs. Several this case is hindered by the use of different cells in human
studies have found that lncRNA tissue specificity, as well and mouse, and the fact that subsequently published tar-
as specific expression patterns, are generally highly con- geted deletion of mouse HOTAIR did lead to a specific
served2,13. Such conservation was also found when indi- phenotype and upregulation of several HoxD genes113.
vidual lncRNAs were compared with higher resolution of When a lncRNA retains functionality across species,
spatial expression using fluorescence in situ hybridization does it act through the same mechanism or targets? In
(FISH)105. Conserved lncRNAs are thus likely to act in most cases this remains unknown. In perhaps the most
similar contexts in different species. extensive study of conservation of lncRNA function, the
Several lncRNAs have been tested for conservation of genomic binding sites of the roX1 and roX2 lncRNAs were
their functionality across species (TABLE 2). Although the mapped in four Drosophila species, and it was found that
number of tested cases remains too small to reach univer- although the functionality of the lncRNAs is conserved,
sal conclusions — or to understand when to expect conser- their binding sites differ drastically across species, while
vation or divergence of function — the emerging picture maintaining some features such as proximity to genes68.
is that relatively minor sequence conservation can be suf- When roX lncRNAs from other species were tested in roX-
ficient for maintaining conserved functions. Functional null D. melanogaster, they bound to the D. melanogaster
conservation can exhibit itself in different ways (FIG. 4): the binding sites, explaining the ability of those h omologues
loss‑of‑function phenotype of the lncRNA homologues to rescue the roX-null D. melanogaster mutants.
can be similar; the molecular functionality can be con- Notably, alongside these examples of lncRNAs with
served; the target genes affected by the lncRNA can be conserved functionality over large evolutionary distances,
similar; or the lncRNA in one species can functionally there are also numerous highly expressed and functional
replace its orthologue in another species. The last scenario lncRNAs in mouse for which no clear human orthologue
is particularly useful, as cases in which the homologues has been identified to date, including Braveheart 114 and
from different species and artificial constructs are capable Haunt 78,115, as well as functional primate-specific lnc
of rescuing a genetic null for a lncRNA68 can be used to RNAs such as BDNFAS (BDNF antisense RNA)116 and
distil essential functional features of lncRNAs and validate HPAT5 (human pluripotency-associated transcript 5)117
predictions from comparative genomics. that have no known mouse orthologues.
1. Iyer, M. K. et al. The landscape of long noncoding 17. Liu, J. et al. Genome-wide analysis uncovers regulation associated with agriculture traits. Plant J. 84,
RNAs in the human transcriptome. Nat. Genet. 47, of long intergenic noncoding RNAs in Arabidopsis. 404–416 (2015).
199–208 (2015). Plant Cell 24, 4333–4345 (2012). 34. Paytuvi Gallart, A., Hermoso Pulido, A., Anzar
2. Hezroni, H. et al. Principles of long noncoding RNA 18. Brown, J. B. et al. Diversity and dynamics of the Martinez de Lagran, I., Sanseverino, W. & Aiese
evolution derived from direct comparison of Drosophila transcriptome. Nature 512, 393–399 Cigliano, R. GREENC: a Wiki-based database of plant
transcriptomes in 17 species. Cell Rep. 11, (2014). lncRNAs. Nucleic Acids Res. 44, D1161–D1166
1110–1122 (2015). 19. Ravasi, T. et al. Experimental validation of the regulated (2016).
This study compares features and loci of lncRNAs expression of large numbers of non-coding RNAs from 35. Bråte, J., Adamski, M., Neumann, R. S., Shalchian-
across various vertebrates and shows rapid lncRNA the mouse genome. Genome Res. 16, 11–19 (2006). Tabrizi, K. & Adamska, M. Regulatory RNA at the root
turnover combined with conservation of expression 20. Adiconis, X. et al. Comparative analysis of RNA of animals: dynamic expression of developmental
patterns, and positional conservation without sequencing methods for degraded or low-input lincRNAs in the calcisponge Sycon ciliatum.
sequence conservation across large evolutionary samples. Nat. Methods 10, 623–629 (2013). Proc. Biol. Sci. 282, 20151746 (2015).
distances. 21. Zhao, W. et al. Comparison of RNA-seq by poly (A) 36. Gaiti, F. et al. Dynamic and widespread lncRNA
3. Cabili, M. N. et al. Localization and abundance capture, ribosomal RNA depletion, and DNA expression in a sponge and the origin of animal
analysis of human lncRNAs at single-cell and single- microarray for expression profiling. BMC Genomics 15, complexity. Mol. Biol. Evol. 32, 2367–2382
molecule resolution. Genome Biol. 16, 20 (2015). 419 (2014). (2015).
4. Cabili, M. N. et al. Integrative annotation of human 22. Trapnell, C., Pachter, L. & Salzberg, S. L. TopHat: 37. Guttman, M. et al. Chromatin signature reveals over a
large intergenic noncoding RNAs reveals global discovering splice junctions with RNA-seq. thousand highly conserved large non-coding RNAs in
properties and specific subclasses. Genes Dev. 25, Bioinformatics 25, 1105–1111 (2009). mammals. Nature 458, 223–227 (2009).
1915–1927 (2011). 23. Trapnell, C. et al. Transcript assembly and This is the first study to use chromatin marks to
This study provides the first comprehensive quantification by RNA-seq reveals unannotated improve the identification of lncRNAs in mouse and
RNA-seq-based catalogue of human lncRNAs and transcripts and isoform switching during cell provides a detailed description of a set of lncRNAs
characterizes their features. differentiation. Nat. Biotechnol. 28, 511–515 (2010). that were better conserved than background.
5. Gong, J., Liu, W., Zhang, J., Miao, X. & Guo, A. Y. 24. Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast 38. Marques, A. C. & Ponting, C. P. Catalogues of
lncRNASNP: a database of SNPs in lncRNAs and their spliced aligner with low memory requirements. mammalian long noncoding RNAs: modest
potential functions in human and mouse. Nucleic Acids Nat. Methods 12, 357–360 (2015). conservation and incompleteness. Genome Biol. 10,
Res. 43, D181–D186 (2015). 25. Pertea, M. et al. StringTie enables improved R124 (2009).
6. Wapinski, O. & Chang, H. Y. Long noncoding RNAs and reconstruction of a transcriptome from RNA-seq reads. 39. Gardner, P. P. et al. Conservation and losses of non-
human disease. Trends Cell Biol. 21, 354–361 (2011). Nat. Biotechnol. 33, 290–295 (2015). coding RNAs in avian genomes. PLoS ONE 10,
7. Pasquinelli, A. E. et al. Conservation of the sequence 26. Steijger, T. et al. Assessment of transcript e0121797 (2015).
and temporal expression of let‑7 heterochronic reconstruction methods for RNA-seq. Nat. Methods 40. Haerty, W. & Ponting, C. P. Mutations within lncRNAs
regulatory RNA. Nature 408, 86–89 (2000). 10, 1177–1184 (2013). are effectively selected against in fruitfly but not in
8. Auyeung, V. C., Ulitsky, I., McGeary, S. E. & 27. Engstrom, P. G. et al. Systematic evaluation of spliced human. Genome Biol. 14, R49 (2013).
Bartel, D. P. Beyond secondary structure: primary- alignment programs for RNA-seq data. Nat. Methods 41. Zhang, Y. C. et al. Genome-wide screening and
sequence determinants license pri-miRNA hairpins 10, 1185–1191 (2013). functional analysis identify a large number of long
for processing. Cell 152, 844–858 (2013). 28. Housman, G. & Ulitsky, I. Methods for distinguishing noncoding RNAs involved in the sexual reproduction
9. Bartel, D. P. MicroRNAs: target recognition and between protein-coding and long noncoding RNAs and of rice. Genome Biol. 15, 512 (2014).
regulatory functions. Cell 136, 215–233 (2009). the elusive biological purpose of translation of long 42. Ponjavic, J., Ponting, C. P. & Lunter, G. Functionality or
10. Berezikov, E. Evolution of microRNA diversity and noncoding RNAs. Biochim. Biophys. Acta 1859, transcriptional noise? Evidence for selection within
regulation in animals. Nat. Rev. Genet. 12, 846–860 31–40 (2015). long noncoding RNAs. Genome Res. 17, 556–565
(2011). 29. Kanitz, A. et al. Comparative assessment of methods (2007).
11. Yang, Z. Likelihood ratio tests for detecting positive for the computational inference of transcript isoform 43. Wang, J. et al. Mouse transcriptome: neutral evolution
selection and application to primate lysozyme abundance from RNA-seq data. Genome Biol. 16, 150 of ‘non-coding’ complementary DNAs. Nature
evolution. Mol. Biol. Evol. 15, 568–573 (1998). (2015). http://dx.doi.org/10.1038/nature03016 (2004).
12. Necsulea, A. et al. The evolution of lncRNA repertoires 30. Chen, J. et al. Evolutionary analysis across mammals 44. Managadze, D., Rogozin, I. B., Chernikova, D.,
and expression patterns in tetrapods. Nature 505, reveals distinct classes of long non-coding RNAs. Shabalina, S. A. & Koonin, E. V. Negative correlation
635–640 (2014). Genome Biol. 17, 19 (2016). between expression level and evolutionary rate of long
13. Washietl, S., Kellis, M. & Garber, M. Evolutionary This study demonstrates a new methodology for intergenic noncoding RNAs. Genome Biol. Evol. 3,
dynamics and tissue specificity of human long detailed comparison of lncRNAs expressed in 1390–1404 (2011).
noncoding RNAs in six mammals. Genome Res. 24, pluripotent stem cells in several species and 45. Kutter, C. et al. Rapid turnover of long noncoding
616–628 (2014). suggests a classification of lncRNAs into groups RNAs and the evolution of gene expression.
References 12 and 13 are studies that based on their evolutionary histories. PLoS Genet. 8, e1002841 (2012).
comprehensively compare lncRNA sequence and 31. Jayakodi, M. et al. Genome-wide characterization of This study compares in detail lncRNAs that are
expression evolution in various tetrapods. long intergenic non-coding RNAs (lincRNAs) provides expressed in the liver in three rodents and reports
14. Bu, D. et al. Evolutionary annotation of conserved new insight into viral diseases in honey bees Apis rapid evolutionary turnover of lncRNAs, even when
long non-coding RNAs in major mammalian species. cerana and Apis mellifera. BMC Genomics 16, 680 the same tissue is compared across closely related
Sci. China Life Sci. 58, 787–798 (2015). (2015). species.
15. Ulitsky, I. & Bartel, D. P. lincRNAs: genomics, 32. Mohammadin, S., Edger, P. P., Pires, J. C. & 46. Morán, I. et al. Human β cell transcriptome analysis
evolution, and mechanisms. Cell 154, 26–46 (2013). Schranz, M. E. Positionally-conserved but sequence- uncovers lncRNAs that are tissue-specific, dynamically
16. Jenkins, A. M., Waterhouse, R. M. & diverged: identification of long non-coding RNAs in the regulated, and abnormally expressed in type 2
Muskavitch, M. A. Long non-coding RNA discovery Brassicaceae and Cleomaceae. BMC Plant Biol. 15, diabetes. Cell. Metab. 16, 435–448 (2012).
across the genus Anopheles reveals conserved 217 (2015). 47. Mustafi, D. et al. Evolutionarily conserved long
secondary structures within and beyond the Gambiae 33. Wang, H. et al. Analysis of non-coding transcriptome intergenic non-coding RNAs in the eye. Hum. Mol.
complex. BMC Genomics 16, 337 (2015). in rice and maize uncovers roles of conserved lncRNAs Genet. 22, 2992–3002 (2013).
48. Tan, J. Y. et al. Extensive microRNA-mediated crosstalk and not any particular part of the sequence, is 95. Gotea, V., Petrykowska, H. M. & Elnitski, L.
between lncRNAs and mRNAs in mouse embryonic important for function. Bidirectional promoters as important drivers for the
stem cells. Genome Res. 25, 655–666 (2015). 72. Haerty, W. & Ponting, C. P. Unexpected selection to emergence of species-specific transcripts. PLoS ONE
49. Thomson, D. W. & Dinger, M. E. Endogenous retain high GC content and splicing enhancers within 8, e57323 (2013).
microRNA sponges: evidence and controversy. exons of multiexonic lncRNA loci. RNA 21, 333–346 96. Ruf, S. et al. Large-scale analysis of the regulatory
Nat. Rev. Genet. 17, 272–283 (2016). (2015). architecture of the mouse genome with a transposon-
50. Yang, J. R. & Zhang, J. Human long noncoding RNAs 73. Ulitsky, I., Shkumatava, A., Jan, C. H., Sive, H. & associated sensor. Nat. Genet. 43, 379–386
are substantially less folded than messenger RNAs. Bartel, D. P. Conserved function of lincRNAs in (2011).
Mol. Biol. Evol. 32, 970–977 (2015). vertebrate embryonic development despite rapid 97. Soumillon, M. et al. Cellular source and mechanisms
51. Spitale, R. C. et al. Structural imprints in vivo decode sequence evolution. Cell 147, 1537–1550 (2011). of high transcriptome complexity in the mammalian
RNA regulatory mechanisms. Nature 519, 486–490 74. He, Y. et al. The conservation and signatures of testis. Cell Rep. 3, 2179–2190 (2013).
(2015). lincRNAs in Marek’s disease of chicken. Sci. Rep. 5, 98. Johnson, R. & Guigo, R. The RIDL hypothesis:
52. Wilusz, J. E. et al. A triple helix stabilizes the 3ʹ ends 15184 (2015). transposable elements as functional domains of long
of long noncoding RNAs that lack poly(A) tails. 75. Jiang, W., Liu, Y., Liu, R., Zhang, K. & Zhang, Y. noncoding RNAs. RNA 20, 959–976 (2014).
Genes Dev. 26, 2392–2407 (2012). The lncRNA DEANR1 facilitates human endoderm 99. Elisaphenko, E. A. et al. A dual origin of the Xist gene
53. Ilik, I. A. et al. Tandem stem-loops in roX RNAs act differentiation by activating FOXA2 expression. from a protein-coding gene and a set of transposable
together to mediate X chromosome dosage Cell Rep. 11, 137–148 (2015). elements. PLoS ONE 3, e2521 (2008).
compensation in Drosophila. Mol. Cell 51, 156–173 76. Sone, M. et al. The mRNA-like noncoding RNA Gomafu 100. Carrieri, C. et al. Long non-coding antisense RNA
(2013). constitutes a novel nuclear domain in a subset of controls Uchl1 translation through an embedded
54. Park, S. W., Kuroda, M. I. & Park, Y. Regulation of neurons. J. Cell Sci. 120, 2498–2506 (2007). SINEB2 repeat. Nature 491, 454–457 (2012).
histone H4 Lys16 acetylation by predicted alternative 77. Paralkar, V. R. et al. Unlinking an lncRNA from its 101. Holdt, L. M. et al. Alu elements in ANRIL non-coding
secondary structures in roX noncoding RNAs. associated cis element. Mol. Cell 62, 104–110 RNA at chromosome 9p21 modulate atherogenic cell
Mol. Cell. Biol. 28, 4952–4962 (2008). (2008). functions through trans-regulation of gene networks.
55. Zhao, J., Sun, B. K., Erwin, J. A., Song, J. J. & Lee, J. T. 78. Yin, Y. et al. Opposing roles for the lncRNA Haunt and PLoS Genet. 9, e1003588 (2013).
Polycomb proteins targeted by a short repeat RNA to its genomic locus in regulating HOXA gene activation 102. Hacisuleyman, E., Shukla, C. J., Weiner, C. L. &
the mouse X chromosome. Science 322, 750–756 during embryonic stem cell differentiation. Cell Stem Rinn, J. L. Function and evolution of local repeats in
(2008). Cell 16, 504–516 (2015). the Firre locus. Nat. Commun. 7, 11021 (2016).
56. Maenner, S. et al. 2D structure of the A region of Xist 79. Marques, A. C. et al. Chromatin signatures at 103. Hacisuleyman, E. et al. Topological organization of
RNA and its implication for PRC2 association. transcriptional start sites separate two equally multichromosomal regions by the long intergenic
PLoS Biol. 8, e1000276 (2010). populated yet distinct classes of intergenic long noncoding RNA Firre. Nat. Struct. Mol. Biol. 21,
57. Lu, Z. et al. RNA duplex map in living cells reveals noncoding RNAs. Genome Biol. 14, R131 (2013). 198–206 (2014).
higher-order transcriptome structure. Cell 165, This paper describes a classification of currently 104. Memczak, S. et al. Circular RNAs are a large class of
1267–1279 (2016). annotated lncRNAs into two groups (promoter- animal RNAs with regulatory potency. Nature 495,
58. Torarinsson, E., Sawera, M., Havgaard, J. H., associated and enhancer-associated) with different 333–338 (2013).
Fredholm, M. & Gorodkin, J. Thousands of features based on the chromatin signatures at 105. Chodroff, R. A. et al. Long noncoding RNA genes:
corresponding human and mouse genomic regions their transcription start sites. conservation of sequence and brain expression among
unalignable in primary sequence contain common 80. Legeai, F. & Derrien, T. Identification of long non- diverse amniotes. Genome Biol. 11, R72 (2010).
RNA structure. Genome Res. 16, 885–889 (2006). coding RNAs in insects genomes. Curr. Opin. Insect 106. Bassett, A. R. et al. Considerations when investigating
59. Miller, W. et al. 28‑way vertebrate alignment and Sci. 7, 37–44 (2015). lncRNA function in vivo. eLife 3, e03058 (2014).
conservation track in the UCSC Genome Browser. 81. Li, L. et al. Genome-wide discovery and This paper provides important practical guidelines
Genome Res. 17, 1797–1808 (2007). characterization of maize long non-coding RNAs. for choosing methods for perturbing lncRNA
60. Gorodkin, J. et al. De novo prediction of structured Genome Biol. 15, R40 (2014). functions and interpreting the results.
RNAs from genomic sequences. Trends Biotechnol. 28, 82. Wang, M. et al. Long noncoding RNAs and their 107. Goto, T. & Monk, M. Regulation of X‑chromosome
9–19 (2010). proposed functions in fibre development of cotton inactivation in development in mice and humans.
61. Stadler, P. F. in Advances in Bioinformatics and (Gossypium spp.). New Phytol. 207, 1181–1197 Microbiol. Mol. Biol. Rev. 62, 362–378 (1998).
Computational Biology (eds Ferreira, C. E. et al.) 1–12 (2015). 108. Sasaki, Y. T., Ideue, T., Sano, M., Mituyama, T. &
(Springer, 2010). 83. Long, M., VanKuren, N. W., Chen, S. & Hirose, T. MENε/β noncoding RNAs are essential for
62. Lee, S. et al. Noncoding RNA NORAD regulates Vibranovski, M. D. New gene evolution: little did we structural integrity of nuclear paraspeckles. Proc. Natl
genomic stability by sequestering PUMILIO proteins. know. Annu. Rev. Genet. 47, 307–333 (2013). Acad. Sci. USA 106, 2525–2530 (2009).
Cell 164, 69–80 (2016). 84. Kaessmann, H. Origins, evolution, and phenotypic 109. Cornelis, G., Souquere, S., Vernochet, C., Heidmann, T.
63. Tichon, A. et al. A conserved abundant cytoplasmic impact of new genes. Genome Res. 20, 1313–1326 & Pierron, G. Functional conservation of the lncRNA
long noncoding RNA modulates repression by Pumilio (2010). NEAT1 in the ancestrally diverged marsupial lineage:
proteins in human cells. Nat. Commun. 7, 12209 85. Derrien, T. et al. The GENCODE v7 catalog of human evidence for NEAT1 expression and associated
(2016). long noncoding RNAs: analysis of their gene structure, paraspeckle assembly during late gestation in the
64. Nam, J. W. & Bartel, D. P. Long noncoding RNAs in evolution, and expression. Genome Res. 22, opossum Monodelphis domestica. RNA Biol.
C. elegans. Genome Res. 22, 2529–2540 (2012). 1775–1789 (2012). http://dx.doi.org/10.1080/15476286.2016.119748
65. Smith, M. A., Gesell, T., Stadler, P. F. & Mattick, J. S. This article provides a comprehensive description 2 (2016).
Widespread purifying selection on RNA structure in of lncRNA features and subcellular localization 110. Ounzain, S. et al. CARMEN, a human super enhancer-
mammals. Nucleic Acids Res. 41, 8220–8236 (2013). based on the Encyclopedia of DNA Elements associated long noncoding RNA controlling cardiac
66. Somarowthu, S. et al. HOTAIR forms an intricate and (ENCODE) project data. specification, differentiation and homeostasis.
modular secondary structure. Mol. Cell 58, 353–361 86. Duret, L., Chureau, C., Samain, S., Weissenbach, J. & J. Mol. Cell Cardiol. 89, 98–112 (2015).
(2015). Avner, P. The Xist RNA gene evolved in eutherians by 111. Rinn, J. L. et al. Functional demarcation of active and
67. Rivas, E., Clements, J. & Eddy, S. R. Lack of evidence pseudogenization of a protein-coding gene. Science silent chromatin domains in human HOX loci by
for conserved secondary structure in long noncoding 312, 1653–1655 (2006). noncoding RNAs. Cell 129, 1311–1323 (2007).
RNAs. Preprint at http://eddylab.org/publications/ The paper is the first example of a lncRNA that 112. Schorderet, P. & Duboule, D. Structural and functional
RivasEddy16/RivasEddy16-preprint.pdf (2016). evolved from a loss of coding potential of an differences in the long non-coding RNA Hotair in
68. Quinn, J. J. et al. Rapid evolutionary turnover ancestral protein-coding gene. mouse and human. PLoS Genet. 7, e1002071
underlies conserved lncRNA-genome interactions. 87. Romito, A. & Rougeulle, C. Origin and evolution of the (2011).
Genes Dev. 30, 191–207 (2016). long non-coding genes in the X‑inactivation center. 113. Li, L. et al. Targeted disruption of Hotair leads to
This study uses a novel computational approach for Biochimie 93, 1935–1942 (2011). homeotic transformation and gene derepression.
the sensitive detection of lncRNA homologues in 88. Cordaux, R. & Batzer, M. A. The impact of Cell Rep. 5, 3–12 (2013).
insects and vertebrates based on a combination of retrotransposons on human genome evolution. 114. Klattenhoff, C. A. et al. Braveheart, a long noncoding
synteny, sequence and structural information, and Nat. Rev. Genet. 10, 691–703 (2009). RNA required for cardiovascular lineage commitment.
includes the first comparison of genomic binding 89. Kelley, D. R. & Rinn, J. L. Transposable elements Cell 152, 570–583 (2013).
sites of lncRNAs across species. reveal a stem cell specific class of long noncoding 115. Maamar, H. & Cabili, M. N., Rinn, J. & Raj, A.
69. Tycowski, K. T., Shu, M. D., Borah, S., Shi, M. & RNAs. Genome Biol. 13, R107 (2012). linc‑HOXA1 is a noncoding RNA that represses Hoxa1
Steitz, J. A. Conservation of a triple-helix-forming RNA 90. Kapusta, A. et al. Transposable elements are major transcription in cis. Genes Dev. 27, 1260–1271
stability element in noncoding and genomic RNAs of contributors to the origin, diversification, and (2013).
diverse viruses. Cell Rep. 2, 26–32 (2012). regulation of vertebrate long noncoding RNAs. 116. Lipovich, L. et al. Activity-dependent human brain
This study describes a sensitive approach for using PLoS Genet. 9, e1003470 (2013). coding/noncoding gene regulatory networks. Genetics
a specific sequence-structure pattern to identify 91. Young, J. M. et al. DUX4 binding to retroelements 192, 1133–1148 (2012).
lncRNA homologues among extensively divergent creates promoters that are active in FSHD muscle and 117. Durruthy-Durruthy, J. et al. The primate-specific
viral genomes. testis. PLoS Genet. 9, e1003947 (2013). noncoding RNA HPAT5 regulates pluripotency during
70. Kornienko, A. E., Guenzl, P. M., Barlow, D. P. & 92. Seila, A. C. et al. Divergent transcription from active human preimplantation development and nuclear
Pauler, F. M. Gene regulation by the act of long non- promoters. Science 322, 1849–1851 (2008). reprogramming. Nat. Genet. 48, 44–52 (2016).
coding RNA transcription. BMC Biol. 11, 59 (2013). 93. Jensen, T. H., Jacquier, A. & Libri, D. Dealing with 118. Altschul, S. F. et al. Gapped BLAST and PSI-BLAST:
71. Latos, P. A. et al. Airn transcriptional overlap, but not pervasive transcription. Mol. Cell 52, 473–484 a new generation of protein database search programs.
its lncRNA products, induces imprinted Igf2r silencing. (2013). Nucleic Acids Res. 25, 3389–3402 (1997).
Science 338, 1469–1472 (2012). 94. Wu, X. & Sharp, P. A. Divergent transcription: a driving 119. Nawrocki, E. P. & Eddy, S. R. Infernal 1.1: 100‑fold
This is the most comprehensive study to date of force for new gene origination? Cell 155, 990–996 faster RNA homology searches. Bioinformatics 29,
a lncRNA for which only the act of transcription, (2013). 2933–2935 (2013).
120. Trapnell, C. et al. Differential gene and transcript 128. Heard, E. et al. Human XIST yeast artificial Work in the Ulitsky laboratory is supported by grants to I.U.
expression analysis of RNA-seq experiments with chromosome transgenes show partial X inactivation from the European Research Council (Project lincSAFARI), the
TopHat and Cufflinks. Nat. Protoc. 7, 562–578 center function in mouse embryonic stem cells. Israeli Science Foundation (1242/14 and 1984/14), the Israeli
(2012). Proc. Natl Acad. Sci. USA 96, 6841–6846 (1999). Centers of Research Excellence (I‑CORE) Program of the
121. Grabherr, M. G. et al. Full-length transcriptome 129. Kurian, L. et al. Identification of novel long noncoding Planning and Budgeting Committee and The Israel Science
assembly from RNA-seq data without a reference RNAs underlying vertebrate cardiovascular Foundation (1796/12), the Minerva Foundation, the Fritz-
genome. Nat. Biotechnol. 29, 644–652 (2011). development. Circulation 131, 1278–1290 (2015). Thyssen Foundation and by research grants from Lapon
122. Fagerberg, L. et al. Analysis of the human tissue- 130. Gong, C. et al. A long non-coding RNA, LncMyoD, Raymond and the Abramson Family Center for Young Scientists.
specific expression by genome-wide integration of regulates skeletal muscle differentiation by blocking
transcriptomics and antibody-based proteomics. IMP2‑mediated mRNA translation. Dev. Cell 34, Competing interests statement
Mol. Cell Proteom. 13, 397–406 (2014). 181–191 (2015). The author declares no competing interests.
123. Lerch, J. K. et al. Isoform diversity and regulation 131. Wang, Y. et al. Arabidopsis noncoding RNA mediates
in peripheral and central neurons revealed through control of photomorphogenesis by red light. Proc. Natl
RNA-seq. PLoS One 7, e30417 (2012). Acad. Sci. USA 111, 10359–10364 (2014). DATABASES
124. Pollard, K. S., Hubisz, M. J., Rosenbloom, K. R. & 132. Grant, J. et al. Rsx is a metatherian RNA with Xist-like Ensembl Compara: http://ensembl.org/info/genome/
Siepel, A. Detection of nonneutral substitution rates properties in X‑chromosome inactivation. Nature 487, compara/index.html
on mammalian phylogenies. Genome Res. 20, 254–258 (2012). GreeNC: http://greenc.sciencedesigners.com
110–121 (2010). 133. Kok, F. O. et al. Reverse genetic screening reveals poor HMMER: http://hmmer.org
125. Schwartz, M. P. et al. Human pluripotent stem cell- correlation between morpholino-induced and mutant lncRNAdb: http://lncrnadb.org
derived neural constructs for predicting neural toxicity. phenotypes in zebrafish. Dev. Cell 32, 97–108 (2015). NONCODE: http://www.noncode.org
Proc. Natl Acad. Sci. USA 112, 12516–12521 (2015). phyloNONCODE: http://www.bioinfo.org/phyloNoncode
126. Bergmann, J. H. et al. Regulation of the ESC Acknowledgements PLAR: http://webhome.weizmann.ac.il/home/igoru/PLAR
transcriptome by nuclear long noncoding RNAs. The author thanks A. Shkumatava, A. Mallory, M. Garber, PLNlncRbase: http://bioinformatics.ahau.edu.cn/PLNlncRbase
Genome Res. 25, 1336–1346 (2015). E. Hornstein, H. Hezroni and N. Gil for discussions and com- RNAcentral: http://rnacentral.org
127. Migeon, B. R. et al. Human X inactivation center ments on the manuscript. I.U. is the Sygnet Career UCSC Genome Browser: https://genome.ucsc.edu
induces random X chromosome inactivation in Development Chair for Bioinformatics and recipient of an Alon ALL LINKS ARE ACTIVE IN THE ONLINE PDF
male transgenic mice. Genomics 59, 113–121 (1999). Fellowship from The Council for Higher Education of Israel.
ERRATUM