close
Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Nov;21(11):1929-43.
doi: 10.1101/gr.112516.110. Epub 2011 Oct 12.

New families of human regulatory RNA structures identified by comparative analysis of vertebrate genomes

Affiliations

New families of human regulatory RNA structures identified by comparative analysis of vertebrate genomes

Brian J Parker et al. Genome Res. 2011 Nov.

Abstract

Regulatory RNA structures are often members of families with multiple paralogous instances across the genome. Family members share functional and structural properties, which allow them to be studied as a whole, facilitating both bioinformatic and experimental characterization. We have developed a comparative method, EvoFam, for genome-wide identification of families of regulatory RNA structures, based on primary sequence and secondary structure similarity. We apply EvoFam to a 41-way genomic vertebrate alignment. Genome-wide, we identify 220 human, high-confidence families outside protein-coding regions comprising 725 individual structures, including 48 families with known structural RNA elements. Known families identified include both noncoding RNAs, e.g., miRNAs and the recently identified MALAT1/MEN β lincRNA family; and cis-regulatory structures, e.g., iron-responsive elements. We also identify tens of new families supported by strong evolutionary evidence and other statistical evidence, such as GO term enrichments. For some of these, detailed analysis has led to the formulation of specific functional hypotheses. Examples include two hypothesized auto-regulatory feedback mechanisms: one involving six long hairpins in the 3'-UTR of MAT2A, a key metabolic gene that produces the primary human methyl donor S-adenosylmethionine; the other involving a tRNA-like structure in the intron of the tRNA maturation gene POP1. We experimentally validate the predicted MAT2A structures. Finally, we identify potential new regulatory networks, including large families of short hairpins enriched in immunity-related genes, e.g., TNF, FOS, and CTLA4, which include known transcript destabilizing elements. Our findings exemplify the diversity of post-transcriptional regulation and provide a resource for further characterization of new regulatory mechanisms and families of noncoding RNAs.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
EvoFam family identification pipeline. (A) Overview of EvoFam analysis and data flow. (B) Phylogenetic tree relating the 31 species of the alignment screened by EvoFold. (C) Each structure prediction is converted into a profile SCFG model. These models describe the nucleotide (or di-nucleotide) distributions at every base and base pair in the structure. (D) Small example of similarity graph between profile models. Maximal highly connected subgraphs (yellow) are extracted as putative families. (E) Distribution of family sizes in the three final prediction sets.
Figure 2.
Figure 2.
Family of hairpins in 3′-UTR of MAT2A. (A) Location of the six hairpins (named A–F) of the MAT2A 3′-UTR family. The initially predicted UTRP family consists of C, D (EvoFold predictions, purple), and B (paralog search hit, dark green). Hairpins A, E, and F were found by a dedicated, more lenient search for paralogs (light green). The well-conserved core part of the hairpins can be extended in some cases (black flanks). A putative 3′-UTR alternative intron is indicated by spliced EST and RNA-seq evidence. (B) Color-coded alignment of the human sequence from all family members. The alignment is referenced by D. Location and length of relative insertions in the other sequences are indicated (orange bars and numbers). The loop region reveals a motif of bases that are completely conserved among all six members (*). (C) Structures of all six extended hairpins showing the boundary of single sequence predictions (red bars) as well as the fully conserved motif (red nucleotides). Note that hairpin D can also form the two base pairs of the loop regions seen for the other hairpins, although not predicted by EvoFold. (D) In-line probing analysis of the 186-long MAT2A construct (including hairpin A). RNA cleavage products resulting from spontaneous transesterification during incubations in the absence (−) of any candidate ligand or in the presence of SAM, S-adenosylhomocysteine (SAH), and L-methionine (L-met), each tested at concentrations of 0.1 mM and 1 mM, were resolved by denaturing 10% PAGE. (NR) No reaction; (T1) partial digest with RNase T1; (OH) partial alkaline digest; (Pre) precursor RNA. Selected bands in the T1 lane are labeled with the positions of the respective 3′-terminal guanosyl residues, according to the numbering used for hairpin A in panel C. Filled bars correspond to positions within hairpin A that are predicted to be largely base-paired, while the open bar corresponds to positions within the putative loop sequence. (Arrowheads) Putative bulged nucleotides C50 and A55.
Figure 3.
Figure 3.
Immune-related families. (A) Alignment of human sequences of members of three immune-related families. UTRP40 includes some additional members not found in the GW families. The families are enriched for macrophage-related genes and GO immunity term association (red). Substitutions are color-coded as in Figure 2. The stems are generally more conserved (black bars) than the loops (predictions without gene symbols are labeled by EvoFold id). (B) Family members (green) overlap with known stabilization/destabilization elements (red). All three genes also have known AREs (blue) (including the ARE-like stability and efficiency element, SEE) (Hel et al. 1998).
Figure 4.
Figure 4.
tRNA-like structure in intron of POP1. (A) Intronic location of the structure. The ENCODE CSHL small RNA-seq track (The ENCODE Project Consortium 2007) for cell line K562 represents three uniquely mapped cytoplasmic reads with 5′-ends aligned with the predicted RNase P cleavage site (the cloning protocol generates directional libraries that are read from the 5′-ends of the inserts, which should largely correspond to the 5′-ends of the mature RNA). Spliced reads suggest splicing activity and possible cassette exon in the region of the structure; mapped RefSeqs (TransMap) show cassette exons from mouse and rat that overlap the structure position. (B) Alignment with a subset of species selected to show all observed substitutions (colors as in Fig. 2). (C) Alignment of human sequences of family. (D) Structures of family members with tRNA invariant (red) and semi-invariant (R or Y; orange) nucleotides (Brown 2007) (RNA structure images generated with VARNA [Darty et al. 2009]).
Figure 5.
Figure 5.
Examples of novel structures from families discussed in the text. Labeled by gene symbol where available (EvoFold id in brackets).

References

    1. Alexa A, Rahnenfuhrer J, Lengauer T 2006. Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics 22: 1600–1607 - PubMed
    1. Anderson P 2008. Post-transcriptional control of cytokine production. Nat Immunol 9: 353–359 - PubMed
    1. Bocobza SE, Aharoni A 2008. Switching the light on plant riboswitches. Trends Plant Sci 13: 526–533 - PubMed
    1. Brown TA 2007. Genomes 3 Garland Science, New York
    1. Brown CY, Lagnado CA, Goodall GJ 1996. A cytokine mRNA-destabilizing element that is structurally and functionally distinct from A+U-rich elements. Proc Natl Acad Sci 93: 13721–13725 - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources