. 2015 Aug 1;31(15):2489-96.

doi: 10.1093/bioinformatics/btv185. Epub 2015 Apr 2.

SPARSE: quadratic time simultaneous alignment and folding of RNAs without sequence-based heuristics

Sebastian Will¹, Christina Otto², Milad Miladi², Mathias Möhl², Rolf Backofen³

Affiliations

¹ Bioinformatics, Department of Computer Science, University of Freiburg, Freiburg, Germany, Bioinformatics, Department of Computer Science, University of Leipzig, Leipzig, Germany.
² Bioinformatics, Department of Computer Science, University of Freiburg, Freiburg, Germany.
³ Bioinformatics, Department of Computer Science, University of Freiburg, Freiburg, Germany, Centre for Biological Systems Analysis (ZBSA), University of Freiburg, Freiburg, Germany, Centre for Non-coding RNA in Technology and Health, University of Copenhagen, Copenhagen, Denmark and Centre for Biological Signalling Studies (BIOSS), University of Freiburg, Freiburg, Germany.

PMID: 25838465
PMCID: PMC4514930
DOI: 10.1093/bioinformatics/btv185

SPARSE: quadratic time simultaneous alignment and folding of RNAs without sequence-based heuristics

Sebastian Will et al. Bioinformatics. 2015.

. 2015 Aug 1;31(15):2489-96.

doi: 10.1093/bioinformatics/btv185. Epub 2015 Apr 2.

Authors

Sebastian Will¹, Christina Otto², Milad Miladi², Mathias Möhl², Rolf Backofen³

Affiliations

¹ Bioinformatics, Department of Computer Science, University of Freiburg, Freiburg, Germany, Bioinformatics, Department of Computer Science, University of Leipzig, Leipzig, Germany.
² Bioinformatics, Department of Computer Science, University of Freiburg, Freiburg, Germany.
³ Bioinformatics, Department of Computer Science, University of Freiburg, Freiburg, Germany, Centre for Biological Systems Analysis (ZBSA), University of Freiburg, Freiburg, Germany, Centre for Non-coding RNA in Technology and Health, University of Copenhagen, Copenhagen, Denmark and Centre for Biological Signalling Studies (BIOSS), University of Freiburg, Freiburg, Germany.

PMID: 25838465
PMCID: PMC4514930
DOI: 10.1093/bioinformatics/btv185

Abstract

Motivation: RNA-Seq experiments have revealed a multitude of novel ncRNAs. The gold standard for their analysis based on simultaneous alignment and folding suffers from extreme time complexity of [Formula: see text]. Subsequently, numerous faster 'Sankoff-style' approaches have been suggested. Commonly, the performance of such methods relies on sequence-based heuristics that restrict the search space to optimal or near-optimal sequence alignments; however, the accuracy of sequence-based methods breaks down for RNAs with sequence identities below 60%. Alignment approaches like LocARNA that do not require sequence-based heuristics, have been limited to high complexity ([Formula: see text] quartic time).

Results: Breaking this barrier, we introduce the novel Sankoff-style algorithm 'sparsified prediction and alignment of RNAs based on their structure ensembles (SPARSE)', which runs in quadratic time without sequence-based heuristics. To achieve this low complexity, on par with sequence alignment algorithms, SPARSE features strong sparsification based on structural properties of the RNA ensembles. Following PMcomp, SPARSE gains further speed-up from lightweight energy computation. Although all existing lightweight Sankoff-style methods restrict Sankoff's original model by disallowing loop deletions and insertions, SPARSE transfers the Sankoff algorithm to the lightweight energy model completely for the first time. Compared with LocARNA, SPARSE achieves similar alignment and better folding quality in significantly less time (speedup: 3.7). At similar run-time, it aligns low sequence identity instances substantially more accurate than RAF, which uses sequence-based heuristics.

PubMed Disclaimer

Figures

**Fig. 1.**
Example *alignment* $A$ of two *RNA sequences A* and B together with (*non-crossing*) *structures* $S = {a_{1}, a_{2}, a_{3}, a_{4}, a_{5}}$ and $T = {b_{1}, b_{2}, b_{3}, b_{4}, b_{5}}$ . We highlight the positions in the *loop closed by a*₁ and in the *loop closed by b*₂. The base pair a₁ is the *parent* of the highlighted positions in A and of a₂. The base pair a₁ closes a 2-loop; a₂, a 1-loop and a₃, a *multiloop*. The latter is a 3-loop, since a₃ has two *inner* base pairs (a₄ and a₅.) Note that the *structure alignment triple* $(A, S, T)$ covers the *external* base pairs $a_{1}, a_{3}, b_{1}$ and $b_{2};$ as well as the inner base pairs of the two multiloops. Finally, $(A, S, T)$ *deletes the entire* 2-loop of a₁ and *inserts the entire* 2-loop of b₂

**Fig. 2.**
Recursions of the novel lightweight alignment algorithm PARSE

**Fig. 3.**
(A) Example alignment. Due to stacking effects, the probability of a base pair a₃ in the loop closed by a₂ (loop of single structure) is much higher than its probability in the loop closed by a₁ [loop of the consensus structure ${(a_{1}, b_{1}), (a_{3}, b_{2})}$ ]. (B) Computing represented entries in the sparsified algorithm. We show the matrix *M^ab*; the rounded bars left and on top of the matrix symbolize the represented positions for a and b; the gray areas contain the represented entries for *M^ab*. In our example, the entry $M^{a b} (i, k)$ recurses (solid arrows) to unrepresented entries $M^{a b} (i - 1, k - 1), M^{a b} (i, k - 1)$ , $M^{a b} (i - 1, k)$ and $M^{a b} (a_{1}^{L} - 1, b_{1}^{L} - 1)$ (white boxes); the latter via matching base pairs; their left ends correspond to the dashed box at $(a_{1}^{L}, b_{1}^{L})$ . The numbers 1-4 at the arrow heads refer to the respective recursion case. The unrepresented entries are computed from represented entries (dashed arrows to black boxes), each in constant time

**Fig. 4.**
Alignment quality (measured by SPS) at different sequence identities for pairwise alignments (Bralibase 2.1 set k2). The curves are lowess curves (Cleveland, 1981) through data points for each benchmark instance. The thin lines visualize the distribution of scores by estimating the respective instance averages above and below of the main lowess curve

**Fig. 5.**
Structure prediction quality measured by MCC within different ranges of average pairwise sequence identity (APSI) shown as boxplots. (Bralibase 2.1 set k2) whiskers are extended up to one interquartile range from the boxes

See this image and copyright information in PMC

References

1. Bernhart S.H., et al. (2008) RNAalifold: improved consensus structure prediction for RNA alignments. BMC Bioinformatics, 9, 474. - PMC - PubMed
1. Chambers J.M., et al. (1983) Graphical Methods for Data Analysis . Wadsworth, Belmont, CA.
1. Clark M.B., et al. (2011) The reality of pervasive transcription. PLoS Biol, 9, e1000625; discussion e1001102. - PMC - PubMed
1. Cleveland W.S. (1981) Lowess: a program for smoothing scatterplots by robust locally weighted regression. Am. Stat., 35, 54.
1. Do C.B., et al. (2008) A max-margin model for efficient simultaneous alignment and folding of RNA sequences. Bioinformatics, 24, i68–i76. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

SPARSE: quadratic time simultaneous alignment and folding of RNAs without sequence-based heuristics

Affiliations

SPARSE: quadratic time simultaneous alignment and folding of RNAs without sequence-based heuristics

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases

Research Materials

Miscellaneous