## Fachbereich Informatik

On Abstract Shapes of RNA
(2008)

As any RNA sequence can be folded in many different ways, there are lots of different possible secondary structures for a given sequence. Most computational prediction methods based on free energy minimization compute a number of suboptimal foldings and we have to identify the native structures among all these possible secondary structures. For this reason, much effort has been made to develop approaches for identifying good predictions of RNA secondary structure. Using the abstract shapes approach as introduced by Giegerich et al., each class of similar secondary structures is represented by one shape and the native structures can be found among the top shape representatives. In this article, we derive some interesting results answering enumeration problems for abstract shapes and secondary structures of RNA. We start by computing symptotical representations for the number of shape representations of length n. Our main goal is to find out how much the search space can be reduced by using the concept of abstract shapes. To reach this goal, we analyze the number of secondary structures and shapes compatible with an RNA sequence of length n under the assumption that base pairing is allowed between arbitrary pairs of bases analytically and compare their exponential growths. Additionally, we analyze the number of secondary structures compatible with an RNA sequence of length n under the assumptions that base pairing is allowed only between certain pairs of bases and that the structures meet some appropriate conditions. The exponential growth factors of the resulting asymptotics are compared to the corresponding experimentally obtained value as given by Giegerich et al.

This article focuses on the analytical analysis of the free energy in a realistic model for RNA secondary structures. In fact, the free energy in a stochastic model derived from a database of small and large subunit ribosomal RNA (SSU and LSU rRNA) data is studied. A common thermody-namic model for computing the free energy of a given RNA secondary structure, as well as stochastic context-free grammars and generating functions are used to derive the desired results. These results include asymptotics for the expected free energy and for the corresponding variance of a random RNA secondary structure. The quality of our model is judged by comparing the derived results to the used database of SSU and LSU rRNA data. At the end of this article, it is discussed how our results could be used to help on identifying good predictions of RNA secondary structure.