One of the fundamental problems in computational structural biology is the prediction of RNA secondary structures from a single sequence. To solve this problem, mainly two different approaches have been used over the past decades: the free energy minimization (MFE) approach which is still considered the most popular and successful method and the competing stochastic context-free grammar (SCFG) approach. While the accuracy of the MFE based algorithms is limited by the quality of underlying thermodynamic models, the SCFG method abstracts from free energies and instead tries to learn about the structural behavior of the molecules by training the grammars on known real RNA structures, making it highly dependent on the availability of a rich high quality training set. However, due to the respective problems associated with both methods, new statistics based approaches towards RNA structure prediction have become increasingly appreciated. For instance, over the last years, several statistical sampling methods and clustering techniques have been invented that are based on the computation of partition functions (PFs) and base pair probabilities according to thermodynamic models. A corresponding SCFG based statistical sampling algorithm for RNA secondary structures has been studied just recently. Notably, this probabilistic method is capable of producing accurate (prediction) results, where its worst-case time and space requirements are equal to those of common RNA folding algorithms for single sequences.
The aim of this work is to present a comprehensive study on how enriching the underlying SCFG by additional information on the lengths of generated substructures (i.e. by incorporating length-dependencies into the SCFG based sampling algorithm, which is actually possible without significant losses in performance) affects the reliability of the induced RNA model and the accuracy of sampled secondary structures. As we will see, significant differences with respect to the overall quality of generated sample sets and the resulting predictive accuracy are typically implied. In principle, when considering the more specialized length-dependent SCFG model as basis for statistical sampling, a higher accuracy of predicted foldings can be reached at the price of a lower diversity of generated candidate structures (compared to the more general traditional SCFG variant or sampling based on PFs that rely on free energies).
SHIM is a concurrent deterministic programming language for embedded systems built on rendezvous communication. It abstracts away many details to give the developer a high-level view that includes virtual shared variables, threads as orthogonal statements, and deterministic concurrent exceptions.
In this paper, we present a new way to compile a SHIM-like language into a set of asynchronous guarded actions, a well-established intermediate representation for concurrent systems. By doing so, we build a bridge to many other tools, including hardware synthesis and formal verification. We present our translation in detail, illustrate it through examples, and show how the result can be used by various other tools.
This report gives an overview of the separate translation of synchronous imperative programs to synchronous guarded actions. In particular, we consider problems to be solved for separate compilation that stem from preemption statements and local variable declarations. We explain how we solved these problems and sketch our solutions implemented in the our Averest framework to implement a compiler that allows a separate compilation of imperative synchronous programs with local variables and unrestricted preemption statements. The focus of the report is the big picture of our entire design flow.
In this article we present a method to generate random objects from a large variety of combinatorial classes according to a given distribution. Given a description of the combinatorial class and a set of sample data our method will provide an algorithm that generates objects of size n in worst-case runtime O(n^2) (O(n log(n)) can be achieved at the cost of a higher average-case runtime), with the generated objects following a distribution that closely matches the distribution of the sample data.
This article focuses on the analytical analysis of the free energy in a realistic model for RNA secondary structures. In fact, the free energy in a stochastic model derived from a database of small and large subunit ribosomal RNA (SSU and LSU rRNA) data is studied. A common thermody-namic model for computing the free energy of a given RNA secondary structure, as well as stochastic context-free grammars and generating functions are used to derive the desired results. These results include asymptotics for the expected free energy and for the corresponding variance of a random RNA secondary structure. The quality of our model is judged by comparing the derived results to the used database of SSU and LSU rRNA data. At the end of this article, it is discussed how our results could be used to help on identifying good predictions of RNA secondary structure.
As any RNA sequence can be folded in many different ways, there are lots of different possible secondary structures for a given sequence. Most computational prediction methods based on free energy minimization compute a number of suboptimal foldings and we have to identify the native structures among all these possible secondary structures. For this reason, much effort has been made to develop approaches for identifying good predictions of RNA secondary structure. Using the abstract shapes approach as introduced by Giegerich et al., each class of similar secondary structures is represented by one shape and the native structures can be found among the top shape representatives. In this article, we derive some interesting results answering enumeration problems for abstract shapes and secondary structures of RNA. We start by computing symptotical representations for the number of shape representations of length n. Our main goal is to find out how much the search space can be reduced by using the concept of abstract shapes. To reach this goal, we analyze the number of secondary structures and shapes compatible with an RNA sequence of length n under the assumption that base pairing is allowed between arbitrary pairs of bases analytically and compare their exponential growths. Additionally, we analyze the number of secondary structures compatible with an RNA sequence of length n under the assumptions that base pairing is allowed only between certain pairs of bases and that the structures meet some appropriate conditions. The exponential growth factors of the resulting asymptotics are compared to the corresponding experimentally obtained value as given by Giegerich et al.
Guaranteeing correctness of compilation is a ma jor precondition for correct software. Code generation can be one of the most error-prone tasks in a compiler. One way to achieve trusted compilation is certifying compilation. A certifying compiler generates for each run a proof that it has performed the compilation run correctly. The proof is checked in a separate theorem prover. If the theorem prover is content with the proof, one can be sure that the compiler produced correct code. This paper presents a certifying code generation phase for a compiler translating an intermediate language into assembler code. The time spent for checking the proofs is the bottleneck of certifying compilation. We exhibit an improved framework for certifying compilation and considerable advances to overcome this bottleneck. We compare our implementation featuring the Coq theorem prover to an older implementation. Our current implementation is feasible for medium to large sized programs.
The Super-Peer Selection Problem is an optimization problem in network topology construction. It may be cast as a special case of a Hub Location Problem, more exactly an Uncapacitated Single Allocation p-Hub Median Problem with equal weights. We show that this problem is still NP-hard by reduction from Max Clique.