- Java 7's Dual Pivot Quicksort (2012)
- Recently, a new Quicksort variant due to Yaroslavskiy was chosen as standard sorting method for Oracle's Java 7 runtime library. The decision for the change was based on empirical studies showing that on average, the new algorithm is faster than the formerly used classic Quicksort. Surprisingly, the improvement was achieved by using a dual pivot approach — an idea that was considered not promising by several theoretical studies in the past. In this thesis, I try to find the reason for this unexpected success. My focus is on the precise and detailed average case analysis, aiming at the flavor of Knuth's series “The Art of Computer Programming”. In particular, I go beyond abstract measures like counting key comparisons, and try to understand the efficiency of the algorithms at different levels of abstraction. Whenever possible, precise expected values are preferred to asymptotic approximations. This rigor ensures that (a) the sorting methods discussed here are actually usable in practice and (b) that the analysis results contribute to a sound comparison of the Quicksort variants.
- An Earley-style Parser for Solving the RNA-RNA Interaction Problem (Bachelor Thesis) (2010)
- It has been observed that for understanding the biological function of certain RNA molecules, one has to study joint secondary structures of interacting pairs of RNA. In this thesis, a new approach for predicting the joint structure is proposed and implemented. For this, we introduce the class of m-dimensional context-free grammars --- an extension of stochastic context-free grammars to multiple dimensions --- and present an Earley-style semiring parser for this class. Additionally, we develop and thoroughly discuss an implementation variant of Earley parsers tailored to efficiently handle dense grammars, which embraces the grammars used for structure prediction. A currently proposed partitioning scheme for joint secondary structures is transferred into a two-dimensional context-free grammar, which in turn is used as a stochastic model for RNA-RNA interaction. This model is trained on actual data and then used for predicting most likely joint structures for given RNA molecules. While this technique has been widely used for secondary structure prediction of single molecules, RNA-RNA interaction was hardly approached this way in the past. Although our parser has O(n^3 m^3) time complexity and O(n^2 m^2) space complexity for two RNA molecules of sizes n and m, it remains practically applicable for typical sizes if enough memory is available. Experiments show that our parser is much more efficient for this application than classical Earley parsers. Moreover the predictions of joint structures are comparable in quality to current energy minimization approaches.