Refine
Year of publication
- 2012 (61) (remove)
Document Type
- Doctoral Thesis (33)
- Report (12)
- Preprint (7)
- Article (4)
- Master's Thesis (2)
- Conference Proceeding (1)
- Periodical Part (1)
- Working Paper (1)
Language
- English (61) (remove)
Has Fulltext
- yes (61)
Keywords
- Transaction Costs (2)
- Arithmetic data-path (1)
- Bildverarbeitung (1)
- Bioinformatik (1)
- Carbon footprint (1)
- Chlamydomonas reinhardii (1)
- Cloud Computing (1)
- Cohen-Lenstra heuristic (1)
- Computeralgebra (1)
- Consistent Price Processes (1)
- Controlling (1)
- Convex geometry (1)
- Data path (1)
- Eingebettetes System (1)
- Embedded System (1)
- Embedded Systems (1)
- Energieffizienz (1)
- Energy Efficiency (1)
- Entscheidungsproblem (1)
- Fault Tree Analysis (1)
- Geoinformationssystem (1)
- Geovisualization (1)
- Green Computing (1)
- Green-IT (1)
- Gröbner basis (1)
- Gyroscopic (1)
- HSF (1)
- HSF1 (1)
- HSP (1)
- HSP70 (1)
- Heat stress response (1)
- High-Performance Computing (HPC) (1)
- Hochleistungsrechnen (1)
- Image restoration (1)
- Information Extraction (1)
- Information Visualization (1)
- Interpolation of the Director (1)
- Isogeometric Analysis (1)
- Layout (1)
- Low-Power (1)
- Molekulare Bioinformatik (1)
- NURBS (1)
- Nachhaltigkeit (1)
- Natural Language Processing (1)
- No-Arbitrage (1)
- Non-convex body (1)
- Optische Zeichenerkennung (1)
- Parallel volume (1)
- Poisson noise (1)
- Portfolio Optimization (1)
- Process configuration (1)
- Process creation (1)
- Property checking (1)
- QVIs (1)
- Random body (1)
- Reissner-Mindlin Shell (1)
- Safety Analysis (1)
- Semantic Web (1)
- Smart Grid (1)
- Stochastic Control (1)
- Supercomputer (1)
- Technology combination (1)
- Technology decision (1)
- Technology selection (1)
- Topologie (1)
- Urban sprawl (1)
- Utility (1)
- Verification (1)
- Visualisierung (1)
- Voronoi diagram (1)
- Wills functional (1)
- analysis of algorithms (1)
- capital-budgeting (1)
- cognitive biases (1)
- computational biology (1)
- decision (1)
- diffusion models (1)
- document analysis (1)
- image denoising (1)
- layout analysis (1)
- miRNA (1)
- multiplicative noise (1)
- nonlocal filtering (1)
- optical character recognition (1)
- probability distribution (1)
- real quadratic number fields (1)
- second class group (1)
- secondary structure prediction (1)
- similarity measures (1)
- variational methods (1)
Faculty / Organisational entity
- Kaiserslautern - Fachbereich Mathematik (21)
- Kaiserslautern - Fachbereich Informatik (11)
- Fraunhofer (ITWM) (10)
- Kaiserslautern - Fachbereich Chemie (6)
- Kaiserslautern - Fachbereich Maschinenbau und Verfahrenstechnik (6)
- Kaiserslautern - Fachbereich Wirtschaftswissenschaften (3)
- Kaiserslautern - Fachbereich Elektrotechnik und Informationstechnik (2)
- Kaiserslautern - Fachbereich Biologie (1)
- Kaiserslautern - Fachbereich Physik (1)
I report on two experiments, which were designed to test theoretical predictions about individual behavior in a duopolistic setting. With quantity being the choice variable a simultaneous Cournot game and a sequential Stackelberg game were tested over two periods. The key feature of both models was that players were able to lower marginal cost for period two if they successfully outperformed their competition in period one in terms of profit. Experimental results suggest that in the Cournot game players are very competitive in period one but become Cournot players in period two. In the Stackelberg game Cournot play is modal, suggesting that players have preferences for equality in payoffs, which maybe brought about by punishment of Stackelberg followers and fear of punishment of Stackelberg leaders . Overall, players earned more money in the Stackelberg game than in the Cournot game.
Generic layout analysis--process of decomposing document image into homogeneous regions for a collection of diverse document images--has many important applications in document image analysis and understanding such as preprocessing of degraded warped, camera-captured document images, high performance layout analysis of document images containing complex cursive scripts, and word spotting in historical document images at page level. Many areas in this field like generic text line extraction method are considered as elusive goals so far, still beyond the reach of the state-of-the-art methods [NJ07, LSZT07, KB06]. This thesis addresses this problem in such a way that it presents generic, domain-independent, text line extraction and text and non-text segmentation methods, and then describes some important applications, that were developed based on these methods. An overview of the key contributions of this thesis is as follows.
The first part of this thesis presents a generic text line extraction method using a combination of matched filtering and ridge detection techniques, which are commonly used in computer vision. Unlike the state-of-the-art text line extraction methods in the literature, the generic text line extraction method can be equally and robustly applied to a large variety of document image classes including scanned and camera-captured documents, binary and grayscale documents, typed-text and handwritten documents, historical and contemporary documents, and documents containing different scripts. Different standard datasets are selected for performance evaluation that belong to different categories of document images such as the UW-III [GHHP97] dataset of scanned documents, the ICDAR 2007 [GAS07] and the UMD [LZDJ08] datasets of handwritten documents, the DFKI-I [SB07] dataset of camera-captured documents, Arabic/Urdu script documents dataset, and German calligraphic (Fraktur) script historical documents dataset. The generic text line extraction method achieves 86% (n = 23,763 text lines in 650 documents) text line detection accuracy which is better than the aggregate accuracy of 73% of the best performing domain-specific state-of-the-art methods. To the best of the author's knowledge, it is the first general-purpose text line extraction method that can be equally used for a diverse collection of documents.
This thesis also presents an active contour (snake) based curled text line extraction method for warped, camera-captured document images. The presented approach is applied to DFKI-I [SB07] dataset of camera-captured, Latin script document images for curled text line extraction. It achieves above 95% (n = 3,091 text lines in 102 documents) text line detection accuracy, which is significantly better than the competing state-of-the-art curled text line extraction methods. The presented text line extraction method can also be applied to document images containing different scripts like Chinese, Devanagari, and Arabic after small modifications.
The second part of this thesis presents an improved version of the state-of-the-art multiresolution morphology (Leptonica) based text and non-text segmentation method [Blo91], which is a domain-independent page segmentation approach and can be equally applied to a diverse collection of binarized document images. It is demonstrated that the presented improvements result in an increase in segmentation accuracy from 93% to 99% (n = 113 documents).
This thesis also introduces a discriminative learning based approach for page segmentation, where a self-tunable multi-layer perceptron (MLP) classifier [BS10] is trained for distinguishing between text and non-text connected components. Unlike other classification based page segmentation approaches in the literature, the connected components based discriminative learning based approach is faster than pixel based classification methods and does not require a block segmentation method beforehand. A segmentation accuracy of $96\%$ ($n = 113$ documents) is achieved in comparison to the state-of-the-art multiresolution morphology (Leptonica) based page segmentation method [Blo91] that achieves a segmentation accuracy of 93%. In addition to text and non-text segmentation of Latin script documents, the presented approach can also be adapted for document images containing other scripts as well as for other specialized layout analysis tasks such as digit and non-digit segmentation [HBSB12], orientation detection [RBSB09], and body-text and side-note segmentation [BAESB12].
Finally, this thesis presents important applications of the two generic layout analysis techniques, ridge-based text line extraction method and the multi-resolution morphology based text and non-text segmentation method, discussed above. First, a complete preprocessing pipeline is described for removing different types of degradations from grayscale warped, camera-captured document images that includes removal of grayscale degradations such as non-uniform shadows and blurring through binarization, noise cleanup applying page frame detection, and document rectification using monocular dewarping. Each of these preprocessing steps shows significant improvement in comparison to the analyzed state-of-the-art methods in the literature. Second, a high performance layout analysis method is described for complex Arabic script document images written in different languages such as Arabic, Urdu, and Persian and different styles for example Naskh and Nastaliq. The presented layout analysis system is robust against different types of document image degradations and shows better performance for text and non-text segmentation, text line extraction, and reading order determination on a variety of Arabic and Urdu document images as compared to the state-of-the-art methods. It can be used for large scale Arabic and Urdu documents' digitization processes. These applications demonstrate that the layout analysis methods, ridge-based text line extraction and the multi-resolution morphology based text and non-text segmentation, are generic and can be applied easily to a large collection of diverse document images.
This research for this thesis was conducted to develop a framework which supports the automatic configuration of project-specific software development processes by selecting and combining different technologies: the Process Configuration Framework. The research draws attention to the problem that while the research community develops new technologies, the industrial companies continue only using their well-known ones. Because of this, technology transfer takes decades. In addition, there is the fact that there is no solution which solves all problems in a software development project. This leads to a number of technologies which need to be combined for one project.
The framework developed and explained in this research mainly addresses those problems by building a bridge between research and industry as well as by supporting software companies during the selection of the most appropriate technologies combined in a software process. The technology transformation gap is filled by a repository of (new) technologies which are used as a foundation of the Process Configuration Framework. The process is configured by providing SPEM process pattern for each technology, so that the companies can build their process by plugging into each other.
The technologies of the repository were specified in a schema including a technology model, context model, and an impact model. With context and impact it is possible to provide information about a technology, for example, its benefits to quality, cost or schedule. The offering of the process pattern as output of the Process Configuration Framework is performed in several stages:
I Technology Ranking:
1 Ranking based on Application Domain, Project & Impact
2 Ranking based on Environment
3 Ranking based on Static Context
II Technology Combination:
4 Creation of all possible Technology Chains
5 Restriction of the Technology Chains
6 Ranking based on Static and Dynamic Context
7 Extension of the Chains by Quality Assurance
III Process Configuration:
8 Process Component Diagram
9 Extension of the Process Component Diagram
10 Instantiation of the Components by Technologies of the Technology Chain
11 Providing process patterns
12 Creation of the process based on Patterns
The effectiveness and quality of the Process Configuration Framework have additionally been evaluated in a case study. Here, the Technology Chains manually created by experts were compared to the chains automatically created by the framework after it was configured by those experts. This comparison depicted that the framework results are similar and therefore can be used as a recommendation.
We conclude from our research that support during the configuration of a process for software projects is important especially for non-experts. This support is provided by the Process Configuration Framework developed in this research. In addition our research has shown that this framework offers a possibility to speed up the technology transformation gap between the research community and industrial companies.
Wechselnde Umweltbedingungen wie Temperaturveränderungen oder der Zugang zu Nährstoffen erfordern spezielle genetische Anpassungsprogramme, vor allem von sessilen Organismen wie Pflanzen. Ein solcher hochkonservierter Mechanismus, der unter anderem vor Temperaturspitzen schützt, ist die von Hitzeschockfaktoren (HSF) kontrollierte Hitzeschockantwort (HSR). Dabei werden vermehrt spezifische Hitzestressproteine (HSPs, Chaperone) gebildet, die Proteine vor Denaturierung schützen. In Pflanzen hat sich ein hochkomplexes regulatorisches Netzwerk gebildet, das aus über 20 HSFs besteht, das eine genaue Feinabstimmung der HSR auf die jeweiligen Stressbedingungen erlaubt.
Das hohe Maß an Komplexität der HSR in Pflanzen erschwert die wissenschaftliche Zugänglichkeit jedoch erheblich. Um die grundlegenden Prinzipien der HSR in Pflanzen zu verstehen griffen wir deshalb auf einen einfacheren Modellorganismus zurück, der Pflanzen sehr nahe steht aber nur einen einzigen HSF (HSF1) enthält, der einzelligen Grünalge Chlamydomonas reinhardtii. Im Rahmen dieser Arbeit wurden dazu drei Ansätze verfolgt.
Als erstes wurden verschiedene chemische Substanzen eingesetzt die unterschiedliche Schritte während der Aktivierung und Abschaltung der HSR hemmen um darüber die Regulation der HSR aufzuklären. Dabei wurde festgestellt, dass die Phosphorylierung von HSF1 eine entscheidende Rolle in der Aktivierung der HSR spielt, das auslösende Momentum die Anhäufung von falsch gefalteten Proteinen ist und das HSP90A aus dem Cytosol eine wichtige modulierende Rolle bei der HSR spielt.
Als zweites wurde die Veränderung sämtlicher Transkripte mithilfe von Microarrays gemessen, um vor allem pflanzenspezifische Prozesse zu identifizieren, die auf erhöhte Temperaturen gezielt angepasst werden müssen. Dabei konnte die Chlorophyll Biosynthese und der Transport von Proteinen in den Chloroplasten als neue, pflanzenspezifische Ziele der Stressantwort identifiziert werden. Des Weiteren konnte direkt gezeigt werden, das HSF1 auch plastidäre Chaperone reguliert, im Gegensatz zu mitochondrialen Chaperonen die getrennt gesteuert werden.
Als letztes wurde gezielt die Expression wichtiger Gene für die Stressantwort (HSF1/HSP70B) unterdrückt, um den Einfluss dieser Gene auf die HSR genauer zu studieren. Dazu habe ich ein in der einzelligen Grünalge neuartiges System entwickelt, basierend auf dem RNAi Mechanismus, dass es erlaubt abhängig von der Stickstoffquelle im Nährmedium auch essentielle Gene gezielt auszuschalten. Dieses System erlaubte es zu zeigen, dass HSF1 selbst während des Stresses die Expression seiner RNA erhöht, und dies gezielt tut um die Stressantwort weiter zu verstärken. Es konnte weiter gezeigt werden, dass das Chloroplasten Chaperon HSP70B ein essentielles Protein für das Zellwachstum ist, welches mithilfe des induzierbaren RNAi Systems genauer untersucht werden kann. Dabei wurde festgestellt, dass die HSP70B vermittelte Assemblierung und Disassemblierung des VIPP1 Proteins entscheidend ist für dessen Funktion in der Zelle. Des Weiteren konnte gezeigt werde, dass HSP70B wahrscheinlich verantwortlich ist für die Faltung eines oder mehrerer noch unbekannter Enzyme der Arginin Biosynthese oder der Stickstofffixierung, und das diese Prozesse wahrscheinlich die essentielle Funktion von HSP70B darstellen.
The main topic of this thesis is to define and analyze a multilevel Monte Carlo algorithm for path-dependent functionals of the solution of a stochastic differential equation (SDE) which is driven by a square integrable, \(d_X\)-dimensional Lévy process \(X\). We work with standard Lipschitz assumptions and denote by \(Y=(Y_t)_{t\in[0,1]}\) the \(d_Y\)-dimensional strong solution of the SDE.
We investigate the computation of expectations \(S(f) = \mathrm{E}[f(Y)]\) using randomized algorithms \(\widehat S\). Thereby, we are interested in the relation of the error and the computational cost of \(\widehat S\), where \(f:D[0,1] \to \mathbb{R}\) ranges in the class \(F\) of measurable functionals on the space of càdlàg functions on \([0,1]\), that are Lipschitz continuous with respect to the supremum norm.
We consider as error \(e(\widehat S)\) the worst case of the root mean square error over the class of functionals \(F\). The computational cost of an algorithm \(\widehat S\), denoted \(\mathrm{cost}(\widehat S)\), should represent the runtime of the algorithm on a computer. We work in the real number model of computation and further suppose that evaluations of \(f\) are possible for piecewise constant functions in time units according to its number of breakpoints.
We state strong error estimates for an approximate Euler scheme on a random time discretization. With this strong error estimates, the multilevel algorithm leads to upper bounds for the convergence order of the error with respect to the computational cost. The main results can be summarized in terms of the Blumenthal-Getoor index of the driving Lévy process, denoted by \(\beta\in[0,2]\). For \(\beta <1\) and no Brownian component present, we almost reach convergence order \(1/2\), which means, that there exists a sequence of multilevel algorithms \((\widehat S_n)_{n\in \mathbb{N}}\) with \(\mathrm{cost}(\widehat S_n) \leq n\) such that \( e(\widehat S_n) \precsim n^{-1/2}\). Here, by \( \precsim\), we denote a weak asymptotic upper bound, i.e. the inequality holds up to an unspecified positive constant. If \(X\) has a Brownian component, the order has an additional logarithmic term, in which case, we reach \( e(\widehat S_n) \precsim n^{-1/2} \, (\log(n))^{3/2}\).
For the special subclass of $Y$ being the Lévy process itself, we also provide a lower bound, which, up to a logarithmic term, recovers the order \(1/2\), i.e., neglecting logarithmic terms, the multilevel algorithm is order optimal for \( \beta <1\).
An empirical error analysis via numerical experiments matches the theoretical results and completes the analysis.
This thesis is devoted to furthering the tropical intersection theory as well as to applying the
developed theory to gain new insights about tropical moduli spaces.
We use piecewise polynomials to define tropical cocycles that generalise the notion of tropical Cartier divisors to higher codimensions, introduce an intersection product of cocycles with tropical cycles and use the connection to toric geometry to prove a Poincaré duality for certain cases. Our
main application of this Poincaré duality is the construction of intersection-theoretic fibres under a
large class of tropical morphisms.
We construct an intersection product of cycles on matroid varieties which are a natural
generalisation of tropicalisations of classical linear spaces and the local blocks of smooth tropical
varieties. The key ingredient is the ability to express a matroid variety contained in another matroid variety by a piecewise polynomial that is given in terms of the rank functions of the corresponding
matroids. In particular, this enables us to intersect cycles on the moduli spaces of n-marked abstract
rational curves. We also construct a pull-back of cycles along morphisms of smooth varieties, relate
pull-backs to tropical modifications and show that every cycle on a matroid variety is rationally
equivalent to its recession cycle and can be cut out by a cocycle.
Finally, we define families of smooth rational tropical curves over smooth varieties and construct a tropical fibre product in order to show that every morphism of a smooth variety to the moduli space of abstract rational tropical curves induces a family of curves over the domain of the morphism.
This leads to an alternative, inductive way of constructing moduli spaces of rational curves.
Recently convex optimization models were successfully applied
for solving various problems in image analysis and restoration.
In this paper, we are interested in relations between
convex constrained optimization problems
of the form
\({\rm argmin} \{ \Phi(x)\) subject to \(\Psi(x) \le \tau \}\)
and their penalized counterparts
\({\rm argmin} \{\Phi(x) + \lambda \Psi(x)\}\).
We recall general results on the topic by the help of an epigraphical projection.
Then we deal with the special setting \(\Psi := \| L \cdot\|\) with \(L \in \mathbb{R}^{m,n}\)
and \(\Phi := \varphi(H \cdot)\),
where \(H \in \mathbb{R}^{n,n}\) and \(\varphi: \mathbb R^n \rightarrow \mathbb{R} \cup \{+\infty\} \)
meet certain requirements which are often fulfilled in image processing models.
In this case we prove by incorporating the dual problems
that there exists a bijective function
such that
the solutions of the constrained problem coincide with those of the
penalized problem if and only if \(\tau\) and \(\lambda\) are in the graph
of this function.
We illustrate the relation between \(\tau\) and \(\lambda\) for various problems
arising in image processing.
In particular, we point out the relation to the Pareto frontier for joint sparsity problems.
We demonstrate the performance of the
constrained model in restoration tasks of images corrupted by Poisson noise
with the \(I\)-divergence as data fitting term \(\varphi\)
and in inpainting models with the constrained nuclear norm.
Such models can be useful if we have a priori knowledge on the image rather than on the noise level.
The safety of embedded systems is becoming more and more important nowadays. Fault Tree Analysis (FTA) is a widely used technique for analyzing the safety of embedded systems. A standardized tree-like structure called a Fault Tree (FT) models the failures of the systems. The Component Fault Tree (CFT) provides an advanced modeling concept for adapting the traditional FTs to the hierarchical architecture model in system design. Minimal Cut Set (MCS) analysis is a method that works for qualitative analysis based on the FTs. Each MCS represents a minimal combination of component failures of a system called basic events, which may together cause the top-level system failure. The ordinary representations of MCSs consist of plain text and data tables with little additional supporting visual and interactive information. Importance analysis based on FTs or CFTs estimates the contribution of each potential basic event to a top-level system failure. The resulting importance values of basic events are typically represented in summary views, e.g., data tables and histograms. There is little visual integration between these forms and the FT (or CFT) structure. The safety of a system can be improved using an iterative process, called the safety improvement process, based on FTs taking relevant constraints into account, e.g., cost. Typically, relevant data regarding the safety improvement process are presented across multiple views with few interactive associations. In short, the ordinary representation concepts cannot effectively facilitate these analyses.
We propose a set of visualization approaches for addressing the issues above mentioned in order to facilitate those analyses in terms of the representations.
Contribution:
1. To support the MCS analysis, we propose a matrix-based visualization that allows detailed data of the MCSs of interest to be viewed while maintaining a satisfactory overview of a large number of MCSs for effective navigation and pattern analysis. Engineers can also intuitively analyze the influence of MCSs of a CFT.
2. To facilitate the importance analysis based on the CFT, we propose a hybrid visualization approach that combines the icicle-layout-style architectural views with the CFT structure. This approach facilitates to identify the vulnerable components taking the hierarchies of system architecture into account and investigate the logical failure propagation of the important basic events.
3. We propose a visual safety improvement process that integrates an enhanced decision tree with a scatter plot. This approach allows one to visually investigate the detailed data related to individual steps of the process while maintaining the overview of the process. The approach facilitates to construct and analyze improvement solutions of the safety of a system.
Using our visualization approaches, the MCS analysis, the importance analysis, and the safety improvement process based on the CFT can be facilitated.
The scientific aim of this work was to synthesize and characterize new bidentate and tridentate phosphine ligands , their corresponding palladium complexes and to examine their application as homogenous catalysts. Later on, a part of the obtained palladium catalysts was immobilized and used as heterogonous catalyst.
Pyrimidinyl functionalized diphenyl phosphine ligands were synthesized by ring closure of [2-(3-dimethylamino-1-oxoprop-2-en-yl)phenyl]diphenylphosphine with an excess of substituted guanidinium salts. Furthermore to increase the electron density at phosphorous centre the two aryl substituents on the phosphanyl group were exchanged against two alkyl substituents. Electron rich pyrimidinyl functionalized dialkyl phosphine ligands were synthesized from pyrimidinyl functionalized bromobenzene in a process involving lithiation followed by reaction with a chlorodialkylphosphine.
Starting from the new synthesized diaryl phosphine ligands, their corresponding palladium complexes were synthesized. I was able to show that slight changes at the amino group of [(2-aminopyrimidin-4-yl)aryl]phosphines lead to pronounced differences in the stability and catalytic activity of the corresponding palladium(II) complexes. Having a P,C coordination mode, the palladium complex can catalyze rapidly the Suzuki coupling reaction of phenylbronic acid with arylbromides even at room temperature with a low loading.
Using the NH2 group of the aminopyrimidine as a potential site for the introduction of an other substituent, bidentate and tridentate ligands containing phosphorous atoms connected to the aminopyrimidine group and their corresponding palladium complexes were synthesized and characterized.
Two ligands [2- and 4-(4-(2-amino)pyrimidinyl)phenyl]diphenylphosphine (containing NH2 group) functionalized with a ethoxysilane group were synthesized. The palladium complexes based on these ligands were prepared and immobilized on commercial silica and MCM-41. Using elemental analysis, FT-IR, solid state 31P, 13C and 29Si CP–MAS NMR spectroscopy, XRD and N2 adsorption the success of the immobilization was confirmed and the structure of the heterogenized catalyst was investigated.
The resulting heterogeneous catalysts were applied for the Suzuki reaction and exhibited excellent activity, selectivity and reusability.
Predicting secondary structures of RNA molecules is one of the fundamental problems of and thus a challenging task in computational structural biology. Existing prediction methods basically use the dynamic programming principle and are either based on a general thermodynamic model or on a specific probabilistic model, traditionally realized by a stochastic context-free grammar. To date, the applied grammars were rather simple and small and despite the fact that statistical approaches have become increasingly appreciated over the past years, a corresponding sampling algorithm based on a stochastic RNA structure model has not yet been devised. In addition, basically all popular state-of-the-art tools for computational structure prediction have the same worst-case time and space requirements of O(n^3) and O(n^2) for sequence length n, limiting their applicability for practical purposes due to the often quite large sizes of native RNA molecules. Accordingly, the prime demand imposed by biologists on computational prediction procedures is to reach a reduced waiting time for results that are not significantly less accurate.
We here deal with all of these issues, by describing algorithms and performing comprehensive studies that are based on sophisticated stochastic context-free grammars of similar complexity as those underlying thermodynamic prediction approaches, where all of our methods indeed make use of the concept of sampling. We also employ the approximation technique known from theoretical computer science in order to reach a heuristic worst-case speedup for RNA folding.
Particularly, we start by describing a way for deriving a sequence-independent random sampler for an arbitrary class of RNAs by means of (weighted) unranking. The resulting algorithm may generate any secondary structure of a given fixed size n in only O(n·log(n)) time, where the results are observed to be accurate, validating its practical applicability.
With respect to RNA folding, we present a novel probabilistic sampling algorithm that generates statistically representative and reproducible samples of the entire ensemble of feasible structures for a particular input sequence. This method actually samples the possible foldings from a distribution implied by a suitable (traditional or length-dependent) grammar. Notably, we also propose several (new) ways for obtaining predictions from generated samples. Both variants have the same worst-case time and space complexities of O(n^3) and O(n^2) for sequence length n. Nevertheless, evaluations of our sampling methods show that they are actually capable of producing accurate (prediction) results.
In an attempt to resolve the long-standing problem of reducing the time complexity of RNA folding algorithms without sacrificing much of the accuracy of the results, we invented an innovative heuristic statistical sampling method that can be implemented to require only O(n^2) time for generating a fixed-size sample of candidate structures for a given sequence of length n. Since a reasonable prediction can still efficiently be obtained from the generated sample set, this approach finally reduces the worst-case time complexity by a liner factor compared to all existing precise methods. Notably, we also propose a novel (heuristic) sampling strategy as opposed to the common one typically applied for statistical sampling, which may produce more accurate results for particular settings. A validation of our heuristic sampling approach by comparison to several leading RNA secondary structure prediction tools indicates that it is capable of producing competitive predictions, but may require the consideration of large sample sizes.