68U99 None of the above, but in this section
Refine
Document Type
- Doctoral Thesis (2)
Language
- English (2)
Has Fulltext
- yes (2)
Keywords
- Bioinformatik (1)
- Molekulare Bioinformatik (1)
- computational biology (1)
- echtzeitsystem (1)
- edge computing (1)
- real-time systems (1)
- secondary structure prediction (1)
Faculty / Organisational entity
Cloud Computing, or the Cloud, became one of the most used technologies in today's world, right after its possibilities had been figured out. It is a renowned technology that enables ubiquitous access to tasks that need collaboration or remote monitoring. It is widely used in daily lives as well as the industry. The paradigm uses Internet Technologies which rely on best-effort communication. Best-effort communication limits the applicability of the technology in the domains where the timing is critical. Edge Computing is a paradigm that is seen as a complementary technology to the Cloud. It is expected to solve the Quality of Service (QoS) and latency problems that are raised due to the increased count of connected devices, and the physical distance between the infrastructure and devices. The Edge Computing adds a new tier between Information Technology (IT) and Operational Technology (OT) and brings the computing power close to the source of the data. Computing power near devices reduces the dependency to the Internet; hence, in case of a network failure, the computation can still continue. Close proximity deployments also enable the application of Edge Computing in the areas where real-timeliness is necessary. Computation and communication in Edge Computing are performed via Edge Servers. This thesis suggests a standardized and hardware-independent software reference architecture for Edge Servers that can be realized as a framework on servers, to be used on domains where the timing is critical. The suggested architecture is scalable, extensible, modular, multi-user supported, and decentralized. In decentralized systems, several precautions must be taken into consideration, such as latencies, delays, and available resources of the neighbouring servers. The resulting architecture evaluates these factors and enables real-time execution. It also hides the complexity of low-level communication and automates the collaboration between Edge Servers to enable seamless offloading in case of a need due to lack of resources. The thesis also validates an exemplary instance of the architecture with at framework, called Real-Time Execution Framework (RTEF), with multiple scenarios. The tasks used are resource-demanding and requested to be executed on an Edge Server in an Edge Network comprising multiple Edge Servers. The servers can make decisions by evaluating their availabilities, and determine the optimal location to execute the task, without causing deadline misses. Even under a heavy load, the decisions made by the servers to execute the tasks on time were correct, and the concept is proven.
Predicting secondary structures of RNA molecules is one of the fundamental problems of and thus a challenging task in computational structural biology. Existing prediction methods basically use the dynamic programming principle and are either based on a general thermodynamic model or on a specific probabilistic model, traditionally realized by a stochastic context-free grammar. To date, the applied grammars were rather simple and small and despite the fact that statistical approaches have become increasingly appreciated over the past years, a corresponding sampling algorithm based on a stochastic RNA structure model has not yet been devised. In addition, basically all popular state-of-the-art tools for computational structure prediction have the same worst-case time and space requirements of O(n^3) and O(n^2) for sequence length n, limiting their applicability for practical purposes due to the often quite large sizes of native RNA molecules. Accordingly, the prime demand imposed by biologists on computational prediction procedures is to reach a reduced waiting time for results that are not significantly less accurate.
We here deal with all of these issues, by describing algorithms and performing comprehensive studies that are based on sophisticated stochastic context-free grammars of similar complexity as those underlying thermodynamic prediction approaches, where all of our methods indeed make use of the concept of sampling. We also employ the approximation technique known from theoretical computer science in order to reach a heuristic worst-case speedup for RNA folding.
Particularly, we start by describing a way for deriving a sequence-independent random sampler for an arbitrary class of RNAs by means of (weighted) unranking. The resulting algorithm may generate any secondary structure of a given fixed size n in only O(n·log(n)) time, where the results are observed to be accurate, validating its practical applicability.
With respect to RNA folding, we present a novel probabilistic sampling algorithm that generates statistically representative and reproducible samples of the entire ensemble of feasible structures for a particular input sequence. This method actually samples the possible foldings from a distribution implied by a suitable (traditional or length-dependent) grammar. Notably, we also propose several (new) ways for obtaining predictions from generated samples. Both variants have the same worst-case time and space complexities of O(n^3) and O(n^2) for sequence length n. Nevertheless, evaluations of our sampling methods show that they are actually capable of producing accurate (prediction) results.
In an attempt to resolve the long-standing problem of reducing the time complexity of RNA folding algorithms without sacrificing much of the accuracy of the results, we invented an innovative heuristic statistical sampling method that can be implemented to require only O(n^2) time for generating a fixed-size sample of candidate structures for a given sequence of length n. Since a reasonable prediction can still efficiently be obtained from the generated sample set, this approach finally reduces the worst-case time complexity by a liner factor compared to all existing precise methods. Notably, we also propose a novel (heuristic) sampling strategy as opposed to the common one typically applied for statistical sampling, which may produce more accurate results for particular settings. A validation of our heuristic sampling approach by comparison to several leading RNA secondary structure prediction tools indicates that it is capable of producing competitive predictions, but may require the consideration of large sample sizes.