## Dissertation

### Filtern

#### Erscheinungsjahr

- 2016 (51) (entfernen)

#### Dokumenttyp

- Dissertation (51) (entfernen)

#### Sprache

- Englisch (51) (entfernen)

#### Schlagworte

- AUTOSAR (1)
- Affine Arithmetic (1)
- Autoregressive Hilbertian model (1)
- Biophysics (1)
- Bootstrap (1)
- Buchstabe (1)
- Combinatorial Testing (1)
- Dyslexie (1)
- Fehlerbaumanalyse (1)
- Functional autoregression (1)

#### Fachbereich / Organisatorische Einheit

- Fachbereich Informatik (21)
- Fachbereich Mathematik (13)
- Fachbereich Maschinenbau und Verfahrenstechnik (5)
- Fachbereich Biologie (4)
- Fachbereich Chemie (2)
- Fachbereich Elektrotechnik und Informationstechnik (2)
- Fachbereich Sozialwissenschaften (2)
- Fachbereich Physik (1)
- Fachbereich Raum- und Umweltplanung (1)

Typically software engineers implement their software according to the design of the software
structure. Relations between classes and interfaces such as method-call relations and inheritance
relations are essential parts of a software structure. Accordingly, analyzing several types of
relations will benefit the static analysis process of the software structure. The tasks of this
analysis include but not limited to: understanding of (legacy) software, checking guidelines,
improving product lines, finding structure, or re-engineering of existing software. Graphs with
multi-type edges are possible representation for these relations considering them as edges, while
nodes represent classes and interfaces of software. Then, this multiple type edges graph can
be mapped to visualizations. However, the visualizations should deal with the multiplicity of
relations types and scalability, and they should enable the software engineers to recognize visual
patterns at the same time.
To advance the usage of visualizations for analyzing the static structure of software systems,
I tracked difierent development phases of the interactive multi-matrix visualization (IMMV)
showing an extended user study at the end. Visual structures were determined and classified
systematically using IMMV compared to PNLV in the extended user study as four categories:
High degree, Within-package edges, Cross-package edges, No edges. In addition to these structures
that were found in these handy tools, other structures that look interesting for software
engineers such as cycles and hierarchical structures need additional visualizations to display
them and to investigate them. Therefore, an extended approach for graph layout was presented
that improves the quality of the decomposition and the drawing of directed graphs
according to their topology based on rigorous definitions. The extension involves describing
and analyzing the algorithms for decomposition and drawing in detail giving polynomial time
complexity and space complexity. Finally, I handled visualizing graphs with multi-type edges
using small-multiples, where each tile is dedicated to one edge-type utilizing the topological
graph layout to highlight non-trivial cycles, trees, and DAGs for showing and analyzing the
static structure of software. Finally, I applied this approach to four software systems to show
its usefulness.

This thesis presents a novel, generic framework for information segmentation in document images.
A document image contains different types of information, for instance, text (machine printed/handwritten), graphics, signatures, and stamps.
It is necessary to segment information in documents so that to process such segmented information only when required in automatic document processing workflows.
The main contribution of this thesis is the conceptualization and implementation of an information segmentation framework that is based on part-based features.
The generic nature of the presented framework makes it applicable to a variety of documents (technical drawings, magazines, administrative, scientific, and academic documents) digitized using different methods (scanners, RGB cameras, and hyper-spectral imaging (HSI) devices).
A highlight of the presented framework is that it does not require large training sets, rather a few training samples (for instance, four pages) lead to high performance, i.e., better than previously existing methods.
In addition, the presented framework is simple and can be adapted quickly to new problem domains.
This thesis is divided into three major parts on the basis of document digitization method (scanned, hyper-spectral imaging, and camera captured) used.
In the area of scanned document images, three specific contributions have been realized.
The first of them is in the domain of signature segmentation in administrative documents.
In some workflows, it is very important to check the document authenticity before processing the actual content.
This can be done based on the available seal of authenticity, e.g., signatures.
However, signature verification systems expect pre-segmented signature image, while signatures are usually a part of document.
To use signature verification systems on document images, it is necessary to first segment signatures in documents.
This thesis shows that the presented framework can be used to segment signatures in administrative documents.
The system based on the presented framework is tested on a publicly available dataset where it outperforms the state-of-the-art methods and successfully segmented all signatures, while less than half of the found signatures are false positives.
This shows that it can be applied for practical use.
The second contribution in the area of scanned document images is segmentation of stamps in administrative documents.
A stamp also serves as a seal for documents authenticity.
However, the location of stamp on the document can be more arbitrary than a signature depending on the person sealing the document.
This thesis shows that a system based on our generic framework is able to extract stamps of any arbitrary shape and color.
The evaluation of the presented system on a publicly available dataset shows that it is also able to segment black stamps (that were not addressed in the past) with a recall and precision of 83% and 73%, respectively.
%Furthermore, to segment colored stamps, this thesis presents a novel feature set which is based on intensity gradient, is able to extract unseen, colored, arbitrary shaped, textual as well as graphical stamps, and outperforms the state-of-the-art methods.
The third contribution in the scanned document images is in the domain of information segmentation in technical drawings (architectural floorplans, maps, circuit diagrams, etc.) containing usually a large amount of graphics and comparatively less textual components. Further, as in technical drawings, text is overlapping with graphics.
Thus, automatic analysis of technical drawings uses text/graphics segmentation as a pre-processing step.
This thesis presents a method based on our generic information segmentation framework that is able to detect the text, which is touching graphical components in architectural floorplans and maps.
Evaluation of the method on a publicly available dataset of architectural floorplans shows that it is able to extract almost all touching text components with precision and recall of 71% and 95%, respectively.
This means that almost all of the touching text components are successfully extracted.
In the area of hyper-spectral document images, two contributions have been realized.
Unlike normal three channels RGB images, hyper-spectral images usually have multiple channels that range from ultraviolet to infrared regions including the visible region.
First, this thesis presents a novel automatic method for signature segmentation from hyper-spectral document images (240 spectral bands between 400 - 900 nm).
The presented method is based on a part-based key point detection technique, which does not use any structural information, but relies only on the spectral response of the document regardless of ink color and intensity.
The presented method is capable of segmenting (overlapping and non-overlapping) signatures from varying backgrounds like, printed text, tables, stamps, logos, etc.
Importantly, the presented method can extract signature pixels and not just the bounding boxes.
This is substantial when signatures are overlapping with text and/or other objects in image. Second, this thesis presents a new dataset comprising of 300 documents scanned using a high-resolution hyper-spectral scanner. Evaluation of the presented signature segmentation method on this hyper-spectral dataset shows that it is able to extract signature pixels with the precision and recall of 100% and 79%, respectively.
Further contributions have been made in the area of camera captured document images. A major problem in the development of Optical Character Recognition (OCR) systems for camera captured document images is the lack of labeled camera captured document images datasets. In the first place, this thesis presents a novel, generic, method for automatic ground truth generation/labeling of document images. The presented method builds large-scale (i.e., millions of images) datasets of labeled camera captured / scanned documents without any human intervention. The method is generic and can be used for automatic ground truth generation of (scanned and/or camera captured) documents in any language, e.g., English, Russian, Arabic, Urdu. The evaluation of the presented method, on two different datasets in English and Russian, shows that 99.98% of the images are correctly labeled in every case.
Another important contribution in the area of camera captured document images is the compilation of a large dataset comprising 1 million word images (10 million character images), captured in a real camera-based acquisition environment, along with the word and character level ground truth. The dataset can be used for training as well as testing of character recognition systems for camera-captured documents. Various benchmark tests are performed to analyze the behavior of different open source OCR systems on camera captured document images. Evaluation results show that the existing OCRs, which already get very high accuracies on scanned documents, fail on camera captured document images.
Using the presented camera-captured dataset, a novel character recognition system is developed which is based on a variant of recurrent neural networks, i.e., Long Short Term Memory (LSTM) that outperforms all of the existing OCR engines on camera captured document images with an accuracy of more than 95%.
Finally, this thesis provides details on various tasks that have been performed in the area closely related to information segmentation. This includes automatic analysis and sketch based retrieval of architectural floor plan images, a novel scheme for online signature verification, and a part-based approach for signature verification. With these contributions, it has been shown that part-based methods can be successfully applied to document image analysis.

Towards A Non-tracking Web
(2016)

Today, many publishers (e.g., websites, mobile application developers) commonly use third-party analytics services and social widgets. Unfortunately, this scheme allows these third parties to track individual users across the web, creating privacy concerns and leading to reactions to prevent tracking via blocking, legislation and standards. While improving user privacy, these efforts do not consider the functionality third-party tracking enables publishers to use: to obtain aggregate statistics about their users and increase their exposure to other users via online social networks. Simply preventing third-party tracking without replacing the functionality it provides cannot be a viable solution; leaving publishers without essential services will hurt the sustainability of the entire ecosystem.
In this thesis, we present alternative approaches to bridge this gap between privacy for users and functionality for publishers and other entities. We first propose a general and interaction-based third-party cookie policy that prevents third-party tracking via cookies, yet enables social networking features for users when wanted, and does not interfere with non-tracking services for analytics and advertisements. We then present a system that enables publishers to obtain rich web analytics information (e.g., user demographics, other sites visited) without tracking the users across the web. While this system requires no new organizational players and is practical to deploy, it necessitates the publishers to pre-define answer values for the queries, which may not be feasible for many analytics scenarios (e.g., search phrases used, free-text photo labels). Our second system complements the first system by enabling publishers to discover previously unknown string values to be used as potential answers in a privacy-preserving fashion and with low computation overhead for clients as well as servers. These systems suggest that it is possible to provide non-tracking services with (at least) the same functionality as today’s tracking services.

The Context and Its Importance: In safety and reliability analysis, the information generated by Minimal Cut Set (MCS) analysis is large.
The Top Level event (TLE) that is the root of the fault tree (FT) represents a hazardous state of the system being analyzed.
MCS analysis helps in analyzing the fault tree (FT) qualitatively-and quantitatively when accompanied with quantitative measures.
The information shows the bottlenecks in the fault tree design leading to identifying weaknesses of the system being examined.
Safety analysis (containing the MCS analysis) is especially important for critical systems, where harm can be done to the environment or human causing injuries, or even death during the system usage.
Minimal Cut Set (MCS) analysis is performed using computers and generating a lot of information.
This phase is called MCS analysis I in this thesis.
The information is then analyzed by the analysts to determine possible issues and to improve the design of the system regarding its safety as early as possible.
This phase is called MCS analysis II in this thesis.
The goal of my thesis was developing interactive visualizations to support MCS analysis II of one fault tree (FT).
The Methodology: As safety visualization-in this thesis, Minimal Cut Set analysis II visualization-is an emerging field and no complete checklist regarding Minimal Cut Set analysis II requirements and gaps were available from the perspective of visualization and interaction capabilities,
I have conducted multiple studies using different methods with different data sources (i.e., triangulation of methods and data) for determining these requirements and gaps before developing and evaluating visualizations and interactions supporting Minimal Cut Set analysis II.
Thus, the following approach was taken in my thesis:
1- First, a triangulation of mixed methods and data sources was conducted.
2- Then, four novel interactive visualizations and one novel interaction widget were developed.
3- Finally, these interactive visualizations were evaluated both objectively and subjectively (compared to multiple safety tools)
from the point of view of users and developers of the safety tools that perform MCS analysis I with respect to their degree in supporting MCS analysis II and from the point of non-domain people using empirical strategies.
The Spiral tool supports analysts with different visions, i.e., full vision, color deficiency protanopia, deuteranopia, and tritanopia. It supports 100 out of 103 (97%) requirements obtained from the triangulation and it fills 37 out of 39 (95%) gaps. Its usability was rated high (better than their best currently used tools) by the users of the safety and reliability tools (RiskSpectrum, ESSaRel, FaultTree+, and a self-developed tool) and at least similar to the best currently used tools from the point of view of the CAFTA tool developers. Its quality was higher regarding its degree of supporting MCS analysis II compared to the FaultTree+ tool. The time spent for discovering the critical MCSs from a problem size of 540 MCSs (with a worst case of all equal order) was less than a minute while achieving 99.5% accuracy. The scalability of the Spiral visualization was above 4000 MCSs for a comparison task. The Dynamic Slider reduces the interaction movements up to 85.71% of the previous sliders and solves the overlapping thumb issues by the sliders provides the 3D model view of the system being analyzed provides the ability to change the coloring of MCSs according to the color vision of the user provides selecting a BE (i.e., multi-selection of MCSs), thus, can observe the BEs' NoO and provides its quality provides two interaction speeds for panning and zooming in the MCS, BE, and model views provide a MCS, a BE, and a physical tab for supporting the analysis starting by the MCSs, the BEs, or the physical parts. It combines MCS analysis results and the model of an embedded system enabling the analysts to directly relate safety information with the corresponding parts of the system being analyzed and provides an interactive mapping between the textual information of the BEs and MCSs and the parts related to the BEs.
Verifications and Assessments: I have evaluated all visualizations and the interaction widget both objectively and subjectively, and finally evaluated the final Spiral visualization tool also both objectively and subjectively regarding its perceived quality and regarding its degree of supporting MCS analysis II.

We investigate the long-term behaviour of diffusions on the non-negative real numbers under killing at some random time. Killing can occur at zero as well as in the interior of the state space. The diffusion follows a stochastic differential equation driven by a Brownian motion. The diffusions we are working with will almost surely be killed. In large parts of this thesis we only assume the drift coefficient to be continuous. Further, we suppose that zero is regular and that infinity is natural. We condition the diffusion on survival up to time t and let t tend to infinity looking for a limiting behaviour.

When designing autonomous mobile robotic systems, there usually is a trade-off between the three opposing goals of safety, low-cost and performance.
If one of these design goals is approached further, it usually leads to a recession of one or even both of the other goals.
If for example the performance of a mobile robot is increased by making use of higher vehicle speeds, then the safety of the system is usually decreased, as, under the same circumstances, faster robots are often also more dangerous robots.
This decrease of safety can be mitigated by installing better sensors on the robot, which ensure the safety of the system, even at high speeds.
However, this solution is accompanied by an increase of system cost.
In parallel to mobile robotics, there is a growing amount of ambient and aware technology installations in today's environments - no matter whether in private homes, offices or factory environments.
Part of this technology are sensors that are suitable to assess the state of an environment.
For example, motion detectors that are used to automate lighting can be used to detect the presence of people.
This work constitutes a meeting point between the two fields of robotics and aware environment research.
It shows how data from aware environments can be used to approach the abovementioned goal of establishing safe, performant and additionally low-cost robotic systems.
Sensor data from aware technology, which is often unreliable due to its low-cost nature, is fed to probabilistic methods for estimating the environment's state.
Together with models, these methods cope with the uncertainty and unreliability associated with the sensor data, gathered from an aware environment.
The estimated state includes positions of people in the environment and is used as an input to the local and global path planners of a mobile robot, enabling safe, cost-efficient and performant mobile robot navigation during local obstacle avoidance as well as on a global scale, when planning paths between different locations.
The probabilistic algorithms enable graceful degradation of the whole system.
Even if, in the extreme case, all aware technology fails, the robots will continue to operate, by sacrificing performance while maintaining safety.
All the presented methods of this work have been validated using simulation experiments as well as using experiments with real hardware.

In this thesis we developed a desynchronization design flow in the goal of easing the de- velopment effort of distributed embedded systems. The starting point of this design flow is a network of synchronous components. By transforming this synchronous network into a dataflow process network (DPN), we ensures important properties that are difficult or theoretically impossible to analyze directly on DPNs are preserved by construction. In particular, both deadlock-freeness and buffer boundedness can be preserved after desyn- chronization. For the correctness of desynchronization, we developed a criteria consisting of two properties: a global property that demands the correctness of the synchronous network, as well as a local property that requires the latency-insensitivity of each local synchronous component. As the global property is also a correctness requirement of synchronous systems in general, we take this property as an assumption of our desyn- chronization. However, the local property is in general not satisfied by all synchronous components, and therefore needs to be verified before desynchronization. In this thesis we developed a novel technique for the verification of the local property that can be carried out very efficiently. Finally we developed a model transformation method that translates a set of synchronous guarded actions – an intermediate format for synchronous systems – to an asynchronous actor description language (CAL). Our theorem ensures that one passed the correctness verification, the generated DPN of asynchronous pro- cesses (or actors) preserves the functional behavior of the original synchronous network. Moreover, by the correctness of the synchronous network, our theorem guarantees that the derived DPN is deadlock-free and can be implemented with only finitely bounded buffers.

Stochastic Network Calculus (SNC) emerged from two branches in the late 90s:
the theory of effective bandwidths and its predecessor the Deterministic Network
Calculus (DNC). As such SNC’s goal is to analyze queueing networks and support
their design and control.
In contrast to queueing theory, which strives for similar goals, SNC uses in-
equalities to circumvent complex situations, such as stochastic dependencies or
non-Poisson arrivals. Leaving the objective to compute exact distributions behind,
SNC derives stochastic performance bounds. Such a bound would, for example,
guarantee a system’s maximal queue length that is violated by a known small prob-
ability only.
This work includes several contributions towards the theory of SNC. They are
sorted into four main contributions:
(1) The first chapters give a self-contained introduction to deterministic net-
work calculus and its two branches of stochastic extensions. The focus lies on the
notion of network operations. They allow to derive the performance bounds and
simplifying complex scenarios.
(2) The author created the first open-source tool to automate the steps of cal-
culating and optimizing MGF-based performance bounds. The tool automatically
calculates end-to-end performance bounds, via a symbolic approach. In a second
step, this solution is numerically optimized. A modular design allows the user to
implement their own functions, like traffic models or analysis methods.
(3) The problem of the initial modeling step is addressed with the development
of a statistical network calculus. In many applications the properties of included
elements are mostly unknown. To that end, assumptions about the underlying
processes are made and backed by measurement-based statistical methods. This
thesis presents a way to integrate possible modeling errors into the bounds of SNC.
As a byproduct a dynamic view on the system is obtained that allows SNC to adapt
to non-stationarities.
(4) Probabilistic bounds are fundamentally different from deterministic bounds:
While deterministic bounds hold for all times of the analyzed system, this is not
true for probabilistic bounds. Stochastic bounds, although still valid for every time
t, only hold for one time instance at once. Sample path bounds are only achieved by
using Boole’s inequality. This thesis presents an alternative method, by adapting
the theory of extreme values.
(5) A long standing problem of SNC is the construction of stochastic bounds
for a window flow controller. The corresponding problem for DNC had been solved
over a decade ago, but remained an open problem for SNC. This thesis presents
two methods for a successful application of SNC to the window flow controller.

Gröbner bases are one of the most powerful tools in computer algebra and commutative algebra, with applications in algebraic geometry and singularity theory. From the theoretical point of view, these bases can be computed over any field using Buchberger's algorithm. In practice, however, the computational efficiency depends on the arithmetic of the coefficient field.
In this thesis, we consider Gröbner bases computations over two types of coefficient fields. First, consider a simple extension \(K=\mathbb{Q}(\alpha)\) of \(\mathbb{Q}\), where \(\alpha\) is an algebraic number, and let \(f\in \mathbb{Q}[t]\) be the minimal polynomial of \(\alpha\). Second, let \(K'\) be the algebraic function field over \(\mathbb{Q}\) with transcendental parameters \(t_1,\ldots,t_m\), that is, \(K' = \mathbb{Q}(t_1,\ldots,t_m)\). In particular, we present efficient algorithms for computing Gröbner bases over \(K\) and \(K'\). Moreover, we present an efficient method for computing syzygy modules over \(K\).
To compute Gröbner bases over \(K\), starting from the ideas of Noro [35], we proceed by joining \(f\) to the ideal to be considered, adding \(t\) as an extra variable. But instead of avoiding superfluous S-pair reductions by inverting algebraic numbers, we achieve the same goal by applying modular methods as in [2,4,27], that is, by inferring information in characteristic zero from information in characteristic \(p > 0\). For suitable primes \(p\), the minimal polynomial \(f\) is reducible over \(\mathbb{F}_p\). This allows us to apply modular methods once again, on a second level, with respect to the
modular factors of \(f\). The algorithm thus resembles a divide and conquer strategy and
is in particular easily parallelizable. Moreover, using a similar approach, we present an algorithm for computing syzygy modules over \(K\).
On the other hand, to compute Gröbner bases over \(K'\), our new algorithm first specializes the parameters \(t_1,\ldots,t_m\) to reduce the problem from \(K'[x_1,\ldots,x_n]\) to \(\mathbb{Q}[x_1,\ldots,x_n]\). The algorithm then computes a set of Gröbner bases of specialized ideals. From this set of Gröbner bases with coefficients in \(\mathbb{Q}\), it obtains a Gröbner basis of the input ideal using sparse multivariate rational interpolation.
At current state, these algorithms are probabilistic in the sense that, as for other modular Gröbner basis computations, an effective final verification test is only known for homogeneous ideals or for local monomial orderings. The presented timings show that for most examples, our algorithms, which have been implemented in SINGULAR [17], are considerably faster than other known methods.