Refine
Year of publication
Document Type
- Preprint (290)
- Doctoral Thesis (174)
- Report (111)
- Article (94)
- Conference Proceeding (8)
- Master's Thesis (7)
- Study Thesis (5)
- Bachelor Thesis (3)
- Habilitation (2)
Language
- English (694) (remove)
Has Fulltext
- yes (694)
Is part of the Bibliography
- no (694)
Keywords
- AG-RESY (47)
- PARO (25)
- SKALP (15)
- Visualisierung (13)
- Case-Based Reasoning (11)
- RODEO (11)
- HANDFLEX (9)
- Robotics (7)
- Abstraction (6)
- Case Based Reasoning (6)
Faculty / Organisational entity
- Fachbereich Informatik (694) (remove)
Manipulating deformable linear objects - Vision-based recognition of contact state transitions -
(1999)
A new and systematic approach to machine vision-based robot manipulation of deformable (non-rigid) linear objects is introduced. This approach reduces the computational needs by using a simple state-oriented model of the objects. These states describe the relation of the object with respect to an obstacle and are derived from the object image and its features. Therefore, the object is segmented from a standard video frame using a fast segmentation algorithm. Several object features are presented which allow the state recognition of the object while being manipulated by the robot.
A new and systematic basic approach to force- and vision-based robot manipulation of deformable (non-rigid) linear objects is introduced. This approach reduces the computational needs by using a simple state-oriented model of the objects. These states describe the relation between the deformable and rigid obstacles, and are derived from the object image and its features. We give an enumeration of possible contact states and discuss the main characteristics of each state. We investigate the performance of robust transitions between the contact states and derive criteria and conditions for each of the states and for two sensor systems, i.e. a vision sensor and a force/torque sensor. This results in a new and task-independent approach in regarding the handling of deformable objects and in a sensor-based implementation of manipulation primitives for industrial robots. Thus, the usage of sensor processing is an appropriate solution for our problem. Finally, we apply the concept of contact states and state transitions to the description of a typical assembly task. Experimental results show the feasibility of our approach: A robot performs several contact state transitions which can be combined for solving a more complex task.
Self-adaptation allows software systems to autonomously adjust their behavior during run-time by handling all possible
operating states that violate the requirements of the managed system. This requires an adaptation engine that receives adaptation
requests during the monitoring process of the managed system and responds with an automated and appropriate adaptation
response. During the last decade, several engineering methods have been introduced to enable self-adaptation in software systems.
However, these methods lack addressing (1) run-time uncertainty that hinders the adaptation process and (2) the performance
impacts resulted from the complexity and the large number of the adaptation space. This paper presents CRATER, a framework
that builds an external adaptation engine for self-adaptive software systems. The adaptation engine, which is built on Case-based
Reasoning, handles the aforementioned challenges together. This paper is braced with an experiment illustrating the benefits of
this framework. The experimental results shows the potential of CRATER in terms handling run-time uncertainty and adaptation
remembrance that enhances the performance for large number of adaptation space.
Postmortem Analysis of Decayed Online Social Communities: Cascade Pattern Analysis and Prediction
(2018)
Recently, many online social networks, such as MySpace, Orkut, and Friendster, have faced inactivity decay of their members, which contributed to the collapse of these networks. The reasons, mechanics, and prevention mechanisms of such inactivity decay are not fully understood. In this work, we analyze decayed and alive subwebsites from the Stack Exchange platform. The analysis mainly focuses on the inactivity cascades that occur among the members of these communities. We provide measures to understand the decay process and statistical analysis to extract the patterns that accompany the inactivity decay. Additionally, we predict cascade size and cascade virality using machine learning. The results of this work include a statistically significant difference of the decay patterns between the decayed and the alive subwebsites. These patterns are mainly cascade size, cascade virality, cascade duration, and cascade similarity. Additionally, the contributed prediction framework showed satisfactorily prediction results compared to a baseline predictor. Supported by empirical evidence, the main findings of this work are (1) there are significantly different decay patterns in the alive and the decayed subwebsites of the Stack Exchange; (2) the cascade’s node degrees contribute more to the decay process than the cascade’s virality, which indicates that the expert members of the Stack Exchange subwebsites were mainly responsible for the activity or inactivity of the Stack Exchange subwebsites; (3) the Statistics subwebsite is going through decay dynamics that may lead to it becoming fully-decayed; (4) the decay process is not governed by only one network measure, it is better described using multiple measures; (5) decayed subwebsites were originally less resilient to inactivity decay, unlike the alive subwebsites; and (6) network’s structure in the early stages of its evolution dictates the activity/inactivity characteristics of the network.
Learning From Networked-data: Methods and Models for Understanding Online Social Networks Dynamics
(2020)
Abstract
Nowadays, people and systems created by people are generating an unprecedented amount of
data. This data has brought us data-driven services with a variety of applications that affect
people’s behavior. One of these applications is the emergent online social networks as a method
for communicating with each other, getting and sharing information, looking for jobs, and many
other things. However, the tremendous growth of these online social networks has also led to many
new challenges that need to be addressed. In this context, the goal of this thesis is to better understand
the dynamics between the members of online social networks from two perspectives. The
first perspective is to better understand the process and the motives underlying link formation in
online social networks. We utilize external information to predict whether two members of an online
social network are friends or not. Also, we contribute a framework for assessing the strength of
friendship ties. The second perspective is to better understand the decay dynamics of online social
networks resulting from the inactivity of their members. Hence, we contribute a model, methods,
and frameworks for understanding the decay mechanics among the members, for predicting members’
inactivity, and for understanding and analyzing inactivity cascades occurring during the decay.
The results of this thesis are: (1) The link formation process is at least partly driven by interactions
among members that take place outside the social network itself; (2) external interactions might
help reduce the noise in social networks and for ranking the strength of the ties in these networks;
(3) inactivity dynamics can be modeled, predicted, and controlled using the models contributed in
this thesis, which are based on network measures. The contributions and the results of this thesis
can be beneficial in many respects. For example, improving the quality of a social network by introducing
new meaningful links and removing noisy ones help to improve the quality of the services
provided by the social network, which, e.g., enables better friend recommendations and helps to
eliminate fake accounts. Moreover, understanding the decay processes involved in the interaction
among the members of a social network can help to prolong the engagement of these members. This
is useful in designing more resilient social networks and can assist in finding influential members
whose inactivity may trigger an inactivity cascade resulting in a potential decay of a network.
Building interoperation among separately developed software units requires checking their conceptual assumptions and constraints. However, eliciting such assumptions and constraints is time consuming and is a challenging task as it requires analyzing each of the interoperating software units. To address this issue we proposed a new conceptual interoperability analysis approach which aims at decreasing the analysis cost and the conceptual mismatches between the interoperating software units. In this report we present the design of a planned controlled experiment for evaluating the effectiveness, efficiency, and acceptance of our proposed conceptual interoperability analysis approach. The design includes the study objectives, research questions, statistical hypotheses, and experimental design. It also provides the materials that will be used in the execution phase of the planned experiment.
Typically software engineers implement their software according to the design of the software
structure. Relations between classes and interfaces such as method-call relations and inheritance
relations are essential parts of a software structure. Accordingly, analyzing several types of
relations will benefit the static analysis process of the software structure. The tasks of this
analysis include but not limited to: understanding of (legacy) software, checking guidelines,
improving product lines, finding structure, or re-engineering of existing software. Graphs with
multi-type edges are possible representation for these relations considering them as edges, while
nodes represent classes and interfaces of software. Then, this multiple type edges graph can
be mapped to visualizations. However, the visualizations should deal with the multiplicity of
relations types and scalability, and they should enable the software engineers to recognize visual
patterns at the same time.
To advance the usage of visualizations for analyzing the static structure of software systems,
I tracked difierent development phases of the interactive multi-matrix visualization (IMMV)
showing an extended user study at the end. Visual structures were determined and classified
systematically using IMMV compared to PNLV in the extended user study as four categories:
High degree, Within-package edges, Cross-package edges, No edges. In addition to these structures
that were found in these handy tools, other structures that look interesting for software
engineers such as cycles and hierarchical structures need additional visualizations to display
them and to investigate them. Therefore, an extended approach for graph layout was presented
that improves the quality of the decomposition and the drawing of directed graphs
according to their topology based on rigorous definitions. The extension involves describing
and analyzing the algorithms for decomposition and drawing in detail giving polynomial time
complexity and space complexity. Finally, I handled visualizing graphs with multi-type edges
using small-multiples, where each tile is dedicated to one edge-type utilizing the topological
graph layout to highlight non-trivial cycles, trees, and DAGs for showing and analyzing the
static structure of software. Finally, I applied this approach to four software systems to show
its usefulness.
This paper deals with the handling of deformable linear objects (DLOs), such as hoses, wires, or leaf springs. It investigates usable features for the vision-based detection of a changing contact situation between a DLO and a rigid polyhedral obstacle and a classification of such contact state transitions. The result is a complete classification of contact state transitions and of the most significant features for each class. This knowledge enables reliable detection of changes in the DLO contact situation, facilitating implementation of sensor-based manipulation skills for all possible contact changes.
Dealing with information in modern times involves users to cope with hundreds of thousands of documents, such as articles, emails, Web pages, or News feeds.
Above all information sources, the World Wide Web presents information seekers with great challenges.
It offers more text in natural language than one is capable to read.
The key idea for this research intends to provide users with adaptable filtering techniques, supporting them in filtering out the specific information items they need.
Its realization focuses on developing an Information Extraction system,
which adapts to a domain of concern, by interpreting the contained formalized knowledge.
Utilizing the Resource Description Framework (RDF), which is the Semantic Web's formal language for exchanging information,
allows extending information extractors to incorporate the given domain knowledge.
Because of this, formal information items from the RDF source can be recognized in the text.
The application of RDF allows a further investigation of operations on recognized information items, such as disambiguating and rating the relevance of these.
Switching between different RDF sources allows changing the application scope of the Information Extraction system from one domain of concern to another.
An RDF-based Information Extraction system can be triggered to extract specific kinds of information entities by providing it with formal RDF queries in terms of the SPARQL query language.
Representing extracted information in RDF extends the coverage of the Semantic Web's information degree and provides a formal view on a text from the perspective of the RDF source.
In detail, this work presents the extension of existing Information Extraction approaches by incorporating the graph-based nature of RDF.
Hereby, the pre-processing of RDF sources allows extracting statistical information models dedicated to support specific information extractors.
These information extractors refine standard extraction tasks, such as the Named Entity Recognition, by using the information provided by the pre-processed models.
The post-processing of extracted information items enables representing these results in RDF format or lists, which can now be ranked or filtered by relevance.
Post-processing also comprises the enrichment of originating natural language text sources with extracted information items by using annotations in RDFa format.
The results of this research extend the state-of-the-art of the Semantic Web.
This work contributes approaches for computing customizable and adaptable RDF views on the natural language content of Web pages.
Finally, due to the formal nature of RDF, machines can interpret these views allowing developers to process the contained information in a variety of applications.
Optical Character Recognition (OCR) system plays an important role in digitization of data acquired as images from a variety of sources. Although the area is very well explored for Latin languages, some of the languages based on Arabic cursive script are not yet explored. It is due to many factors: Most importantly are the unavailability of proper data sets and complexities posed by cursive scripts. The Pashto language is one of such languages which needs considerable exploration towards OCR. In order to develop such an OCR system, this thesis provides a pioneering study that explores deep learning for the Pashto language in the field of OCR.
The Pashto language is spoken by more than $50$ million people across the world, and it is an active medium both for oral as well as written communication. It is associated with rich literary heritage and contains huge written collection. These written materials present contents of simple to complex nature, and layouts from hand-scribed to printed text. The Pashto language presents mainly two types of complexities (i) generic w.r.t. cursive script, (ii) specific w.r.t. Pashto language. Generic complexities are cursiveness, context dependency, and breaker character anomalies, as well as space anomalies. Pashto specific complexities are variations in shape for a single character and shape similarity for some of the additional Pashto characters. Existing research in the area of Arabic OCR did not lead to an end-to-end solution for the mentioned complexities and therefore could not be generalized to build a sophisticated OCR system for Pashto.
The contribution of this thesis spans in three levels, conceptual level, data level, and practical level. In the conceptual level, we have deeply explored the Pashto language and identified those characters, which are responsible for the challenges mentioned above. In the data level, a comprehensive dataset is introduced containing real images of hand-scribed contents. The dataset is manually transcribed and has the most frequent layout patterns associated with the Pashto language. The practical level contribution provides a bridge, in the form of a complete Pashto OCR system, and connects the outcomes of the conceptual and data levels contributions. The practical contribution comprises of skew detection, text-line segmentation, feature extraction, classification, and post-processing. The OCR module is more strengthened by using deep learning paradigm to recognize Pashto cursive script by the framework of Recursive Neural Networks (RNN). Proposed Pashto text recognition is based on Long Short-Term Memory Network (LSTM) and realizes a character recognition rate of $90.78\%$ on Pashto real hand-scribed images. All these contributions are integrated into an application to provide a flexible and generic End-to-End Pashto OCR system.
The impact of this thesis is not only specific to the Pashto language, but it is also beneficial to other cursive languages like Arabic, Urdu, and Persian e.t.c. The main reason is the Pashto character set, which is a superset of Arabic, Persian, and Urdu languages. Therefore, the conceptual contribution of this thesis provides insight and proposes solutions to almost all generic complexities associated with Arabic, Persian, and Urdu languages. For example, an anomaly caused by breaker characters is deeply analyzed, which is shared among 70 languages, mainly use Arabic script. This thesis presents a solution to this issue and is equally beneficial to almost all Arabic like languages.
The scope of this thesis has two important aspects. First, a social impact, i.e., how a society may benefit from it. The main advantages are to bring the historical and almost vanished document to life and to ensure the opportunities to explore, analyze, translate, share, and understand the contents of Pashto language globally. Second, the advancement and exploration of the technical aspects. Because, this thesis empirically explores the recognition and challenges which are solely related to the Pashto language, both regarding character-set and the materials which present such complexities. Furthermore, the conceptual and practical background of this thesis regarding complexities of Pashto language is very beneficial regarding OCR for other cursive languages.
This thesis presents a novel, generic framework for information segmentation in document images.
A document image contains different types of information, for instance, text (machine printed/handwritten), graphics, signatures, and stamps.
It is necessary to segment information in documents so that to process such segmented information only when required in automatic document processing workflows.
The main contribution of this thesis is the conceptualization and implementation of an information segmentation framework that is based on part-based features.
The generic nature of the presented framework makes it applicable to a variety of documents (technical drawings, magazines, administrative, scientific, and academic documents) digitized using different methods (scanners, RGB cameras, and hyper-spectral imaging (HSI) devices).
A highlight of the presented framework is that it does not require large training sets, rather a few training samples (for instance, four pages) lead to high performance, i.e., better than previously existing methods.
In addition, the presented framework is simple and can be adapted quickly to new problem domains.
This thesis is divided into three major parts on the basis of document digitization method (scanned, hyper-spectral imaging, and camera captured) used.
In the area of scanned document images, three specific contributions have been realized.
The first of them is in the domain of signature segmentation in administrative documents.
In some workflows, it is very important to check the document authenticity before processing the actual content.
This can be done based on the available seal of authenticity, e.g., signatures.
However, signature verification systems expect pre-segmented signature image, while signatures are usually a part of document.
To use signature verification systems on document images, it is necessary to first segment signatures in documents.
This thesis shows that the presented framework can be used to segment signatures in administrative documents.
The system based on the presented framework is tested on a publicly available dataset where it outperforms the state-of-the-art methods and successfully segmented all signatures, while less than half of the found signatures are false positives.
This shows that it can be applied for practical use.
The second contribution in the area of scanned document images is segmentation of stamps in administrative documents.
A stamp also serves as a seal for documents authenticity.
However, the location of stamp on the document can be more arbitrary than a signature depending on the person sealing the document.
This thesis shows that a system based on our generic framework is able to extract stamps of any arbitrary shape and color.
The evaluation of the presented system on a publicly available dataset shows that it is also able to segment black stamps (that were not addressed in the past) with a recall and precision of 83% and 73%, respectively.
%Furthermore, to segment colored stamps, this thesis presents a novel feature set which is based on intensity gradient, is able to extract unseen, colored, arbitrary shaped, textual as well as graphical stamps, and outperforms the state-of-the-art methods.
The third contribution in the scanned document images is in the domain of information segmentation in technical drawings (architectural floorplans, maps, circuit diagrams, etc.) containing usually a large amount of graphics and comparatively less textual components. Further, as in technical drawings, text is overlapping with graphics.
Thus, automatic analysis of technical drawings uses text/graphics segmentation as a pre-processing step.
This thesis presents a method based on our generic information segmentation framework that is able to detect the text, which is touching graphical components in architectural floorplans and maps.
Evaluation of the method on a publicly available dataset of architectural floorplans shows that it is able to extract almost all touching text components with precision and recall of 71% and 95%, respectively.
This means that almost all of the touching text components are successfully extracted.
In the area of hyper-spectral document images, two contributions have been realized.
Unlike normal three channels RGB images, hyper-spectral images usually have multiple channels that range from ultraviolet to infrared regions including the visible region.
First, this thesis presents a novel automatic method for signature segmentation from hyper-spectral document images (240 spectral bands between 400 - 900 nm).
The presented method is based on a part-based key point detection technique, which does not use any structural information, but relies only on the spectral response of the document regardless of ink color and intensity.
The presented method is capable of segmenting (overlapping and non-overlapping) signatures from varying backgrounds like, printed text, tables, stamps, logos, etc.
Importantly, the presented method can extract signature pixels and not just the bounding boxes.
This is substantial when signatures are overlapping with text and/or other objects in image. Second, this thesis presents a new dataset comprising of 300 documents scanned using a high-resolution hyper-spectral scanner. Evaluation of the presented signature segmentation method on this hyper-spectral dataset shows that it is able to extract signature pixels with the precision and recall of 100% and 79%, respectively.
Further contributions have been made in the area of camera captured document images. A major problem in the development of Optical Character Recognition (OCR) systems for camera captured document images is the lack of labeled camera captured document images datasets. In the first place, this thesis presents a novel, generic, method for automatic ground truth generation/labeling of document images. The presented method builds large-scale (i.e., millions of images) datasets of labeled camera captured / scanned documents without any human intervention. The method is generic and can be used for automatic ground truth generation of (scanned and/or camera captured) documents in any language, e.g., English, Russian, Arabic, Urdu. The evaluation of the presented method, on two different datasets in English and Russian, shows that 99.98% of the images are correctly labeled in every case.
Another important contribution in the area of camera captured document images is the compilation of a large dataset comprising 1 million word images (10 million character images), captured in a real camera-based acquisition environment, along with the word and character level ground truth. The dataset can be used for training as well as testing of character recognition systems for camera-captured documents. Various benchmark tests are performed to analyze the behavior of different open source OCR systems on camera captured document images. Evaluation results show that the existing OCRs, which already get very high accuracies on scanned documents, fail on camera captured document images.
Using the presented camera-captured dataset, a novel character recognition system is developed which is based on a variant of recurrent neural networks, i.e., Long Short Term Memory (LSTM) that outperforms all of the existing OCR engines on camera captured document images with an accuracy of more than 95%.
Finally, this thesis provides details on various tasks that have been performed in the area closely related to information segmentation. This includes automatic analysis and sketch based retrieval of architectural floor plan images, a novel scheme for online signature verification, and a part-based approach for signature verification. With these contributions, it has been shown that part-based methods can be successfully applied to document image analysis.
The advent of heterogeneous many-core systems has increased the spectrum
of achievable performance from multi-threaded programming. As the processor components become more distributed, the cost of synchronization and
communication needed to access the shared resources increases. Concurrent
linearizable access to shared objects can be prohibitively expensive in a high
contention workload. Though there are various mechanisms (e.g., lock-free
data structures) to circumvent the synchronization overhead in linearizable
objects, it still incurs performance overhead for many concurrent data types.
Moreover, many applications do not require linearizable objects and apply
ad-hoc techniques to eliminate synchronous atomic updates.
In this thesis, we propose the Global-Local View Model. This programming model exploits the heterogeneous access latencies in many-core systems.
In this model, each thread maintains different views on the shared object: a
thread-local view and a global view. As the thread-local view is not shared,
it can be updated without incurring synchronization costs. The local updates
become visible to other threads only after the thread-local view is merged
with the global view. This scheme improves the performance at the expense
of linearizability.
Besides the weak operations on the local view, the model also allows strong
operations on the global view. Combining operations on the global and the
local views, we can build data types with customizable consistency semantics
on the spectrum between sequential and purely mergeable data types. Thus
the model provides a framework that captures the semantics of Multi-View
Data Types. We discuss a formal operational semantics of the model. We
also introduce a verification method to verify the correctness of the implementation of several multi-view data types.
Frequently, applications require updating shared objects in an “all-or-nothing” manner. Therefore, the mechanisms to synchronize access to individual objects are not sufficient. Software Transactional Memory (STM)
is a mechanism that helps the programmer to correctly synchronize access to
multiple mutable shared data by serializing the transactional reads and writes.
But under high contention, serializable transactions incur frequent aborts and
limit parallelism, which can lead to severe performance degradation.
Mergeable Transactional Memory (MTM), proposed in this thesis, allows accessing multi-view data types within a transaction. Instead of aborting
and re-executing the transaction, MTM merges its changes using the data-type
specific merge semantics. Thus it provides a consistency semantics that allows
for more scalability even under contention. The evaluation of our prototype
implementation in Haskell shows that mergeable transactions outperform serializable transactions even under low contention while providing a structured
and type-safe interface.
Towards A Non-tracking Web
(2016)
Today, many publishers (e.g., websites, mobile application developers) commonly use third-party analytics services and social widgets. Unfortunately, this scheme allows these third parties to track individual users across the web, creating privacy concerns and leading to reactions to prevent tracking via blocking, legislation and standards. While improving user privacy, these efforts do not consider the functionality third-party tracking enables publishers to use: to obtain aggregate statistics about their users and increase their exposure to other users via online social networks. Simply preventing third-party tracking without replacing the functionality it provides cannot be a viable solution; leaving publishers without essential services will hurt the sustainability of the entire ecosystem.
In this thesis, we present alternative approaches to bridge this gap between privacy for users and functionality for publishers and other entities. We first propose a general and interaction-based third-party cookie policy that prevents third-party tracking via cookies, yet enables social networking features for users when wanted, and does not interfere with non-tracking services for analytics and advertisements. We then present a system that enables publishers to obtain rich web analytics information (e.g., user demographics, other sites visited) without tracking the users across the web. While this system requires no new organizational players and is practical to deploy, it necessitates the publishers to pre-define answer values for the queries, which may not be feasible for many analytics scenarios (e.g., search phrases used, free-text photo labels). Our second system complements the first system by enabling publishers to discover previously unknown string values to be used as potential answers in a privacy-preserving fashion and with low computation overhead for clients as well as servers. These systems suggest that it is possible to provide non-tracking services with (at least) the same functionality as today’s tracking services.
The goal of this work is to develop statistical natural language models and processing techniques
based on Recurrent Neural Networks (RNN), especially the recently introduced Long Short-
Term Memory (LSTM). Due to their adapting and predicting abilities, these methods are more
robust, and easier to train than traditional methods, i.e., words list and rule-based models. They
improve the output of recognition systems and make them more accessible to users for browsing
and reading. These techniques are required, especially for historical books which might take
years of effort and huge costs to manually transcribe them.
The contributions of this thesis are several new methods which have high-performance computing and accuracy. First, an error model for improving recognition results is designed. As
a second contribution, a hyphenation model for difficult transcription for alignment purposes
is suggested. Third, a dehyphenation model is used to classify the hyphens in noisy transcription. The fourth contribution is using LSTM networks for normalizing historical orthography.
A size normalization alignment is implemented to equal the size of strings, before the training
phase. Using the LSTM networks as a language model to improve the recognition results is
the fifth contribution. Finally, the sixth contribution is a combination of Weighted Finite-State
Transducers (WFSTs), and LSTM applied on multiple recognition systems. These contributions
will be elaborated in more detail.
Context-dependent confusion rules is a new technique to build an error model for Optical
Character Recognition (OCR) corrections. The rules are extracted from the OCR confusions
which appear in the recognition outputs and are translated into edit operations, e.g., insertions,
deletions, and substitutions using the Levenshtein edit distance algorithm. The edit operations
are extracted in a form of rules with respect to the context of the incorrect string to build an
error model using WFSTs. The context-dependent rules assist the language model to find the
best candidate corrections. They avoid the calculations that occur in searching the language
model and they also make the language model able to correct incorrect words by using context-
dependent confusion rules. The context-dependent error model is applied on the university of
Washington (UWIII) dataset and the Nastaleeq script in Urdu dataset. It improves the OCR
results from an error rate of 1.14% to an error rate of 0.68%. It performs better than the
state-of-the-art single rule-based which returns an error rate of 1.0%.
This thesis describes a new, simple, fast, and accurate system for generating correspondences
between real scanned historical books and their transcriptions. The alignment has many challenges, first, the transcriptions might have different modifications, and layout variations than the
original book. Second, the recognition of the historical books have misrecognition, and segmentation errors, which make the alignment more difficult especially the line breaks, and pages will
not have the same correspondences. Adapted WFSTs are designed to represent the transcription. The WFSTs process Fraktur ligatures and adapt the transcription with a hyphenations
model that allows the alignment with respect to the varieties of the hyphenated words in the line
breaks of the OCR documents. In this work, several approaches are implemented to be used for
the alignment such as: text-segments, page-wise, and book-wise approaches. The approaches
are evaluated on German calligraphic (Fraktur) script historical documents dataset from “Wan-
derungen durch die Mark Brandenburg” volumes (1862-1889). The text-segmentation approach
returns an error rate of 2.33% without using a hyphenation model and an error rate of 2.0%
using a hyphenation model. Dehyphenation methods are presented to remove the hyphen from
the transcription. They provide the transcription in a readable and reflowable format to be used
for alignment purposes. We consider the task as classification problem and classify the hyphens
from the given patterns as hyphens for line breaks, combined words, or noise. The methods are
applied on clean and noisy transcription for different languages. The Decision Trees classifier
returns better performance on UWIII dataset and returns an accuracy of 98%. It returns 97%
on Fraktur script.
A new method for normalizing historical OCRed text using LSTM is implemented for different texts, ranging from Early New High German 14th - 16th centuries to modern forms in New
High German applied on the Luther bible. It performed better than the rule-based word-list
approaches. It provides a transcription for various purposes such as part-of-speech tagging and
n-grams. Also two new techniques are presented for aligning the OCR results and normalize the
size by using adding Character-Epsilons or Appending-Epsilons. They allow deletion and insertion in the appropriate position in the string. In normalizing historical wordforms to modern
wordforms, the accuracy of LSTM on seen data is around 94%, while the state-of-the-art combined rule-based method returns 93%. On unseen data, LSTM returns 88% and the combined
rule-based method returns 76%. In normalizing modern wordforms to historical wordforms, the
LSTM delivers the best performance and returns 93.4% on seen data and 89.17% on unknown
data.
In this thesis, a deep investigation has been done on constructing high-performance language
modeling for improving the recognition systems. A new method to construct a language model
using LSTM is designed to correct OCR results. The method is applied on UWIII and Urdu
script. The LSTM approach outperforms the state-of-the-art, especially for unseen tokens
during training. On the UWIII dataset, the LSTM returns reduction in OCR error rates from
1.14% to 0.48%. On the Nastaleeq script in Urdu dataset, the LSTM reduces the error rate
from 6.9% to 1.58%.
Finally, the integration of multiple recognition outputs can give higher performance than a
single recognition system. Therefore, a new method for combining the results of OCR systems is
explored using WFSTs and LSTM. It uses multiple OCR outputs and votes for the best output
to improve the OCR results. It performs better than the ISRI tool, Pairwise of Multiple Sequence and it helps to improve the OCR results. The purpose is to provide correct transcription
so that it can be used for digitizing books, linguistics purposes, N-grams, and part-of-speech
tagging. The method consists of two alignment steps. First, two recognition systems are aligned
using WFSTs. The transducers are designed to be more flexible and compatible with the different symbols in line and page breaks to avoid the segmentation and misrecognition errors.
The LSTM model then is used to vote the best candidate correction of the two systems and
improve the incorrect tokens which are produced during the first alignment. The approaches
are evaluated on OCRs output from the English UWIII and historical German Fraktur dataset
which are obtained from state-of-the-art OCR systems. The Experiments show that the error
rate of ISRI-Voting is 1.45%, the error rate of the Pairwise of Multiple Sequence is 1.32%, the
error rate of the Line-to-Page alignment is 1.26% and the error rate of the LSTM approach has
the best performance with 0.40%.
The purpose of this thesis is to contribute methods providing correct transcriptions corresponding to the original book. This is considered to be the first step towards an accurate and
more effective use of the documents in digital libraries.
The Context and Its Importance: In safety and reliability analysis, the information generated by Minimal Cut Set (MCS) analysis is large.
The Top Level event (TLE) that is the root of the fault tree (FT) represents a hazardous state of the system being analyzed.
MCS analysis helps in analyzing the fault tree (FT) qualitatively-and quantitatively when accompanied with quantitative measures.
The information shows the bottlenecks in the fault tree design leading to identifying weaknesses of the system being examined.
Safety analysis (containing the MCS analysis) is especially important for critical systems, where harm can be done to the environment or human causing injuries, or even death during the system usage.
Minimal Cut Set (MCS) analysis is performed using computers and generating a lot of information.
This phase is called MCS analysis I in this thesis.
The information is then analyzed by the analysts to determine possible issues and to improve the design of the system regarding its safety as early as possible.
This phase is called MCS analysis II in this thesis.
The goal of my thesis was developing interactive visualizations to support MCS analysis II of one fault tree (FT).
The Methodology: As safety visualization-in this thesis, Minimal Cut Set analysis II visualization-is an emerging field and no complete checklist regarding Minimal Cut Set analysis II requirements and gaps were available from the perspective of visualization and interaction capabilities,
I have conducted multiple studies using different methods with different data sources (i.e., triangulation of methods and data) for determining these requirements and gaps before developing and evaluating visualizations and interactions supporting Minimal Cut Set analysis II.
Thus, the following approach was taken in my thesis:
1- First, a triangulation of mixed methods and data sources was conducted.
2- Then, four novel interactive visualizations and one novel interaction widget were developed.
3- Finally, these interactive visualizations were evaluated both objectively and subjectively (compared to multiple safety tools)
from the point of view of users and developers of the safety tools that perform MCS analysis I with respect to their degree in supporting MCS analysis II and from the point of non-domain people using empirical strategies.
The Spiral tool supports analysts with different visions, i.e., full vision, color deficiency protanopia, deuteranopia, and tritanopia. It supports 100 out of 103 (97%) requirements obtained from the triangulation and it fills 37 out of 39 (95%) gaps. Its usability was rated high (better than their best currently used tools) by the users of the safety and reliability tools (RiskSpectrum, ESSaRel, FaultTree+, and a self-developed tool) and at least similar to the best currently used tools from the point of view of the CAFTA tool developers. Its quality was higher regarding its degree of supporting MCS analysis II compared to the FaultTree+ tool. The time spent for discovering the critical MCSs from a problem size of 540 MCSs (with a worst case of all equal order) was less than a minute while achieving 99.5% accuracy. The scalability of the Spiral visualization was above 4000 MCSs for a comparison task. The Dynamic Slider reduces the interaction movements up to 85.71% of the previous sliders and solves the overlapping thumb issues by the sliders provides the 3D model view of the system being analyzed provides the ability to change the coloring of MCSs according to the color vision of the user provides selecting a BE (i.e., multi-selection of MCSs), thus, can observe the BEs' NoO and provides its quality provides two interaction speeds for panning and zooming in the MCS, BE, and model views provide a MCS, a BE, and a physical tab for supporting the analysis starting by the MCSs, the BEs, or the physical parts. It combines MCS analysis results and the model of an embedded system enabling the analysts to directly relate safety information with the corresponding parts of the system being analyzed and provides an interactive mapping between the textual information of the BEs and MCSs and the parts related to the BEs.
Verifications and Assessments: I have evaluated all visualizations and the interaction widget both objectively and subjectively, and finally evaluated the final Spiral visualization tool also both objectively and subjectively regarding its perceived quality and regarding its degree of supporting MCS analysis II.
A growing share of all software development project work is being done by geographically distributed teams. To satisfy shorter product design cycles, expert team members for a development project may need to be r ecruited globally. Yet to avoid extensive travelling or r eplacement costs, distributed project work is preferred. Current-generation software engineering tools and ass ociated systems, processes, and methods were for the most part developed to be used within a single enterprise. Major innovations have lately been introduced to enable groupware applications on the Internet to support global collaboration. However, their deployment for distributed software projects requires further research. In partic ular, groupware methods must seamlessly be integrated with project and product management systems to make them attractive for industry. In this position paper we outline the major challenges concerning distributed (virtual) software projects. Based on our experiences with software process modeling and enactment environments, we then propose approaches to solve those challenges.
Information Visualization (InfoVis) and Human-Computer Interaction (HCI) have strong ties with each other. Visualization supports the human cognitive system by providing interactive and meaningful images of the underlying data. On the other side, the HCI domain cares about the usability of the designed visualization from the human perspectives. Thus, designing a visualization system requires considering many factors in order to achieve the desired functionality and the system usability. Achieving these goals will help these people in understanding the inside behavior of complex data sets in less time.
Graphs are widely used data structures to represent the relations between the data elements in complex applications. Due to the diversity of this data type, graphs have been applied in numerous information visualization applications (e.g., state transition diagrams, social networks, etc.). Therefore, many graph layout algorithms have been proposed in the literature to help in visualizing this rich data type. Some of these algorithms are used to visualize large graphs, while others handle the medium sized graphs. Regardless of the graph size, the resulting layout should be understandable from the users’ perspective and at the same time it should fulfill a list of aesthetic criteria to increase the representation readability. Respecting these two principles leads to produce a resulting graph visualization that helps the users in understanding and exploring the complex behavior of critical systems.
In this thesis, we utilize the graph visualization techniques in modeling the structural and behavioral aspects of embedded systems. Furthermore, we focus on evaluating the resulting representations from the users’ perspectives.
The core contribution of this thesis is a framework, called ESSAVis (Embedded Systems Safety Aspect Visualizer). This framework visualizes not only some of the safety aspects (e.g. CFT models) of embedded systems, but also helps the engineers and experts in analyzing the system safety critical situations. For this, the framework provides a 2Dplus3D environment in which the 2D represents the graph representation of the abstract data about the safety aspects of the underlying embedded system while the 3D represents the underlying system 3D model. Both views are integrated smoothly together in the 3D world fashion. In order to check the effectiveness and feasibility of the framework and its sub-components, we conducted many studies with real end users as well as with general users. Results of the main study that targeted the overall ESSAVis framework show high acceptance ratio and higher accuracy with better performance using the provided visual support of the framework.
The ESSAVis framework has been designed to be compatible with different 3D technologies. This enabled us to use the 3D stereoscopic depth of such technologies to encode nodes attributes in node-link diagrams. In this regard, we conducted an evaluation study to measure the usability of the stereoscopic depth cue approach, called the stereoscopic highlighting technique, against other selected visual cues (i.e., color, shape, and sizes). Based on the results, the thesis proposes the Reflection Layer extension to the stereoscopic highlighting technique, which was also evaluated from the users’ perspectives. Additionally, we present a new technique, called ExpanD (Expand in Depth), that utilizes the depth cue to show the structural relations between different levels of details in node-link diagrams. Results of this part opens a promising direction of the research in which visualization designers can get benefits from the richness of the 3D technologies in visualizing abstract data in the information visualization domain.
Finally, this thesis proposes the application of the ESSAVis frame- work as a visual tool in the educational training process of engineers for understanding the complex concepts. In this regard, we conducted an evaluation study with computer engineering students in which we used the visual representations produced by ESSAVis to teach the principle of the fault detection and the failure scenarios in embedded systems. Our work opens the directions to investigate many challenges about the design of visualization for educational purposes.
Evaluation is an important issue for every scientific field and a necessity for an emerging soft-ware technology like case- based reasoning. This paper is a supplementation to the review of industrial case-based reasoning tools by K.-D. Althoff, E. Auriol, R. Barletta and M. Manago which describes the most detailed evaluation of commercial case-based reasoning tools currently available. The author focuses on some important aspects that correspond to the evaluation ofcase-based reasoning systems and gives links to ongoing research.
We present two techniques for reasoning from cases to solve classification tasks: Induction and case-based reasoning. We contrast the two technologies (that are often confused) and show how they complement each other. Based on this, we describe how they are integrated in one single platform for reasoning from cases: The Inreca system.
Case-Based Reasoning for Decision Support and Diagnostic Problem Solving: The INRECA Approach
(1995)
INRECA offers tools and methods for developing, validating, and maintaining decision support systems. INRECA's basic technologies are inductive and case-based reasoning, namely KATE -INDUCTION (cf., e.g., Manago, 1989; Manago, 1990) and S3-CASE, a software product based on PATDEX (cf., e.g., Wess,1991; Richter & Wess, 1991; Althoff & Wess, 1991). Induction extracts decision knowledge from case databases. It brings to light patterns among cases and helps monitoring trends over time. Case-based rea -soning relates the engineer's current problem to past experiences.
MOLTKE is a research project dealing with a complex technical application. After describing the domain of CNCmachining centers and the applied KA methods, we summarize the concrete KA problems which we have to handle. Then we describe a KA mechanism which supports an engineer in developing a diagnosis system. In chapter 6 weintroduce learning techniques operating on diagnostic cases and domain knowledge for improving the diagnostic procedure of MOLTKE. In the last section of this chapter we outline some essential aspects of organizationalknowledge which is heavily applied by engineers for analysing such technical systems (Qualitative Engineering). Finally we give a short overview of the actual state of realization and our future plans.
In this paper we will present a design model (in the sense of KADS) for the domain of technical diagnosis. Based on this we will describe the fully implemented expert system shell MOLTKE 3.0, which integrates common knowledge acquisition methods with techniques developed in the fields of Model-Based Diagnosis and Machine Learning, especially Case-Based Reasoning.
We present an approach to systematically describing case-based reasoning systems bydifferent kinds of criteria. One main requirement was the practical relevance of these criteria and their usability for real-life applications. We report on the results we achieved from a case study carried out in the INRECA1 Esprit project.
Case-based knowledge acquisition, learning and problem solving for diagnostic real world tasks
(1999)
Within this paper we focus on both the solution of real, complex problems using expert system technology and the acquisition of the necessary knowledge from a case-based reasoning point of view. The development of systems which can be applied to real world problems has to meet certain requirements. E.g., all available information sources have to be identified and utilized. Normally, this involves different types of knowledge for which several knowledge representation schemes are needed, because no scheme is equally natural for all sources. Facing empirical knowledge it is important to complement the use of manually compiled, statistic and otherwise induced knowledge by the exploitation of the intuitive understandability of case-based mechanisms. Thus, an integration of case-based and alternative knowledge acquisition and problem solving mechanisms is necessary. For this, the basis is to define the "role" which case-based inference can "play" within a knowledge acquisition workbench. We will discuss a concrete casebased architecture, which has been applied to technical diagnosis problems, and its integration into a knowledge acquisition workbench which includes compiled knowledge and explicit deep models, additionally.
Retrieval of cases is one important step within the case-based reasoning paradigm. We propose an improvement of this stage in the process model for finding most similar cases with an average effort of O[log2n], n number of cases. The basic idea of the algorithm is to use the heterogeneity of the search space for a density-based structuring and to employ this precomputed structure, a k-d tree, for efficient case retrieval according to a given similarity measure sim. In addition to illustrating the basic idea, we present the expe- rimental results of a comparison of four different k-d tree generating strategies as well as introduce the notion of virtual bounds as a new one that significantly reduces the retrieval effort from a more pragmatic perspective. The presented approach is fully implemented within the (Patdex) system, a case-based reasoning system for diagnostic applications in engineering domains.
DeepKAF: A Knowledge Intensive Framework for Heterogeneous Case-Based Reasoning in Textual Domains
(2021)
Business-relevant domain knowledge can be found in plain text across message exchanges
among customer support tickets, employee message exchanges and other business transactions.
Decoding text-based domain knowledge can be a very demanding task since traditional
methods focus on a comprehensive representation of the business and its relevant paths. Such
a process can be highly complex, time-costly and of high maintenance effort, especially in
environments that change dynamically.
In this thesis, a novel approach is presented for developing hybrid case-based reasoning
(CBR) systems that bring together the benefits of deep learning approaches with CBR advantages.
Deep Knowledge Acquisition Framework (DeepKAF) is a domain-independent
framework that features the usage of deep neural networks and big data technologies to decode
the domain knowledge with the minimum involvement from the domain experts. While
this thesis is focusing more on the textual data because of the availability of the datasets, the
target CBR systems based on DeepKAF are able to deal with heterogeneous data where a
case can be represented by different attribute types and automatically extract the necessary
domain knowledge while keeping the ability to provide an adequate level of explainability.
The main focus within this thesis are automatic knowledge acquisition, building similarity
measures and cases retrieval.
Throughout the progress of this research, several sets of experiments have been conducted
and validated by domain experts. Past textual data produced over around 15 years have
been used for the needs of the conducted experiments. The text produced is a mixture
between English and German texts that were used to describe specific domain problems
with a lot of abbreviations. Based on these, the necessary knowledge repositories were built
and used afterwards in order to evaluate the suggested approach towards effective monitoring
and diagnosis of business workflows. Another public dataset has been used, the CaseLaw
dataset, to validate DeepKAF when dealing with longer text and cases with more attributes.
The CaseLaw dataset represents around 22 million cases from different US states.
Further work motivated by this thesis could investigate how different deep learning models
can be used within the CBR paradigm to solve some of the chronic CBR challenges and be
of benefit to large-scale multi-dimensional enterprises.
A prime motivation for using XML to directly represent pieces of information is the ability of supporting ad-hoc or 'schema-later' settings. In such scenarios, modeling data under loose data constraints is essential. Of course, the flexibility of XML comes at a price: the absence of a rigid, regular, and homogeneous structure makes many aspects of data management more challenging. Such malleable data formats can also lead to severe information quality problems, because the risk of storing inconsistent and incorrect data is greatly increased. A prominent example of such problems is the appearance of the so-called fuzzy duplicates, i.e., multiple and non-identical representations of a real-world entity. Similarity joins correlating XML document fragments that are similar can be used as core operators to support the identification of fuzzy duplicates. However, similarity assessment is especially difficult on XML datasets because structure, besides textual information, may exhibit variations in document fragments representing the same real-world entity. Moreover, similarity computation is substantially more expensive for tree-structured objects and, thus, is a serious performance concern. This thesis describes the design and implementation of an effective, flexible, and high-performance XML-based similarity join framework. As main contributions, we present novel structure-conscious similarity functions for XML trees - either considering XML structure in isolation or combined with textual information -, mechanisms to support the selection of relevant information from XML trees and organization of this information into a suitable format for similarity calculation, and efficient algorithms for large-scale identification of similar, set-represented objects. Finally, we validate the applicability of our techniques by integrating our framework into a native XML database management system; in this context we address several issues around the integration of similarity operations into traditional database architectures.
Since their invention in the 1980s, behaviour-based systems have become very popular among roboticists. Their component-based nature facilitates the distributed implementation of systems, fosters reuse, and allows for early testing and integration. However, the distributed approach necessitates the interconnection of many components into a network in order to realise complex functionalities. This network is crucial to the correct operation of the robotic system. There are few sound design techniques for behaviour networks, especially if the systems shall realise task sequences. Therefore, the quality of the resulting behaviour-based systems is often highly dependant on the experience of their developers.
This dissertation presents a novel integrated concept for the design and verification of behaviour-based systems that realise task sequences. Part of this concept is a technique for encoding task sequences in behaviour networks. Furthermore, the concept provides guidance to developers of such networks. Based on a thorough analysis of methods for defining sequences, Moore machines have been selected for representing complex tasks. With the help of the structured workflow proposed in this work and the developed accompanying tool support, Moore machines defining task sequences can be transferred automatically into corresponding behaviour networks, resulting in less work for the developer and a lower risk of failure.
Due to the common integration of automatically and manually created behaviour-based components, a formal analysis of the final behaviour network is reasonable. For this purpose, the dissertation at hand presents two verification techniques and justifies the selection of model checking. A novel concept for applying model checking to behaviour-based systems is proposed according to which behaviour networks are modelled as synchronised automata. Based on such automata, properties of behaviour networks that realise task sequences can be verified or falsified. Extensive graphical tool support has been developed in order to assist the developer during the verification process.
Several examples are provided in order to illustrate the soundness of the presented design and verification techniques. The applicability of the integrated overall concept to real-world tasks is demonstrated using the control system of an autonomous bucket excavator. It can be shown that the proposed design concept is suitable for developing complex sophisticated behaviour networks and that the presented verification technique allows for verifying real-world behaviour-based systems.
When designing autonomous mobile robotic systems, there usually is a trade-off between the three opposing goals of safety, low-cost and performance.
If one of these design goals is approached further, it usually leads to a recession of one or even both of the other goals.
If for example the performance of a mobile robot is increased by making use of higher vehicle speeds, then the safety of the system is usually decreased, as, under the same circumstances, faster robots are often also more dangerous robots.
This decrease of safety can be mitigated by installing better sensors on the robot, which ensure the safety of the system, even at high speeds.
However, this solution is accompanied by an increase of system cost.
In parallel to mobile robotics, there is a growing amount of ambient and aware technology installations in today's environments - no matter whether in private homes, offices or factory environments.
Part of this technology are sensors that are suitable to assess the state of an environment.
For example, motion detectors that are used to automate lighting can be used to detect the presence of people.
This work constitutes a meeting point between the two fields of robotics and aware environment research.
It shows how data from aware environments can be used to approach the abovementioned goal of establishing safe, performant and additionally low-cost robotic systems.
Sensor data from aware technology, which is often unreliable due to its low-cost nature, is fed to probabilistic methods for estimating the environment's state.
Together with models, these methods cope with the uncertainty and unreliability associated with the sensor data, gathered from an aware environment.
The estimated state includes positions of people in the environment and is used as an input to the local and global path planners of a mobile robot, enabling safe, cost-efficient and performant mobile robot navigation during local obstacle avoidance as well as on a global scale, when planning paths between different locations.
The probabilistic algorithms enable graceful degradation of the whole system.
Even if, in the extreme case, all aware technology fails, the robots will continue to operate, by sacrificing performance while maintaining safety.
All the presented methods of this work have been validated using simulation experiments as well as using experiments with real hardware.
PANDA is a run-time package based on a very small operating system kernel which supports distributed applications written in C++. It provides powerful abstractions such as very efficient user-level threads, a uniform global address space, object and thread mobility, garbage collection, and persistent objects. The paper discusses the design ration- ales underlying the PANDA system. The fundamental features of PANDA are surveyed, and their implementation in the current prototype environment is outlined.
Distributed systems are an alternative to shared-memorymultiprocessors for the execution of parallel applications.PANDA is a runtime system which provides architecturalsupport for efficient parallel and distributed program-ming. PANDA supplies means for fast user-level threads,and for a transparent and coordinated sharing of objectsacross a homogeneous network. The paper motivates themajor architectural choices that guided our design. Theproblem of sharing data in a distributed environment isdiscussed, and the performance of appropriate mecha-nisms provided by the PANDA prototype implementation isassessed.
This paper introduces a new high Level programming language for a novel
class of computational devices namely data-procedural machines. These machines are by up to several orders of magnitude more efficient than the von Neumann paradigm of computers and are as flexible and as universal as computers. Their efficiency and flexibility is achieved by using field-programmable logic as the essential technology platform. The paper briefly summarizes and illustrates the essential new features of this language by means of two example programs.
As the properties of components have gradually become clearer, attention has started to turn to the architectural issues which govern their interaction and composition. In this paper we identify some of the major architectural questions affecting component-based software develop-ment and describe the predominant architectural dimensions. Of these, the most interesting is the "architecture hierarchy" which we believe is needed to address the "interface vicissitude" problem that arises whenever interaction refinement is explicitly documented within a component-based system. We present a solution to this problem based on the concept of stratified architectures and object metamorphosis Finally, we describe how these concepts may assist in increasing the tailorability of component-based frameworks.
This paper presents a new kind of abstraction, which has been developed for the purpose of proofplanning. The basic idea of this paper is to abstract a given theorem and to find an abstractproof of it. Once an abstract proof has been found, this proof has to be refined to a real proofof the original theorem. We present a goal oriented abstraction for the purpose of equality proofplanning, which is parameterized by common parts of the left- and right-hand sides of the givenequality. Therefore, this abstraction technique provides an abstract equality problem which ismore adequate than those generated by the abstractions known so far. The presented abstractionalso supports the heuristic search process based on the difference reduction paradigm. We give aformal definition of the abstract space including the objects and their manipulation. Furthermore,we prove some properties in order to allow an efficient implementation of the presented abstraction.
Simultaneous quantifier elimination in sequent calculus is an improvement over the well-known skolemization. It allows a lazy handling of instantiations as well as of the order of certain reductions. We prove the soundness of a sequent calculus which incorporates a rule for simultaneous quantifier elimination. The proof is performed by semantical arguments and provides some insights into the dependencies between various formulas in a sequent.
In this paper we show that distributing the theorem proving task to several experts is a promising idea. We describe the team work method which allows the experts to compete for a while and then to cooperate. In the cooperation phase the best results derived in the competition phase are collected and the less important results are forgotten. We describe some useful experts and explain in detail how they work together. We establish fairness criteria and so prove the distributed system to be both, complete and correct. We have implementedour system and show by non-trivial examples that drastical time speed-ups are possible for a cooperating team of experts compared to the time needed by the best expert in the team.
This report contains a collection of abstracts for talks given at the "Deduktionstreffen" held at Kaiserslautern, October 6 to 8, 1993. The topics of the talks range from theoretical aspects of term rewriting systems and higher order resolution to descriptions of practical proof systems in various applications. They are grouped together according the following classification: Distribution and Combination of Theorem Provers, Termination, Completion, Functional Programs, Inductive Theorem Proving, Automatic Theorem Proving, Proof Presentation. The Deduktionstreffen is the annual meeting of the Fachgruppe Deduktionssysteme in the Gesellschaft für Informatik (GI), the German association for computer science.
We study deterministic conditional rewrite systems, i.e. conditional rewrite systemswhere the extra variables are not totally free but 'input bounded'. If such a systemR is quasi-reductive then !R is decidable and terminating. We develop a critical paircriterion to prove confluence if R is quasi-reductive and strongly deterministic. In thiscase we prove that R is logical, i.e./!R==R holds. We apply our results to proveHorn clause programs to be uniquely terminating.This research was supported by the Deutsche Forschungsgemeinschaft, SFB 314, Project D4
We investigate one of the classical problems of the theory ofterm rewriting, namely termination. We present an ordering for compar-ing higher-order terms that can be utilized for testing termination anddecreasingness of higher-order conditional term rewriting systems. Theordering relies on a first-order interpretation of higher-order terms anda suitable extension of the RPO.
In this paper we are interested in an algebraic specification language that (1) allowsfor sufficient expessiveness, (2) admits a well-defined semantics, and (3) allows for formalproofs. To that end we study clausal specifications over built-in algebras. To keep thingssimple, we consider built-in algebras only that are given as the initial model of a Hornclause specification. On top of this Horn clause specification new operators are (partially)defined by positive/negative conditional equations. In the first part of the paper wedefine three types of semantics for such a hierarchical specification: model-theoretic,operational, and rewrite-based semantics. We show that all these semantics coincide,provided some restrictions are met. We associate a distinguished algebra A spec to ahierachical specification spec. This algebra is initial in the class of all models of spec.In the second part of the paper we study how to prove a theorem (a clause) valid in thedistinguished algebra A spec . We first present an abstract framework for inductive theoremprovers. Then we instantiate this framework for proving inductive validity. Finally wegive some examples to show how concrete proofs are carried out.This report was supported by the Deutsche Forschungsgemeinschaft, SFB 314 (D4-Projekt)
This paper discusses the benefits and drawbacks of caching and replication strategies in the WWW with respect to the Internet infrastructure. Bandwidth consumption, latency, and overall error rates are considered to be most important from a network point of view. The dependencies of these values with input parameters like degree of replication, document popularity, actual cache hit rates, and error rates are highlighted. In order to determine the influence of different caching and replication strategies on the behavior of a single proxy server with respect to these values, trace-based simulations are used. Since the overall effects of such strate- gies can hardly be decided with this approach alone, a mathematical model has been developed to deal with their influence on the network as a whole. Together, this two-tiered approach permits us to propose quantita- tive assessments on the influence different caching and replication proposals (are going to) have on the Inter- net infrastructure.
As global networks are being used by more and more people,they are becoming increasingly interesting for commercial appli-cations. The recent success and change in direction of the World-Wide Web is a clear indication for this. However, this success meta largely unprepared communications infrastructure. The Inter-net as an originally non-profit network did neither offer the secu-rity, nor the globally available accounting infrastructure byitself.These problems were addressed in the recent past, but in aseemingly ad-hoc manner. Several different accounting schemessensible for only certain types of commercial transactions havebeen developed, which either seem to neglect the problems ofscalability, or trade security for efficiency. Finally, some propos-als aim at achieving near perfect security at the expense of effi-ciency, thus rendering those systems to be of no practical use.In contrast, this paper presents a suitably configurable schemefor accounting in a general, widely distributed client/server envi-ronment. When developing the protocol presented in this paper,special attention has been paid to make this approach work wellin the future setting of high-bandwidth, high-latency internets.The developed protocol has been applied to a large-scale distrib-uted application, a WWW-based software development environ-ment.
In this paper, a framework for globally distributed soft-ware development and management environments, whichwe call Booster is presented. Additionally, the first experi-ences with WebMake, an application developed to serve asan experimental platform for a software developmentenvironment based on the World Wide Web and theBooster framework is introduced. Booster encompasses thebasic building blocks and mechanisms necessary tosupport a truly cooperative distributed softwaredevelopment from the very beginning to the last steps in asoftware life cycle. It is thus a precursor of the GlobalSoftware Highway, in which providers and users can meetfor the development, management, exchange and usage ofall kind of software.
In this paper, we compare the BERKOM globally ac-cessible services project (GLASS) with the well-knownWorld-Wide Web with respect to the ease of development,realization, and distribution of multimedia presentations.This comparison is based on the experiences we gainedwhen implementing a gateway between GLASS and theWorld-Wide Web. Since both systems are shown to haveobvious weaknesses, we are concluding this paper with apresentation of a better way to multimedia document en-gineering and distribution. This concept is based on awell-accepted approach to function-shipping in the Inter-net: the Java language, permitting for example a smoothintegration of GLASS92 MHEG objects and WWW HTMLpages within one common environment.
An Adaptive and Dynamic Simulation Framework for Incremental, Collaborative Classifier Fusion
(2016)
Abstract. To investigate incremental collaborative classifier fusion techniques, we have developed a comprehensive simulation framework. It is highly flexible and customizable, and can be adapted to various settings and scenarios. The toolbox is realized as an extension to the NetLogo multi-agent based simulation environment using its comprehensive Java- API. The toolbox has been integrated in two di↵erent environments, one for demonstration purposes and another, modeled on persons using re- alistic motion data from Zurich, who are communicating in an ad hoc fashion using mobile devices.
In this thesis we developed a desynchronization design flow in the goal of easing the de- velopment effort of distributed embedded systems. The starting point of this design flow is a network of synchronous components. By transforming this synchronous network into a dataflow process network (DPN), we ensures important properties that are difficult or theoretically impossible to analyze directly on DPNs are preserved by construction. In particular, both deadlock-freeness and buffer boundedness can be preserved after desyn- chronization. For the correctness of desynchronization, we developed a criteria consisting of two properties: a global property that demands the correctness of the synchronous network, as well as a local property that requires the latency-insensitivity of each local synchronous component. As the global property is also a correctness requirement of synchronous systems in general, we take this property as an assumption of our desyn- chronization. However, the local property is in general not satisfied by all synchronous components, and therefore needs to be verified before desynchronization. In this thesis we developed a novel technique for the verification of the local property that can be carried out very efficiently. Finally we developed a model transformation method that translates a set of synchronous guarded actions – an intermediate format for synchronous systems – to an asynchronous actor description language (CAL). Our theorem ensures that one passed the correctness verification, the generated DPN of asynchronous pro- cesses (or actors) preserves the functional behavior of the original synchronous network. Moreover, by the correctness of the synchronous network, our theorem guarantees that the derived DPN is deadlock-free and can be implemented with only finitely bounded buffers.
In this thesis we developed a desynchronization design flow in the goal of easing the de- velopment effort of distributed embedded systems. The starting point of this design flow is a network of synchronous components. By transforming this synchronous network into a dataflow process network (DPN), we ensures important properties that are difficult or theoretically impossible to analyze directly on DPNs are preserved by construction. In particular, both deadlock-freeness and buffer boundedness can be preserved after desyn- chronization. For the correctness of desynchronization, we developed a criteria consisting of two properties: a global property that demands the correctness of the synchronous network, as well as a local property that requires the latency-insensitivity of each local synchronous component. As the global property is also a correctness requirement of synchronous systems in general, we take this property as an assumption of our desyn- chronization. However, the local property is in general not satisfied by all synchronous components, and therefore needs to be verified before desynchronization. In this thesis we developed a novel technique for the verification of the local property that can be carried out very efficiently. Finally we developed a model transformation method that translates a set of synchronous guarded actions – an intermediate format for synchronous systems – to an asynchronous actor description language (CAL). Our theorem ensures that one passed the correctness verification, the generated DPN of asynchronous pro- cesses (or actors) preserves the functional behavior of the original synchronous network. Moreover, by the correctness of the synchronous network, our theorem guarantees that the derived DPN is deadlock-free and can be implemented with only finitely bounded buffers.
Today’s pervasive availability of computing devices enabled with wireless communication and location- or inertial sensing capabilities is unprecedented. The number of smartphones sold worldwide are still growing and increasing numbers of sensor enabled accessories are available which a user can wear in the shoe or at the wrist for fitness tracking, or just temporarily puts on to measure vital signs. Despite this availability of computing and sensing hardware the merit of application seems rather limited regarding the full potential of information inherent to such senor deployments. Most applications build upon a vertical design which encloses a narrowly defined sensor setup and algorithms specifically tailored to suit the application’s purpose. Successful technologies, however, such as the OSI model, which serves as base for internet communication, have used a horizontal design that allows high level communication protocols to be run independently from the actual lower-level protocols and physical medium access. This thesis contributes to a more horizontal design of human activity recognition systems at two stages. First, it introduces an integrated toolchain to facilitate the entire process of building activity recognition systems and to foster sharing and reusing of individual components. At a second stage, a novel method for automatic integration of new sensors to increase a system’s performance is presented and discussed in detail.
The integrated toolchain is built around an efficient toolbox of parametrizable components for interfacing sensor hardware, synchronization and arrangement of data streams, filtering and extraction of features, classification of feature vectors, and interfacing output devices and applications. The toolbox emerged as open-source project through several research projects and is actively used by research groups. Furthermore, the toolchain supports recording, monitoring, annotation, and sharing of large multi-modal data sets for activity recognition through a set of integrated software tools and a web-enabled database.
The method for automatically integrating a new sensor into an existing system is, at its core, a variation of well-established principles of semi-supervised learning: (1) unsupervised clustering to discover structure in data, (2) assumption that cluster membership is correlated with class membership, and (3) obtaining at a small number of labeled data points for each cluster, from which the cluster labels are inferred. In most semi-supervised approaches, however, the labels are the ground truth provided by the user. By contrast, the approach presented in this thesis uses a classifier trained on an N-dimensional feature space (old classifier) to provide labels for a few points in an (N+1)-dimensional feature space which are used to generate a new, (N+1)-dimensional classifier. The different factors that make a distribution difficult to handle are discussed, a detailed description of heuristics designed to mitigate the influences of such factors is provided, and a detailed evaluation on a set of over 3000 sensor combinations from 3 multi-user experiments that have been used by a variety of previous studies of different activity recognition methods is presented.
This thesis provides a fully automatic translation from synchronous programs to parallel software for different architectures, in particular, shared memory processing (SMP) and distributed memory systems. Thereby, we exploit characteristics of the synchronous model of computation (MoC) to reduce communication and to improve available parallelism and load-balancing by out-of-order (OOO) execution and data speculation.
Manual programming of parallel software requires the developers to partition a system into tasks and to add synchronization and communication. The model-based approach of development abstracts from details of the target architecture and allows to make decisions about the target architecture as late as possible. The synchronous MoC supports this approach by abstracting from time and providing implicit parallelism and synchronization. Existing compilation techniques translate synchronous programs into synchronous guarded actions (SGAs) which are an intermediate format abstracting from semantic problems in synchronous languages. Compilers for SGAs analyze causality problems, ensure logical correctness and the absence of schizophrenia problems. Hence, SGAs are a simplified and general starting point and keep the synchronous MoC at the same time. The instantaneous feedback in the synchronous MoC makes the mapping of these systems to parallel software a non-trivial task. In contrast, other MoCs such as data-flow processing networks (DPNs) directly match with parallel architectures. We translate the SGAs into DPNs,which represent a commonly used model to create parallel software. DPNs have been proposed as a programming model for distributed parallel systems that have communication paths with unpredictable latencies. The purely data-driven execution of DPNs does not require a global coordination and therefore DPNs can be easily mapped to parallel software for architectures with distributed memory. The generation of efficient parallel code from DPNs challenges compiler design with two issues: To perfectly utilize a parallel system, the communication and synchronization has to be kept low, and the utilization of the computational units has to be balanced. The variety of hardware architectures and dynamic execution techniques in processing units of these systems make a statically balanced distributed execution impossible.
The synchronous MoC is still reflected in our generated DPNs, which exhibits characteristics that allow optimizations concerning the previously mentioned issues. In particular, we apply a general communication reduction and OOO execution to achieve a dynamically balanced execution which is inspired from hardware design.
The recognition of day-to-day activities is still a very challenging and important research topic. During recent years, a lot of research has gone into designing and realizing smart environ- ments in different application areas such as health care, maintenance, sports or smart homes. As a result, a large amount of sensor modalities were developed, different types of activity and context recognition services were implemented and the resulting systems were benchmarked using state-of-the-art evaluation techniques. However, so far hardly any of these approaches have found their way into the market and consequently into the homes of real end-users on a large scale. The reason for this is, that almost all systems have one or more of the following characteristics in common: expensive high-end or prototype sensors are used which are not af- fordable or reliable enough for mainstream applications; many systems are deployed in highly instrumented environments or so-called "living labs", which are far from real-life scenarios and are often evaluated only in research labs; almost all systems are based on complex system con- figurations and/or extensive training data sets, which means that a large amount of data must be collected in order to install the system. Furthermore, many systems rely on a user and/or environment dependent training, which makes it even more difficult to install them on a large scale. Besides, a standardized integration procedure for the deployment of services in existing environments and smart homes has still not been defined. As a matter of fact, service providers use their own closed systems, which are not compatible with other systems, services or sensors. It is clear, that these points make it nearly impossible to deploy activity recognition systems in a real daily-life environment, to make them affordable for real users and to deploy them in hundreds or thousands of different homes.
This thesis works towards the solution of the above mentioned problems. Activity and context recognition systems designed for large-scale deployment and real-life scenarios are intro- duced. Systems are based on low-cost, reliable sensors and can be set up, configured and trained with little effort, even by technical laymen. It is because of these characteristics that we call our approach "minimally invasive". As a consequence, large amounts of training data, that are usu- ally required by many state-of-the-art approaches, are not necessary. Furthermore, all systems were integrated unobtrusively in real-world/similar to real-world environments and were evalu- ated under real-life, as well as similar to real-life conditions. The thesis addresses the following topics: First, a sub-room level indoor positioning system is introduced. The system is based on low-cost ceiling cameras and a simple computer vision tracking approach. The problem of user identification is solved by correlating modes of locomotion patterns derived from the trajectory of unidentified objects and on-body motion sensors. Afterwards, the issue of recognizing how and what mainstream household devices have been used for is considered. Based on a low-cost microphone, the water consumption of water-taps can be approximated by analyzing plumbing noise. Besides that, operating modes of mainstream electronic devices were recognized by using rule-based classifiers, electric current features and power measurement sensors. As a next step, the difficulty of spotting subtle, barely distinguishable hand activities and the resulting object interactions, within a data set containing a large amount of background data, is addressed. The problem is solved by introducing an on-body core system which is configured by simple, one-time physical measurements and minimal data collections. The lack of large training sets is compensated by fusing the system with activity and context recognition systems, that are able to reduce the search space observed. Amongst other systems, previously introduced approaches and ideas are revisited in this section. An in-depth evaluation shows the impact of each fusion procedure on the performance and run-time of the system. The approaches introduced are able to provide significantly better results than a state-of-the-art inertial system using large amounts of training data. The idea of using unobtrusive sensors has also been applied to the field of behavior analysis. Integrated smartphone sensors are used to detect behavioral changes of in- dividuals due to medium-term stress periods. Behavioral parameters related to location traces, social interactions and phone usage were analyzed to detect significant behavioral changes of individuals during stressless and stressful time periods. Finally, as a closing part of the the- sis, a standardization approach related to the integration of ambient intelligence systems (as introduced in this thesis) in real-life and large-scale scenarios is shown.
Large displays become more and more popular, due to dropping prices. Their size and high resolution leverages collaboration and they are capable of dis- playing even large datasets in one view. This becomes even more interesting as the number of big data applications increases. The increased screen size and other properties of large displays pose new challenges to the Human- Computer-Interaction with these screens. This includes issues such as limited scalability to the number of users, diversity of input devices in general, leading to increased learning efforts for users, and more.
Using smart phones and tablets as interaction devices for large displays can solve many of these issues. Since they are almost ubiquitous today, users can bring their own device. This approach scales well with the number of users. These mobile devices are easy and intuitive to use and allow for new interaction metaphors, as they feature a wide array of input and output capabilities, such as touch screens, cameras, accelerometers, microphones, speakers, Near-Field Communication, WiFi, etc.
This thesis will present a concept to solve the issues posed by large displays. We will show proofs-of-concept, with specialized approaches showing the via- bility of the concept. A generalized, eyes-free technique using smart phones or tablets to interact with any kind of large display, regardless of hardware or software then overcomes the limitations of the specialized approaches. This is implemented in a large display application that is designed to run under a multitude of environments, including both 2D and 3D display setups. A special visualization method is used to combine 2D and 3D data in a single visualization.
Additionally the thesis will present several approaches to solve common is- sues with large display interaction, such as target sizes on large display getting too small, expensive tracking hardware, and eyes-free interaction through vir- tual buttons. These methods provide alternatives and context for the main contribution.
Stochastic Network Calculus (SNC) emerged from two branches in the late 90s:
the theory of effective bandwidths and its predecessor the Deterministic Network
Calculus (DNC). As such SNC’s goal is to analyze queueing networks and support
their design and control.
In contrast to queueing theory, which strives for similar goals, SNC uses in-
equalities to circumvent complex situations, such as stochastic dependencies or
non-Poisson arrivals. Leaving the objective to compute exact distributions behind,
SNC derives stochastic performance bounds. Such a bound would, for example,
guarantee a system’s maximal queue length that is violated by a known small prob-
ability only.
This work includes several contributions towards the theory of SNC. They are
sorted into four main contributions:
(1) The first chapters give a self-contained introduction to deterministic net-
work calculus and its two branches of stochastic extensions. The focus lies on the
notion of network operations. They allow to derive the performance bounds and
simplifying complex scenarios.
(2) The author created the first open-source tool to automate the steps of cal-
culating and optimizing MGF-based performance bounds. The tool automatically
calculates end-to-end performance bounds, via a symbolic approach. In a second
step, this solution is numerically optimized. A modular design allows the user to
implement their own functions, like traffic models or analysis methods.
(3) The problem of the initial modeling step is addressed with the development
of a statistical network calculus. In many applications the properties of included
elements are mostly unknown. To that end, assumptions about the underlying
processes are made and backed by measurement-based statistical methods. This
thesis presents a way to integrate possible modeling errors into the bounds of SNC.
As a byproduct a dynamic view on the system is obtained that allows SNC to adapt
to non-stationarities.
(4) Probabilistic bounds are fundamentally different from deterministic bounds:
While deterministic bounds hold for all times of the analyzed system, this is not
true for probabilistic bounds. Stochastic bounds, although still valid for every time
t, only hold for one time instance at once. Sample path bounds are only achieved by
using Boole’s inequality. This thesis presents an alternative method, by adapting
the theory of extreme values.
(5) A long standing problem of SNC is the construction of stochastic bounds
for a window flow controller. The corresponding problem for DNC had been solved
over a decade ago, but remained an open problem for SNC. This thesis presents
two methods for a successful application of SNC to the window flow controller.
We show how to prove ground confluence of term rewrite relations that areinduced by reductive systems of clausal rewrite rules. According to a well-knowncritical pair criterion it suffices for such systems to prove ground joinability ofa suitable set of 'critical clauses'. We outline how the latter can be done in asystematic fashion, using mathematical induction as a key concept of reasoning.
The provision of quality-of-service (QoS) on the network layer is a major challenge in communication networks. This applies particularly to mobile ad-hoc networks (MANETs) in the area of Ambient Intelligence (AmI), especially with the increasing use of delay and bandwidth sensitive applications. The focus of this survey lies on the classification and analysis of selected QoS routing protocols in the domain of mobile ad-hoc networks. Each protocol is briefly described and assessed, and the results are summarized in multiple tables.
We describe a technique to make application programs fault tolerant. This techADnique is based on the concept of checkpointing from an active program to one ormore passive backup copies which serve as an abstraction of stable memory. Ifthe primary copy fails, one of the backup copies takes over and resumes processADing service requests. After each failure a new backup copy is created in order torestore the replication degree of the service. All mechanisms necessary to achieveand maintain fault tolerance can be added automatically to the code of a nonADfaulttolerant server, thus making fault tolerance completely transparent for the applicaADtion programmer.
We present a new software architecture in which all concepts necessary to achieve fault tolerance can be added to an appli- cation automatically without any source code changes. As a case study, we consider the problem of providing a reliable service despite node failures by executing a group of replicat- ed servers. Replica creation and management as well as fail- ure detection and recovery are performed automatically by a separate fault tolerance layer (ft-layer) which is inserted be- tween the server application and the operating system kernel. The layer is invisible for the application since it provides the same functional interface as the operating system kernel, thus making the fault tolerance property of the service completely transparent for the application. A major advantage of our ar- chitecture is that the layer encapsulates both fault tolerance mechanisms and policies. This allows for maximum flexibility in the choice of appropriate methods for fault tolerance with- out any changes in the application code.
Optical Character Recognition (OCR) is one of the central problems in pattern recognition. Its
applications have played a great role in the digitization of document images collected from het-
erogeneous sources. Many of the well-known scripts have OCR systems with sufficiently high
performance that enables OCR applications in industrial/commercial settings. However, OCR sys-
tems yield very-good results only on a narrow domain and very-specific use cases. Thus, it is still
a challenging task, and there are other exotic languages with indigenous scripts, such as Amharic,
for which no well-developed OCR systems exist.
As many as 100 million people speak Amharic, and it is an official working language of Ethiopia.
Amharic script contains about 317 different alphabets derived from 34 consonants with small changes.
The change involves shortening or elongating one of its main legs or adding small diacritics to the
right, left, top, or bottom of the consonant character. Such modifications lead the characters to have
similar shapes and make the recognition task complex, but this is particularly interesting for charac-
ter recognition research. So far, Amharic script recognition models are developed based on classical
machine learning techniques, and they are very limited in addressing the issues for Amharic OCR.
The motivation of this thesis is, therefore, to explore and tailor contemporary deep learning tech-
niques for the OCR of Amharic.
This thesis addresses the challenges in Amharic OCR through two main contributions. The first
contribution is an algorithmic contribution in which we investigate deep learning approaches that
suit the demand for Amharic OCR. The second is a technical contribution that comprises several
works towards the OCR model development; thus, it introduces a new Amharic database consisting
of collections of images annotated at a character and text-line level. It also presents a novel CNN-
based framework designed by leveraging the grapheme of characters in Fidel-Gebeta (where Fidel-
Gebeta consists of the full set of Amharic characters in matrix structure) and achieves 94.97%
overall character recognition accuracy.
In addition to character level methods, text-line level methods are also investigated and devel-
oped based on sequence-to-sequence learning. These models avoid several of the pre-processing
stages used in prior works by eliminating the need to segment individual characters. In this design,
we use a stack of CNNs, before the Bi-LSTM layers and train from end-to-end. This model out-
performs the LSTM-CTC based network, on average, by a CER of 3.75% with the ADOCR test
set. Motivated by the success of attention, in addressing the problems’ of long sequences in Neural
Machine Translation (NMT), we proposed a novel attention-based methodology by blending the
attention mechanism into CTC objective function. This model performs far better than the existing
techniques with a CER of 1.04% and 0.93% on printed and synthetic text-line images respectively.
Finally, this thesis provides details on various tasks that have been performed for the development
of Amharic OCR. As per our empirical analysis, the majority of the errors are due to poor annotation
of the dataset. As future work, the methods proposed in this thesis should be further investigated
and extended to deal with handwritten and historical Amharic documents.
Coordinating distributed processes, especially engineering and software design processes, has been a research topic for some time now. Several approaches have been published that aim at coordinating large projects in general, and large software development processes in specific. However, most of these approaches focus on the technical part of the design process and omit management activities like planning and scheduling the project, or monitoring it during execution. In this paper, we focus on coordinating the management activities that accompany the technical software design process. We state the requirements for a Software Engineering Environm ent (SEE) accommodating management, and we describe a possible architecture for such an SEE.
This paper describes the architecture and concept of operation of a Framework for Adaptive Process Modeling and Execution (FAME). The research addresses the absence of robust methods for supporting the software process management life cycle. FAME employs a novel, model-based approach in providing automated support for different activities in the software development life cycle including project definition, process design, process analysis, process enactment, process execution status monitoring, and execution status-triggered process redesign. FAME applications extend beyond the software development domain to areas such as agile manufacturing, project management, logistics planning, and business process reengineering.
In this paper we present an extensional higher-order resolution calculus that iscomplete relative to Henkin model semantics. The treatment of the extensionality princi-ples - necessary for the completeness result - by specialized (goal-directed) inference rulesis of practical applicability, as an implentation of the calculus in the Leo-System shows.Furthermore, we prove the long-standing conjecture, that it is sufficient to restrict the orderof primitive substitutions to the order of input formulae.
In this paper we provide a semantical meta-theory that will support the development of higher-order calculi for automated theorem proving like the corresponding methodology has in first-order logic. To reach this goal, we establish classes of models that adequately characterize the existing theorem-proving calculi, that is, so that they are sound and complete to these calculi, and a standard methodology of abstract consistency methods (by providing the necessary model existence theorems) needed to analyze completeness of machine-oriented calculi.
Fast Internet content delivery relies on two layers of caches on the request path. Firstly, content delivery networks (CDNs) seek to answer user requests before they traverse slow Internet paths. Secondly, aggregation caches in data centers seek to answer user requests before they traverse slow backend systems. The key challenge in managing these caches is the high variability of object sizes, request patterns, and retrieval latencies. Unfortunately, most existing literature focuses on caching with low (or no) variability in object sizes and ignores the intricacies of data center subsystems.
This thesis seeks to fill this gap with three contributions. First, we design a new caching system, called AdaptSize, that is robust under high object size variability. Second, we derive a method (called Flow-Offline Optimum or FOO) to predict the optimal cache hit ratio under variable object sizes. Third, we design a new caching system, called RobinHood, that exploits variances in retrieval latencies to deliver faster responses to user requests in data centers.
The techniques proposed in this thesis significantly improve the performance of CDN and data center caches. On two production traces from one of the world's largest CDN AdaptSize achieves 30-91% higher hit ratios than widely-used production systems, and 33-46% higher hit ratios than state-of-the-art research systems. Further, AdaptSize reduces the latency by more than 30% at the median, 90-percentile and 99-percentile.
We evaluate the accuracy of our FOO analysis technique on eight different production traces spanning four major Internet companies.
We find that FOO's error is at most 0.3%. Further, FOO reveals that the gap between online policies and OPT is much larger than previously thought: 27% on average, and up to 43% on web application traces.
We evaluate RobinHood with production traces from a major Internet company on a 50-server cluster. We find that RobinHood improves the 99-percentile latency by more than 50% over existing caching systems.
As load imbalances grow, RobinHood's latency improvement can be more than 2x. Further, we show that RobinHood is robust against server failures and adapts to automatic scaling of backend systems.
The results of this thesis demonstrate the power of guiding the design of practical caching policies using mathematical performance models and analysis. These models are general enough to find application in other areas of caching design and future challenges in Internet content delivery.
Complex problem solving can be substantially improved by the reuse of experience from previously solved problems. This requires that case libraries of successful problem solutions are transformed into problem solving knowledge with high utility, i.e. knowledge which causes high savings in search time, high application probability and low matching costs in a respective performance component. Planning can be improved by explanation-based learning (EBL) of abstract plans from detailed, successfully solved planning problems. Abstract plans, expressed in well-established terms of the domain, serve as useful problem decompositions which can drastically reduce the planning complexity. Abstractions which are valid for a class of planning cases rather than for a single case, ensure a successful application in a larger spectrum of new situations. The hierarchical organization of the learned shared abstractions causes low matching costs. The presented S-PABS procedure is an EBL-procedure in which abstraction, learning from multiple examples and hierarchical clustering are combined to automatically construct a hierarchy of shared abstract plans by analyzing concrete planning cases. A specific planning procedure has been designed to solve new planning problems guided by the knowledge learned by S-PABS. By allowing a feedback from this planning procedure to the learning component, the integrated system shows an increase in performance through past problem solving.
Although skeletal plan refinement is used in several planning systems, a procedure for the automatic acquisition of such high-level plans has not yet been developed. The proposed explanation- based knowledge acquisition procedure constructs a skeletal plan automatically from a sophisticated concrete planning case. The classification of that case into a well-described class of problems serves as an instrument for adjusting the applicability of the acquired skeletal plans to that class. The four phases of the proposed procedure are constituted as follows: In the first phase, the execution of the source plan is simulated, and explanations for the effects of the occurred operators are constructed. In the second phase, the generalization of these explanations is performed with respect to a criterion of operationality which specifies the vocabulary for defining abstract operators for the skeletal plan. The third phase, a dependency analysis of the resulting operator effects, unveils the interactions of the concrete plan which are substantial for the specified class. In the forth phase, the concept descriptions for the abstract operators of the skeletal plan are formed by collecting and normalizing the important constraints for each operation that were indicated by the dependencies. With this procedure sophisticated planning solutions from human experts can be generalized into skeletal plans and consequently be reused by a planning system in novel situations.
Abstraction is one of the most promising approaches to improve the performance of problem solvers. Abstraction by dropping sentences of a domain description - as used in most hierarchical planners - is known to be very representation dependent. To overcome these drawbacks, we propose a more general view of abstraction involving the change of representation language. We have developed a new abstraction methodology and a related sound and complete learning algorithm that allows the complete change of representation language of planning cases from concrete to abstract.
Recently, the use of abstraction in case-based reasoning (CBR) is getting more and more popular. The basic idea is to supply a CBR system with cases at many different levels of abstraction. When a new problem must be solved, one (or several) 'appropriate' concrete or abstract case are retrieved from the case base and the solution that the case contains is reused to derive a solution for the current problem, e.g. by filling in the details that a retrieved case at some higher level of abstraction does not contain. A major problem that occurs when using this approach is, that for a given new problem, usually several cases, e.g., from different levels of abstraction could be reused to solve the new problem. Choosing a wrong abstract case can slow down the problem solving process or even prevents the problem from being solved.
Hierachical planning can be improved by explanation-based learning (EBL) of abstract plans from detailed, successfully solved planning problems. Abstract plans, expressed in well-established terms of the domain, serve as useful problem decompositions which can drastically reduce the planning complexity. The learned plan abstraction must be valid for a class of planning cases rather than for a single case, to ensure their successful application in a larger spectrum of new situations. A hierarchical organization of the newly learned knowledge must be archieved to overcome the utility problem in EBL. This paper presents a new formal model of shared plan abstraction and the closely related explanation-based procedure S-PABS. Unlike other apporaches to plan abstraction, our model allows a total different terminology to be introduced at the abstract level. Finally, an unsupervised incremental procedure for constructing a hierachy of shared abstract plans is proposed, as a kind of concept formation over explanations.
This paper presents a brief overview of the INRECA-II methodology for building and maintaining CBR applications. It is based on the experience factory and the software process modeling approach from software engineering. CBR development and maintenance experience is documented using software process models and stored in a three-layered experience packet.
For defining attribute types to be used in the case representation, taxonomies occur quite often. The symbolic values at any node of the taxonomy tree are used as attribute values in a case or a query. A taxonomy type represents a relationship between the symbols through their position within the taxonomy-tree which expresses knowledge about the similarity between the symbols. This paper analyzes several situations in which taxonomies are used in different ways and proposes a systematic way of specifying local similarity measures for taxonomy types. The proposed similarity measures have a clear semantics and are easy to compute at runtime.
Collecting Experience on the Systematic Development of CBR Applications using the INRECA Methodology
(1999)
This paper presents an overview of the INRECA methodology for building and maintaining CBR applications. This methodology supports the collection and reuse of experience on the systematic development of CBR applications. It is based on the experience factory and the software process modeling approach from software engineering. CBR development experience is documented using software process models and stored in different levels of generality in a three-layered experience base. Up to now, experience from 9 industrial projects enacted by all INRECA II partners has been collected.
As the previous chapters of this book have shown, case-based reasoning is a technology that has been successfully applied to a large range of different tasks. Through all the different CBR projects, both basic research projects as well as industrial development projects, lots of knowledge and experience about how to build a CBR application has been collected. Today, there is already an increasing number of successful companies developing industrial CBR applications. In former days, these companies could develop their early pioneering CBR applications in an ad-hoc manner. The highly-skilled CBR expert of the company was able to manage these projects and to provide the developers with the required expertise.
Planning means constructing a course of actions to achieve a specified set of goals when starting from an initial situation. For example, determining a sequence of actions (a plan) for transporting goods from an initial location to some destination is a typical planning problem in the transportation domain. Many planning problems are of practical interest.
Case-based problem solving can be significantly improved by applying domain knowledge (in opposition to problem solving knowledge), which can be acquired with reasonable effort, to derive explanations of the correctness of a case. Such explanations, constructed on several levels of abstraction, can be employed as the basis for similarity assessment as well as for adaptation by solution refinement. The general approach for explanation-based similarity can be applied to different real world problem solving tasks such as diagnosis and planning in technical areas. This paper presents the general idea as well as the two specific, completely implemented realizations for a diagnosis and a planning task.
Object-oriented case representations require approaches for similarity assessment that allow to compare two differently structured objects, in particular, objects belonging to different object classes. Currently, such similarity measures are developed more or less in an ad-hoc fashion. It is mostly unclear, how the structure of an object-oriented case model, e.g., the class hierarchy, influences similarity assessment. Intuitively, it is obvious that the class hierarchy contains knowledge about the similarity of the objects. However, how this knowledge relates to the knowledge that could be represented in similarity measures is not obvious at all. This paper analyzes several situations in which class hierarchies are used in different ways for case modeling and proposes a systematic way of specifying similarity measures for comparing arbitrary objects from the hierarchy. The proposed similarity measures have a clear semantics and are computationally inexpensive to compute at run-time.
Abstraction is one of the most promising approaches to improve the performance of problem solvers. In several domains abstraction by dropping sentences of a domain description - as used in most hierarchical planners - has proven useful. In this paper we present examples which illustrate significant drawbacks of abstraction by dropping sentences. To overcome these drawbacks, we propose a more general view of abstraction involving the change of representation language. We have developed a new abstraction methodology and a related sound and complete learning algorithm that allows the complete change of representation language of planning cases from concrete to abstract. However, to achieve a powerful change of the representation language, the abstract language itself as well as rules which describe admissible ways of abstracting states must be provided in the domain model. This new abstraction approach is the core of PARIS (Plan Abstraction and Refinement in an Integrated System), a system in which abstract planning cases are automatically learned from given concrete cases. An empirical study in the domain of process planning in mechanical engineering shows significant advantages of the proposed reasoning from abstract cases over classical hierarchical planning.^
In this paper, we propose the PARIS approach for improving complex problem solving by learning from previous cases. In this approach, abstract planning cases are learned from given concrete cases. For this purpose, we have developed a new abstraction methodology that allows to completely change the representation language of a planning case, when the concrete and abstract languages are given by the user. Furthermore, we present a learning algorithm which is correct and complete with respect to the introduced model. An empirical study in the domain of process planning in mechanical engineering shows significant improvements in planning efficiency through learning abstract cases while an explanation-based learning method only causes a very slight improvement.
Although several systematic analyses of existing approaches to adaptation have been published recently, a general formal adaptation framework is still missing. This paper presents a step into the direction of developing such a formal model of transformational adaptation. The model is based on the notion of the quality of a solution to a problem, while quality is meant in a more general sense and can also denote some kind of appropriateness, utility, or degree of correctness. Adaptation knowledge is then defined in terms of functions transforming one case into a successor case. The notion of quality provides us with a semantics for adaptation knowledge and allows us to define terms like soundness, correctness and completeness. In this view, adaptation (and even the whole CBR process) appears to be a special instance of an optimization problem.
This paper addresses the role of abstraction in case-based reasoning. We develop a general framework for reusing cases at several levels of abstraction, which is particularly suited for describing and analyzing existing and designing new approaches of this kind. We show that in synthetic tasks (e.g. configuration, design, and planning), abstraction can be successfully used to improve the efficiency of similarity assessment, retrieval, and adaptation. Furthermore, a case-based planning system, called Paris, is described and analyzed in detail using this framework. An empirical study done with Paris demonstrates significant advantages concerning retrieval and adaptation efficiency as well as flexibility of adaptation. Finally, we show how other approaches from the literature can be classified according to the developed framework.
This paper is to present a new algorithm, called KNNcost, for learning feature weights for CBR systems used for classification. Unlike algorithms known so far, KNNcost considers the profits of a correct and the cost of a wrong decision. The need for this algorithm is motivated from two real-world applications, where cost and profits of decisions play a major role. We introduce a representation of accuracy, cost and profits of decisions and define the decision cost of a classification system. To compare accuracy optimization with cost optimization, we tested KNNacc against KNNcost. The first one optimizes classification accuracy with a conjugate gradient algorithm. The second one optimizes the decision cost of the CBR system, respecting cost and profits of the classifications. We present experiments with these two algorithms in a real application to demonstrate the usefulness of our approach.
Paris (Plan Abstraction and Refinement in an Integrated System) [4, 2] is a domain independent case-based planning system which allows the flexible reuse of planning cases by abstraction and refinement. This approach is mainly inspired by the observation that reuse of plans must not be restricted to a single description level. In domains with a high variation in the problems, the reuse of past solutions must be achieved at various levels of abstraction.
Case-based problem solving can be significantly improved by applying domain knowledge (in opposition to problem solving knowledge), which can be acquired with reasonable effort, to derive explanations of the correctness of a case. Such explanations, constructed on several levels of abstraction, can be employed as the basis for similarity assessment as well as for adaptation by solution refinement. The general approach for explanation-based similarity can be applied to different real world problem solving tasks such as diagnosis and planning in technical areas. This paper presents the general idea as well as the two specific, completely implemented realizations for a diagnosis and a planning task.
When problems are solved through reasoning from cases, the primary kind of knowledge is contained in the specific cases which are stored in the case base. However, in many situations additional background-knowledge is required to cope with the requirements of an application. We describe an approach to integrate such general knowledge into the reasoning process in a way that it complements the knowledge contained in the cases. This general knowledge itself is not sufficient to perform any kind of model-based problem solving, but it is required to interpret the available cases appropriately. Background knowledge is expressed by two different kinds of rules that both must be formalized by the knowledge engineer: Completion rules describe how to infer additional features out of known features of an old case or the current query case. Adaptation rules describe how an old case can be adapted to fit the current query. This paper shows how these kinds of rules can be integrated into an object-oriented case representation.
This paper motivates the necessity for support for negotiation during Sales Support on the Internet within Case-Based Reasoning solutions. Different negotiation approaches are discussed and a general model of the sales process is presented. Further, the tradition al CBR-cycle is modified in such a way that iterative retrieval during a CBR consulting session is covered by the new model. Several gen eral characteristics of negotiation are described and a case study is shown where preliminary approaches are used to negotiate with a cu stomer about his demands and available products in a 'CBR-based' Electronic Commerce solution.
Industrial design has a long history. With the introduction of Computer-Aided Engineering, industrial design was revolutionised. Due to the newly found support, the design workflow changed, and with the introduction of virtual prototyping, new challenges arose. These new engineering problems have triggered
new basic research questions in computer science.
In this dissertation, I present a range of methods which support different components of the virtual design cycle, from modifications of a virtual prototype and optimisation of said prototype, to analysis of simulation results.
Starting with a virtual prototype, I support engineers by supplying intuitive discrete normal vectors which can be used to interactively deform the control mesh of a surface. I provide and compare a variety of different normal definitions which have different strengths and weaknesses. The best choice depends on
the specific model and on an engineer’s priorities. Some methods have higher accuracy, whereas other methods are faster.
I further provide an automatic means of surface optimisation in the form of minimising total curvature. This minimisation reduces surface bending, and therefore, it reduces material expenses. The best results can be obtained for analytic surfaces, however, the technique can also be applied to real-world examples.
Moreover, I provide engineers with a curvature-aware technique to optimise mesh quality. This helps to avoid degenerated triangles which can cause numerical issues. It can be applied to any component of the virtual design cycle: as a direct modification of the virtual prototype (depending on the surface defini-
tion), during optimisation, or dynamically during simulation.
Finally, I have developed two different particle relaxation techniques that both support two components of the virtual design cycle. The first component for which they can be used is discretisation. To run computer simulations on a model, it has to be discretised. Particle relaxation uses an initial sampling,
and it improves it with the goal of uniform distances or curvature-awareness. The second component for which they can be used is the analysis of simulation results. Flow visualisation is a powerful tool in supporting the analysis of flow fields through the insertion of particles into the flow, and through tracing their movements. The particle seeding is usually uniform, e.g. for an integral surface, one could seed on a square. Integral surfaces undergo strong deformations, and they can have highly varying curvature. Particle relaxation redistributes the seeds on the surface depending on surface properties like local deformation or curvature.
Scaling up conventional processor architectures cannot translate the ever-increasing number of transistors into comparable application performance. Although the trend is to shift from single-core to multi-core architectures, utilizing these multiple cores is not a trivial task for many applications due to thread synchronization and weak memory consistency issues. This is especially true for applications in real-time embedded systems since timing analysis becomes more complicated due to contention on shared resources. One inherent reason for the limited use of instruction-level parallelism (ILP) by conventional processors is the use of registers. Therefore, some recent processors bypass register usage by directly communicating values from producer processing units to consumer processing units. In widely used superscalar processors, this direct instruction communication is organized by hardware at runtime, adversely affecting its scalability. The exposed datapath architectures provide a scalable alternative by allowing compilers to move values directly from output ports to the input ports of processing units. Though exposed datapath architectures have already been studied in great detail, they still use registers for executing programs, thus limiting the amount of ILP they can exploit. This limitation stems from a drawback in their execution paradigm, code generator, or both.
This thesis considers a novel exposed datapath architecture named Synchronous Control Asynchronous Dataflow (SCAD) that follows a hybrid control-flow dataflow execution paradigm. The SCAD architecture employs first-in-first-out (FIFO) buffers at the output and input ports of processing units. It is programmed by move instructions that transport values from the head of output buffers to the tail of input buffers. Thus, direct instruction communication is facilitated by the architecture. The processing unit triggers the execution of an operation when operand values are available at the heads of its input buffers. We propose a code generation technique for SCAD processors inspired by classical queue machines that completely eliminates the use of registers. On this basis, we first generate optimal code by using satisfiability (SAT) solvers after establishing that optimal code generation is hard. Heuristics based on a novel buffer interference analysis are then developed to compile larger programs. The experimental results demonstrate the efficacy of the execution paradigm of SCAD using our queue-oriented code generation technique.
Visualization is vital to the scientific discovery process.
An interactive high-fidelity rendering provides accelerated insight into complex structures, models and relationships.
However, the efficient mapping of visualization tasks to high performance architectures is often difficult, being subject to a challenging mixture of hardware and software architectural complexities in combination with domain-specific hurdles.
These difficulties are often exacerbated on heterogeneous architectures.
In this thesis, a variety of ray casting-based techniques are developed and investigated with respect to a more efficient usage of heterogeneous HPC systems for distributed visualization, addressing challenges in mesh-free rendering, in-situ compression, task-based workload formulation, and remote visualization at large scale.
A novel direct raytracing scheme for on-the-fly free surface reconstruction of particle-based simulations using an extended anisoptropic kernel model is investigated on different state-of-the-art cluster setups.
The versatile system renders up to 170 million particles on 32 distributed compute nodes at close to interactive frame rates at 4K resolution with ambient occlusion.
To address the widening gap between high computational throughput and prohibitively slow I/O subsystems, in situ topological contour tree analysis is combined with a compact image-based data representation to provide an effective and easy-to-control trade-off between storage overhead and visualization fidelity.
Experiments show significant reductions in storage requirements, while preserving flexibility for exploration and analysis.
Driven by an increasingly heterogeneous system landscape, a flexible distributed direct volume rendering and hybrid compositing framework is presented.
Based on a task-based dynamic runtime environment, it enables adaptable performance-oriented deployment on various platform configurations.
Comprehensive benchmarks with respect to task granularity and scaling are conducted to verify the characteristics and potential of the novel task-based system design.
A core challenge of HPC visualization is the physical separation of visualization resources and end-users.
Using more tiles than previously thought reasonable, a distributed, low-latency multi-tile streaming system is demonstrated, being able to sustain a stable 80 Hz when streaming up to 256 synchronized 3840x2160 tiles and achieve 365 Hz at 3840x2160 for sort-first compositing over the internet, thereby enabling lightweight visualization clients and leaving all the heavy lifting to the remote supercomputer.
Ever since Mark Weiser’s vision of Ubiquitous Computing the importance of context has increased in the computer science domain. Future Ambient Intelligent Environments will assist humans in their everyday activities, even without them being constantly aware of it. Objects in such environments will have small computers embedded into them which have the ability to predict human needs from the current context and adapt their behavior accordingly. This vision equally applies to future production environments. In modern factories workers and technical staff members are confronted with a multitude of devices from various manufacturers, all with different user interfaces, interaction concepts and degrees of complexity. Production processes are highly dynamic, whole modules can be exchanged or restructured. Both factors force users to continuously change their mental model of the environment. This complicates their workflows and leads to avoidable user errors or slips in judgement. In an Ambient Intelligent Production Environment these challenges have to be approached. The SmartMote is a universal control device for ambient intelligent production environments like the SmartFactoryKL. It copes with the problems mentioned above by integrating all the user interfaces into a single, holistic and mobile device. Following an automated Model-Based User Interface Development (MBUID) process it generates a fully functional graphical user interface from an abstract task-based description of the environment during run-time. This work introduces an approach to integrating context, namely the user’s location, as an adaptation basis into the MBUID process. A Context Model is specified, which stores location information in a formal and precise way. Connected sensors continuously update the model with new values. The model is complemented by a reasoning component which uses an extensible set of rules. These rules are used to derive more abstract context information from basic sensor data and for providing this information to the MBUID process. The feasibility of the approach is shown by using the example of Interaction Zones, which let developers describe different task models depending on the user’s location. Using the context model to determine when a user enters or leaves a zone, the generator can adapt the graphical user interface accordingly. Context-awareness and the potential to adapt to the current context of use are key requirements of applications in ambient intelligent environments. The approach presented here provides a clear procedure and extension scheme for the consideration of additional context types. As context has significant influence on the overall User Experience, this results not only in a better usefulness, but also in an improved usability of the SmartMote.
Calibration of robots has become a research field of great importance over the last decades especially in the field industrial robotics. The main reason for this is that the field of application was significantly broadened due to an increasing number of fully automated or robot assisted tasks to be performed. Those applications require significantly higher level of accuracy due to more delicate tasks that need to be fulfilled (e.g. assembly in the semiconductor industry or robot assisted medical surgery). In the past, (industrial) robot calibration had to be performed manually for every single robot under lab conditions in a long and cost intensive process. Expensive and complex measurement systems had to be operated by highly trained personnel. The result of this process is a set of measurements representing the robot pose in the task space (i.e. world coordinate system) and as joint encoder values. To determine the deviation, the robot pose indicated by the internal joint encoder values has to be compared to the physical pose (i.e. external measurement data). Hence, the errors in the kinematic model of the robot can be computed and therefore later on compensated. These errors are inevitable and caused by varying manufacturing tolerances and other sources of error (e.g. friction and deflection). They have to be compensated in order to achieve sufficient accuracy for the given tasks. Furthermore for performance, maintenance, or quality assurance reasons the robots may have to undergo the calibration process in constant time intervals to monitor and compensate e.g. ageing effects such as wear and tear. In modern production processes old fashioned procedures like the one mentioned above are no longer suitable. Therefore a new method has to be found that is less time consuming, more cost effective, and involves less (or in the long term even no) human interaction in the calibration process.
In its rather short history robotic research has come a long way in the half century since it started to exist as a noticeable scientic eld. Due to its roots in engineering, computer science, mathematics, and several other 'classical' scientic branches,a grand diversity of methodologies and approaches existed from the very beginning. Hence, the researchers in this eld are in particular used to adopting ideas that originate in other elds. As a fairly logical consequence of this, scientists tended to biology during the 1970s in order to nd approaches that are ideally adapted to the conditions of our natural environment. Doing so allows for introducing principles to robotics that have already shown their great potential by prevailing in a tough evolutionary selection process for millions of years. The variety of these approaches spans from efficient locomotion, to sensor processing methodologies and all the way to control architectures. Thus, the full spectrum of challenges for autonomous interaction with the surroundings while pursuing a task can be covered by such means. A feature that has proven to be amongst the most challenging to recreate is the human ability of biped locomotion. This is mainly caused by the fact that walking,running and so on are highly complex processes involving the need for energy efficient actuation, sophisticated control architectures and algorithms, and an elaborate mechanical design while at the same time posting restrictions concerning stability and weight. However, it is of special interest since our environment is favoring this specic kind of locomotion and thus promises to open up an enormous potential if mastered. More than the mere scientic interest, it is the fascination of understanding and recreating parts of oneself that drives the ongoing eorts in this area of research. The fact that this is not at all an easy task to tackle is not only caused by the highly dynamical processes but also has its roots in the challenging design process. That is because it cannot be limited to just one aspect like e.g. the control architecture, actuation, sensors, or mechanical design alone. Each aspect has to be incorporated into a sound general concept in order to allow for a successful outcome in the end. Since control is in this context inseparably coupled with the mechanics of the system, both has to be dealt with here.
Guaranteeing correctness of compilation is a ma jor precondition for correct software. Code generation can be one of the most error-prone tasks in a compiler. One way to achieve trusted compilation is certifying compilation. A certifying compiler generates for each run a proof that it has performed the compilation run correctly. The proof is checked in a separate theorem prover. If the theorem prover is content with the proof, one can be sure that the compiler produced correct code. This paper presents a certifying code generation phase for a compiler translating an intermediate language into assembler code. The time spent for checking the proofs is the bottleneck of certifying compilation. We exhibit an improved framework for certifying compilation and considerable advances to overcome this bottleneck. We compare our implementation featuring the Coq theorem prover to an older implementation. Our current implementation is feasible for medium to large sized programs.
Abstraction is intensively used in the verification of large, complex or infinite-state systems. With abstractions getting more complex it is often difficult to see whether they are valid. However, for using abstraction in model checking it has to be ensured that properties are preserved. In this paper, we use a translation validation approach to verify property preservation of system abstractions. We formulate a correctness criterion based on simulation between concrete and abstract system for a property to be verified. For each distinct run of the abstraction procedure the correctness is verified in the theorem prover Isabelle/HOL. This technique is applied in the verification of embedded adaptive systems. This paper is an extended version a previously published work.
Many applications dealing with geometry acquisition and processing produce polygonal meshes that carry artifacts like discretization noise. While there are many approaches to remove the artifacts by smoothing or filtering the mesh, they are not tailored to any specific application subject to·certain restrictive objectives. We show how to incorporate smoothing schemes based on the general Laplacian approximation to satsify all those objectives at
the same time for the results of flow simulation in the application field of car manufacturing. In the presented application setting the major restrictions come from the bounding volume of the flow simulation, the so-called installation space. In particular, clean mesh regions (without noise) should not be smoothed while at the same time the installation space must not be violated by the smoothing of the noisy mesh regions. Additionally, aliasing effects at the boundary between clean and noisy mesh regions must be prevented. To address the fact that the meshes come from flow simulation, the presented method is versatile enough to preserve their exact volume and to apply anisotropic filters using the flow information.
Although the paper focuses on the results of a specific application, most of its findings can be transferred to different settings as well.
In engineering and science, a multitude of problems exhibit an inherently geometric nature. The computational assessment of such problems requires an adequate representation by means of data structures and processing algorithms. One of the most widely adopted and recognized spatial data structures is the Delaunay triangulation which has its canonical dual in the Voronoi diagram. While the Voronoi diagram provides a simple and elegant framework to model spatial proximity, the core of which is the concept of natural neighbors, the Delaunay triangulation provides robust and efficient access to it. This combination explains the immense popularity of Voronoi- and Delaunay-based methods in all areas of science and engineering. This thesis addresses aspects from a variety of applications that share their affinity to the Voronoi diagram and the natural neighbor concept. First, an idea for the generalization of B-spline surfaces to unstructured knot sets over Voronoi diagrams is investigated. Then, a previously proposed method for \(C^2\) smooth natural neighbor interpolation is backed with concrete guidelines for its implementation. Smooth natural neighbor interpolation is also one of many applications requiring derivatives of the input data. The generation of derivative information in scattered data with the help of natural neighbors is described in detail. In a different setting, the computation of a discrete harmonic function in a point cloud is considered, and an observation is presented that relates natural neighbor coordinates to a continuous dependency between discrete harmonic functions and the coordinates of the point cloud. Attention is then turned to integrating the flexibility and meritable properties of natural neighbor interpolation into a framework that allows the algorithmically transparent and smooth extrapolation of any known natural neighbor interpolant. Finally, essential properties are proved for a recently introduced novel finite element tessellation technique in which a Delaunay triangulation is transformed into a unique polygonal tessellation.
In recent decades, there has been increasing interest in analyzing the behavior of complex systems. A popular approach for analyzing such systems is a network analytic approach where the system is represented by a graph structure (Wassermann&Faust 1994, Boccaletti et al. 2006, Brandes&Erlebach 2005, Vespignani 2018): Nodes represent the system’s entities, edges their interactions. A large toolbox of network analytic methods, such as measures for structural properties (Newman 2010), centrality measures (Koschützki et al. 2005), or methods for identifying communities (Fortunato 2010), is readily available to be applied on any network structure. However, it is often overlooked that a network representation of a system and the (technically applicable) methods contain assumptions that need to be met; otherwise, the results are not interpretable or even misleading. The most important assumption of a network representation is the presence of indirect effects: If A has an impact on B, and B has an impact on C, then A has an impact on C (Zweig 2016, Brandes et al. 2013). The presence of indirect effects can be explained by ”something” flowing through the network by moving from node to node. Such network flows (or network processes) may be the propagation of information in social networks, the spread of infections, or entities using the network as infrastructure, such as in transportation networks. Also several network measures, particularly most centrality measures, assume the presence of such a network process, but additionally assume specific properties of the network processes (Borgatti 2005). Then, a centrality value indicates a node’s importance with respect to a process with these properties.
While this has been known for several years, only recently have datasets containing real-world network flows become accessible. In this context, the goal of this dissertation is to provide a better understanding of the actual behavior of real-world network processes, with a particular focus on centrality measures: If real-world network processes turn out to show different properties than those assumed by classic centrality measures, these measures might considerably under- or overestimate the importance of nodes for the actual network flow. To the best of our knowledge, there are only very few works addressing this topic.
The contributions of this thesis are therefore as follows: (i) We investigate in which aspects real-world network flows meet the assumptions contained about them in centrality measures. (ii) Since we find that the real-world flows show considerably different properties than assumed, we test to which extent the found properties can be explained by models, i.e., models based on shortest paths or random walks. (iii) We study whether the deviations from the assumed behavior have an impact on the results of centrality measures.
To this end, we introduce flow-based variants of centrality measures which are either based on the assumed behavior or on the actual behavior of the real-world network flow. This enables systematic evaluation of the impact of each assumption on the resulting rankings of centrality measures.
While–on a large scale–we observe a surprisingly large robustness of the measures against deviations in their assumptions, there are nodes whose importance is rated very differently when the real-world network flow is taken into account. (iv) As a technical contribution, we provide a method for an efficient handling of large sets of flow trajectories by summarizing them into groups of similar trajectories. (v) We furthermore present the results of an interdisciplinary research project in which the trajectories of humans in a network were analyzed in detail. In general, we are convinced that a process-driven perspective on network analysis in which the network process is considered in addition to the network representation, can help to better understand the behavior of complex systems.