Kaiserslautern - Fachbereich Informatik
Refine
Year of publication
- 1999 (267)
- 1996 (50)
- 1994 (49)
- 1995 (48)
- 1998 (38)
- 1997 (35)
- 2021 (27)
- 2016 (25)
- 2019 (24)
- 2022 (23)
- 1993 (22)
- 2015 (22)
- 2023 (22)
- 2001 (21)
- 2020 (21)
- 2007 (19)
- 2013 (18)
- 2018 (18)
- 2002 (17)
- 2003 (16)
- 2014 (15)
- 2012 (14)
- 1992 (13)
- 2000 (13)
- 2004 (12)
- 2006 (11)
- 2009 (11)
- 2008 (9)
- 2005 (8)
- 2017 (8)
- 1991 (7)
- 2010 (7)
- 2024 (6)
- 2011 (5)
- 1979 (2)
- 1980 (1)
- 1983 (1)
- 1990 (1)
Document Type
- Preprint (346)
- Doctoral Thesis (234)
- Report (139)
- Article (133)
- Master's Thesis (45)
- Study Thesis (13)
- Conference Proceeding (8)
- Bachelor Thesis (4)
- Part of a Book (2)
- Habilitation (2)
Has Fulltext
- yes (926)
Keywords
- AG-RESY (64)
- PARO (31)
- Case-Based Reasoning (20)
- Visualisierung (20)
- SKALP (16)
- CoMo-Kit (15)
- Fallbasiertes Schliessen (12)
- RODEO (12)
- Robotik (12)
- HANDFLEX (11)
Faculty / Organisational entity
Manipulating deformable linear objects - Vision-based recognition of contact state transitions -
(1999)
A new and systematic approach to machine vision-based robot manipulation of deformable (non-rigid) linear objects is introduced. This approach reduces the computational needs by using a simple state-oriented model of the objects. These states describe the relation of the object with respect to an obstacle and are derived from the object image and its features. Therefore, the object is segmented from a standard video frame using a fast segmentation algorithm. Several object features are presented which allow the state recognition of the object while being manipulated by the robot.
A new and systematic basic approach to force- and vision-based robot manipulation of deformable (non-rigid) linear objects is introduced. This approach reduces the computational needs by using a simple state-oriented model of the objects. These states describe the relation between the deformable and rigid obstacles, and are derived from the object image and its features. We give an enumeration of possible contact states and discuss the main characteristics of each state. We investigate the performance of robust transitions between the contact states and derive criteria and conditions for each of the states and for two sensor systems, i.e. a vision sensor and a force/torque sensor. This results in a new and task-independent approach in regarding the handling of deformable objects and in a sensor-based implementation of manipulation primitives for industrial robots. Thus, the usage of sensor processing is an appropriate solution for our problem. Finally, we apply the concept of contact states and state transitions to the description of a typical assembly task. Experimental results show the feasibility of our approach: A robot performs several contact state transitions which can be combined for solving a more complex task.
Self-adaptation allows software systems to autonomously adjust their behavior during run-time by handling all possible
operating states that violate the requirements of the managed system. This requires an adaptation engine that receives adaptation
requests during the monitoring process of the managed system and responds with an automated and appropriate adaptation
response. During the last decade, several engineering methods have been introduced to enable self-adaptation in software systems.
However, these methods lack addressing (1) run-time uncertainty that hinders the adaptation process and (2) the performance
impacts resulted from the complexity and the large number of the adaptation space. This paper presents CRATER, a framework
that builds an external adaptation engine for self-adaptive software systems. The adaptation engine, which is built on Case-based
Reasoning, handles the aforementioned challenges together. This paper is braced with an experiment illustrating the benefits of
this framework. The experimental results shows the potential of CRATER in terms handling run-time uncertainty and adaptation
remembrance that enhances the performance for large number of adaptation space.
Postmortem Analysis of Decayed Online Social Communities: Cascade Pattern Analysis and Prediction
(2018)
Recently, many online social networks, such as MySpace, Orkut, and Friendster, have faced inactivity decay of their members, which contributed to the collapse of these networks. The reasons, mechanics, and prevention mechanisms of such inactivity decay are not fully understood. In this work, we analyze decayed and alive subwebsites from the Stack Exchange platform. The analysis mainly focuses on the inactivity cascades that occur among the members of these communities. We provide measures to understand the decay process and statistical analysis to extract the patterns that accompany the inactivity decay. Additionally, we predict cascade size and cascade virality using machine learning. The results of this work include a statistically significant difference of the decay patterns between the decayed and the alive subwebsites. These patterns are mainly cascade size, cascade virality, cascade duration, and cascade similarity. Additionally, the contributed prediction framework showed satisfactorily prediction results compared to a baseline predictor. Supported by empirical evidence, the main findings of this work are (1) there are significantly different decay patterns in the alive and the decayed subwebsites of the Stack Exchange; (2) the cascade’s node degrees contribute more to the decay process than the cascade’s virality, which indicates that the expert members of the Stack Exchange subwebsites were mainly responsible for the activity or inactivity of the Stack Exchange subwebsites; (3) the Statistics subwebsite is going through decay dynamics that may lead to it becoming fully-decayed; (4) the decay process is not governed by only one network measure, it is better described using multiple measures; (5) decayed subwebsites were originally less resilient to inactivity decay, unlike the alive subwebsites; and (6) network’s structure in the early stages of its evolution dictates the activity/inactivity characteristics of the network.
Learning From Networked-data: Methods and Models for Understanding Online Social Networks Dynamics
(2020)
Abstract
Nowadays, people and systems created by people are generating an unprecedented amount of
data. This data has brought us data-driven services with a variety of applications that affect
people’s behavior. One of these applications is the emergent online social networks as a method
for communicating with each other, getting and sharing information, looking for jobs, and many
other things. However, the tremendous growth of these online social networks has also led to many
new challenges that need to be addressed. In this context, the goal of this thesis is to better understand
the dynamics between the members of online social networks from two perspectives. The
first perspective is to better understand the process and the motives underlying link formation in
online social networks. We utilize external information to predict whether two members of an online
social network are friends or not. Also, we contribute a framework for assessing the strength of
friendship ties. The second perspective is to better understand the decay dynamics of online social
networks resulting from the inactivity of their members. Hence, we contribute a model, methods,
and frameworks for understanding the decay mechanics among the members, for predicting members’
inactivity, and for understanding and analyzing inactivity cascades occurring during the decay.
The results of this thesis are: (1) The link formation process is at least partly driven by interactions
among members that take place outside the social network itself; (2) external interactions might
help reduce the noise in social networks and for ranking the strength of the ties in these networks;
(3) inactivity dynamics can be modeled, predicted, and controlled using the models contributed in
this thesis, which are based on network measures. The contributions and the results of this thesis
can be beneficial in many respects. For example, improving the quality of a social network by introducing
new meaningful links and removing noisy ones help to improve the quality of the services
provided by the social network, which, e.g., enables better friend recommendations and helps to
eliminate fake accounts. Moreover, understanding the decay processes involved in the interaction
among the members of a social network can help to prolong the engagement of these members. This
is useful in designing more resilient social networks and can assist in finding influential members
whose inactivity may trigger an inactivity cascade resulting in a potential decay of a network.
Building interoperation among separately developed software units requires checking their conceptual assumptions and constraints. However, eliciting such assumptions and constraints is time consuming and is a challenging task as it requires analyzing each of the interoperating software units. To address this issue we proposed a new conceptual interoperability analysis approach which aims at decreasing the analysis cost and the conceptual mismatches between the interoperating software units. In this report we present the design of a planned controlled experiment for evaluating the effectiveness, efficiency, and acceptance of our proposed conceptual interoperability analysis approach. The design includes the study objectives, research questions, statistical hypotheses, and experimental design. It also provides the materials that will be used in the execution phase of the planned experiment.
Typically software engineers implement their software according to the design of the software
structure. Relations between classes and interfaces such as method-call relations and inheritance
relations are essential parts of a software structure. Accordingly, analyzing several types of
relations will benefit the static analysis process of the software structure. The tasks of this
analysis include but not limited to: understanding of (legacy) software, checking guidelines,
improving product lines, finding structure, or re-engineering of existing software. Graphs with
multi-type edges are possible representation for these relations considering them as edges, while
nodes represent classes and interfaces of software. Then, this multiple type edges graph can
be mapped to visualizations. However, the visualizations should deal with the multiplicity of
relations types and scalability, and they should enable the software engineers to recognize visual
patterns at the same time.
To advance the usage of visualizations for analyzing the static structure of software systems,
I tracked difierent development phases of the interactive multi-matrix visualization (IMMV)
showing an extended user study at the end. Visual structures were determined and classified
systematically using IMMV compared to PNLV in the extended user study as four categories:
High degree, Within-package edges, Cross-package edges, No edges. In addition to these structures
that were found in these handy tools, other structures that look interesting for software
engineers such as cycles and hierarchical structures need additional visualizations to display
them and to investigate them. Therefore, an extended approach for graph layout was presented
that improves the quality of the decomposition and the drawing of directed graphs
according to their topology based on rigorous definitions. The extension involves describing
and analyzing the algorithms for decomposition and drawing in detail giving polynomial time
complexity and space complexity. Finally, I handled visualizing graphs with multi-type edges
using small-multiples, where each tile is dedicated to one edge-type utilizing the topological
graph layout to highlight non-trivial cycles, trees, and DAGs for showing and analyzing the
static structure of software. Finally, I applied this approach to four software systems to show
its usefulness.
This paper deals with the handling of deformable linear objects (DLOs), such as hoses, wires, or leaf springs. It investigates usable features for the vision-based detection of a changing contact situation between a DLO and a rigid polyhedral obstacle and a classification of such contact state transitions. The result is a complete classification of contact state transitions and of the most significant features for each class. This knowledge enables reliable detection of changes in the DLO contact situation, facilitating implementation of sensor-based manipulation skills for all possible contact changes.
Highly Automated Driving (HAD) vehicles represent complex and safety critical systems. They are deployed in an open context i.e., an intricate environment which undergoes continual changes. The complexity of these systems and insufficiencies in sensing and understanding the open context may result in unsafe and uncertain behaviour. The safety critical nature of the HAD vehicles requires modelling of root causes for unsafe behaviour and their mitigation to argue sufficient reduction of residual risk.
Standardization activities such as ISO 21448 provide guidelines on the Safety Of The Intended Functionality (SOTIF) and focus on the analysis of performance limitations under the influence of triggering conditions that can lead to hazardous behaviour. SOTIF references traditional safety analyses methods e.g., Failure Mode and Effect Analysis (FMEA) and Fault Tree Analysis (FTA) to perform safety analysis. These analyses methods are based on certain assumptions e.g., single point failure in FMEA and independence of basic events in FTA. Moreover, these analyses are generally based on expert knowledge i.e., data-based models or hybrid approaches (expert and data) are seldom practised. The resulting safety model is fixed i.e., it is generally seen as a one-time artefact. Open context environment may contain triggering conditions which may not be evident to the expert. Open context also evolves over time and new phenomena may emerge.
This thesis explores the applicability of the traditional safety analyses techniques to provide safety models for HAD vehicles operating in the open context, under the light of modelling assumptions taken by traditional safety analyses techniques. Moreover, incorporating uncertainties into safety analyses models is also explored. An explicit distinction between the inherent uncertainty of a probabilistic event (aleatory) and uncertainty due to lack of knowledge (epistemic) is made to formalize models to perform SOTIF analysis. A further distinction is made for conditions of complete ignorance and termed as ontological uncertainty. The distinction is important as for HAD vehicles operating in open context the ontological uncertainty can never be completely disregarded.
This thesis proposes a novel framework of SOTIF to model, estimate and dis cover triggering conditions relevant to performance limitations. The framework provides the ability to model uncertainties while also providing a hybrid approach i.e., supporting inclusion of expert knowledge as well as data driven engineering processes. Two representative algorithms are provided to support the framework. Bayesian Network (BN) and p-value hypothesis testing are utilised in this regard. The framework is implemented on a real-world case study in which LIDARs based perception systems are used as vehicle detection system.
Dealing with information in modern times involves users to cope with hundreds of thousands of documents, such as articles, emails, Web pages, or News feeds.
Above all information sources, the World Wide Web presents information seekers with great challenges.
It offers more text in natural language than one is capable to read.
The key idea for this research intends to provide users with adaptable filtering techniques, supporting them in filtering out the specific information items they need.
Its realization focuses on developing an Information Extraction system,
which adapts to a domain of concern, by interpreting the contained formalized knowledge.
Utilizing the Resource Description Framework (RDF), which is the Semantic Web's formal language for exchanging information,
allows extending information extractors to incorporate the given domain knowledge.
Because of this, formal information items from the RDF source can be recognized in the text.
The application of RDF allows a further investigation of operations on recognized information items, such as disambiguating and rating the relevance of these.
Switching between different RDF sources allows changing the application scope of the Information Extraction system from one domain of concern to another.
An RDF-based Information Extraction system can be triggered to extract specific kinds of information entities by providing it with formal RDF queries in terms of the SPARQL query language.
Representing extracted information in RDF extends the coverage of the Semantic Web's information degree and provides a formal view on a text from the perspective of the RDF source.
In detail, this work presents the extension of existing Information Extraction approaches by incorporating the graph-based nature of RDF.
Hereby, the pre-processing of RDF sources allows extracting statistical information models dedicated to support specific information extractors.
These information extractors refine standard extraction tasks, such as the Named Entity Recognition, by using the information provided by the pre-processed models.
The post-processing of extracted information items enables representing these results in RDF format or lists, which can now be ranked or filtered by relevance.
Post-processing also comprises the enrichment of originating natural language text sources with extracted information items by using annotations in RDFa format.
The results of this research extend the state-of-the-art of the Semantic Web.
This work contributes approaches for computing customizable and adaptable RDF views on the natural language content of Web pages.
Finally, due to the formal nature of RDF, machines can interpret these views allowing developers to process the contained information in a variety of applications.
Optical Character Recognition (OCR) system plays an important role in digitization of data acquired as images from a variety of sources. Although the area is very well explored for Latin languages, some of the languages based on Arabic cursive script are not yet explored. It is due to many factors: Most importantly are the unavailability of proper data sets and complexities posed by cursive scripts. The Pashto language is one of such languages which needs considerable exploration towards OCR. In order to develop such an OCR system, this thesis provides a pioneering study that explores deep learning for the Pashto language in the field of OCR.
The Pashto language is spoken by more than $50$ million people across the world, and it is an active medium both for oral as well as written communication. It is associated with rich literary heritage and contains huge written collection. These written materials present contents of simple to complex nature, and layouts from hand-scribed to printed text. The Pashto language presents mainly two types of complexities (i) generic w.r.t. cursive script, (ii) specific w.r.t. Pashto language. Generic complexities are cursiveness, context dependency, and breaker character anomalies, as well as space anomalies. Pashto specific complexities are variations in shape for a single character and shape similarity for some of the additional Pashto characters. Existing research in the area of Arabic OCR did not lead to an end-to-end solution for the mentioned complexities and therefore could not be generalized to build a sophisticated OCR system for Pashto.
The contribution of this thesis spans in three levels, conceptual level, data level, and practical level. In the conceptual level, we have deeply explored the Pashto language and identified those characters, which are responsible for the challenges mentioned above. In the data level, a comprehensive dataset is introduced containing real images of hand-scribed contents. The dataset is manually transcribed and has the most frequent layout patterns associated with the Pashto language. The practical level contribution provides a bridge, in the form of a complete Pashto OCR system, and connects the outcomes of the conceptual and data levels contributions. The practical contribution comprises of skew detection, text-line segmentation, feature extraction, classification, and post-processing. The OCR module is more strengthened by using deep learning paradigm to recognize Pashto cursive script by the framework of Recursive Neural Networks (RNN). Proposed Pashto text recognition is based on Long Short-Term Memory Network (LSTM) and realizes a character recognition rate of $90.78\%$ on Pashto real hand-scribed images. All these contributions are integrated into an application to provide a flexible and generic End-to-End Pashto OCR system.
The impact of this thesis is not only specific to the Pashto language, but it is also beneficial to other cursive languages like Arabic, Urdu, and Persian e.t.c. The main reason is the Pashto character set, which is a superset of Arabic, Persian, and Urdu languages. Therefore, the conceptual contribution of this thesis provides insight and proposes solutions to almost all generic complexities associated with Arabic, Persian, and Urdu languages. For example, an anomaly caused by breaker characters is deeply analyzed, which is shared among 70 languages, mainly use Arabic script. This thesis presents a solution to this issue and is equally beneficial to almost all Arabic like languages.
The scope of this thesis has two important aspects. First, a social impact, i.e., how a society may benefit from it. The main advantages are to bring the historical and almost vanished document to life and to ensure the opportunities to explore, analyze, translate, share, and understand the contents of Pashto language globally. Second, the advancement and exploration of the technical aspects. Because, this thesis empirically explores the recognition and challenges which are solely related to the Pashto language, both regarding character-set and the materials which present such complexities. Furthermore, the conceptual and practical background of this thesis regarding complexities of Pashto language is very beneficial regarding OCR for other cursive languages.
This thesis presents a novel, generic framework for information segmentation in document images.
A document image contains different types of information, for instance, text (machine printed/handwritten), graphics, signatures, and stamps.
It is necessary to segment information in documents so that to process such segmented information only when required in automatic document processing workflows.
The main contribution of this thesis is the conceptualization and implementation of an information segmentation framework that is based on part-based features.
The generic nature of the presented framework makes it applicable to a variety of documents (technical drawings, magazines, administrative, scientific, and academic documents) digitized using different methods (scanners, RGB cameras, and hyper-spectral imaging (HSI) devices).
A highlight of the presented framework is that it does not require large training sets, rather a few training samples (for instance, four pages) lead to high performance, i.e., better than previously existing methods.
In addition, the presented framework is simple and can be adapted quickly to new problem domains.
This thesis is divided into three major parts on the basis of document digitization method (scanned, hyper-spectral imaging, and camera captured) used.
In the area of scanned document images, three specific contributions have been realized.
The first of them is in the domain of signature segmentation in administrative documents.
In some workflows, it is very important to check the document authenticity before processing the actual content.
This can be done based on the available seal of authenticity, e.g., signatures.
However, signature verification systems expect pre-segmented signature image, while signatures are usually a part of document.
To use signature verification systems on document images, it is necessary to first segment signatures in documents.
This thesis shows that the presented framework can be used to segment signatures in administrative documents.
The system based on the presented framework is tested on a publicly available dataset where it outperforms the state-of-the-art methods and successfully segmented all signatures, while less than half of the found signatures are false positives.
This shows that it can be applied for practical use.
The second contribution in the area of scanned document images is segmentation of stamps in administrative documents.
A stamp also serves as a seal for documents authenticity.
However, the location of stamp on the document can be more arbitrary than a signature depending on the person sealing the document.
This thesis shows that a system based on our generic framework is able to extract stamps of any arbitrary shape and color.
The evaluation of the presented system on a publicly available dataset shows that it is also able to segment black stamps (that were not addressed in the past) with a recall and precision of 83% and 73%, respectively.
%Furthermore, to segment colored stamps, this thesis presents a novel feature set which is based on intensity gradient, is able to extract unseen, colored, arbitrary shaped, textual as well as graphical stamps, and outperforms the state-of-the-art methods.
The third contribution in the scanned document images is in the domain of information segmentation in technical drawings (architectural floorplans, maps, circuit diagrams, etc.) containing usually a large amount of graphics and comparatively less textual components. Further, as in technical drawings, text is overlapping with graphics.
Thus, automatic analysis of technical drawings uses text/graphics segmentation as a pre-processing step.
This thesis presents a method based on our generic information segmentation framework that is able to detect the text, which is touching graphical components in architectural floorplans and maps.
Evaluation of the method on a publicly available dataset of architectural floorplans shows that it is able to extract almost all touching text components with precision and recall of 71% and 95%, respectively.
This means that almost all of the touching text components are successfully extracted.
In the area of hyper-spectral document images, two contributions have been realized.
Unlike normal three channels RGB images, hyper-spectral images usually have multiple channels that range from ultraviolet to infrared regions including the visible region.
First, this thesis presents a novel automatic method for signature segmentation from hyper-spectral document images (240 spectral bands between 400 - 900 nm).
The presented method is based on a part-based key point detection technique, which does not use any structural information, but relies only on the spectral response of the document regardless of ink color and intensity.
The presented method is capable of segmenting (overlapping and non-overlapping) signatures from varying backgrounds like, printed text, tables, stamps, logos, etc.
Importantly, the presented method can extract signature pixels and not just the bounding boxes.
This is substantial when signatures are overlapping with text and/or other objects in image. Second, this thesis presents a new dataset comprising of 300 documents scanned using a high-resolution hyper-spectral scanner. Evaluation of the presented signature segmentation method on this hyper-spectral dataset shows that it is able to extract signature pixels with the precision and recall of 100% and 79%, respectively.
Further contributions have been made in the area of camera captured document images. A major problem in the development of Optical Character Recognition (OCR) systems for camera captured document images is the lack of labeled camera captured document images datasets. In the first place, this thesis presents a novel, generic, method for automatic ground truth generation/labeling of document images. The presented method builds large-scale (i.e., millions of images) datasets of labeled camera captured / scanned documents without any human intervention. The method is generic and can be used for automatic ground truth generation of (scanned and/or camera captured) documents in any language, e.g., English, Russian, Arabic, Urdu. The evaluation of the presented method, on two different datasets in English and Russian, shows that 99.98% of the images are correctly labeled in every case.
Another important contribution in the area of camera captured document images is the compilation of a large dataset comprising 1 million word images (10 million character images), captured in a real camera-based acquisition environment, along with the word and character level ground truth. The dataset can be used for training as well as testing of character recognition systems for camera-captured documents. Various benchmark tests are performed to analyze the behavior of different open source OCR systems on camera captured document images. Evaluation results show that the existing OCRs, which already get very high accuracies on scanned documents, fail on camera captured document images.
Using the presented camera-captured dataset, a novel character recognition system is developed which is based on a variant of recurrent neural networks, i.e., Long Short Term Memory (LSTM) that outperforms all of the existing OCR engines on camera captured document images with an accuracy of more than 95%.
Finally, this thesis provides details on various tasks that have been performed in the area closely related to information segmentation. This includes automatic analysis and sketch based retrieval of architectural floor plan images, a novel scheme for online signature verification, and a part-based approach for signature verification. With these contributions, it has been shown that part-based methods can be successfully applied to document image analysis.
The advent of heterogeneous many-core systems has increased the spectrum
of achievable performance from multi-threaded programming. As the processor components become more distributed, the cost of synchronization and
communication needed to access the shared resources increases. Concurrent
linearizable access to shared objects can be prohibitively expensive in a high
contention workload. Though there are various mechanisms (e.g., lock-free
data structures) to circumvent the synchronization overhead in linearizable
objects, it still incurs performance overhead for many concurrent data types.
Moreover, many applications do not require linearizable objects and apply
ad-hoc techniques to eliminate synchronous atomic updates.
In this thesis, we propose the Global-Local View Model. This programming model exploits the heterogeneous access latencies in many-core systems.
In this model, each thread maintains different views on the shared object: a
thread-local view and a global view. As the thread-local view is not shared,
it can be updated without incurring synchronization costs. The local updates
become visible to other threads only after the thread-local view is merged
with the global view. This scheme improves the performance at the expense
of linearizability.
Besides the weak operations on the local view, the model also allows strong
operations on the global view. Combining operations on the global and the
local views, we can build data types with customizable consistency semantics
on the spectrum between sequential and purely mergeable data types. Thus
the model provides a framework that captures the semantics of Multi-View
Data Types. We discuss a formal operational semantics of the model. We
also introduce a verification method to verify the correctness of the implementation of several multi-view data types.
Frequently, applications require updating shared objects in an “all-or-nothing” manner. Therefore, the mechanisms to synchronize access to individual objects are not sufficient. Software Transactional Memory (STM)
is a mechanism that helps the programmer to correctly synchronize access to
multiple mutable shared data by serializing the transactional reads and writes.
But under high contention, serializable transactions incur frequent aborts and
limit parallelism, which can lead to severe performance degradation.
Mergeable Transactional Memory (MTM), proposed in this thesis, allows accessing multi-view data types within a transaction. Instead of aborting
and re-executing the transaction, MTM merges its changes using the data-type
specific merge semantics. Thus it provides a consistency semantics that allows
for more scalability even under contention. The evaluation of our prototype
implementation in Haskell shows that mergeable transactions outperform serializable transactions even under low contention while providing a structured
and type-safe interface.
Towards A Non-tracking Web
(2016)
Today, many publishers (e.g., websites, mobile application developers) commonly use third-party analytics services and social widgets. Unfortunately, this scheme allows these third parties to track individual users across the web, creating privacy concerns and leading to reactions to prevent tracking via blocking, legislation and standards. While improving user privacy, these efforts do not consider the functionality third-party tracking enables publishers to use: to obtain aggregate statistics about their users and increase their exposure to other users via online social networks. Simply preventing third-party tracking without replacing the functionality it provides cannot be a viable solution; leaving publishers without essential services will hurt the sustainability of the entire ecosystem.
In this thesis, we present alternative approaches to bridge this gap between privacy for users and functionality for publishers and other entities. We first propose a general and interaction-based third-party cookie policy that prevents third-party tracking via cookies, yet enables social networking features for users when wanted, and does not interfere with non-tracking services for analytics and advertisements. We then present a system that enables publishers to obtain rich web analytics information (e.g., user demographics, other sites visited) without tracking the users across the web. While this system requires no new organizational players and is practical to deploy, it necessitates the publishers to pre-define answer values for the queries, which may not be feasible for many analytics scenarios (e.g., search phrases used, free-text photo labels). Our second system complements the first system by enabling publishers to discover previously unknown string values to be used as potential answers in a privacy-preserving fashion and with low computation overhead for clients as well as servers. These systems suggest that it is possible to provide non-tracking services with (at least) the same functionality as today’s tracking services.
The goal of this work is to develop statistical natural language models and processing techniques
based on Recurrent Neural Networks (RNN), especially the recently introduced Long Short-
Term Memory (LSTM). Due to their adapting and predicting abilities, these methods are more
robust, and easier to train than traditional methods, i.e., words list and rule-based models. They
improve the output of recognition systems and make them more accessible to users for browsing
and reading. These techniques are required, especially for historical books which might take
years of effort and huge costs to manually transcribe them.
The contributions of this thesis are several new methods which have high-performance computing and accuracy. First, an error model for improving recognition results is designed. As
a second contribution, a hyphenation model for difficult transcription for alignment purposes
is suggested. Third, a dehyphenation model is used to classify the hyphens in noisy transcription. The fourth contribution is using LSTM networks for normalizing historical orthography.
A size normalization alignment is implemented to equal the size of strings, before the training
phase. Using the LSTM networks as a language model to improve the recognition results is
the fifth contribution. Finally, the sixth contribution is a combination of Weighted Finite-State
Transducers (WFSTs), and LSTM applied on multiple recognition systems. These contributions
will be elaborated in more detail.
Context-dependent confusion rules is a new technique to build an error model for Optical
Character Recognition (OCR) corrections. The rules are extracted from the OCR confusions
which appear in the recognition outputs and are translated into edit operations, e.g., insertions,
deletions, and substitutions using the Levenshtein edit distance algorithm. The edit operations
are extracted in a form of rules with respect to the context of the incorrect string to build an
error model using WFSTs. The context-dependent rules assist the language model to find the
best candidate corrections. They avoid the calculations that occur in searching the language
model and they also make the language model able to correct incorrect words by using context-
dependent confusion rules. The context-dependent error model is applied on the university of
Washington (UWIII) dataset and the Nastaleeq script in Urdu dataset. It improves the OCR
results from an error rate of 1.14% to an error rate of 0.68%. It performs better than the
state-of-the-art single rule-based which returns an error rate of 1.0%.
This thesis describes a new, simple, fast, and accurate system for generating correspondences
between real scanned historical books and their transcriptions. The alignment has many challenges, first, the transcriptions might have different modifications, and layout variations than the
original book. Second, the recognition of the historical books have misrecognition, and segmentation errors, which make the alignment more difficult especially the line breaks, and pages will
not have the same correspondences. Adapted WFSTs are designed to represent the transcription. The WFSTs process Fraktur ligatures and adapt the transcription with a hyphenations
model that allows the alignment with respect to the varieties of the hyphenated words in the line
breaks of the OCR documents. In this work, several approaches are implemented to be used for
the alignment such as: text-segments, page-wise, and book-wise approaches. The approaches
are evaluated on German calligraphic (Fraktur) script historical documents dataset from “Wan-
derungen durch die Mark Brandenburg” volumes (1862-1889). The text-segmentation approach
returns an error rate of 2.33% without using a hyphenation model and an error rate of 2.0%
using a hyphenation model. Dehyphenation methods are presented to remove the hyphen from
the transcription. They provide the transcription in a readable and reflowable format to be used
for alignment purposes. We consider the task as classification problem and classify the hyphens
from the given patterns as hyphens for line breaks, combined words, or noise. The methods are
applied on clean and noisy transcription for different languages. The Decision Trees classifier
returns better performance on UWIII dataset and returns an accuracy of 98%. It returns 97%
on Fraktur script.
A new method for normalizing historical OCRed text using LSTM is implemented for different texts, ranging from Early New High German 14th - 16th centuries to modern forms in New
High German applied on the Luther bible. It performed better than the rule-based word-list
approaches. It provides a transcription for various purposes such as part-of-speech tagging and
n-grams. Also two new techniques are presented for aligning the OCR results and normalize the
size by using adding Character-Epsilons or Appending-Epsilons. They allow deletion and insertion in the appropriate position in the string. In normalizing historical wordforms to modern
wordforms, the accuracy of LSTM on seen data is around 94%, while the state-of-the-art combined rule-based method returns 93%. On unseen data, LSTM returns 88% and the combined
rule-based method returns 76%. In normalizing modern wordforms to historical wordforms, the
LSTM delivers the best performance and returns 93.4% on seen data and 89.17% on unknown
data.
In this thesis, a deep investigation has been done on constructing high-performance language
modeling for improving the recognition systems. A new method to construct a language model
using LSTM is designed to correct OCR results. The method is applied on UWIII and Urdu
script. The LSTM approach outperforms the state-of-the-art, especially for unseen tokens
during training. On the UWIII dataset, the LSTM returns reduction in OCR error rates from
1.14% to 0.48%. On the Nastaleeq script in Urdu dataset, the LSTM reduces the error rate
from 6.9% to 1.58%.
Finally, the integration of multiple recognition outputs can give higher performance than a
single recognition system. Therefore, a new method for combining the results of OCR systems is
explored using WFSTs and LSTM. It uses multiple OCR outputs and votes for the best output
to improve the OCR results. It performs better than the ISRI tool, Pairwise of Multiple Sequence and it helps to improve the OCR results. The purpose is to provide correct transcription
so that it can be used for digitizing books, linguistics purposes, N-grams, and part-of-speech
tagging. The method consists of two alignment steps. First, two recognition systems are aligned
using WFSTs. The transducers are designed to be more flexible and compatible with the different symbols in line and page breaks to avoid the segmentation and misrecognition errors.
The LSTM model then is used to vote the best candidate correction of the two systems and
improve the incorrect tokens which are produced during the first alignment. The approaches
are evaluated on OCRs output from the English UWIII and historical German Fraktur dataset
which are obtained from state-of-the-art OCR systems. The Experiments show that the error
rate of ISRI-Voting is 1.45%, the error rate of the Pairwise of Multiple Sequence is 1.32%, the
error rate of the Line-to-Page alignment is 1.26% and the error rate of the LSTM approach has
the best performance with 0.40%.
The purpose of this thesis is to contribute methods providing correct transcriptions corresponding to the original book. This is considered to be the first step towards an accurate and
more effective use of the documents in digital libraries.
One of the biggest social issues in mature societies such as Europe and Japan
is the aging population and declining birth rate. These societies have a serious
problem with the retirement of the expert workers, doctors, and engineers etc.
Especially in the sectors that require long time to make experts in fields like medicine and industry; the retirement and injuries of the experts, is a serious problem. The technology to support the training and assessment of skilled workers (like doctors, manufacturing
workers) is strongly required for the society. Although there are some solutions for
this problem, most of them are video-based which violates the privacy of the subjects.
Furthermore, they are not easy to deploy due to the need for large training data.
This thesis provides a novel framework to recognize, analyze, and assess human
skills with minimum customization cost. The presented framework tackles this problem
in two different domains, industrial setup and medical operations of catheter-based
cardiovascular interventions (CBCVI).
In particular, the contributions of this thesis are four-fold. First, it proposes an
easy-to-deploy framework for human activity recognition based on zero-shot learning
approach, which is based on learning basic actions and objects. The model recognizes
unseen activities by combinations of basic actions learned in a preliminary way and involved objects. Therefore, it is completely configurable by the user and can be used to detect completely new activities.
Second, a novel gaze-estimation model for attention driven object detection task is
presented. The key features of the model are: (i) usage of the deformable convolutional
layers to better incorporate spatial dependencies of different shapes of objects and
backgrounds, (ii) formulation of the gaze-estimation problem in two different way, as a
classification as well as a regression problem. We combine both formulations using a
joint loss that incorporates both the cross-entropy as well as the mean-squared error in
order to train our model. This enhanced the accuracy of the model from 6.8 by using only
the cross-entropy loss to 6.4 for the joint loss.
The third contribution of this thesis targets the area of quantification of quality of
i
actions using wearable sensor. To address the variety of scenarios, we have targeted two
possibilities: a) both expert and novice data is available , b) only expert data is available,
a quite common case in safety critical scenarios.
Both of the developed methods from these scenarios are deep learning based. In the
first one, we use autoencoders with OneClass SVM, and in the second one we use the
Siamese Networks. These methods allow us to encode the expert’s expertise and to learn
the differences between novice and expert workers. This enables quantification of the
performance of the novice in comparison to the expert worker.
The fourth contribution, explicitly targets medical practitioners and provides a
methodology for novel gaze-based temporal spatial analysis of CBCVI data. The developed
methodology allows continuous registration and analysis of gaze data for analysis
of the visual X-ray image processing (XRIP) strategies of expert operators in live-cases scenarios and may assist in transferring experts’ reading skills to novices.
The Context and Its Importance: In safety and reliability analysis, the information generated by Minimal Cut Set (MCS) analysis is large.
The Top Level event (TLE) that is the root of the fault tree (FT) represents a hazardous state of the system being analyzed.
MCS analysis helps in analyzing the fault tree (FT) qualitatively-and quantitatively when accompanied with quantitative measures.
The information shows the bottlenecks in the fault tree design leading to identifying weaknesses of the system being examined.
Safety analysis (containing the MCS analysis) is especially important for critical systems, where harm can be done to the environment or human causing injuries, or even death during the system usage.
Minimal Cut Set (MCS) analysis is performed using computers and generating a lot of information.
This phase is called MCS analysis I in this thesis.
The information is then analyzed by the analysts to determine possible issues and to improve the design of the system regarding its safety as early as possible.
This phase is called MCS analysis II in this thesis.
The goal of my thesis was developing interactive visualizations to support MCS analysis II of one fault tree (FT).
The Methodology: As safety visualization-in this thesis, Minimal Cut Set analysis II visualization-is an emerging field and no complete checklist regarding Minimal Cut Set analysis II requirements and gaps were available from the perspective of visualization and interaction capabilities,
I have conducted multiple studies using different methods with different data sources (i.e., triangulation of methods and data) for determining these requirements and gaps before developing and evaluating visualizations and interactions supporting Minimal Cut Set analysis II.
Thus, the following approach was taken in my thesis:
1- First, a triangulation of mixed methods and data sources was conducted.
2- Then, four novel interactive visualizations and one novel interaction widget were developed.
3- Finally, these interactive visualizations were evaluated both objectively and subjectively (compared to multiple safety tools)
from the point of view of users and developers of the safety tools that perform MCS analysis I with respect to their degree in supporting MCS analysis II and from the point of non-domain people using empirical strategies.
The Spiral tool supports analysts with different visions, i.e., full vision, color deficiency protanopia, deuteranopia, and tritanopia. It supports 100 out of 103 (97%) requirements obtained from the triangulation and it fills 37 out of 39 (95%) gaps. Its usability was rated high (better than their best currently used tools) by the users of the safety and reliability tools (RiskSpectrum, ESSaRel, FaultTree+, and a self-developed tool) and at least similar to the best currently used tools from the point of view of the CAFTA tool developers. Its quality was higher regarding its degree of supporting MCS analysis II compared to the FaultTree+ tool. The time spent for discovering the critical MCSs from a problem size of 540 MCSs (with a worst case of all equal order) was less than a minute while achieving 99.5% accuracy. The scalability of the Spiral visualization was above 4000 MCSs for a comparison task. The Dynamic Slider reduces the interaction movements up to 85.71% of the previous sliders and solves the overlapping thumb issues by the sliders provides the 3D model view of the system being analyzed provides the ability to change the coloring of MCSs according to the color vision of the user provides selecting a BE (i.e., multi-selection of MCSs), thus, can observe the BEs' NoO and provides its quality provides two interaction speeds for panning and zooming in the MCS, BE, and model views provide a MCS, a BE, and a physical tab for supporting the analysis starting by the MCSs, the BEs, or the physical parts. It combines MCS analysis results and the model of an embedded system enabling the analysts to directly relate safety information with the corresponding parts of the system being analyzed and provides an interactive mapping between the textual information of the BEs and MCSs and the parts related to the BEs.
Verifications and Assessments: I have evaluated all visualizations and the interaction widget both objectively and subjectively, and finally evaluated the final Spiral visualization tool also both objectively and subjectively regarding its perceived quality and regarding its degree of supporting MCS analysis II.
A growing share of all software development project work is being done by geographically distributed teams. To satisfy shorter product design cycles, expert team members for a development project may need to be r ecruited globally. Yet to avoid extensive travelling or r eplacement costs, distributed project work is preferred. Current-generation software engineering tools and ass ociated systems, processes, and methods were for the most part developed to be used within a single enterprise. Major innovations have lately been introduced to enable groupware applications on the Internet to support global collaboration. However, their deployment for distributed software projects requires further research. In partic ular, groupware methods must seamlessly be integrated with project and product management systems to make them attractive for industry. In this position paper we outline the major challenges concerning distributed (virtual) software projects. Based on our experiences with software process modeling and enactment environments, we then propose approaches to solve those challenges.
This doctoral dissertation is comprised of nine published articles covering different
methods for ‘Fast, Robust Rigid and Non-Rigid Registration for Globally Consistent
3D Scene and Shape Reconstruction’. Overall the contributing articles are separated
and discussed in three stages – The first part of the thesis i.e., chapter 2 explains
three novel method classes of rigid point set registration namely Gravitational Approach (GA), Fast Gravitational Approach (FGA), and RPSRNet. GA was introduced as the first physics-based rigid point set registration. It includes elegant modeling of rigid by dynamics using Newtonian mechanics. The method proposed many new avenues for other types of pattern matching tasks thank point set registration. Next, FGA method, published 4 years after GA presented as an extension that breaks the algorithmic complexity of GA from O(M N ) to O(M log N ) using Barnes-Hut tree representation of point cloud. It also eliminates the requirement of heuristic optimization parameter settings by GA, and achieve state-of-the-art alignment accuracy on LiDAR odometry. Finally, RPSRNet presents deep learning version of FGA, with custom convolution layers for hierarchical point feature embedding. RPSRNet is robust and the fastest among SoA methods for LiDAR data registration. The second part, i.e., chapter 3, of the thesis introduces NRGA as the fist physics-based non-rigid point set
registration method which is computationally slow but robust against noisy and partial inputs. NRGA preserves structural consistency as it coherently regularize motion of deformable vertices. For articulated hand shape reconstruction, a tailored version of NRGA -- Articulated-NRGA -- is effective to refine final hand shape. Collision and penetration avoidance between source and target surfaces are tackled by constrained optimization in NRGA. This setting has improved hand and object interaction reconstruction. Next contribution FoldMatch method remodels the shape deformation by introducing wrinkle vector field (WVF) for capturing complex clothing and garment details while fitting body models onto 3D Scans. Quantitative evaluation of FoldMatch and NRGA shows their effectiveness in geometrically consistent surface modeling and reconstruction tasks. Finally, the third part of the thesis explains globally consistent outdoor scene reconstruciton, odometry estimation, and uncertainty guided pose-graph optimization in a novel LiDAR-based localization and map building method, called Deep Evidential LiDAR Odometry (DELO). This is the first Odometry method to use predictive uncertainty modeling for sensor pose prediction network.
Information Visualization (InfoVis) and Human-Computer Interaction (HCI) have strong ties with each other. Visualization supports the human cognitive system by providing interactive and meaningful images of the underlying data. On the other side, the HCI domain cares about the usability of the designed visualization from the human perspectives. Thus, designing a visualization system requires considering many factors in order to achieve the desired functionality and the system usability. Achieving these goals will help these people in understanding the inside behavior of complex data sets in less time.
Graphs are widely used data structures to represent the relations between the data elements in complex applications. Due to the diversity of this data type, graphs have been applied in numerous information visualization applications (e.g., state transition diagrams, social networks, etc.). Therefore, many graph layout algorithms have been proposed in the literature to help in visualizing this rich data type. Some of these algorithms are used to visualize large graphs, while others handle the medium sized graphs. Regardless of the graph size, the resulting layout should be understandable from the users’ perspective and at the same time it should fulfill a list of aesthetic criteria to increase the representation readability. Respecting these two principles leads to produce a resulting graph visualization that helps the users in understanding and exploring the complex behavior of critical systems.
In this thesis, we utilize the graph visualization techniques in modeling the structural and behavioral aspects of embedded systems. Furthermore, we focus on evaluating the resulting representations from the users’ perspectives.
The core contribution of this thesis is a framework, called ESSAVis (Embedded Systems Safety Aspect Visualizer). This framework visualizes not only some of the safety aspects (e.g. CFT models) of embedded systems, but also helps the engineers and experts in analyzing the system safety critical situations. For this, the framework provides a 2Dplus3D environment in which the 2D represents the graph representation of the abstract data about the safety aspects of the underlying embedded system while the 3D represents the underlying system 3D model. Both views are integrated smoothly together in the 3D world fashion. In order to check the effectiveness and feasibility of the framework and its sub-components, we conducted many studies with real end users as well as with general users. Results of the main study that targeted the overall ESSAVis framework show high acceptance ratio and higher accuracy with better performance using the provided visual support of the framework.
The ESSAVis framework has been designed to be compatible with different 3D technologies. This enabled us to use the 3D stereoscopic depth of such technologies to encode nodes attributes in node-link diagrams. In this regard, we conducted an evaluation study to measure the usability of the stereoscopic depth cue approach, called the stereoscopic highlighting technique, against other selected visual cues (i.e., color, shape, and sizes). Based on the results, the thesis proposes the Reflection Layer extension to the stereoscopic highlighting technique, which was also evaluated from the users’ perspectives. Additionally, we present a new technique, called ExpanD (Expand in Depth), that utilizes the depth cue to show the structural relations between different levels of details in node-link diagrams. Results of this part opens a promising direction of the research in which visualization designers can get benefits from the richness of the 3D technologies in visualizing abstract data in the information visualization domain.
Finally, this thesis proposes the application of the ESSAVis frame- work as a visual tool in the educational training process of engineers for understanding the complex concepts. In this regard, we conducted an evaluation study with computer engineering students in which we used the visual representations produced by ESSAVis to teach the principle of the fault detection and the failure scenarios in embedded systems. Our work opens the directions to investigate many challenges about the design of visualization for educational purposes.