Asynchronous programs are challenging to implement correctly: the loose coupling between asynchronously executed tasks makes the control and data dependencies difficult to follow. Even subtle design and programming mistakes on the programs have the capability to introduce erroneous or divergent behaviors. As asynchronous programs are typically written to provide a reliable, high-performance infrastructure, there is a critical need for analysis techniques to guarantee their correctness.
In this dissertation, I provide scalable verification and testing tools to make asyn- chronous programs more reliable. I show that the combination of counter abstraction and partial order reduction is an effective approach for the verification of asynchronous systems by presenting PROVKEEPER and KUAI, two scalable verifiers for two types of asynchronous systems. I also provide a theoretical result that proves a counter-abstraction based algorithm called expand-enlarge-check, is an asymptotically optimal algorithm for the coverability problem of branching vector addition systems as which many asynchronous programs can be modeled. In addition, I present BBS and LLSPLAT, two testing tools for asynchronous programs that efficiently uncover many subtle memory violation bugs.
The task of printed Optical Character Recognition (OCR), though considered ``solved'' by many, still poses several challenges. The complex grapheme structure of many scripts, such as Devanagari and Urdu Nastaleeq, greatly lowers the performance of state-of-the-art OCR systems.
Moreover, the digitization of historical and multilingual documents still require much probing. Lack of benchmark datasets further complicates the development of reliable OCR systems. This thesis aims to find the answers to some of these challenges using contemporary machine learning technologies. Specifically, the Long Short-Term Memory (LSTM) networks, have been employed to OCR modern as well historical monolingual documents. The excellent OCR results obtained on these have led us to extend their application for multilingual documents.
The first major contribution of this thesis is to demonstrate the usability of LSTM networks for monolingual documents. The LSTM networks yield very good OCR results on various modern and historical scripts, without using sophisticated features and post-processing techniques. The set of modern scripts include modern English, Urdu Nastaleeq and Devanagari. To address the challenge of OCR of historical documents, this thesis focuses on Old German Fraktur script, medieval Latin script of the 15th century, and Polytonic Greek script. LSTM-based systems outperform the contemporary OCR systems on all of these scripts. To cater for the lack of ground-truth data, this thesis proposes a new methodology, combining segmentation-based and segmentation-free OCR approaches, to OCR scripts for which no transcribed training data is available.
Another major contribution of this thesis is the development of a novel multilingual OCR system. A unified framework for dealing with different types of multilingual documents has been proposed. The core motivation behind this generalized framework is the human reading ability to process multilingual documents, where no script identification takes place.
In this design, the LSTM networks recognize multiple scripts simultaneously without the need to identify different scripts. The first step in building this framework is the realization of a language-independent OCR system which recognizes multilingual text in a single step. This language-independent approach is then extended to script-independent OCR that can recognize multiscript documents using a single OCR model. The proposed generalized approach yields low error rate (1.2%) on a test corpus of English-Greek bilingual documents.
In summary, this thesis aims to extend the research in document recognition, from modern Latin scripts to Old Latin, to Greek and to other ``under-privilaged'' scripts such as Devanagari and Urdu Nastaleeq.
It also attempts to add a different perspective in dealing with multilingual documents.
Computer Vision (CV) problems, such as image classification and segmentation, have traditionally been solved by manual construction of feature hierarchies or incorporation of other prior knowledge. However, noisy images, varying viewpoints and lighting conditions of images, and clutters in real-world images make the problem challenging. Such tasks cannot be efficiently solved without learning from data. Therefore, many Deep Learning (DL) approaches have recently been successful for various CV tasks, for instance, image classification, object recognition and detection, action recognition, video classification, and scene labeling. The main focus of this thesis is to investigate a purely learning-based approach, particularly, Multi-Dimensional LSTM (MD-LSTM) recurrent neural networks to tackle the challenging CV tasks, classification and segmentation on 2D and 3D image data. Due to the structural nature of MD-LSTM, the network learns directly from raw pixel values and takes the complex spatial dependencies of each pixel into account. This thesis provides several key contributions in the field of CV and DL.
Several MD-LSTM network architectural options are suggested based on the type of input and output, as well as the requiring tasks. Including the main layers, which are an input layer, a hidden layer, and an output layer, several additional layers can be added such as a collapse layer and a fully connected layer. First, a single Two Dimensional LSTM (2D-LSTM) is directly applied on texture images for segmentation and show improvement over other texture segmentation methods. Besides, a 2D-LSTM layer with a collapse layer is applied for image classification on texture and scene images and have provided an accurate classification results. In addition, a deeper model with a fully connected layer is introduced to deal with more complex images for scene labeling and outperforms the other state-of-the-art methods including the deep Convolutional Neural Networks (CNN). Here, several input and output representation techniques are introduced to achieve the robust classification. Randomly sampled windows as input are transformed in scaling and rotation, which are integrated to get the final classification. To achieve multi-class image classification on scene images, several pruning techniques are introduced. This framework provides a good results in automatic web-image tagging. The next contribution is an investigation of 3D data with MD-LSTM. The traditional cuboid order of computations in Multi-Dimensional LSTM (MD-LSTM) is re-arranged in pyramidal fashion. The resulting Pyramidal Multi-Dimensional LSTM (PyraMiD-LSTM) is easy to parallelize, especially for 3D data such as stacks of brain slice images. PyraMiD-LSTM was tested on 3D biomedical volumetric images and achieved best known pixel-wise brain image segmentation results and competitive results on Electron Microscopy (EM) data for membrane segmentation.
To validate the framework, several challenging databases for classification and segmentation are proposed to overcome the limitations of current databases. First, scene images are randomly collected from the web and used for scene understanding, i.e., the web-scene image dataset for multi-class image classification. To achieve multi-class image classification, the training and testing images are generated in a different setting. For training, images belong to a single pre-defined category which are trained as a regular single-class image classification. However, for testing, images containing multi-classes are randomly collected by web-image search engine by querying the categories. All scene images include noise, background clutter, unrelated contents, and also diverse in quality and resolution. This setting can make the database possible to evaluate for real-world applications. Secondly, an automated blob-mosaics texture dataset generator is introduced for segmentation. Random 2D Gaussian blobs are generated and filled with random material textures. These textures contain diverse changes in illumination, scale, rotation, and viewpoint. The generated images are very challenging since they are even visually hard to separate the related regions.
Overall, the contributions in this thesis are major advancements in the direction of solving image analysis problems with Long Short-Term Memory (LSTM) without the need of any extra processing or manually designed steps. We aim at improving the presented framework to achieve the ultimate goal of accurate fine-grained image analysis and human-like understanding of images by machines.
Most of today’s wireless communication devices operate on unlicensed bands with uncoordinated spectrum access, with the consequence that RF interference and collisions are impairing the overall performance of wireless networks. In the classical design of network protocols, both packets in a collision are considered lost, such that channel access mechanisms attempt to avoid collisions proactively. However, with the current proliferation of wireless applications, e.g., WLANs, car-to-car networks, or the Internet of Things, this conservative approach is increasingly limiting the achievable network performance in practice. Instead of shunning interference, this thesis questions the notion of „harmful“ interference and argues that interference can, when generated in a controlled manner, be used to increase the performance and security of wireless systems. Using results from information theory and communications engineering, we identify the causes for reception or loss of packets and apply these insights to design system architectures that benefit from interference. Because the effect of signal propagation and channel fading, receiver design and implementation, and higher layer interactions on reception performance is complex and hard to reproduce by simulations, we design and implement an experimental platform for controlled interference generation to strengthen our theoretical findings with experimental results. Following this philosophy, we introduce and evaluate a system architecture that leverage interference.
First, we identify the conditions for successful reception of concurrent transmissions in wireless networks. We focus on the inherent ability of angular modulation receivers to reject interference when the power difference of the colliding signals is sufficiently large, the so-called capture effect. Because signal power fades over distance, the capture effect enables two or more sender–receiver pairs to transmit concurrently if they are positioned appropriately, in turn boosting network performance. Second, we show how to increase the security of wireless networks with a centralized network access control system (called WiFire) that selectively interferes with packets that violate a local security policy, thus effectively protecting legitimate devices from receiving such packets. WiFire’s working principle is as follows: a small number of specialized infrastructure devices, the guardians, are distributed alongside a network and continuously monitor all packet transmissions in the proximity, demodulating them iteratively. This enables the guardians to access the packet’s content before the packet fully arrives at the receiver. Using this knowledge the guardians classify the packet according to a programmable security policy. If a packet is deemed malicious, e.g., because its header fields indicate an unknown client, one or more guardians emit a limited burst of interference targeting the end of the packet, with the objective to introduce bit errors into it. Established communication standards use frame check sequences to ensure that packets are received correctly; WiFire leverages this built-in behavior to prevent a receiver from processing a harmful packet at all. This paradigm of „over-the-air“ protection without requiring any prior modification of client devices enables novel security services such as the protection of devices that cannot defend themselves because their performance limitations prohibit the use of complex cryptographic protocols, or of devices that cannot be altered after deployment.
This thesis makes several contributions. We introduce the first software-defined radio based experimental platform that is able to generate selective interference with the timing precision needed to evaluate the novel architectures developed in this thesis. It implements a real-time receiver for IEEE 802.15.4, giving it the ability to react to packets in a channel-aware way. Extending this system design and implementation, we introduce a security architecture that enables a remote protection of wireless clients, the wireless firewall. We augment our system with a rule checker (similar in design to Netfilter) to enable rule-based selective interference. We analyze the security properties of this architecture using physical layer modeling and validate our analysis with experiments in diverse environmental settings. Finally, we perform an analysis of concurrent transmissions. We introduce a new model that captures the physical properties correctly and show its validity with experiments, improving the state of the art in the design and analysis of cross-layer protocols for wireless networks.
In der aktuellen technologischen Entwicklung spielen verteilte eingebettete Echtzeitsysteme eine immer zentralere Rolle und werden zunehmend zum Träger von Innovationen. Durch den hiermit verbundenen steigenden Funktionsumfang der verteilten Echtzeitsysteme und deren zunehmenden Einsatz in sicherheitsrelevanten Anwendungsgebieten stellt die Entwicklung solcher Systeme eine immer größere Herausforderung dar. Hierbei handelt es sich einerseits um Herausforderungen bezogen auf die Kommunikation hinsichtlich Echtzeitfähigkeit und effizienter Bandbreitennutzung, andererseits werden geeignete Methoden benötigt, um den Entwicklungsprozess solcher komplexen Systeme durch Tests und Evaluationen zu unterstützen und zu begleiten. Die hier vorgestellte Arbeit adressiert diese beiden Aspekte und ist entsprechend in zwei Teile untergliedert.
Der erste Teil der Arbeit beschäftigt sich mit der Entwicklung neuer Kommunikationslösungen, um den gestiegenen Kommunikationsanforderungen begegnen zu können. So erfordert die Nutzung verteilter Echtzeitsysteme im Kontext sicherheitsrelevanter Aufgaben den Einsatz zeitgetriggerter Kommunikationssysteme, die in der Lage sind, deterministische Garantien bezüglich der Echtzeitfähigkeit zu gewähren. Diese klassischen auf exklusiven Reservierungen basierenden Ansätze sind jedoch gerade bei (seltenen) sporadischen Nachrichten sehr ineffizient in Bezug auf die Nutzung der Bandbreite.
Das in dieser Arbeit verwendete Mode-Based Scheduling with Fast Mode-Signaling (modusbasierte Kommunikation) ist ein Verfahren zur Verbesserung der Bandbreitennutzung zeitgetriggerter Kommunikation, bei gleichzeitiger Gewährleistung der Echtzeitfähigkeit. Um dies zu ermöglichen, erlaubt Mode-Based Scheduling einen kontrollierten, slotbasierten Wettbewerb, welcher durch eine schnelle Modussignalisierung (Fast Mode-Signaling) aufgelöst wird. Im Zuge dieser Arbeit werden verschiedene robuste, zuverlässige und vor allem deterministische Realisierungen von Mode-Based Scheduling with Fast Mode-Signaling auf Basis existierender drahtgebundener Kommunikationsprotokolle (TTCAN und FlexRay) vorgestellt sowie Konzepte präsentiert, welche eine einfache Integration in weitere Kommunikationstechnologien (wie drahtlose Ad-Hoc-Netze) ermöglichen.
Der zweite Teil der Arbeit konzentriert sich nicht nur auf Kommunikationsaspekte, sondern stellt einen Ansatz vor, den Entwicklungsprozess verteilter eingebetteter Echtzeitsysteme durch kontinuierliche Tests und Evaluationen in allen Entwicklungsphasen zu unterstützen und zu begleiten. Das im Kontext des Innovationszentrums für Applied Systems Modeling mitentwickelte und erweiterte FERAL (ein Framework für die Kopplung spezialisierter Simulatoren) bietet eine ideale Ausgangsbasis für das Virtual Prototyping komplexer verteilter eingebetteter Echtzeitsysteme und ermöglicht Tests und Evaluationen der Systeme in einer realistisch simulierten Umgebung. Die entwickelten Simulatoren für aktuelle Kommunikationstechnologien ermöglichen hierbei realistische Simulationen der Interaktionen innerhalb des verteilten Systems. Durch die Unterstützung von Simulationssystemen mit Komponenten auf unterschiedlichen Abstraktionsstufen kann FERAL in allen Entwicklungsphasen eingesetzt werden. Anhand einer Fallstudie wird gezeigt, wie FERAL verwendet werden kann, um ein Simulationssystem zusammen mit den zu realisierenden Komponenten schrittweise zu verfeinern. Auf diese Weise steht während jeder Entwicklungsphase ein ausführbares Simulationssystem für Tests zur Verfügung. Die entwickelten Konzepte und Simulatoren für FERAL ermöglichen es, Designalternativen zu evaluieren und die Wahl einer Kommunikationstechnologie durch die Ergebnisse von Simulationen zu stützen.
Typically software engineers implement their software according to the design of the software
structure. Relations between classes and interfaces such as method-call relations and inheritance
relations are essential parts of a software structure. Accordingly, analyzing several types of
relations will benefit the static analysis process of the software structure. The tasks of this
analysis include but not limited to: understanding of (legacy) software, checking guidelines,
improving product lines, finding structure, or re-engineering of existing software. Graphs with
multi-type edges are possible representation for these relations considering them as edges, while
nodes represent classes and interfaces of software. Then, this multiple type edges graph can
be mapped to visualizations. However, the visualizations should deal with the multiplicity of
relations types and scalability, and they should enable the software engineers to recognize visual
patterns at the same time.
To advance the usage of visualizations for analyzing the static structure of software systems,
I tracked difierent development phases of the interactive multi-matrix visualization (IMMV)
showing an extended user study at the end. Visual structures were determined and classified
systematically using IMMV compared to PNLV in the extended user study as four categories:
High degree, Within-package edges, Cross-package edges, No edges. In addition to these structures
that were found in these handy tools, other structures that look interesting for software
engineers such as cycles and hierarchical structures need additional visualizations to display
them and to investigate them. Therefore, an extended approach for graph layout was presented
that improves the quality of the decomposition and the drawing of directed graphs
according to their topology based on rigorous definitions. The extension involves describing
and analyzing the algorithms for decomposition and drawing in detail giving polynomial time
complexity and space complexity. Finally, I handled visualizing graphs with multi-type edges
using small-multiples, where each tile is dedicated to one edge-type utilizing the topological
graph layout to highlight non-trivial cycles, trees, and DAGs for showing and analyzing the
static structure of software. Finally, I applied this approach to four software systems to show
In this thesis we developed a desynchronization design flow in the goal of easing the de- velopment effort of distributed embedded systems. The starting point of this design flow is a network of synchronous components. By transforming this synchronous network into a dataflow process network (DPN), we ensures important properties that are difficult or theoretically impossible to analyze directly on DPNs are preserved by construction. In particular, both deadlock-freeness and buffer boundedness can be preserved after desyn- chronization. For the correctness of desynchronization, we developed a criteria consisting of two properties: a global property that demands the correctness of the synchronous network, as well as a local property that requires the latency-insensitivity of each local synchronous component. As the global property is also a correctness requirement of synchronous systems in general, we take this property as an assumption of our desyn- chronization. However, the local property is in general not satisfied by all synchronous components, and therefore needs to be verified before desynchronization. In this thesis we developed a novel technique for the verification of the local property that can be carried out very efficiently. Finally we developed a model transformation method that translates a set of synchronous guarded actions – an intermediate format for synchronous systems – to an asynchronous actor description language (CAL). Our theorem ensures that one passed the correctness verification, the generated DPN of asynchronous pro- cesses (or actors) preserves the functional behavior of the original synchronous network. Moreover, by the correctness of the synchronous network, our theorem guarantees that the derived DPN is deadlock-free and can be implemented with only finitely bounded buffers.
Ad-Hoc-Netze sind selbstorganisierende Netze ohne zentrale Infrastruktur, die heutzutage in vielen Bereichen Verwendung finden. Sie bestehen aus drahtlosen Knoten, die zur Erfüllung ihrer Aufgaben miteinander kommunizieren. Jedoch befinden sich nicht notwendigerweise alle Knoten in Reichweite zueinander. Damit entfernte Knoten einander erreichen können, werden Routingverfahren benötigt. Die Etablierung einer beliebigen Route ist jedoch oft nicht ausreichend, denn viele Anwendungen stellen spezielle Dienstgüteanforderungen (QoS-Anforderungen) an die Verbindung, beispielsweise die Gewährleistung einer Mindestbandbreite. Um diese QoS-Anforderungen erfüllen zu können, werden sie bereits bei der Ermittlung einer Route berücksichtigt, und die benötigten Ressourcen werden entlang der Route reserviert. Dazu dienen QoS-Routing- und Reservierungsprotokolle.
In dieser Arbeit wird zunächst der Aspekt der deterministischen Reservierung von Bandbreite in Form von konkreten Zeitslots einer TDMA-basierten MAC-Schicht betrachtet. Da sich die Übertragungen verschiedener Knoten in drahtlosen Netzen gegenseitig stören können, wurde ein Interferenzmodell entwickelt. Dieses identifiziert Bedingungen, unter denen Zeitslots innerhalb eines Netzes für mehr als eine Übertragung verwendet werden können. Zudem definiert es durch Aggregation der Informationen anderer Knoten Möglichkeiten zur Ermittlung der benötigten Informationen, um zu entscheiden, welche Zeitslots für eine störungsfreie Übertragung verwendet werden können.
Weiterhin werden existierende QoS-Routing- und Reservierungsprotokolle auf inhärente Probleme untersucht, wobei der Schwerpunkt auf Protokollen liegt, die deterministische Reservierungen von Zeitslots vornehmen. In diese Kategorie fällt auch das im Rahmen der Arbeit entwickelte Protokoll RBBQR, dessen Hauptziel darin besteht, die identifizierten Probleme zu eliminieren. Ferner wird das ebenfalls zu dieser Kategorie gehörende Protokoll QMRP beschrieben, welches zentralisiert Multicast-Routen inklusive der zugehörigen Reservierungen in teilstationären Netzen ermittelt.
Ein weiterer Bestandteil der Arbeit behandelt die Entwicklung von Simulationskomponenten, welche beispielsweise zur Evaluation von QoS-Routing- und Reservierungsprotokollen genutzt werden können. Das existierende Simulationsframework FERAL wurde um eine Komponente erweitert, die die Verwendung von Kommunikationstechnologien des Netzwerksimulators ns-3 ermöglicht. Weiterhin wurde ein Modul zur Simulation eines CC2420-Transceivers entwickelt, welches in eigenständigen ns-3-Simulationen und in Simulationen mit FERAL verwendet werden kann.
The main goal of this thesis is twofold. First, the thesis aims at bridging the gap between existing Pattern Recognition (PR) methods of automatic signature verification and the requirements for their application in forensic science. This gap, attributed by various factors ranging from system definition to evaluation, prevents automatic methods from being used by Forensic Handwriting Examiners (FHEs). Second, the thesis presents novel signature verification methods developed particularly considering the implications of forensic casework, and outperforming the state-of-the-art PR methods.
The first goal of the thesis is attributed by four important factors, i.e., data, terminology, output reporting, and how evaluation of automatic systems is carried out today. It is argued that traditionally the signature data used in PR are not actual/close representative of the real world data (especially that available in forensic cases). The systems trained on such data are, therefore, not suitable for forensic environments. This situation can be tackled by providing more realistic data to PR researchers. To this end, various signature and handwriting datasets are gathered in collaboration with FHEs and are made publicly available through the course of this thesis. A special attention is given to disguised signatures--where authentic authors purposefully make their signatures look like a forgery. This genre was at large neglected in PR research previously.
The terminology used, in the two communities - PR and FHEs, differ greatly. In fact, even in PR, there is no standard terminology and people often differ in the usage of various terms particularly related to various types of forged signatures/handwriting. The thesis presents a new terminology that is equally useful for both forensic scientists and PR researchers. The proposed terminology is hoped to increase the general acceptability of automatic signature analysis systems in forensic science.
The outputs reported by general signature verification systems are not acceptable for FHEs and courts as they are either binary (yes/no) or score (raw evidence) based on similarity/difference. The thesis describes that automatic systems should rather report the probability of observing the evidence (e.g., a certain similarity/difference score) given the signature belongs to the acclaimed identity, and the probability of observing the same evidence given the signature does not belong to the acclaimed identity. This will take automatic systems from hard decisions to soft decisions, thereby enabling them to report likelihood ratios that actually represent the evidential value of the score rather than the raw score (evidence).
When automatic systems report soft decisions (as in the form of likelihood ratios), the thesis argues that there must be some methods to evaluate such systems. This thesis presents one such adaptation. The thesis argues that the state-of-the-art evaluation methods, like equal error rate and area under curve, do not address the needs of forensic science. These needs require an assessment of the evidential value of signature verification, rather than a hard/pure classification (accept/reject binary decision). The thesis demonstrates and validates a relatively simple adaptation of the current verification methods based on the Bayesian inference dependent calibration of continuous scores rather than hard classifications (binary and/or score based classification).
The second goal of this thesis is to introduce various local features based techniques which are capable of performing signature verification in forensic cases and reporting results as anticipated by FHEs and courts. This is an important contribution of the thesis because of the following two reasons. First, to the best of author's knowledge, local feature descriptors are for the first time used for development of signature verification systems for forensic environments (particularly considering disguised signatures). Previously, such methods have been heavily used for recognition tasks, rather than verification of writing behaviors, such as character and digit recognition. Second, the proposed methods not only report the more traditional decisions (like scores-usually reported in PR) but also the Bayesian inference based likelihood ratios (suitable for courts and forensic cases).
Furthermore, the thesis also provides a detailed man vs. machine comparison for signature verification tasks. The men, in this comparison, are forensic scientists serving as forensic handwriting examiners and having experience of varying number of years. The machines are the local features based methods proposed in this thesis, along with various other state-of-the-art signature verification systems. The proposed methods clearly outperform the state-of-the-art systems, and sometimes the human experts.
Finally, the thesis details various tasks that have been performed in the areas closely related to signature verification and its application in forensic casework. These include, developing novel local feature based methods for extraction of signatures/handwritten text from document images, hyper-spectral image analysis for extraction of signatures from forensic documents, and analysis of on-line signatures acquired through specialized pens equipped with Accelerometer and Gyroscope. These tasks are important as they enable the thesis to take PR systems one step further close to direct application in forensic cases.
Attention-awareness is a key topic for the upcoming generation of computer-human interaction. A human moves his or her eyes to visually attends to a particular region in a scene. Consequently, he or she can process visual information rapidly and efficiently without being overwhelmed by vast amount of information from the environment. Such a physiological function called visual attention provides a computer system with valuable information of the user to infer his or her activity and the surrounding environment. For example, a computer can infer whether the user is reading text or not by analyzing his or her eye movements. Furthermore, it can infer with which object he or she is interacting by recognizing the object the user is looking at. Recent developments of mobile eye tracking technologies enable us
to capture human visual attention in ubiquitous everyday environments. There are various types of applications where attention-aware systems may be effectively incorporated. Typical examples are augmented reality (AR) applications such as Wikitude which overlay virtual information onto physical objects. This type of AR application presents augmentative information of recognized objects to the user. However, if it presents information of all recognized objects at once, the over
ow of information could be obtrusive to the user. As a solution for such a problem, attention-awareness can be integrated into a system. If a
system knows to which object the user is attending, it can present only the information of
relevant objects to the user.
Towards attention-aware systems in everyday environments, this thesis presents approaches
for analysis of user attention to visual content. Using a state-of-the-art wearable eye tracking device, one can measure the user's eye movements in a mobile scenario. By capturing the user's eye gaze position in a scene and analyzing the image where the eyes focus, a computer can recognize the visual content the user is currently attending to. I propose several image analysis methods to recognize the user-attended visual content in a scene image. For example, I present an application called Museum Guide 2.0. In Museum Guide 2.0, image-based object recognition and eye gaze analysis are combined together to recognize user-attended objects in a museum scenario. Similarly, optical character recognition
(OCR), face recognition, and document image retrieval are also combined with eye gaze analysis to identify the user-attended visual content in respective scenarios. In addition to Museum Guide 2.0, I present other applications in which these combined frameworks are effectively used. The proposed applications show that the user can benefit from active information presentation which augments the attended content in a virtual environment with
a see-through head-mounted display (HMD).
In addition to the individual attention-aware applications mentioned above, this thesis
Furthermore, I present novel interaction methodologies for a see-through HMD using eye gaze input. A see-through HMD is a suitable device for a wearable attention-aware system for everyday environments because the user can also view his or her physical environment
through the display. I propose methods for the user's attention engagement estimation with the display, eye gaze-driven proactive user assistance functions, and a method for interacting
with a multi-focal see-through display.
Contributions of this thesis include:
• An overview of the state-of-the-art in attention-aware computer-human interaction
and attention-integrated image analysis.
• Methods for the analysis of user-attended visual content in various scenarios.
• Demonstration of the feasibilities and the benefits of the proposed user-attended visual content analysis methods with practical user-supportive applications.
• Methods for interaction with a see-through HMD using eye gaze.
• A comprehensive framework for recognition of user-attended visual content in a complex
scene where multiple visual information resources are present.
This thesis opens a novel field of wearable computer systems where computers can understand the user attention in everyday environments and provide with what the user wants. I will show the potential of such wearable attention-aware systems for everyday
environments for the next generation of pervasive computer-human interaction.