The goal of this work is to develop statistical natural language models and processing techniques
based on Recurrent Neural Networks (RNN), especially the recently introduced Long Short-
Term Memory (LSTM). Due to their adapting and predicting abilities, these methods are more
robust, and easier to train than traditional methods, i.e., words list and rule-based models. They
improve the output of recognition systems and make them more accessible to users for browsing
and reading. These techniques are required, especially for historical books which might take
years of effort and huge costs to manually transcribe them.
The contributions of this thesis are several new methods which have high-performance computing and accuracy. First, an error model for improving recognition results is designed. As
a second contribution, a hyphenation model for difficult transcription for alignment purposes
is suggested. Third, a dehyphenation model is used to classify the hyphens in noisy transcription. The fourth contribution is using LSTM networks for normalizing historical orthography.
A size normalization alignment is implemented to equal the size of strings, before the training
phase. Using the LSTM networks as a language model to improve the recognition results is
the fifth contribution. Finally, the sixth contribution is a combination of Weighted Finite-State
Transducers (WFSTs), and LSTM applied on multiple recognition systems. These contributions
will be elaborated in more detail.
Context-dependent confusion rules is a new technique to build an error model for Optical
Character Recognition (OCR) corrections. The rules are extracted from the OCR confusions
which appear in the recognition outputs and are translated into edit operations, e.g., insertions,
deletions, and substitutions using the Levenshtein edit distance algorithm. The edit operations
are extracted in a form of rules with respect to the context of the incorrect string to build an
error model using WFSTs. The context-dependent rules assist the language model to find the
best candidate corrections. They avoid the calculations that occur in searching the language
model and they also make the language model able to correct incorrect words by using context-
dependent confusion rules. The context-dependent error model is applied on the university of
Washington (UWIII) dataset and the Nastaleeq script in Urdu dataset. It improves the OCR
results from an error rate of 1.14% to an error rate of 0.68%. It performs better than the
state-of-the-art single rule-based which returns an error rate of 1.0%.
This thesis describes a new, simple, fast, and accurate system for generating correspondences
between real scanned historical books and their transcriptions. The alignment has many challenges, first, the transcriptions might have different modifications, and layout variations than the
original book. Second, the recognition of the historical books have misrecognition, and segmentation errors, which make the alignment more difficult especially the line breaks, and pages will
not have the same correspondences. Adapted WFSTs are designed to represent the transcription. The WFSTs process Fraktur ligatures and adapt the transcription with a hyphenations
model that allows the alignment with respect to the varieties of the hyphenated words in the line
breaks of the OCR documents. In this work, several approaches are implemented to be used for
the alignment such as: text-segments, page-wise, and book-wise approaches. The approaches
are evaluated on German calligraphic (Fraktur) script historical documents dataset from “Wan-
derungen durch die Mark Brandenburg” volumes (1862-1889). The text-segmentation approach
returns an error rate of 2.33% without using a hyphenation model and an error rate of 2.0%
using a hyphenation model. Dehyphenation methods are presented to remove the hyphen from
the transcription. They provide the transcription in a readable and reflowable format to be used
for alignment purposes. We consider the task as classification problem and classify the hyphens
from the given patterns as hyphens for line breaks, combined words, or noise. The methods are
applied on clean and noisy transcription for different languages. The Decision Trees classifier
returns better performance on UWIII dataset and returns an accuracy of 98%. It returns 97%
on Fraktur script.
A new method for normalizing historical OCRed text using LSTM is implemented for different texts, ranging from Early New High German 14th - 16th centuries to modern forms in New
High German applied on the Luther bible. It performed better than the rule-based word-list
approaches. It provides a transcription for various purposes such as part-of-speech tagging and
n-grams. Also two new techniques are presented for aligning the OCR results and normalize the
size by using adding Character-Epsilons or Appending-Epsilons. They allow deletion and insertion in the appropriate position in the string. In normalizing historical wordforms to modern
wordforms, the accuracy of LSTM on seen data is around 94%, while the state-of-the-art combined rule-based method returns 93%. On unseen data, LSTM returns 88% and the combined
rule-based method returns 76%. In normalizing modern wordforms to historical wordforms, the
LSTM delivers the best performance and returns 93.4% on seen data and 89.17% on unknown
In this thesis, a deep investigation has been done on constructing high-performance language
modeling for improving the recognition systems. A new method to construct a language model
using LSTM is designed to correct OCR results. The method is applied on UWIII and Urdu
script. The LSTM approach outperforms the state-of-the-art, especially for unseen tokens
during training. On the UWIII dataset, the LSTM returns reduction in OCR error rates from
1.14% to 0.48%. On the Nastaleeq script in Urdu dataset, the LSTM reduces the error rate
from 6.9% to 1.58%.
Finally, the integration of multiple recognition outputs can give higher performance than a
single recognition system. Therefore, a new method for combining the results of OCR systems is
explored using WFSTs and LSTM. It uses multiple OCR outputs and votes for the best output
to improve the OCR results. It performs better than the ISRI tool, Pairwise of Multiple Sequence and it helps to improve the OCR results. The purpose is to provide correct transcription
so that it can be used for digitizing books, linguistics purposes, N-grams, and part-of-speech
tagging. The method consists of two alignment steps. First, two recognition systems are aligned
using WFSTs. The transducers are designed to be more flexible and compatible with the different symbols in line and page breaks to avoid the segmentation and misrecognition errors.
The LSTM model then is used to vote the best candidate correction of the two systems and
improve the incorrect tokens which are produced during the first alignment. The approaches
are evaluated on OCRs output from the English UWIII and historical German Fraktur dataset
which are obtained from state-of-the-art OCR systems. The Experiments show that the error
rate of ISRI-Voting is 1.45%, the error rate of the Pairwise of Multiple Sequence is 1.32%, the
error rate of the Line-to-Page alignment is 1.26% and the error rate of the LSTM approach has
the best performance with 0.40%.
The purpose of this thesis is to contribute methods providing correct transcriptions corresponding to the original book. This is considered to be the first step towards an accurate and
more effective use of the documents in digital libraries.
Unter Ambient Intelligence (AmI) wird die Integration verschiedener Technologien zu einer den Menschen umgebenden, (nahezu) unsichtbaren Gesamtheit verstanden. Diese Intelligente Umgebung wird möglich durch die Miniaturisierung hochintegrierter Bauteile (Sensoren, Aktuatoren und Rechnern), deren zunehmende Intelligenz und vor allem deren lokale und globale zunehmend drahtlose Vernetzung. Unter dem Titel Man-u-Faktur 2012 (man and factoring in 2012) wurde an der Technischen Universität Kaiserslautern im Rahmen des Forschungsschwerpunkts Ambient Intelligence ein Szenario entwickelt, das ein beeindruckendes Gesamtbild einer Technik, die den Menschen in den Mittelpunkt rückt, beschreibt. Man-u-Faktur 2012 steht dabei für ein Weiterdrehen des Rads der Industrialisierung von der heute üblichen variantenreichen, technologiezentrierten Massenfertigung hin zu einer kundenindividuellen, mitarbeiterzentrierten Maßfertigung. Im Speziellen wird hierunter der Aufbau massiv verteiler kunden- aber auch mitarbeiterfreundlicher Produktionsanlagen verstanden, die sich im hochdynamischen Umfeld entsprechend der jeweiligen Gegebenheiten anzupassen wissen. Der Mensch ist überall dort präsent, wo flexibles Arbeiten oder flexible Entscheidungen im Vordergrund stehen. In diesem Bericht wird der Einfluss von Ambient Intelligence beispielhaft auf die Vision einer Fahrradproduktion in der Man-u-Faktur 2012 angewandt. Aus diesem Szenario werden anschließend sowohl die zu entwickelnden Schlüsseltechnologien als auch die Einflüsse auf Wirtschaft und Gesellschaft abgeleitet.
This dissertation focuses on the visualization of urban microclimate data sets,
which describe the atmospheric impact of individual urban features. The application
and adaptation of visualization and analysis concepts to enhance the
insight into observational data sets used this specialized area are explored, motivated
through application problems encountered during active involvement
in urban microclimate research at the Arizona State University in Tempe, Arizona.
Besides two smaller projects dealing with the analysis of thermographs
recorded with a hand-held device and visualization techniques used for building
performance simulation results, the main focus of the work described in
this document is the development of a prototypic tool for the visualization
and analysis of mobile transect measurements. This observation technique involves
a sensor platform mounted to a vehicle, which is then used to traverse
a heterogeneous neighborhood to investigate the relationships between urban
form and microclimate. The resulting data sets are among the most complex
modes of in-situ observations due to their spatio-temporal dependence, their
multivariate nature, but also due to the various error sources associated with
moving platform observations.
The prototype enables urban climate researchers to preprocess their data,
to explore a single transect in detail, and to aggregate observations from multiple
traverses conducted over diverse routes for a visual delineation of climatic
microenvironments. Extending traditional analysis methods, the suggested visualization
tool provides techniques to relate the measured attributes to each
other and to the surrounding land cover structure. In addition to that, an
improved method for sensor lag correction is described, which shows the potential
to increase the spatial resolution of measurements conducted with slow
air temperature sensors.
In summary, the interdisciplinary approach followed in this thesis triggers
contributions to geospatial visualization and visual analytics, as well as to urban
climatology. The solutions developed in the course of this dissertation are
meant to support domain experts in their research tasks, providing means to
gain a qualitative overview over their specific data sets and to detect patterns,
which can then be further analyzed using domain-specific tools and methods.
Dieses Szenario ist eine Erweiterung eines Teilszenarios von Human Centered Manufacturing. Dabei geht es um die Montage der Energieelektrik für industrielle Anlagen. Im Jahr 2015 enthält die Ausrüstung eines Elektromonteurs bei der Verdrahtung von Schaltschränken u.a. einen Schutzhelm mit integrierter Farbkamera, integriertem Mikrofon und einem Lautsprecher im Ohrbereich sowie einen automatisch gesteuerten Laserpointer. Auf der Baustelle sind keine Pläne mehr erforderlich. Der Monteur benötigt keinen Plan während der Montage.