Doctoral Thesis
Refine
Document Type
- Doctoral Thesis (3) (remove)
Language
- English (3)
Has Fulltext
- yes (3)
Keywords
- Neural Networks (3) (remove)
Faculty / Organisational entity
Industrial robots are vital in automation technology, but their limitations become evident in applications requiring high path accuracy. This research focuses on improving the dynamic path accuracy of industrial robots by integrating additional sensor technology and employing intelligent feed-forward control. Specifically, the inclusion of secondary encoder sensors enables explicit measurement and compensation of robot gear deformations. Three types of model-based feed-forward controllers, namely physics-based, data-based, and hybrid, are developed to effectively counteract dynamic effects.
Firstly, a physics-based feed-forward control method is proposed, explicitly modeling joint deformations, hydraulic weight compensation, and other relevant features. Nonlinear friction parameters are accurately identified using a globally optimized design of experiments. The resulting physics-based model is fully continuously differentiable, facilitating its transformation into a code-optimized flatness-based feed-forward control.
Secondly, a data-based feed-forward control approach is introduced, leveraging a continuous-time neural network. The continuous-time approach demonstrates enhanced model generalization capabilities even with limited data. Furthermore, a time domain normalization method is introduced, significantly improving numerical properties by concurrently normalizing measurement timelines, robot states, and state derivatives. Based on previous work, a method ensuring input-to-state and global-asymptotic stability is presented, employing a Lyapunov function. Model stability is enforced already during training using constrained optimization techniques. Moreover, the data-based methods are evaluated on public benchmarks, extending its applicability beyond the field of robotics.
Both the physics-based and data-based models are combined into a hybrid model. Comparative analysis of the three models reveals that the continuous-time neural network yields the highest model accuracy, while the physics-based model delivers the best safety properties. The effectiveness of all three models is experimentally validated using an industrial robot.
In recent years, the Internet has become a major source of visual information exchange. Popular social platforms have reported an average of 80 million photo uploads a day. These images, are often accompanied with a user provided text one-liner, called an image caption. Deep Learning techniques have made significant advances towards automatic generation of factual image captions. However, captions generated by humans are much more than mere factual image descriptions. This work takes a step towards enhancing a machine's ability to generate image captions with human-like properties. We name this field as Affective Image Captioning, to differentiate it from the other areas of research focused on generating factual descriptions.
To deepen our understanding of human generated captions, we first perform a large-scale Crowd-Sourcing study on a subset of Yahoo Flickr Creative Commons 100 Million Dataset (YFCC100M). Three thousand random image-caption pairs were evaluated by native English speakers w.r.t different dimensions like focus, intent, emotion, meaning, and visibility. Our findings indicate three important underlying properties of human captions: subjectivity, sentiment, and variability. Based on these results, we develop Deep Learning models to address each of these dimensions.
To address the subjectivity dimension, we propose the Focus-Aspect-Value (FAV) model (along with a new task of aspect-detection) to structure the process of capturing subjectivity. We also introduce a novel dataset, aspects-DB, following this way of modeling. To implement the model, we propose a novel architecture called Tensor Fusion. Our experiments show that Tensor Fusion outperforms the state-of-the-art cross residual networks (XResNet) in aspect-detection.
Towards the sentiment dimension, we propose two models:Concept & Syntax Transition Network (CAST) and Show & Tell with Emotions (STEM). The CAST model uses a graphical structure to generate sentiment. The STEM model uses a neural network to inject adjectives into a neutral caption. Achieving a high score of 93% with human evaluation, these models were selected as the top-3 at the ACMMM Grand Challenge 2016.
To address the last dimension, variability, we take a generative approach called Generative Adversarial Networks (GAN) along with multimodal fusion. Our modified GAN, with two discriminators, is trained using Reinforcement Learning. We also show that it is possible to control the properties of the generated caption-variations with an external signal. Using sentiment as the external signal, we show that we can easily outperform state-of-the-art sentiment caption models.
In this work we present and estimate an explanatory model with a predefined system of explanatory equations, a so called lag dependent model. We present a locally optimal, on blocked neural network based lag estimator and theorems about consistensy. We define the change points in context of lag dependent model, and present a powerfull algorithm for change point detection in high dimensional high dynamical systems. We present a special kind of bootstrap for approximating the distribution of statistics of interest in dependent processes.