I.2.6 Learning (K.3.2)
Refine
Document Type
- Doctoral Thesis (2)
Language
- English (2)
Has Fulltext
- yes (2)
Keywords
- Artificial Intelligence (2)
- Association (1)
- Deep Learning (1)
- Evolutionary Algorithm (1)
- Explainability (1)
- Linked Data (1)
- Machine Learning (1)
- Privacy (1)
- SPARQL (1)
- SPARQL query learning (1)
Faculty / Organisational entity
During our daily lives, we are confronted with vast amounts of data, the processing of which can dramatically influence our lives, both positively and negatively. The enormous amount of data (images, texts, tables, and time series), its variety and possible applications are not always obvious. Due to advancements in the internet of things (IoT), there exist billions of sensors that produce time series which can be found everywhere, whether in medicine, the financial sector or the agricultural economy. This incredible amount of time series data has many hidden features which are useful for industry as well as for daily use, e.g. improving the cancer prediction can save real human lives. Recently, several deep learning methods have been proposed for analyzing this time series data. However, due to their black box nature, their applicability is limited in critical sectors like medicine, finance, and communication. In addition, it is now a compulsion as per artificial intelligence (AI) Act and per General Data Protection Regulation (GDPR) to protect any sensitive data and provide explanations in safety-critical domains. To enable use of DNNs in a broader domain scope, this thesis presents a framework for privacy-preserved and interpretable time series analysis. TimeFrame consists of four main components, namely, post-hoc interpretability, intrinsic interpretability, direct privacy, and indirect privacy. Interpretability is indispensable to avoid damaging people or the infrastructure. In the past years, the development mostly focused on image data, which prevented the full potential of DNNs in time series processing from being exploited. To overcome this limitation, TimeFrame introduces five (Time to Focus, TSViz, TimeREISE, TSInsight, Data Lens) novel post-hoc and two (PatchX, P2ExNet) novel intrinsic interpretability components. TimeFrame addresses multiple perspectives such as attribution, compression, visualization, influence, prototyping, and hierarchical splitting. Compared to existing methods, the components show better explanations, robustness, and scalability. Another crucial factor is the privacy when dealing with sensitive data and deep learning. In this context, TimeFrame introduces two (PPML, PPML x XAI) components for direct and one (From Private to Public) component for indirect privacy. These components benchmark privacy approaches, their effect on interpretability, and the synthetic generation of data to overcome privacy concerns. TimeFrame offers a large set of interpretability and privacy components that can be combined and consider numerous different aspects. Furthermore, the novel approaches have shown to consistently outperform twenty existing state-of-the-art methods across up to 20 different datasets. To guarantee the fairness, various metrics were used including performance change, Sensitivity, Infidelity, Continuity, runtime, model dependency, compression rate, and others. This broad set of metrics makes it possible to provide guidelines for a more appropriate use of existing state-of-the-art approaches as well as the novel components included in TimeFrame.
In recent years, enormous progress has been made in the field of Artificial Intelligence (AI). Especially the introduction of Deep Learning and end-to-end learning, the availability of large datasets and the necessary computational power in form of specialised hardware allowed researchers to build systems with previously unseen performance in areas such as computer vision, machine translation and machine gaming. In parallel, the Semantic Web and its Linked Data movement have published many interlinked RDF datasets, forming the world’s largest, decentralised and publicly available knowledge base.
Despite these scientific successes, all current systems are still narrow AI systems. Each of them is specialised to a specific task and cannot easily be adapted to all other human intelligence tasks, as would be necessary for Artificial General Intelligence (AGI). Furthermore, most of the currently developed systems are not able to learn by making use of freely available knowledge such as provided by the Semantic Web. Autonomous incorporation of new knowledge is however one of the pre-conditions for human-like problem solving.
This work provides a small step towards teaching machines such human-like reasoning on freely available knowledge from the Semantic Web. We investigate how human associations, one of the building blocks of our thinking, can be simulated with Linked Data. The two main results of these investigations are a ground truth dataset of semantic associations and a machine learning algorithm that is able to identify patterns for them in huge knowledge bases.
The ground truth dataset of semantic associations consists of DBpedia entities that are known to be strongly associated by humans. The dataset is published as RDF and can be used for future research.
The developed machine learning algorithm is an evolutionary algorithm that can learn SPARQL queries from a given SPARQL endpoint based on a given list of exemplary source-target entity pairs. The algorithm operates in an end-to-end learning fashion, extracting features in form of graph patterns without the need for human intervention. The learned patterns form a feature space adapted to the given list of examples and can be used to predict target candidates from the SPARQL endpoint for new source nodes. On our semantic association ground truth dataset, our evolutionary graph pattern learner reaches a Recall@10 of > 63 % and an MRR (& MAP) > 43 %, outperforming all baselines. With an achieved Recall@1 of > 34% it even reaches average human top response prediction performance. We also demonstrate how the graph pattern learner can be applied to other interesting areas without modification.