## 90C06 Large-scale problems

### Refine

#### Document Type

- Diploma Thesis (2)
- Doctoral Thesis (2)

#### Keywords

- Aggregation (1)
- Artificial Intelligence (1)
- Association (1)
- Dynamic Network Flow Problem (1)
- Evacuation Planning (1)
- Evolutionary Algorithm (1)
- Large-Scale Problems (1)
- Linked Data (1)
- MLC (1)
- Machine Learning (1)

#### Faculty / Organisational entity

In recent years, enormous progress has been made in the field of Artificial Intelligence (AI). Especially the introduction of Deep Learning and end-to-end learning, the availability of large datasets and the necessary computational power in form of specialised hardware allowed researchers to build systems with previously unseen performance in areas such as computer vision, machine translation and machine gaming. In parallel, the Semantic Web and its Linked Data movement have published many interlinked RDF datasets, forming the world’s largest, decentralised and publicly available knowledge base.
Despite these scientific successes, all current systems are still narrow AI systems. Each of them is specialised to a specific task and cannot easily be adapted to all other human intelligence tasks, as would be necessary for Artificial General Intelligence (AGI). Furthermore, most of the currently developed systems are not able to learn by making use of freely available knowledge such as provided by the Semantic Web. Autonomous incorporation of new knowledge is however one of the pre-conditions for human-like problem solving.
This work provides a small step towards teaching machines such human-like reasoning on freely available knowledge from the Semantic Web. We investigate how human associations, one of the building blocks of our thinking, can be simulated with Linked Data. The two main results of these investigations are a ground truth dataset of semantic associations and a machine learning algorithm that is able to identify patterns for them in huge knowledge bases.
The ground truth dataset of semantic associations consists of DBpedia entities that are known to be strongly associated by humans. The dataset is published as RDF and can be used for future research.
The developed machine learning algorithm is an evolutionary algorithm that can learn SPARQL queries from a given SPARQL endpoint based on a given list of exemplary source-target entity pairs. The algorithm operates in an end-to-end learning fashion, extracting features in form of graph patterns without the need for human intervention. The learned patterns form a feature space adapted to the given list of examples and can be used to predict target candidates from the SPARQL endpoint for new source nodes. On our semantic association ground truth dataset, our evolutionary graph pattern learner reaches a Recall@10 of > 63 % and an MRR (& MAP) > 43 %, outperforming all baselines. With an achieved Recall@1 of > 34% it even reaches average human top response prediction performance. We also demonstrate how the graph pattern learner can be applied to other interesting areas without modification.

In this thesis we have discussed the problem of decomposing an integer matrix \(A\) into a weighted sum \(A=\sum_{k \in {\mathcal K}} \alpha_k Y^k\) of 0-1 matrices with the strict consecutive ones property. We have developed algorithms to find decompositions which minimize the decomposition time \(\sum_{k \in {\mathcal K}} \alpha_k\) and the decomposition cardinality \(|\{ k \in {\mathcal K}: \alpha_k > 0\}|\). In the absence of additional constraints on the 0-1 matrices \(Y^k\) we have given an algorithm that finds the minimal decomposition time in \({\mathcal O}(NM)\) time. For the case that the matrices \(Y^k\) are restricted to shape matrices -- a restriction which is important in the application of our results in radiotherapy -- we have given an \({\mathcal O}(NM^2)\) algorithm. This is achieved by solving an integer programming formulation of the problem by a very efficient combinatorial algorithm. In addition, we have shown that the problem of minimizing decomposition cardinality is strongly NP-hard, even for matrices with one row (and thus for the unconstrained as well as the shape matrix decomposition). Our greedy heuristics are based on the results for the decomposition time problem and produce better results than previously published algorithms.

Aggregation of Large-Scale Network Flow Problems with Application to Evacuation Planning at SAP
(2005)

Our initial situation is as follows: The blueprint of the ground floor of SAP’s main building the EVZ is given and the open question on how mathematic can support the evacuation’s planning process ? To model evacuation processes in advance as well as for existing buildings two models can be considered: macro- and microscopic models. Microscopic models emphasize the individual movement of evacuees. These models consider individual parameters such as walking speed, reaction time or physical abilities as well as the interaction of evacuees during the evacuation process. Because of the fact that the microscopic model requires lots of data, simulations are taken for implementation. Most of the current approaches concerning simulation are based on cellular automats. In contrast to microscopic models, macroscopic models do not consider individual parameters such as the physical abilities of the evacuees. This means that the evacuees are treated as a homogenous group for which only common characteristics are considered; an average human being is assumed. We do not have that much data as in the case of the microscopic models. Therefore, the macroscopic models are mainly based on optimization approaches. In most cases, a building or any other evacuation object is represented through a static network. A time horizon T is added, in order to be able to describe the evolution of the evacuation process over time. Connecting these two components we finally get a dynamic network. Based on this network, dynamic network flow problems are formulated, which can map evacuation processes. We focused on the macroscopic model in our thesis. Our main focus concerning the transfer from the real world problem (e.g. supporting the evacuation planning) will be the modeling of the blueprint as a dynamic network. After modeling the blueprint as a dynamic network, it will be no problem to give a formulation of a dynamic network flow problem, the so-called evacuation problem, which seeks for an optimal evacuation time. However, we have to solve a static large-scale network flow problem to derive a solution for this formulation. In order to reduce the network size, we will examine the possibility of applying aggregation to the evacuation problem. Aggregation (lat. aggregare = piling, affiliate; lat. aggregatio = accumulation, union; the act of gathering something together) was basically used to reduce the size of general large-scale linear or integer programs. The results gained for the general problem definitions were then applied to the transportation problem and the minimum cost network flow problem. We review this theory in detail and look on how results derived there can be used for the evacuation problem, too.

This essay discusses the multileaf collimator leaf sequencing problem, which occurs in every treatment planning in radiation therapy. The problem is to find a good realization in terms of a leaf sequence in the multileaf collimator such that the time needed to deliver the given dose profile is minimized. A mathematical model using an integer programming formulation has been developed. Additionally, a heuristic, based on existing algorithms and an integer programming formulation, has been developed to enhance the quality of the solutions. Comparing the results to those provided by other algorithms, a significant improvement can be observed.