We tackle the problem of obtaining statistics on content and structure of XML documents by using summaries which may provide cardinality estimations for XML query expressions. Our focus is a data-centric processing scenario in which we use a query engine to process such query expressions. We provide three new summary structures called LESS (Leaf-Element-in-Subtree), LWES (Level-Wide Element Summarization), and EXsum (Element-centered XML Summarization) which are targeted to base an estimation process in an XML query optimizer. Each of these collects structural statistical information of XML documents, and the latter (EXsum) gathers, in addition, statistics on document content. Estimation procedures and/or heuristics for specic types of query expressions of each proposed approach are developed. We have incorporated and implemented our proposals in XTC, a native XML database management system (XDBMS). With this common implementation base, we present an empirical and comparative study in which our proposals are stressed against others published in the literature, which are also incorporated into the XTC. Furthermore, an analysis is made based on criteria pertinent to a query optimizer process.
A prime motivation for using XML to directly represent pieces of information is the ability of supporting ad-hoc or 'schema-later' settings. In such scenarios, modeling data under loose data constraints is essential. Of course, the flexibility of XML comes at a price: the absence of a rigid, regular, and homogeneous structure makes many aspects of data management more challenging. Such malleable data formats can also lead to severe information quality problems, because the risk of storing inconsistent and incorrect data is greatly increased. A prominent example of such problems is the appearance of the so-called fuzzy duplicates, i.e., multiple and non-identical representations of a real-world entity. Similarity joins correlating XML document fragments that are similar can be used as core operators to support the identification of fuzzy duplicates. However, similarity assessment is especially difficult on XML datasets because structure, besides textual information, may exhibit variations in document fragments representing the same real-world entity. Moreover, similarity computation is substantially more expensive for tree-structured objects and, thus, is a serious performance concern. This thesis describes the design and implementation of an effective, flexible, and high-performance XML-based similarity join framework. As main contributions, we present novel structure-conscious similarity functions for XML trees - either considering XML structure in isolation or combined with textual information -, mechanisms to support the selection of relevant information from XML trees and organization of this information into a suitable format for similarity calculation, and efficient algorithms for large-scale identification of similar, set-represented objects. Finally, we validate the applicability of our techniques by integrating our framework into a native XML database management system; in this context we address several issues around the integration of similarity operations into traditional database architectures.
In robotics, information is often regarded as a means to an end. The question of how to structure information and how to bridge the semantic gap between different levels of abstraction in a uniform way is still widely regarded as a technical issue. Ignoring these challenges appears to lead robotics into a similar stasis as experienced in the software industry of the late 1960s. From the beginning of the software crisis until today, numerous methods, techniques, and tools for managing the increasing complexity of software systems have evolved. The attempt to transfer several of these ideas towards applications in robotics yielded various control architectures, frameworks, and process models. These attempts mainly provide modularisation schemata which suggest how to decompose a complex system into less complex subsystems. The schematisation of representation and information ﬂow however is mostly ignored. In this work, a set of design schemata is proposed which is embedded into an action/perception-oriented design methodology to promote thorough abstractions between distinct levels of control. Action-oriented design decomposes control systems top-down and sensor data is extracted from the environment as required. This comes with the problem that information is often condensed in a premature fashion. That way, sensor processing is dependent on the control system design resulting in a monolithical system structure with limited options for reusability. In contrast, perception-oriented design constructs control systems bottom-up starting with the extraction of environment information from sensor data. The extracted entities are placed into structures which evolve with the development of the sensor processing algorithms. In consequence, the control system is strictly dependent on the sensor processing algorithms which again results in a monolithic system. In their particular domain, both design approaches have great advantages but fail to create inherently modular systems. The design approach proposed in this work combines the strengths of action orientation and perception orientation into one coherent methodology without inheriting their weaknesses. More precisely, design schemata for representation, translation, and fusion of environmental information are developed which establish thorough abstraction mechanisms between components. The explicit introduction of abstractions particularly supports extensibility and scalability of robot control systems by design.