H.2 DATABASE MANAGEMENT (E.5)

H.2.0 General
H.2.1 Logical Design
H.2.2 Physical Design
H.2.3 Languages (D.3.2)
H.2.4 Systems
H.2.5 Heterogeneous Databases (3)
H.2.6 Database Machines
H.2.7 Database Administration
H.2.8 Database Applications (1)
H.2.m Miscellaneous

2 search hits

1 to 2

Sort by

A Framework for XML Similarity Joins (2010)

A prime motivation for using XML to directly represent pieces of information is the ability of supporting ad-hoc or 'schema-later' settings. In such scenarios, modeling data under loose data constraints is essential. Of course, the flexibility of XML comes at a price: the absence of a rigid, regular, and homogeneous structure makes many aspects of data management more challenging. Such malleable data formats can also lead to severe information quality problems, because the risk of storing inconsistent and incorrect data is greatly increased. A prominent example of such problems is the appearance of the so-called fuzzy duplicates, i.e., multiple and non-identical representations of a real-world entity. Similarity joins correlating XML document fragments that are similar can be used as core operators to support the identification of fuzzy duplicates. However, similarity assessment is especially difficult on XML datasets because structure, besides textual information, may exhibit variations in document fragments representing the same real-world entity. Moreover, similarity computation is substantially more expensive for tree-structured objects and, thus, is a serious performance concern. This thesis describes the design and implementation of an effective, flexible, and high-performance XML-based similarity join framework. As main contributions, we present novel structure-conscious similarity functions for XML trees - either considering XML structure in isolation or combined with textual information -, mechanisms to support the selection of relevant information from XML trees and organization of this information into a suitable format for similarity calculation, and efficient algorithms for large-scale identification of similar, set-represented objects. Finally, we validate the applicability of our techniques by integrating our framework into a native XML database management system; in this context we address several issues around the integration of similarity operations into traditional database architectures.

Unterstützung von datenbankorientierten Software-Produktlinien durch domänenspezifische Spracherweiterungen für SQL (2004)

Weber, Christian

Im Rahmen dieser Diplomarbeit werden die Konzepte zur Unterstützung von datenbankorientierten Software-Produktlinien durch domänenspezifische Sprachen am Beispiel von Versionierungssystemen untersucht. Ziel dieser Arbeit ist es, die zeitlichen Kosten, die durch die Nutzung einer domänenspezifischen Sprache entstehen, zu bestimmen. Dabei werden unterschiedliche Datenbankschemata verwendet, um zu untersuchen, welcher Zusammenhang zwischen der Komplexität des Datenbankschemas und der Übersetzung einer domänenspezifischen Anweisung in eine Reihe von herkömmlichen SQL-Anweisungen besteht. Um die zeitlichen Kosten für die Reduktion zu bestimmen, werden Leistungsuntersuchungen durchgeführt. Grundlage für diese Leistungsuntersuchungen sind domänenspezifische Anweisungen, die von einem speziell für diesen Zweck entwickelten Generator erzeugt wurden. Diese generierten domänenspezifischen Anweisungen werden mit den unterschiedlichen Datenbanktreibern auf dem passenden Datenbankschema ausgeführt.

1 to 2

H.2 DATABASE MANAGEMENT (E.5)

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Keywords

Faculty / Organisational entity

2 search hits