Mining and Querying Ranked Entities

  • Tables or ranked lists summarize facts about a group of entities in a concise and structured fashion. They are found in all kind of domains and easily comprehensible by humans. Some globally prominent examples of such rankings are the tallest buildings in the World, the richest people in Germany, or most powerful cars. The availability of vast amounts of tables or rankings from open domain allows different ways to explore data. Computing similarity between ranked lists, in order to find those lists where entities are presented in a similar order, carries important analytical insights. This thesis presents a novel query-driven Locality Sensitive Hashing (LSH) method, in order to efficiently find similar top-k rankings for a given input ranking. Experiments show that the proposed method provides a far better performance than inverted-index--based approaches, in particular, it is able to outperform the popular prefix-filtering method. Additionally, an LSH-based probabilistic pruning approach is proposed that optimizes the space utilization of inverted indices, while still maintaining a user-provided recall requirement for the results of the similarity search. Further, this thesis addresses the problem of automatically identifying interesting categorical attributes, in order to explore the entity-centric data by organizing them into meaningful categories. Our approach proposes novel statistical measures, beyond known concepts, like information entropy, in order to capture the distribution of data to train a classifier that can predict which categorical attribute will be perceived suitable by humans for data categorization. We further discuss how the information of useful categories can be applied in PANTHEON and PALEO, two data exploration frameworks developed in our group.

Volltext Dateien herunterladen

Metadaten exportieren

Weitere Dienste

Teilen auf Twitter Suche bei Google Scholar
Verfasserangaben:Koninika Pal
URN (Permalink):urn:nbn:de:hbz:386-kluedo-54188
Betreuer:Sebastian Michel
Sprache der Veröffentlichung:Englisch
Veröffentlichungsdatum (online):16.11.2018
Jahr der Veröffentlichung:2018
Veröffentlichende Institution:Technische Universität Kaiserslautern
Titel verleihende Institution:Technische Universität Kaiserslautern
Datum der Annahme der Abschlussarbeit:11.09.2018
Datum der Publikation (Server):21.11.2018
Seitenzahl:VII, 167
Fachbereiche / Organisatorische Einheiten:Fachbereich Informatik
DDC-Sachgruppen:0 Allgemeines, Informatik, Informationswissenschaft / 004 Informatik
Lizenz (Deutsch):Creative Commons 4.0 - Namensnennung, nicht kommerziell, keine Bearbeitung (CC BY-NC-ND 4.0)