The search result changed since you submitted your search request. Documents might be displayed in a different sort order.
  • search hit 13 of 909
Back to Result List

Mining and Querying Ranked Entities

  • Tables or ranked lists summarize facts about a group of entities in a concise and structured fashion. They are found in all kind of domains and easily comprehensible by humans. Some globally prominent examples of such rankings are the tallest buildings in the World, the richest people in Germany, or most powerful cars. The availability of vast amounts of tables or rankings from open domain allows different ways to explore data. Computing similarity between ranked lists, in order to find those lists where entities are presented in a similar order, carries important analytical insights. This thesis presents a novel query-driven Locality Sensitive Hashing (LSH) method, in order to efficiently find similar top-k rankings for a given input ranking. Experiments show that the proposed method provides a far better performance than inverted-index--based approaches, in particular, it is able to outperform the popular prefix-filtering method. Additionally, an LSH-based probabilistic pruning approach is proposed that optimizes the space utilization of inverted indices, while still maintaining a user-provided recall requirement for the results of the similarity search. Further, this thesis addresses the problem of automatically identifying interesting categorical attributes, in order to explore the entity-centric data by organizing them into meaningful categories. Our approach proposes novel statistical measures, beyond known concepts, like information entropy, in order to capture the distribution of data to train a classifier that can predict which categorical attribute will be perceived suitable by humans for data categorization. We further discuss how the information of useful categories can be applied in PANTHEON and PALEO, two data exploration frameworks developed in our group.

Download full text files

Export metadata

Metadaten
Author:Koninika Pal
URN:urn:nbn:de:hbz:386-kluedo-54188
Advisor:Sebastian Michel
Document Type:Doctoral Thesis
Language of publication:English
Date of Publication (online):2018/11/16
Year of first Publication:2018
Publishing Institution:Technische Universität Kaiserslautern
Granting Institution:Technische Universität Kaiserslautern
Acceptance Date of the Thesis:2018/09/11
Date of the Publication (Server):2018/11/21
Page Number:VII, 167
Faculties / Organisational entities:Kaiserslautern - Fachbereich Informatik
DDC-Cassification:0 Allgemeines, Informatik, Informationswissenschaft / 004 Informatik
Licence (German):Creative Commons 4.0 - Namensnennung, nicht kommerziell, keine Bearbeitung (CC BY-NC-ND 4.0)