Summarizing XML Documents: Contributions, Empirical Studies, and Challenges

  • We tackle the problem of obtaining statistics on content and structure of XML documents by using summaries which may provide cardinality estimations for XML query expressions. Our focus is a data-centric processing scenario in which we use a query engine to process such query expressions. We provide three new summary structures called LESS (Leaf-Element-in-Subtree), LWES (Level-Wide Element Summarization), and EXsum (Element-centered XML Summarization) which are targeted to base an estimation process in an XML query optimizer. Each of these collects structural statistical information of XML documents, and the latter (EXsum) gathers, in addition, statistics on document content. Estimation procedures and/or heuristics for specic types of query expressions of each proposed approach are developed. We have incorporated and implemented our proposals in XTC, a native XML database management system (XDBMS). With this common implementation base, we present an empirical and comparative study in which our proposals are stressed against others published in the literature, which are also incorporated into the XTC. Furthermore, an analysis is made based on criteria pertinent to a query optimizer process.
  • Zusammenfassung des XML Dokumenten: Beiträge, empirische Untersuchungen und Herausforderungen

Export metadata

  • Export Bibtex
  • Export RIS

Additional Services

Share in Twitter Search Google Scholar
Metadaten
Author:José de Aguiar Moraes Filho
URN (permanent link):urn:nbn:de:hbz:386-kluedo-24832
Advisor:Theo Härder
Document Type:Doctoral Thesis
Language of publication:English
Year of Completion:2010
Year of Publication:2010
Publishing Institute:Technische Universität Kaiserslautern
Granting Institute:Technische Universität Kaiserslautern
Acceptance Date of the Thesis:2010/03/30
Tag:XML query estimation; XML summary; content-and-structure summary; statistics; structural summary
Faculties / Organisational entities:Fachbereich Informatik
DDC-Cassification:004 Datenverarbeitung; Informatik

$Rev: 12793 $