Focusing on only one type of structural component in the process of clustering XML documents may produce clusters with a certain extent of inner structural inhomogeneity, due either to uncaught differences in the overall logical structures of the available XML documents or to inappropriate choices of the targeted structural component. To overcome these limitations, two approaches to clustering XML documents by multiple heterogeneous structures are proposed. An approach looks at the simultaneous occurrences of such structures across the individual XML documents. The other approach instead combines multiple clusterings of the XML documents, separately performed with respect to the individual types of structures in isolation. A comparative evaluation over both real and synthetic XML data proved that the effectiveness of the devised approaches is at least on a par and even superior with respect to the effectiveness of state-of-the-art competitors. Additionally, the empirical evidence also reveals that the proposed approaches outperform such competitors in terms of time efficiency.

Structure-oriented techniques for XML document partitioning

Gianni Costa;Riccardo Ortale
2016

Abstract

Focusing on only one type of structural component in the process of clustering XML documents may produce clusters with a certain extent of inner structural inhomogeneity, due either to uncaught differences in the overall logical structures of the available XML documents or to inappropriate choices of the targeted structural component. To overcome these limitations, two approaches to clustering XML documents by multiple heterogeneous structures are proposed. An approach looks at the simultaneous occurrences of such structures across the individual XML documents. The other approach instead combines multiple clusterings of the XML documents, separately performed with respect to the individual types of structures in isolation. A comparative evaluation over both real and synthetic XML data proved that the effectiveness of the devised approaches is at least on a par and even superior with respect to the effectiveness of state-of-the-art competitors. Additionally, the empirical evidence also reveals that the proposed approaches outperform such competitors in terms of time efficiency.
2016
Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
978-3-319-14194-7
Data mining
Ensemble XML clustering
XML clustering
XML transactional representation
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/320285
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact