CNR Institutional Research Information System

We propose a novel methodology for clustering XML documents on the basis of their structural similarities. The basic idea is to equip each cluster with an XML cluster representative, i.e. an XML document subsuming the most typical structural specifics of a set of XML documents. Clustering is essentially accomplished by comparing cluster representatives, and updating the representatives as soon as new clusters are detected. We propose an algorithm for computing an XML representative through three phases. Suitable techniques for identifying significant node matchings and for reliably merging and pruning XML trees are investigated. Also, experimental evaluation performed on both synthetic and real data shows the effectiveness of our approach.

A Tree-based Approach to Clustering XML Documents by Structure

Gianni Costa;Gianni Manco;Riccardo Ortale;Andrea Tagarelli

2004

Abstract

We propose a novel methodology for clustering XML documents on the basis of their structural similarities. The basic idea is to equip each cluster with an XML cluster representative, i.e. an XML document subsuming the most typical structural specifics of a set of XML documents. Clustering is essentially accomplished by comparing cluster representatives, and updating the representatives as soon as new clusters are detected. We propose an algorithm for computing an XML representative through three phases. Suitable techniques for identifying significant node matchings and for reliably merging and pruning XML trees are investigated. Also, experimental evaluation performed on both synthetic and real data shows the effectiveness of our approach.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno

2004

Appare nelle tipologie:

04.01 Contributo in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/14609

Citazioni

ND

72

40

social impact