This study explores XML partitioning through unsupervised topic modeling. We propose a novel mixed-membership Bayesian generative model for identifying latent topics within XML corpora. We derive approximate posterior inference and parameter estimation for the proposed XML topic model, implemented using a Gibbs sampling algorithm. This approach allows us to infer the topic distributions of the input XML documents, which are subsequently utilized to partition the entire XML corpus based on latent-topic similarity. Experiments conducted on real-world XML corpora demonstrate superior effectiveness compared to several methods.

A Bayesian Generative Approach to Topic-Based Clustering in XML Corpora

Costa G.
Co-primo
;
Ortale R.
Co-primo
2025

Abstract

This study explores XML partitioning through unsupervised topic modeling. We propose a novel mixed-membership Bayesian generative model for identifying latent topics within XML corpora. We derive approximate posterior inference and parameter estimation for the proposed XML topic model, implemented using a Gibbs sampling algorithm. This approach allows us to infer the topic distributions of the input XML documents, which are subsequently utilized to partition the entire XML corpus based on latent-topic similarity. Experiments conducted on real-world XML corpora demonstrate superior effectiveness compared to several methods.
2025
Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
Bayesian Probabilistic XML Analysis
Latent Topic Modeling
XML Clustering
File in questo prodotto:
File Dimensione Formato  
3709026.3709110.pdf

solo utenti autorizzati

Descrizione: Articolo pubblicato in formato PDF
Tipologia: Versione Editoriale (PDF)
Licenza: Altro tipo di licenza
Dimensione 643.5 kB
Formato Adobe PDF
643.5 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/559856
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact