This study explores XML partitioning through unsupervised topic modeling. We propose a novel mixed-membership Bayesian generative model for identifying latent topics within XML corpora. We derive approximate posterior inference and parameter estimation for the proposed XML topic model, implemented using a Gibbs sampling algorithm. This approach allows us to infer the topic distributions of the input XML documents, which are subsequently utilized to partition the entire XML corpus based on latent-topic similarity. Experiments conducted on real-world XML corpora demonstrate superior effectiveness compared to several methods.
A Bayesian Generative Approach to Topic-Based Clustering in XML Corpora
Costa G.
Co-primo
;Ortale R.Co-primo
2025
Abstract
This study explores XML partitioning through unsupervised topic modeling. We propose a novel mixed-membership Bayesian generative model for identifying latent topics within XML corpora. We derive approximate posterior inference and parameter estimation for the proposed XML topic model, implemented using a Gibbs sampling algorithm. This approach allows us to infer the topic distributions of the input XML documents, which are subsequently utilized to partition the entire XML corpus based on latent-topic similarity. Experiments conducted on real-world XML corpora demonstrate superior effectiveness compared to several methods.| File | Dimensione | Formato | |
|---|---|---|---|
|
3709026.3709110.pdf
solo utenti autorizzati
Descrizione: Articolo pubblicato in formato PDF
Tipologia:
Versione Editoriale (PDF)
Licenza:
Altro tipo di licenza
Dimensione
643.5 kB
Formato
Adobe PDF
|
643.5 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


