We propose an approach to clustering XML-based corpora of healthcare documents by their latent topic similarity. Our approach is a two-step process. Initially, the latent topic distributions of the input healthcare documents are inferred, by performing collapsed Gibbs sampling and parameter estimation under an XML topic model. Subsequently, the inferred distributions are grouped through established clustering techniques.
Topical Cluster Discovery in Semistructured Healthcare Data
Gianni Costa;Riccardo Ortale
2018
Abstract
We propose an approach to clustering XML-based corpora of healthcare documents by their latent topic similarity. Our approach is a two-step process. Initially, the latent topic distributions of the input healthcare documents are inferred, by performing collapsed Gibbs sampling and parameter estimation under an XML topic model. Subsequently, the inferred distributions are grouped through established clustering techniques.File in questo prodotto:
Non ci sono file associati a questo prodotto.
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.