We propose an approach to clustering XML-based corpora of healthcare documents by their latent topic similarity. Our approach is a two-step process. Initially, the latent topic distributions of the input healthcare documents are inferred, by performing collapsed Gibbs sampling and parameter estimation under an XML topic model. Subsequently, the inferred distributions are grouped through established clustering techniques.

Topical Cluster Discovery in Semistructured Healthcare Data

Gianni Costa;Riccardo Ortale
2018

Abstract

We propose an approach to clustering XML-based corpora of healthcare documents by their latent topic similarity. Our approach is a two-step process. Initially, the latent topic distributions of the input healthcare documents are inferred, by performing collapsed Gibbs sampling and parameter estimation under an XML topic model. Subsequently, the inferred distributions are grouped through established clustering techniques.
2018
Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
Topical Clusters
Semistructured Healthcare Data Analysis
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/353495
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact