Microblogs have become an important origin of information regarding events happening in a location during a time period. Analyzing and clustering these streams of short textual messages is an important research activity which is attracting the interest of both public and private organizations, since the extracted knowledge can be exploited to enhance the comprehension of people behavior and the onset of emergency situations. Clustering these streams requires efficient algorithms capable of analyzing this continuos deluge of data. The paper proposes an online algorithm that incrementally groups tweet streams into clusters. The approach summarizes the examined tweets into the cluster centroids generated so far. The assignment of a tweet to a centroid uses a similarity measure that takes into account both the cluster age and the terms occurring in the tweet. Experiments on messages posted by users in the Manhattan area show that the method is able to extract events effectively taking place in the examined period.

Online Clustering for Topic Detection in Social Data Streams

Carmela Comito;Clara Pizzuti;Nicola Procopio
2016

Abstract

Microblogs have become an important origin of information regarding events happening in a location during a time period. Analyzing and clustering these streams of short textual messages is an important research activity which is attracting the interest of both public and private organizations, since the extracted knowledge can be exploited to enhance the comprehension of people behavior and the onset of emergency situations. Clustering these streams requires efficient algorithms capable of analyzing this continuos deluge of data. The paper proposes an online algorithm that incrementally groups tweet streams into clusters. The approach summarizes the examined tweets into the cluster centroids generated so far. The assignment of a tweet to a centroid uses a similarity measure that takes into account both the cluster age and the terms occurring in the tweet. Experiments on messages posted by users in the Manhattan area show that the method is able to extract events effectively taking place in the examined period.
2016
Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
Twitter
online detection
clustering
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/321531
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact