With the recent explosion of social networks, there is a growing need for systems capable to extract useful information from this amount of data. Social networks generate a large amount of text content over time because of continuous interaction between people. Given the amount and cadence of the data generated by those platforms, classical text mining techniques are not suitable. "Events" can be deduced from aggregations of tweets in the stream. In this paper, we talk about detection, clustering and tracking of events in tweets stream. We will present an online framework that considers a tweet post as an electric charge and a new event as an electric field. A new event on Twitter is created when several tweets deal with the same topic. This event will disappear over time when there are no more tweet debating it. A corpus of 400 million tweets has been created and analyzed using our algorithm. The results show the effectiveness of the technique, both in terms of time and memory performance.
Detection, Clustering and Tracking of Life Cycle Events on Twitter Using Electric Fields Analogy
Terrana Diego;Pilato Giovanni
2013
Abstract
With the recent explosion of social networks, there is a growing need for systems capable to extract useful information from this amount of data. Social networks generate a large amount of text content over time because of continuous interaction between people. Given the amount and cadence of the data generated by those platforms, classical text mining techniques are not suitable. "Events" can be deduced from aggregations of tweets in the stream. In this paper, we talk about detection, clustering and tracking of events in tweets stream. We will present an online framework that considers a tweet post as an electric charge and a new event as an electric field. A new event on Twitter is created when several tweets deal with the same topic. This event will disappear over time when there are no more tweet debating it. A corpus of 400 million tweets has been created and analyzed using our algorithm. The results show the effectiveness of the technique, both in terms of time and memory performance.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.