News documents published online represent an important source of information that can be used for event detection and tracking as well as for analyzing the temporal publishing relationships among different news streams. In this paper, we describe our research on detecting, tracking, and predicting events from multiple news streams. We also analyze the temporal publishing patterns of newswires on different platforms and their timeliness in reporting the events. First, we present an approach based on discrete dynamic topic modeling and Hidden Markov Model for event detection and tracking. Then, we predict the events that would persist in the next time slice, which can be important for forecasting facts that would be popular in the future. We leverage the detected events for clustering news documents according to the events they describe. This allows us to determine which newswires published news about an event and to analyze their temporal ordering in reporting events. Finally, we propose two scoring functions for ranking the newswires based on their timeliness. We tested our methodologies on different collections of news articles and tweets. Moreover, we built a collection of heterogeneous news documents with event-document labels which were manually assessed using crowdsourcing. Experimental results showed that, compared to the traditional dynamic topic model, our approach is able to timely detect emerging topics (events). Overall, we could register an event coverage of about 90% w.r.t. the pool of labeled events. The evolution of events is captured by event chains which are highly coherent (0.76) and informative (0.60) allowing to effectively reconstruct the stories. Furthermore, the event-based clustering of news documents has a good trade-off of precision and recall (F-score = 0.83) and the topic keywords provide a semantic description of the events represented by the clusters. Concerning our analysis on the temporal publishing relationships among news streams, we could observe interesting patterns on the usage of the different platforms, for example, some newswires still favor their own official websites, while others tend to publish more timely on Twitter.

Event mining and timeliness analysis from heterogeneous news streams

Mele I;
2019

Abstract

News documents published online represent an important source of information that can be used for event detection and tracking as well as for analyzing the temporal publishing relationships among different news streams. In this paper, we describe our research on detecting, tracking, and predicting events from multiple news streams. We also analyze the temporal publishing patterns of newswires on different platforms and their timeliness in reporting the events. First, we present an approach based on discrete dynamic topic modeling and Hidden Markov Model for event detection and tracking. Then, we predict the events that would persist in the next time slice, which can be important for forecasting facts that would be popular in the future. We leverage the detected events for clustering news documents according to the events they describe. This allows us to determine which newswires published news about an event and to analyze their temporal ordering in reporting events. Finally, we propose two scoring functions for ranking the newswires based on their timeliness. We tested our methodologies on different collections of news articles and tweets. Moreover, we built a collection of heterogeneous news documents with event-document labels which were manually assessed using crowdsourcing. Experimental results showed that, compared to the traditional dynamic topic model, our approach is able to timely detect emerging topics (events). Overall, we could register an event coverage of about 90% w.r.t. the pool of labeled events. The evolution of events is captured by event chains which are highly coherent (0.76) and informative (0.60) allowing to effectively reconstruct the stories. Furthermore, the event-based clustering of news documents has a good trade-off of precision and recall (F-score = 0.83) and the topic keywords provide a semantic description of the events represented by the clusters. Concerning our analysis on the temporal publishing relationships among news streams, we could observe interesting patterns on the usage of the different platforms, for example, some newswires still favor their own official websites, while others tend to publish more timely on Twitter.
2019
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
News Stream Mining
Event detection and tracking
Temporal Analysis
File in questo prodotto:
File Dimensione Formato  
prod_415958-doc_146650.pdf

accesso aperto

Descrizione: Event mining and timeliness analysis from heterogeneous news streams
Tipologia: Versione Editoriale (PDF)
Dimensione 603.37 kB
Formato Adobe PDF
603.37 kB Adobe PDF Visualizza/Apri
prod_415958-doc_164467.pdf

non disponibili

Descrizione: Event mining and timeliness analysis from heterogeneous news streams
Tipologia: Versione Editoriale (PDF)
Dimensione 1.95 MB
Formato Adobe PDF
1.95 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/370510
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 47
  • ???jsp.display-item.citation.isi??? ND
social impact