News documents published online represent an important source of information that can be used for event detection and tracking as well as for analyzing the temporal publishing relationships among different news streams. In this paper, we describe our research on detecting, tracking, and predicting events from multiple news streams. We also analyze the temporal publishing patterns of newswires on different platforms and their timeliness in reporting the events. First, we present an approach based on discrete dynamic topic modeling and Hidden Markov Model for event detection and tracking. Then, we predict the events that would persist in the next time slice, which can be important for forecasting facts that would be popular in the future. We leverage the detected events for clustering news documents according to the events they describe. This allows us to determine which newswires published news about an event and to analyze their temporal ordering in reporting events. Finally, we propose two scoring functions for ranking the newswires based on their timeliness. We tested our methodologies on different collections of news articles and tweets. Moreover, we built a collection of heterogeneous news documents with event-document labels which were manually assessed using crowdsourcing. Experimental results showed that, compared to the traditional dynamic topic model, our approach is able to timely detect emerging topics (events). Overall, we could register an event coverage of about 90% w.r.t. the pool of labeled events. The evolution of events is captured by event chains which are highly coherent (0.76) and informative (0.60) allowing to effectively reconstruct the stories. Furthermore, the event-based clustering of news documents has a good trade-off of precision and recall (F-score = 0.83) and the topic keywords provide a semantic description of the events represented by the clusters. Concerning our analysis on the temporal publishing relationships among news streams, we could observe interesting patterns on the usage of the different platforms, for example, some newswires still favor their own official websites, while others tend to publish more timely on Twitter.
Event mining and timeliness analysis from heterogeneous news streams
Mele I.;
2019
Abstract
News documents published online represent an important source of information that can be used for event detection and tracking as well as for analyzing the temporal publishing relationships among different news streams. In this paper, we describe our research on detecting, tracking, and predicting events from multiple news streams. We also analyze the temporal publishing patterns of newswires on different platforms and their timeliness in reporting the events. First, we present an approach based on discrete dynamic topic modeling and Hidden Markov Model for event detection and tracking. Then, we predict the events that would persist in the next time slice, which can be important for forecasting facts that would be popular in the future. We leverage the detected events for clustering news documents according to the events they describe. This allows us to determine which newswires published news about an event and to analyze their temporal ordering in reporting events. Finally, we propose two scoring functions for ranking the newswires based on their timeliness. We tested our methodologies on different collections of news articles and tweets. Moreover, we built a collection of heterogeneous news documents with event-document labels which were manually assessed using crowdsourcing. Experimental results showed that, compared to the traditional dynamic topic model, our approach is able to timely detect emerging topics (events). Overall, we could register an event coverage of about 90% w.r.t. the pool of labeled events. The evolution of events is captured by event chains which are highly coherent (0.76) and informative (0.60) allowing to effectively reconstruct the stories. Furthermore, the event-based clustering of news documents has a good trade-off of precision and recall (F-score = 0.83) and the topic keywords provide a semantic description of the events represented by the clusters. Concerning our analysis on the temporal publishing relationships among news streams, we could observe interesting patterns on the usage of the different platforms, for example, some newswires still favor their own official websites, while others tend to publish more timely on Twitter.File | Dimensione | Formato | |
---|---|---|---|
prod_415958-doc_146650.pdf
accesso aperto
Descrizione: Event mining and timeliness analysis from heterogeneous news streams
Tipologia:
Documento in Pre-print
Licenza:
Nessuna licenza dichiarata (non attribuibile a prodotti successivi al 2023)
Dimensione
603.37 kB
Formato
Adobe PDF
|
603.37 kB | Adobe PDF | Visualizza/Apri |
prod_415958-doc_164467.pdf
non disponibili
Descrizione: Event mining and timeliness analysis from heterogeneous news streams
Tipologia:
Versione Editoriale (PDF)
Licenza:
NON PUBBLICO - Accesso privato/ristretto
Dimensione
1.95 MB
Formato
Adobe PDF
|
1.95 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.