In this paper, we present a collection of news documents labeled at the level of crisp events. Compared to other publicly-available collections, our dataset is made of heterogeneous documents published by popular news channels on different platforms in the same temporal window and, therefore, dealing with roughly the same events and topics. The collection spans 4 months and comprises 147K news documents from 27 news streams, i.e., 9 different channels and 3 platforms: Twitter, RSS portals, and news websites. We also provide relevance labels of news documents for some selected events. These relevance judgments were collected using crowdsourcing. The collection can be useful to researchers investigating challenging newsmining tasks, such as event detection and tracking, multi-stream analysis, and temporal analysis of news publishing patterns.

A Multi-source collection of event-labeled news documents

Mele I;
2019

Abstract

In this paper, we present a collection of news documents labeled at the level of crisp events. Compared to other publicly-available collections, our dataset is made of heterogeneous documents published by popular news channels on different platforms in the same temporal window and, therefore, dealing with roughly the same events and topics. The collection spans 4 months and comprises 147K news documents from 27 news streams, i.e., 9 different channels and 3 platforms: Twitter, RSS portals, and news websites. We also provide relevance labels of news documents for some selected events. These relevance judgments were collected using crowdsourcing. The collection can be useful to researchers investigating challenging newsmining tasks, such as event detection and tracking, multi-stream analysis, and temporal analysis of news publishing patterns.
2019
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
test collections; news streams; event detection and analysis
File in questo prodotto:
File Dimensione Formato  
prod_415953-doc_146648.pdf

accesso aperto

Descrizione: A Multi-source collection of event-labeled news documents
Tipologia: Versione Editoriale (PDF)
Dimensione 451.03 kB
Formato Adobe PDF
451.03 kB Adobe PDF Visualizza/Apri
prod_415953-doc_164462.pdf

non disponibili

Descrizione: A Multi-source collection of event-labeled news documents
Tipologia: Versione Editoriale (PDF)
Dimensione 922.06 kB
Formato Adobe PDF
922.06 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/370505
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? ND
social impact