In this paper, we present a collection of news documents labeled at the level of crisp events. Compared to other publicly-available collections, our dataset is made of heterogeneous documents published by popular news channels on different platforms in the same temporal window and, therefore, dealing with roughly the same events and topics. The collection spans 4 months and comprises 147K news documents from 27 news streams, i.e., 9 different channels and 3 platforms: Twitter, RSS portals, and news websites. We also provide relevance labels of news documents for some selected events. These relevance judgments were collected using crowdsourcing. The collection can be useful to researchers investigating challenging newsmining tasks, such as event detection and tracking, multi-stream analysis, and temporal analysis of news publishing patterns.
A Multi-source collection of event-labeled news documents
Mele I;
2019
Abstract
In this paper, we present a collection of news documents labeled at the level of crisp events. Compared to other publicly-available collections, our dataset is made of heterogeneous documents published by popular news channels on different platforms in the same temporal window and, therefore, dealing with roughly the same events and topics. The collection spans 4 months and comprises 147K news documents from 27 news streams, i.e., 9 different channels and 3 platforms: Twitter, RSS portals, and news websites. We also provide relevance labels of news documents for some selected events. These relevance judgments were collected using crowdsourcing. The collection can be useful to researchers investigating challenging newsmining tasks, such as event detection and tracking, multi-stream analysis, and temporal analysis of news publishing patterns.File | Dimensione | Formato | |
---|---|---|---|
prod_415953-doc_146648.pdf
accesso aperto
Descrizione: A Multi-source collection of event-labeled news documents
Tipologia:
Documento in Post-print
Licenza:
Nessuna licenza dichiarata (non attribuibile a prodotti successivi al 2023)
Dimensione
451.03 kB
Formato
Adobe PDF
|
451.03 kB | Adobe PDF | Visualizza/Apri |
prod_415953-doc_164462.pdf
non disponibili
Descrizione: A Multi-source collection of event-labeled news documents
Tipologia:
Versione Editoriale (PDF)
Licenza:
NON PUBBLICO - Accesso privato/ristretto
Dimensione
922.06 kB
Formato
Adobe PDF
|
922.06 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.