This short note describes the main characteristics of WebDocs, a huge real-life transactional dataset we made publicly available to the Data Mining community through the FIMI repository. We built WebDocs from a spidered collection of web html documents. The whole collection contains about 1.7 millions documents, mainly written in English, and its size is about 5GB.

WebDocs: a real-life huge transactional dataset

Lucchese C;Orlando S;Perego R;Silvestri F
2004

Abstract

This short note describes the main characteristics of WebDocs, a huge real-life transactional dataset we made publicly available to the Data Mining community through the FIMI repository. We built WebDocs from a spidered collection of web html documents. The whole collection contains about 1.7 millions documents, mainly written in English, and its size is about 5GB.
2004
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Inglese
ICDM Workshop on Frequent Itemset Mining Implementations
2
2
1
0-7695-2142-8
http://ftp.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-126/
CEUR-WS.org
Aachen
GERMANIA
Sì, ma tipo non specificato
1 November 2004
Brighton, UK
Frequent itemsets mining datasets
4
restricted
Lucchese, C; Orlando, S; Perego, R; Silvestri, F
273
info:eu-repo/semantics/conferenceObject
04 Contributo in convegno::04.01 Contributo in Atti di convegno
File in questo prodotto:
File Dimensione Formato  
prod_91780-doc_125585.pdf

solo utenti autorizzati

Descrizione: WebDocs: a real-life huge transactional dataset
Tipologia: Versione Editoriale (PDF)
Dimensione 858.22 kB
Formato Adobe PDF
858.22 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/58442
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact