This short note describes the main characteristics of WebDocs, a huge real-life transactional dataset we made publicly available to the Data Mining community through the FIMI repository. We built WebDocs from a spidered collection of web html documents. The whole collection contains about 1.7 millions documents, mainly written in English, and its size is about 5GB.

WebDocs: a real-life huge transactional dataset

Lucchese C;Orlando S;Perego R;Silvestri F
2004

Abstract

This short note describes the main characteristics of WebDocs, a huge real-life transactional dataset we made publicly available to the Data Mining community through the FIMI repository. We built WebDocs from a spidered collection of web html documents. The whole collection contains about 1.7 millions documents, mainly written in English, and its size is about 5GB.
2004
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
0-7695-2142-8
Frequent itemsets mining datasets
File in questo prodotto:
File Dimensione Formato  
prod_91780-doc_125585.pdf

solo utenti autorizzati

Descrizione: WebDocs: a real-life huge transactional dataset
Tipologia: Versione Editoriale (PDF)
Dimensione 858.22 kB
Formato Adobe PDF
858.22 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/58442
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact