The automated categorisation (or classification) of lex.ts into topical categories has a long history, dating back at least to 1960. Until the late '80s, the dominant approach to the problem involved knowledge-engineering automatic categorisers, Le. manually bll~lding a set of rules encoding expert knowledge on how lO classify document". In the '90s, with the booming production and availability of on-line documents, automated text categorisatioI' has witnessed an increased and renewed interest. A newer paradigm based on ;machine learning has superseded the previous approach. Within this paradigm, a general inductive process automatically builds a classifier by "learning", from; a set of previously classified documents, the characteristics of one or more categories; the advantages are a very good effectiveness, a considerable savings in terms of expert manpower, and domain independence, In this tulorial we look at the main approaches that have been taken towards automatic text categorisation within the general machine learning paradigm. Issues of document indexing, classifier construction, and classifier evaluation, will be touched upon.

A tutorial on automated text categorisation

Sebastiani F
1999

Abstract

The automated categorisation (or classification) of lex.ts into topical categories has a long history, dating back at least to 1960. Until the late '80s, the dominant approach to the problem involved knowledge-engineering automatic categorisers, Le. manually bll~lding a set of rules encoding expert knowledge on how lO classify document". In the '90s, with the booming production and availability of on-line documents, automated text categorisatioI' has witnessed an increased and renewed interest. A newer paradigm based on ;machine learning has superseded the previous approach. Within this paradigm, a general inductive process automatically builds a classifier by "learning", from; a set of previously classified documents, the characteristics of one or more categories; the advantages are a very good effectiveness, a considerable savings in terms of expert manpower, and domain independence, In this tulorial we look at the main approaches that have been taken towards automatic text categorisation within the general machine learning paradigm. Issues of document indexing, classifier construction, and classifier evaluation, will be touched upon.
1999
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Automated text categorisation
File in questo prodotto:
File Dimensione Formato  
prod_407510-doc_142818.pdf

solo utenti autorizzati

Descrizione: A tutorial on automated text categorisation
Tipologia: Versione Editoriale (PDF)
Dimensione 258.58 kB
Formato Adobe PDF
258.58 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/391664
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact