Text categorization (also known as text classi.cation, or topic spotting) is the task of automatically sorting a set of documents into categories from a predefined set. This task has several applications, including automated indexing of scienti.c articles according to prede.ned thesauri of technical terms, filing patents into patent directories, selective dissemination of information to information consumers, automated population of hierarchical catalogues of Web resources, spam filtering, identification of document genre, authorship attribution, automated survey coding, and even automated essay grading. Automated text classi.cation is attractive because it frees organizations from the need of manually organizing document bases, which can be too expensive, or simply infeasible given the time constraints of the application or the number of documents involved. The accuracy of modern text classification systems rivals that of trained human professionals, thanks to a combination of information retrieval (IR) technology and machine learning (ML) technology. This paper will outline the fundamental traits of the technologies involved, of the applications that can feasibly be tackled through text classi.cation, and of the tools and resources that are available to the researcher and developer wishing to take up these technologies for deploying real-world applications.

Research in automated classification of texts: trends and perspectives

Sebastiani F
2003

Abstract

Text categorization (also known as text classi.cation, or topic spotting) is the task of automatically sorting a set of documents into categories from a predefined set. This task has several applications, including automated indexing of scienti.c articles according to prede.ned thesauri of technical terms, filing patents into patent directories, selective dissemination of information to information consumers, automated population of hierarchical catalogues of Web resources, spam filtering, identification of document genre, authorship attribution, automated survey coding, and even automated essay grading. Automated text classi.cation is attractive because it frees organizations from the need of manually organizing document bases, which can be too expensive, or simply infeasible given the time constraints of the application or the number of documents involved. The accuracy of modern text classification systems rivals that of trained human professionals, thanks to a combination of information retrieval (IR) technology and machine learning (ML) technology. This paper will outline the fundamental traits of the technologies involved, of the applications that can feasibly be tackled through text classi.cation, and of the tools and resources that are available to the researcher and developer wishing to take up these technologies for deploying real-world applications.
2003
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Classification texts
Classifier Design and Evaluation
Learning
Information Search and Retrieval
File in questo prodotto:
File Dimensione Formato  
prod_90975-doc_123687.pdf

solo utenti autorizzati

Descrizione: Research in automated classification of texts
Tipologia: Versione Editoriale (PDF)
Dimensione 177.77 kB
Formato Adobe PDF
177.77 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/101793
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact