Text categorization (also known as text classi.cation, or topic spotting) is the task of automatically sorting a set of documents into categories from a predefined set. This task has several applications, including automated indexing of scienti.c articles according to prede.ned thesauri of technical terms, filing patents into patent directories, selective dissemination of information to information consumers, automated population of hierarchical catalogues of Web resources, spam filtering, identification of document genre, authorship attribution, automated survey coding, and even automated essay grading. Automated text classi.cation is attractive because it frees organizations from the need of manually organizing document bases, which can be too expensive, or simply infeasible given the time constraints of the application or the number of documents involved. The accuracy of modern text classification systems rivals that of trained human professionals, thanks to a combination of information retrieval (IR) technology and machine learning (ML) technology. This paper will outline the fundamental traits of the technologies involved, of the applications that can feasibly be tackled through text classi.cation, and of the tools and resources that are available to the researcher and developer wishing to take up these technologies for deploying real-world applications.
Research in automated classification of texts: trends and perspectives
Sebastiani F
2003
Abstract
Text categorization (also known as text classi.cation, or topic spotting) is the task of automatically sorting a set of documents into categories from a predefined set. This task has several applications, including automated indexing of scienti.c articles according to prede.ned thesauri of technical terms, filing patents into patent directories, selective dissemination of information to information consumers, automated population of hierarchical catalogues of Web resources, spam filtering, identification of document genre, authorship attribution, automated survey coding, and even automated essay grading. Automated text classi.cation is attractive because it frees organizations from the need of manually organizing document bases, which can be too expensive, or simply infeasible given the time constraints of the application or the number of documents involved. The accuracy of modern text classification systems rivals that of trained human professionals, thanks to a combination of information retrieval (IR) technology and machine learning (ML) technology. This paper will outline the fundamental traits of the technologies involved, of the applications that can feasibly be tackled through text classi.cation, and of the tools and resources that are available to the researcher and developer wishing to take up these technologies for deploying real-world applications.| File | Dimensione | Formato | |
|---|---|---|---|
|
prod_90975-doc_123687.pdf
solo utenti autorizzati
Descrizione: Research in automated classification of texts
Tipologia:
Versione Editoriale (PDF)
Dimensione
177.77 kB
Formato
Adobe PDF
|
177.77 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


