Text Categorization (TC) is the discipline concerned with the construction of automatic text classifiers, i.e. programs capable of assigning to a document one or more among a set of predefined categories based on the content of the document. Building these classifiers is itself done automatically, by means of a general inductive process that learns the characteristics of the categories from a set of preclassified documents. In this paper we discuss a class of applications, automatic indexing with controlled vocabularies, that is of direct concern to organizing digital libraries. We exemplify this class of applications by discussing an ongoing project aimed at classifying scientific papers about computer science with respect to the ACM Classification Scheme.
Organizing digital libraries by automated text categorization
Sebastiani F
2002
Abstract
Text Categorization (TC) is the discipline concerned with the construction of automatic text classifiers, i.e. programs capable of assigning to a document one or more among a set of predefined categories based on the content of the document. Building these classifiers is itself done automatically, by means of a general inductive process that learns the characteristics of the categories from a set of preclassified documents. In this paper we discuss a class of applications, automatic indexing with controlled vocabularies, that is of direct concern to organizing digital libraries. We exemplify this class of applications by discussing an ongoing project aimed at classifying scientific papers about computer science with respect to the ACM Classification Scheme.| File | Dimensione | Formato | |
|---|---|---|---|
|
prod_160601-doc_122857.pdf
accesso aperto
Descrizione: Organizing digital libraries by automated text categorization
Dimensione
187.46 kB
Formato
Adobe PDF
|
187.46 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


