Category Ranking is a variant of the multi-label text categorization problem, in which, rather than performing a (hard) assignment to a document dj of (zero, one, or more) categories from a predefined set C, we rank all categories in C according to their estimated 'degree of suitability' to dj. Category ranking has many applications, all pertaining to 'interactive' classification contexts in which the system, rather than taking a final categorization decision, is simply required to support a human expert who is in charge of taking this decision. Despite its high applicative potential, category ranking has not received much attention from the information retrieval and text categorization communities, and has mainly been tackled by standard text categorization methods, i.e. by training one binary classifier for each category and ranking the categories in terms of the confidence scores returned by the respective classifiers when asked to classify dj. In this paper we take a radically different stand to category ranking, i.e. one in which supervision is provided to the learner not in the standard form of labels attached to training documents, but in the form of preferences of type 'category c1 is to be preferred to category c2 for document dj'. We apply to this problem a recently proposed, very general model for preferential learning, and show, through experiments performed on the standard Reuters-21578 benchmark, that this outperforms support vector machines, the learning method which has up to now proved the best-performing one in text categorization comparative experiments.
Preference learning for category-ranking based interactive text categorization
Sebastiani F
2007
Abstract
Category Ranking is a variant of the multi-label text categorization problem, in which, rather than performing a (hard) assignment to a document dj of (zero, one, or more) categories from a predefined set C, we rank all categories in C according to their estimated 'degree of suitability' to dj. Category ranking has many applications, all pertaining to 'interactive' classification contexts in which the system, rather than taking a final categorization decision, is simply required to support a human expert who is in charge of taking this decision. Despite its high applicative potential, category ranking has not received much attention from the information retrieval and text categorization communities, and has mainly been tackled by standard text categorization methods, i.e. by training one binary classifier for each category and ranking the categories in terms of the confidence scores returned by the respective classifiers when asked to classify dj. In this paper we take a radically different stand to category ranking, i.e. one in which supervision is provided to the learner not in the standard form of labels attached to training documents, but in the form of preferences of type 'category c1 is to be preferred to category c2 for document dj'. We apply to this problem a recently proposed, very general model for preferential learning, and show, through experiments performed on the standard Reuters-21578 benchmark, that this outperforms support vector machines, the learning method which has up to now proved the best-performing one in text categorization comparative experiments.File | Dimensione | Formato | |
---|---|---|---|
prod_91676-doc_131550.pdf
accesso aperto
Descrizione: Preference learning for category-ranking based interactive text categorization
Tipologia:
Versione Editoriale (PDF)
Dimensione
109.64 kB
Formato
Adobe PDF
|
109.64 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.