Category Ranking is a variant of the multi-label text categorization problem, in which, rather than performing a (hard) assignment to a document dj of (zero, one, or more) categories from a predefined set C, we rank all categories in C according to their estimated 'degree of suitability' to dj. Category ranking has many applications, all pertaining to 'interactive' classification contexts in which the system, rather than taking a final categorization decision, is simply required to support a human expert who is in charge of taking this decision. Despite its high applicative potential, category ranking has not received much attention from the information retrieval and text categorization communities, and has mainly been tackled by standard text categorization methods, i.e. by training one binary classifier for each category and ranking the categories in terms of the confidence scores returned by the respective classifiers when asked to classify dj. In this paper we take a radically different stand to category ranking, i.e. one in which supervision is provided to the learner not in the standard form of labels attached to training documents, but in the form of preferences of type 'category c1 is to be preferred to category c2 for document dj'. We apply to this problem a recently proposed, very general model for preferential learning, and show, through experiments performed on the standard Reuters-21578 benchmark, that this outperforms support vector machines, the learning method which has up to now proved the best-performing one in text categorization comparative experiments.

Preference learning for category-ranking based interactive text categorization

Sebastiani F
2007

Abstract

Category Ranking is a variant of the multi-label text categorization problem, in which, rather than performing a (hard) assignment to a document dj of (zero, one, or more) categories from a predefined set C, we rank all categories in C according to their estimated 'degree of suitability' to dj. Category ranking has many applications, all pertaining to 'interactive' classification contexts in which the system, rather than taking a final categorization decision, is simply required to support a human expert who is in charge of taking this decision. Despite its high applicative potential, category ranking has not received much attention from the information retrieval and text categorization communities, and has mainly been tackled by standard text categorization methods, i.e. by training one binary classifier for each category and ranking the categories in terms of the confidence scores returned by the respective classifiers when asked to classify dj. In this paper we take a radically different stand to category ranking, i.e. one in which supervision is provided to the learner not in the standard form of labels attached to training documents, but in the form of preferences of type 'category c1 is to be preferred to category c2 for document dj'. We apply to this problem a recently proposed, very general model for preferential learning, and show, through experiments performed on the standard Reuters-21578 benchmark, that this outperforms support vector machines, the learning method which has up to now proved the best-performing one in text categorization comparative experiments.
2007
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Preference learning
Text classification
Kernel machines
Supervised learning Analysis and Indexing
File in questo prodotto:
File Dimensione Formato  
prod_91676-doc_131550.pdf

accesso aperto

Descrizione: Preference learning for category-ranking based interactive text categorization
Tipologia: Versione Editoriale (PDF)
Dimensione 109.64 kB
Formato Adobe PDF
109.64 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/102635
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact