We describe AdaBoost.MH KR, an improved boosting algorithm, and its application to text categorization. Boosting is a method for supervised learning which has successfully been applied to many different domains, and that has proven one of the best performers in text categorization exercises so far. Boosting is based on the idea of relying on the collective judgement of a committee of classifiers that are trained sequentially. In training the i-th classifier special emphasis is placed on the correct categorization of the training documents which have proven harder for the previously trained classifiers. AdaBoost.MH KR is based on the idea to build, at every iteration of the learning phase, not a single classifier but a sub-committee of the K classifiers which, at that iteration, look the most promising. We report the results of systematic experimentation of this method performed on the standard REUTERS-21578 benchmark. These experiments have shown that AdaBoost.MH KR is both more efficient to train and more effective than the original AdaBoost.MH KR algorithm.

An improved boosting algorithm and its application to text categorization

Sebastiani F;
2000

Abstract

We describe AdaBoost.MH KR, an improved boosting algorithm, and its application to text categorization. Boosting is a method for supervised learning which has successfully been applied to many different domains, and that has proven one of the best performers in text categorization exercises so far. Boosting is based on the idea of relying on the collective judgement of a committee of classifiers that are trained sequentially. In training the i-th classifier special emphasis is placed on the correct categorization of the training documents which have proven harder for the previously trained classifiers. AdaBoost.MH KR is based on the idea to build, at every iteration of the learning phase, not a single classifier but a sub-committee of the K classifiers which, at that iteration, look the most promising. We report the results of systematic experimentation of this method performed on the standard REUTERS-21578 benchmark. These experiments have shown that AdaBoost.MH KR is both more efficient to train and more effective than the original AdaBoost.MH KR algorithm.
2000
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Text categorization
Content analysis and indexing
Artificial intelligence
File in questo prodotto:
File Dimensione Formato  
prod_406603-doc_142261.pdf

solo utenti autorizzati

Descrizione: An improved boosting algorithm and its application to text categorization
Tipologia: Versione Editoriale (PDF)
Dimensione 207.35 kB
Formato Adobe PDF
207.35 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/365714
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 48
  • ???jsp.display-item.citation.isi??? ND
social impact