Survey coding is the task of assigning a symbolic code from a predefined set of such codes to the answer given in response to an open-ended question in a questionnaire (aka survey). We formulate the problem of automated survey coding as a text categorization problem, i.e. as the problem of learning, by means of supervised machine learning techniques, a model of the association between answers and codes from a training set of pre-coded answers, and applying the resulting model to the classi.cation of new answers. In this paper we experiment with two different learning techniques, one based on naÏve Bayesian classi.cation and the other one based on multiclass support vector machines, and test the resulting framework on a corpus of social surveys. The results we have obtained significantly outperform the results achieved by previous automated survey coding approaches.

Multiclass text categorization for automated survey coding

Sebastiani F.
2003

Abstract

Survey coding is the task of assigning a symbolic code from a predefined set of such codes to the answer given in response to an open-ended question in a questionnaire (aka survey). We formulate the problem of automated survey coding as a text categorization problem, i.e. as the problem of learning, by means of supervised machine learning techniques, a model of the association between answers and codes from a training set of pre-coded answers, and applying the resulting model to the classi.cation of new answers. In this paper we experiment with two different learning techniques, one based on naÏve Bayesian classi.cation and the other one based on multiclass support vector machines, and test the resulting framework on a corpus of social surveys. The results we have obtained significantly outperform the results achieved by previous automated survey coding approaches.
2003
Istituto di linguistica computazionale "Antonio Zampolli" - ILC
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Text categorization
Classifier Design and Evaluation
Learning
Information Search and Retrieval
Sociology
File in questo prodotto:
File Dimensione Formato  
prod_171489-doc_123629.pdf

solo utenti autorizzati

Descrizione: Multiclass text categorization for automated survey coding
Tipologia: Versione Editoriale (PDF)
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 112.35 kB
Formato Adobe PDF
112.35 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/147456
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 10
  • ???jsp.display-item.citation.isi??? ND
social impact