Survey coding is the task of assigning a symbolic code from a predefined set of such codes to the answer given in response to an open-ended question in a questionnaire (aka survey). We formulate the problem of automated survey coding as a text categorization problem, i.e. as the problem of learning, by means of supervised machine learning techniques, a model of the association between answers and codes from a training set of pre-coded answers, and applying the resulting model to the classi.cation of new answers. In this paper we experiment with two different learning techniques, one based on naÏve Bayesian classi.cation and the other one based on multiclass support vector machines, and test the resulting framework on a corpus of social surveys. The results we have obtained significantly outperform the results achieved by previous automated survey coding approaches.

Multiclass text categorization for automated survey coding

Sebastiani F.
2003

Abstract

Survey coding is the task of assigning a symbolic code from a predefined set of such codes to the answer given in response to an open-ended question in a questionnaire (aka survey). We formulate the problem of automated survey coding as a text categorization problem, i.e. as the problem of learning, by means of supervised machine learning techniques, a model of the association between answers and codes from a training set of pre-coded answers, and applying the resulting model to the classi.cation of new answers. In this paper we experiment with two different learning techniques, one based on naÏve Bayesian classi.cation and the other one based on multiclass support vector machines, and test the resulting framework on a corpus of social surveys. The results we have obtained significantly outperform the results achieved by previous automated survey coding approaches.
Campo DC Valore Lingua
dc.authority.orgunit Istituto di linguistica computazionale "Antonio Zampolli" - ILC en
dc.authority.orgunit Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI en
dc.authority.people Giorgetti D. en
dc.authority.people Sebastiani F. en
dc.collection.id.s 71c7200a-7c5f-4e83-8d57-d3d2ba88f40d *
dc.collection.name 04.01 Contributo in Atti di convegno *
dc.contributor.appartenenza Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI *
dc.contributor.appartenenza.mi 973 *
dc.date.accessioned 2024/02/18 14:16:59 -
dc.date.available 2024/02/18 14:16:59 -
dc.date.issued 2003 -
dc.description.abstracteng Survey coding is the task of assigning a symbolic code from a predefined set of such codes to the answer given in response to an open-ended question in a questionnaire (aka survey). We formulate the problem of automated survey coding as a text categorization problem, i.e. as the problem of learning, by means of supervised machine learning techniques, a model of the association between answers and codes from a training set of pre-coded answers, and applying the resulting model to the classi.cation of new answers. In this paper we experiment with two different learning techniques, one based on naÏve Bayesian classi.cation and the other one based on multiclass support vector machines, and test the resulting framework on a corpus of social surveys. The results we have obtained significantly outperform the results achieved by previous automated survey coding approaches. -
dc.description.affiliations CNR-ILC, Pisa, Italy; CNR-ISTI, Pisa, Italy -
dc.description.allpeople Giorgetti, D.; Sebastiani, F. -
dc.description.allpeopleoriginal Giorgetti D.; Sebastiani F. en
dc.description.fulltext restricted en
dc.description.numberofauthors 2 -
dc.identifier.scopus 2-s2.0-0037661005 en
dc.identifier.uri https://hdl.handle.net/20.500.14243/147456 -
dc.language.iso eng en
dc.publisher.country USA en
dc.publisher.name ACM Press en
dc.publisher.place New York en
dc.relation.conferencedate 9-12 March 2003 en
dc.relation.conferencename SAC-03 - 18th ACM Symposium on Applied Computing en
dc.relation.conferenceplace Melbourne en
dc.relation.firstpage 798 en
dc.relation.ispartofbook na en
dc.relation.lastpage 802 en
dc.relation.medium ELETTRONICO en
dc.relation.numberofpages 5 en
dc.subject.keywordseng Text categorization -
dc.subject.keywordseng Classifier Design and Evaluation -
dc.subject.keywordseng Learning -
dc.subject.keywordseng Information Search and Retrieval -
dc.subject.keywordseng Sociology -
dc.subject.singlekeyword Text categorization *
dc.subject.singlekeyword Classifier Design and Evaluation *
dc.subject.singlekeyword Learning *
dc.subject.singlekeyword Information Search and Retrieval *
dc.subject.singlekeyword Sociology *
dc.title Multiclass text categorization for automated survey coding en
dc.type.driver info:eu-repo/semantics/conferenceObject -
dc.type.full 04 Contributo in convegno::04.01 Contributo in Atti di convegno it
dc.type.miur 273 -
dc.type.referee Sì, ma tipo non specificato en
dc.ugov.descaux1 171489 -
iris.mediafilter.data 2025/04/18 03:04:12 *
iris.orcid.lastModifiedDate 2024/09/09 18:31:58 *
iris.orcid.lastModifiedMillisecond 1725899518848 *
iris.scopus.extIssued 2003 -
iris.scopus.extTitle Multiclass text categorization for automated survey coding -
iris.sitodocente.maxattempts 1 -
scopus.category 1712 *
scopus.contributor.affiliation Consiglio Nazionale delle Ricerche -
scopus.contributor.affiliation Consiglio Nazionale delle Ricerche -
scopus.contributor.afid 60008941 -
scopus.contributor.afid 60021199 -
scopus.contributor.auid 7801379715 -
scopus.contributor.auid 7004170314 -
scopus.contributor.country Italy -
scopus.contributor.country Italy -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.name Daniela -
scopus.contributor.name Fabrizio -
scopus.contributor.subaffiliation Ist. di Linguistica Computazionale; -
scopus.contributor.subaffiliation Ist. Sci./Tecnologie dell'I.; -
scopus.contributor.surname Giorgetti -
scopus.contributor.surname Sebastiani -
scopus.date.issued 2003 *
scopus.description.abstracteng Survey coding is the task of assigning a symbolic code from a predefined set of such codes to the answer given in response to an open-ended question in a questionnaire (aka survey). We formulate the problem of automated survey coding as a text categorization problem, i.e. as the problem of learning, by means of supervised machine learning techniques, a model of the association between answers and codes from a training set of pre-coded answers, and applying the resulting model to the classification of new answers. In this paper we experiment with two different learning techniques, one based on naïve Bayesian classification and the other one based on multiclass support vector machines, and test the resulting framework on a corpus of social surveys. The results we have obtained significantly outperform the results achieved by previous automated survey coding approaches. *
scopus.description.allpeopleoriginal Giorgetti D.; Sebastiani F. *
scopus.differences scopus.relation.conferencename *
scopus.differences scopus.publisher.name *
scopus.differences scopus.subject.keywords *
scopus.differences scopus.relation.conferencedate *
scopus.differences scopus.identifier.doi *
scopus.differences scopus.description.abstracteng *
scopus.differences scopus.relation.conferenceplace *
scopus.document.type cp *
scopus.document.types cp *
scopus.identifier.doi 10.1145/952686.952691 *
scopus.identifier.pui 36816908 *
scopus.identifier.scopus 2-s2.0-0037661005 *
scopus.journal.sourceid 89358 *
scopus.language.iso eng *
scopus.publisher.name Association for Computing Machinery (ACM) *
scopus.relation.conferencedate 2003 *
scopus.relation.conferencename Proceedings of the 2003 ACM Symposium on Applied Computing *
scopus.relation.conferenceplace Melbourne, FL, usa *
scopus.relation.firstpage 798 *
scopus.relation.lastpage 802 *
scopus.subject.keywords Multiclass text categorization; Open-ended survey coding; *
scopus.title Multiclass text categorization for automated survey coding *
scopus.titleeng Multiclass text categorization for automated survey coding *
Appare nelle tipologie: 04.01 Contributo in Atti di convegno
File in questo prodotto:
File Dimensione Formato  
prod_171489-doc_123629.pdf

solo utenti autorizzati

Descrizione: Multiclass text categorization for automated survey coding
Tipologia: Versione Editoriale (PDF)
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 112.35 kB
Formato Adobe PDF
112.35 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/147456
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 10
  • ???jsp.display-item.citation.isi??? ND
social impact