Survey coding is the task of assigning a symbolic code from a predefined set of such codes to the answer given in response to an open-ended question in a questionnaire (aka survey). This task is usually carried out to group respondents according to a predefined scheme based on their answers. Survey coding has several applications, especially in the social sciences, ranging from the simple classification of respondents to the extraction of statistics on political opinions, health and lifestyle habits, customer satisfaction, brand fidelity, and patient satisfaction. Survey coding is a difficult task, because the code that should be attributed to a respondent based on the answer she has given is a matter of subjective judgment, and thus requires expertise. It is thus unsurprising that this task has traditionally been performed manually, by trained coders. Some attempts have been made at automating this task, most of them based on detecting the similarity between the answer and textual descriptions of the meanings of the candidate codes. We take a radically new stand, and formulate the problem of automated survey coding as a text categorization problem, that is, as the problem of learning, by means of supervised machine learning techniques, a model of the association between answers and codes from a training set of precoded answers, and applying the resulting model to the classification of new answers. In this article we experiment with two different learning techniques: one based on naive Bayesian classification, and the other one based on multiclass support vector machines, and test the resulting framework on a corpus of social surveys. The results we have obtained significantly outperform the results achieved by previous automated survey coding approaches.

Automating survey coding by multiclass text categorization techniques

Sebastiani F
2003

Abstract

Survey coding is the task of assigning a symbolic code from a predefined set of such codes to the answer given in response to an open-ended question in a questionnaire (aka survey). This task is usually carried out to group respondents according to a predefined scheme based on their answers. Survey coding has several applications, especially in the social sciences, ranging from the simple classification of respondents to the extraction of statistics on political opinions, health and lifestyle habits, customer satisfaction, brand fidelity, and patient satisfaction. Survey coding is a difficult task, because the code that should be attributed to a respondent based on the answer she has given is a matter of subjective judgment, and thus requires expertise. It is thus unsurprising that this task has traditionally been performed manually, by trained coders. Some attempts have been made at automating this task, most of them based on detecting the similarity between the answer and textual descriptions of the meanings of the candidate codes. We take a radically new stand, and formulate the problem of automated survey coding as a text categorization problem, that is, as the problem of learning, by means of supervised machine learning techniques, a model of the association between answers and codes from a training set of precoded answers, and applying the resulting model to the classification of new answers. In this article we experiment with two different learning techniques: one based on naive Bayesian classification, and the other one based on multiclass support vector machines, and test the resulting framework on a corpus of social surveys. The results we have obtained significantly outperform the results achieved by previous automated survey coding approaches.
Campo DC Valore Lingua
dc.authority.ancejournal JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY -
dc.authority.orgunit Istituto di linguistica computazionale "Antonio Zampolli" - ILC -
dc.authority.orgunit Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI -
dc.authority.people Giorgetti D it
dc.authority.people Sebastiani F it
dc.collection.id.s b3f88f24-048a-4e43-8ab1-6697b90e068e *
dc.collection.name 01.01 Articolo in rivista *
dc.contributor.appartenenza Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI *
dc.contributor.appartenenza.mi 973 *
dc.date.accessioned 2024/02/19 00:02:09 -
dc.date.available 2024/02/19 00:02:09 -
dc.date.issued 2003 -
dc.description.abstracteng Survey coding is the task of assigning a symbolic code from a predefined set of such codes to the answer given in response to an open-ended question in a questionnaire (aka survey). This task is usually carried out to group respondents according to a predefined scheme based on their answers. Survey coding has several applications, especially in the social sciences, ranging from the simple classification of respondents to the extraction of statistics on political opinions, health and lifestyle habits, customer satisfaction, brand fidelity, and patient satisfaction. Survey coding is a difficult task, because the code that should be attributed to a respondent based on the answer she has given is a matter of subjective judgment, and thus requires expertise. It is thus unsurprising that this task has traditionally been performed manually, by trained coders. Some attempts have been made at automating this task, most of them based on detecting the similarity between the answer and textual descriptions of the meanings of the candidate codes. We take a radically new stand, and formulate the problem of automated survey coding as a text categorization problem, that is, as the problem of learning, by means of supervised machine learning techniques, a model of the association between answers and codes from a training set of precoded answers, and applying the resulting model to the classification of new answers. In this article we experiment with two different learning techniques: one based on naive Bayesian classification, and the other one based on multiclass support vector machines, and test the resulting framework on a corpus of social surveys. The results we have obtained significantly outperform the results achieved by previous automated survey coding approaches. -
dc.description.affiliations CNR-ILC, Pisa, Italy; CNR-ISTI, Pisa, Italy -
dc.description.allpeople Giorgetti, D; Sebastiani, F -
dc.description.allpeopleoriginal Giorgetti D.; Sebastiani F. -
dc.description.fulltext restricted en
dc.description.numberofauthors 2 -
dc.identifier.isi WOS:000186610800002 -
dc.identifier.scopus 2-s2.0-0344443768 -
dc.identifier.uri https://hdl.handle.net/20.500.14243/154345 -
dc.language.iso eng -
dc.relation.firstpage 1269 -
dc.relation.lastpage 1277 -
dc.relation.volume 54 -
dc.subject.keywords survey coding -
dc.subject.keywords text classification -
dc.subject.keywords machine learning -
dc.subject.keywords information retrieva -
dc.subject.singlekeyword survey coding *
dc.subject.singlekeyword text classification *
dc.subject.singlekeyword machine learning *
dc.subject.singlekeyword information retrieva *
dc.title Automating survey coding by multiclass text categorization techniques en
dc.type.driver info:eu-repo/semantics/article -
dc.type.full 01 Contributo su Rivista::01.01 Articolo in rivista it
dc.type.miur 262 -
dc.ugov.descaux1 170365 -
iris.isi.extIssued 2003 -
iris.isi.extTitle Automating survey coding by multiclass text categorization techniques -
iris.mediafilter.data 2025/04/02 00:28:57 *
iris.orcid.lastModifiedDate 2024/04/04 12:37:09 *
iris.orcid.lastModifiedMillisecond 1712227029052 *
iris.scopus.extIssued 2003 -
iris.scopus.extTitle Automating Survey Coding by Multiclass Text Categorization Techniques -
iris.sitodocente.maxattempts 3 -
isi.authority.ancejournal JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY###1532-2882 *
isi.category NU *
isi.category ET *
isi.contributor.affiliation -
isi.contributor.affiliation -
isi.contributor.country -
isi.contributor.country -
isi.contributor.name D -
isi.contributor.name F -
isi.contributor.researcherId MEP-4972-2025 -
isi.contributor.researcherId K-6825-2019 -
isi.contributor.subaffiliation -
isi.contributor.subaffiliation -
isi.contributor.surname Giorgetti -
isi.contributor.surname Sebastiani -
isi.date.issued 2003 *
isi.description.abstracteng Survey coding is the task of assigning a symbolic code from a predefined set of such codes to the answer given in response to an open-ended question in a questionnaire (aka survey). This task is usually carried out to group respondents according to a predefined scheme based on their answers. Survey coding has several applications, especially in the social sciences, ranging from the simple classification of respondents to the extraction of statistics on political opinions, health and lifestyle habits, customer satisfaction, brand fidelity, and patient satisfaction. Survey coding is a difficult task, because the code that should be attributed to a respondent based on the answer she has given is a matter of subjective judgment, and thus requires expertise. It is thus unsurprising that this task has traditionally been performed manually, by trained coders. Some attempts have been made at automating this task, most of them based on detecting the similarity between the answer and textual descriptions of the meanings of the candidate codes. We take a radically new stand, and formulate the problem of automated survey coding as a text categorization problem, that is, as the problem of learning, by means of supervised machine learning techniques, a model of the association between answers and codes from a training set of precoded answers, and applying the resulting model to the classification of new answers. In this article we experiment with two different learning techniques: one based on naive Bayesian classification, and the other one based on multiclass support vector machines, and test the resulting framework on a corpus of social surveys. The results we have obtained significantly outperform the results achieved by previous automated survey coding approaches. *
isi.description.allpeopleoriginal Giorgetti, D; Sebastiani, F; *
isi.document.sourcetype WOS.SCI *
isi.document.type Article *
isi.document.types Article *
isi.identifier.doi 10.1002/asi.10335 *
isi.identifier.eissn 1532-2890 *
isi.identifier.isi WOS:000186610800002 *
isi.journal.journaltitle JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY *
isi.journal.journaltitleabbrev J AM SOC INF SCI TEC *
isi.language.original English *
isi.publisher.place 111 RIVER ST, HOBOKEN 07030-5774, NJ USA *
isi.relation.firstpage 1269 *
isi.relation.issue 14 *
isi.relation.lastpage 1277 *
isi.relation.volume 54 *
isi.title Automating survey coding by multiclass text categorization techniques *
scopus.authority.ancejournal JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY###1532-2882 *
scopus.category 1712 *
scopus.category 1710 *
scopus.category 1709 *
scopus.category 1705 *
scopus.category 1702 *
scopus.contributor.affiliation Consiglio Nazionale delle Ricerche -
scopus.contributor.affiliation Consiglio Nazionale delle Ricerche -
scopus.contributor.afid 60008941 -
scopus.contributor.afid 60021199 -
scopus.contributor.auid 7801379715 -
scopus.contributor.auid 7004170314 -
scopus.contributor.country Italy -
scopus.contributor.country Italy -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.name Daniela -
scopus.contributor.name Fabrizio -
scopus.contributor.subaffiliation Ist. di Linguistica Computazionale; -
scopus.contributor.subaffiliation Ist. di Sci./Tecn. dell'Info.; -
scopus.contributor.surname Giorgetti -
scopus.contributor.surname Sebastiani -
scopus.date.issued 2003 *
scopus.description.abstracteng Survey coding is the task of assigning a symbolic code from a predefined set of such codes to the answer given in response to an open-ended question in a questionnaire (aka survey). This task is usually carried out to group respondents according to a predefined scheme based on their answers. Survey coding has several applications, especially in the social sciences, ranging from the simple classification of respondents to the extraction of statistics on political opinions, health and lifestyle habits, customer satisfaction, brand fidelity, and patient satisfaction. Survey coding is a difficult task, because the code that should be attributed to a respondent based on the answer she has given is a matter of subjective judgment, and thus requires expertise. It is thus unsurprising that this task has traditionally been performed manually, by trained coders. Some attempts have been made at automating this task, most of them based on detecting the similarity between the answer and textual descriptions of the meanings of the candidate codes. We take a radically new stand, and formulate the problem of automated survey coding as a text categorization problem, that is, as the problem of learning, by means of supervised machine learning techniques, a model of the association between answers and codes from a training set of preceded answers, and applying the resulting model to the classification of new answers. In this article we experiment with two different learning techniques: one based on naive Bayesian classification, and the other one based on multiclass support vector machines, and test the resulting framework on a corpus of social surveys. The results we have obtained significantly outperform the results achieved by previous automated survey coding approaches. *
scopus.description.allpeopleoriginal Giorgetti D.; Sebastiani F. *
scopus.differences scopus.identifier.doi *
scopus.differences scopus.description.abstracteng *
scopus.differences scopus.relation.issue *
scopus.document.type re *
scopus.document.types re *
scopus.identifier.doi 10.1002/asi.10335 *
scopus.identifier.pui 37473506 *
scopus.identifier.scopus 2-s2.0-0344443768 *
scopus.journal.sourceid 12098 *
scopus.language.iso eng *
scopus.relation.firstpage 1269 *
scopus.relation.issue 14 *
scopus.relation.lastpage 1277 *
scopus.relation.volume 54 *
scopus.title Automating Survey Coding by Multiclass Text Categorization Techniques *
scopus.titleeng Automating Survey Coding by Multiclass Text Categorization Techniques *
Appare nelle tipologie: 01.01 Articolo in rivista
File in questo prodotto:
File Dimensione Formato  
prod_170365-doc_123105.pdf

solo utenti autorizzati

Descrizione: Automating survey coding by multiclass text categorization techniques
Tipologia: Versione Editoriale (PDF)
Dimensione 113.81 kB
Formato Adobe PDF
113.81 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/154345
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 21
  • ???jsp.display-item.citation.isi??? 14
social impact