CNR Institutional Research Information System

Survey coding is the task of assigning a symbolic code from a predefined set of such codes to the answer given in response to an open-ended question in a questionnaire (aka survey). This task is usually carried out to group respondents according to a predefined scheme based on their answers. Survey coding has several applications, especially in the social sciences, ranging from the simple classification of respondents to the extraction of statistics on political opinions, health and lifestyle habits, customer satisfaction, brand fidelity, and patient satisfaction. Survey coding is a difficult task, because the code that should be attributed to a respondent based on the answer she has given is a matter of subjective judgment, and thus requires expertise. It is thus unsurprising that this task has traditionally been performed manually, by trained coders. Some attempts have been made at automating this task, most of them based on detecting the similarity between the answer and textual descriptions of the meanings of the candidate codes. We take a radically new stand, and formulate the problem of automated survey coding as a text categorization problem, that is, as the problem of learning, by means of supervised machine learning techniques, a model of the association between answers and codes from a training set of precoded answers, and applying the resulting model to the classification of new answers. In this article we experiment with two different learning techniques: one based on naive Bayesian classification, and the other one based on multiclass support vector machines, and test the resulting framework on a corpus of social surveys. The results we have obtained significantly outperform the results achieved by previous automated survey coding approaches.

Automating survey coding by multiclass text categorization techniques

Giorgetti D;Sebastiani F

2003

Abstract

Survey coding is the task of assigning a symbolic code from a predefined set of such codes to the answer given in response to an open-ended question in a questionnaire (aka survey). This task is usually carried out to group respondents according to a predefined scheme based on their answers. Survey coding has several applications, especially in the social sciences, ranging from the simple classification of respondents to the extraction of statistics on political opinions, health and lifestyle habits, customer satisfaction, brand fidelity, and patient satisfaction. Survey coding is a difficult task, because the code that should be attributed to a respondent based on the answer she has given is a matter of subjective judgment, and thus requires expertise. It is thus unsurprising that this task has traditionally been performed manually, by trained coders. Some attempts have been made at automating this task, most of them based on detecting the similarity between the answer and textual descriptions of the meanings of the candidate codes. We take a radically new stand, and formulate the problem of automated survey coding as a text categorization problem, that is, as the problem of learning, by means of supervised machine learning techniques, a model of the association between answers and codes from a training set of precoded answers, and applying the resulting model to the classification of new answers. In this article we experiment with two different learning techniques: one based on naive Bayesian classification, and the other one based on multiclass support vector machines, and test the resulting framework on a corpus of social surveys. The results we have obtained significantly outperform the results achieved by previous automated survey coding approaches.

Scheda breve

Scheda completa

Scheda completa (DC)

Campo DC	Valore	Lingua
dc.authority.ancejournal	JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY	-
dc.authority.orgunit	Istituto di linguistica computazionale "Antonio Zampolli" - ILC	-
dc.authority.orgunit	Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI	-
dc.authority.people	Giorgetti D	it
dc.authority.people	Sebastiani F	it
dc.collection.id.s	b3f88f24-048a-4e43-8ab1-6697b90e068e	*
dc.collection.name	01.01 Articolo in rivista	*
dc.contributor.appartenenza	Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI	*
dc.contributor.appartenenza.mi	973	*
dc.date.accessioned	2024/02/19 00:02:09	-
dc.date.available	2024/02/19 00:02:09	-
dc.date.issued	2003	-
dc.description.abstracteng	Survey coding is the task of assigning a symbolic code from a predefined set of such codes to the answer given in response to an open-ended question in a questionnaire (aka survey). This task is usually carried out to group respondents according to a predefined scheme based on their answers. Survey coding has several applications, especially in the social sciences, ranging from the simple classification of respondents to the extraction of statistics on political opinions, health and lifestyle habits, customer satisfaction, brand fidelity, and patient satisfaction. Survey coding is a difficult task, because the code that should be attributed to a respondent based on the answer she has given is a matter of subjective judgment, and thus requires expertise. It is thus unsurprising that this task has traditionally been performed manually, by trained coders. Some attempts have been made at automating this task, most of them based on detecting the similarity between the answer and textual descriptions of the meanings of the candidate codes. We take a radically new stand, and formulate the problem of automated survey coding as a text categorization problem, that is, as the problem of learning, by means of supervised machine learning techniques, a model of the association between answers and codes from a training set of precoded answers, and applying the resulting model to the classification of new answers. In this article we experiment with two different learning techniques: one based on naive Bayesian classification, and the other one based on multiclass support vector machines, and test the resulting framework on a corpus of social surveys. The results we have obtained significantly outperform the results achieved by previous automated survey coding approaches.	-
dc.description.affiliations	CNR-ILC, Pisa, Italy; CNR-ISTI, Pisa, Italy	-
dc.description.allpeople	Giorgetti, D; Sebastiani, F	-
dc.description.allpeopleoriginal	Giorgetti D.; Sebastiani F.	-
dc.description.fulltext	restricted	en
dc.description.numberofauthors	2	-
dc.identifier.isi	WOS:000186610800002	-
dc.identifier.scopus	2-s2.0-0344443768	-
dc.identifier.uri	https://hdl.handle.net/20.500.14243/154345	-
dc.language.iso	eng	-
dc.relation.firstpage	1269	-
dc.relation.lastpage	1277	-
dc.relation.volume	54	-
dc.subject.keywords	survey coding	-
dc.subject.keywords	text classification	-
dc.subject.keywords	machine learning	-
dc.subject.keywords	information retrieva	-
dc.subject.singlekeyword	survey coding	*
dc.subject.singlekeyword	text classification	*
dc.subject.singlekeyword	machine learning	*
dc.subject.singlekeyword	information retrieva	*
dc.title	Automating survey coding by multiclass text categorization techniques	en
dc.type.driver	info:eu-repo/semantics/article	-
dc.type.full	01 Contributo su Rivista::01.01 Articolo in rivista	it
dc.type.miur	262	-
dc.ugov.descaux1	170365	-
iris.isi.extIssued	2003	-
iris.isi.extTitle	Automating survey coding by multiclass text categorization techniques	-
iris.mediafilter.data	2025/04/02 00:28:57	*
iris.orcid.lastModifiedDate	2024/04/04 12:37:09	*
iris.orcid.lastModifiedMillisecond	1712227029052	*
iris.scopus.extIssued	2003	-
iris.scopus.extTitle	Automating Survey Coding by Multiclass Text Categorization Techniques	-
iris.sitodocente.maxattempts	3	-
isi.authority.ancejournal	JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY###1532-2882	*
isi.category	NU	*
isi.category	ET	*
isi.contributor.affiliation		-
isi.contributor.affiliation		-
isi.contributor.country		-
isi.contributor.country		-
isi.contributor.name	D	-
isi.contributor.name	F	-
isi.contributor.researcherId	MEP-4972-2025	-
isi.contributor.researcherId	K-6825-2019	-
isi.contributor.subaffiliation		-
isi.contributor.subaffiliation		-
isi.contributor.surname	Giorgetti	-
isi.contributor.surname	Sebastiani	-
isi.date.issued	2003	*
isi.description.abstracteng	Survey coding is the task of assigning a symbolic code from a predefined set of such codes to the answer given in response to an open-ended question in a questionnaire (aka survey). This task is usually carried out to group respondents according to a predefined scheme based on their answers. Survey coding has several applications, especially in the social sciences, ranging from the simple classification of respondents to the extraction of statistics on political opinions, health and lifestyle habits, customer satisfaction, brand fidelity, and patient satisfaction. Survey coding is a difficult task, because the code that should be attributed to a respondent based on the answer she has given is a matter of subjective judgment, and thus requires expertise. It is thus unsurprising that this task has traditionally been performed manually, by trained coders. Some attempts have been made at automating this task, most of them based on detecting the similarity between the answer and textual descriptions of the meanings of the candidate codes. We take a radically new stand, and formulate the problem of automated survey coding as a text categorization problem, that is, as the problem of learning, by means of supervised machine learning techniques, a model of the association between answers and codes from a training set of precoded answers, and applying the resulting model to the classification of new answers. In this article we experiment with two different learning techniques: one based on naive Bayesian classification, and the other one based on multiclass support vector machines, and test the resulting framework on a corpus of social surveys. The results we have obtained significantly outperform the results achieved by previous automated survey coding approaches.	*
isi.description.allpeopleoriginal	Giorgetti, D; Sebastiani, F;	*
isi.document.sourcetype	WOS.SCI	*
isi.document.type	Article	*
isi.document.types	Article	*
isi.identifier.doi	10.1002/asi.10335	*
isi.identifier.eissn	1532-2890	*
isi.identifier.isi	WOS:000186610800002	*
isi.journal.journaltitle	JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY	*
isi.journal.journaltitleabbrev	J AM SOC INF SCI TEC	*
isi.language.original	English	*
isi.publisher.place	111 RIVER ST, HOBOKEN 07030-5774, NJ USA	*
isi.relation.firstpage	1269	*
isi.relation.issue	14	*
isi.relation.lastpage	1277	*
isi.relation.volume	54	*
isi.title	Automating survey coding by multiclass text categorization techniques	*
scopus.authority.ancejournal	JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY###1532-2882	*
scopus.category	1712	*
scopus.category	1710	*
scopus.category	1709	*
scopus.category	1705	*
scopus.category	1702	*
scopus.contributor.affiliation	Consiglio Nazionale delle Ricerche	-
scopus.contributor.affiliation	Consiglio Nazionale delle Ricerche	-
scopus.contributor.afid	60008941	-
scopus.contributor.afid	60021199	-
scopus.contributor.auid	7801379715	-
scopus.contributor.auid	7004170314	-
scopus.contributor.country	Italy	-
scopus.contributor.country	Italy	-
scopus.contributor.dptid		-
scopus.contributor.dptid		-
scopus.contributor.name	Daniela	-
scopus.contributor.name	Fabrizio	-
scopus.contributor.subaffiliation	Ist. di Linguistica Computazionale;	-
scopus.contributor.subaffiliation	Ist. di Sci./Tecn. dell'Info.;	-
scopus.contributor.surname	Giorgetti	-
scopus.contributor.surname	Sebastiani	-
scopus.date.issued	2003	*
scopus.description.abstracteng	Survey coding is the task of assigning a symbolic code from a predefined set of such codes to the answer given in response to an open-ended question in a questionnaire (aka survey). This task is usually carried out to group respondents according to a predefined scheme based on their answers. Survey coding has several applications, especially in the social sciences, ranging from the simple classification of respondents to the extraction of statistics on political opinions, health and lifestyle habits, customer satisfaction, brand fidelity, and patient satisfaction. Survey coding is a difficult task, because the code that should be attributed to a respondent based on the answer she has given is a matter of subjective judgment, and thus requires expertise. It is thus unsurprising that this task has traditionally been performed manually, by trained coders. Some attempts have been made at automating this task, most of them based on detecting the similarity between the answer and textual descriptions of the meanings of the candidate codes. We take a radically new stand, and formulate the problem of automated survey coding as a text categorization problem, that is, as the problem of learning, by means of supervised machine learning techniques, a model of the association between answers and codes from a training set of preceded answers, and applying the resulting model to the classification of new answers. In this article we experiment with two different learning techniques: one based on naive Bayesian classification, and the other one based on multiclass support vector machines, and test the resulting framework on a corpus of social surveys. The results we have obtained significantly outperform the results achieved by previous automated survey coding approaches.	*
scopus.description.allpeopleoriginal	Giorgetti D.; Sebastiani F.	*
scopus.differences	scopus.identifier.doi	*
scopus.differences	scopus.description.abstracteng	*
scopus.differences	scopus.relation.issue	*
scopus.document.type	re	*
scopus.document.types	re	*
scopus.identifier.doi	10.1002/asi.10335	*
scopus.identifier.pui	37473506	*
scopus.identifier.scopus	2-s2.0-0344443768	*
scopus.journal.sourceid	12098	*
scopus.language.iso	eng	*
scopus.relation.firstpage	1269	*
scopus.relation.issue	14	*
scopus.relation.lastpage	1277	*
scopus.relation.volume	54	*
scopus.title	Automating Survey Coding by Multiclass Text Categorization Techniques	*
scopus.titleeng	Automating Survey Coding by Multiclass Text Categorization Techniques	*
Appare nelle tipologie:	01.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
prod_170365-doc_123105.pdf solo utenti autorizzati Descrizione: Automating survey coding by multiclass text categorization techniques Tipologia: Versione Editoriale (PDF) Dimensione 113.81 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	113.81 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/154345

Citazioni

ND

21

14

social impact