CNR Institutional Research Information System

In the last seven years we have carried out experimental research aimed at developing software that automatically codes open-ended survey responses. These projects have led to the generation of an industrial-strength software package now in operation at the Customer Insight division of a large international banking group, and now integrated into a widely-used software platform for the management of open-ended survey data. This software, which can code data at a rate of tens of thousands of open-ended responses per hour, and that can address responses formulated in any of five major European languages, is the result of contributions from different fields of computer science, including Information Retrieval, Machine Learning, Computational Linguistics, and Opinion Mining. Our approach is based on a learning metaphor, whereby automated verbatim coders are automatically generated by a general-purpose process that learns, from a user-provided sample of manually coded verbatims, the characteristics that new, uncoded verbatims should have in order to be attributed the codes in the codeframe. In this paper we discuss the basic philosophy underlying this software. In a forthcoming companion paper we present the results of experiments we have run on several datasets of real respondent data in which we have compared the accuracy of the software against the accuracy of human coders.

Machines that learn how to code open-ended survey data. Part I: the basic approach and a working system

Esuli A;Fagni T;Sebastiani F

2009

Abstract

In the last seven years we have carried out experimental research aimed at developing software that automatically codes open-ended survey responses. These projects have led to the generation of an industrial-strength software package now in operation at the Customer Insight division of a large international banking group, and now integrated into a widely-used software platform for the management of open-ended survey data. This software, which can code data at a rate of tens of thousands of open-ended responses per hour, and that can address responses formulated in any of five major European languages, is the result of contributions from different fields of computer science, including Information Retrieval, Machine Learning, Computational Linguistics, and Opinion Mining. Our approach is based on a learning metaphor, whereby automated verbatim coders are automatically generated by a general-purpose process that learns, from a user-provided sample of manually coded verbatims, the characteristics that new, uncoded verbatims should have in order to be attributed the codes in the codeframe. In this paper we discuss the basic philosophy underlying this software. In a forthcoming companion paper we present the results of experiments we have run on several datasets of real respondent data in which we have compared the accuracy of the software against the accuracy of human coders.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2009
			
	Strutture organizzative
	
				Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
			
	Parole chiave
	
				Design Methodology. Cla
Administrative Data Processing. Marketing
Survey coding
Open-ended questions
			
	Appare nelle tipologie:
	
				08.04 Rapporto tecnico

File in questo prodotto:

File	Dimensione	Formato
prod_161101-doc_131388.pdf accesso aperto Descrizione: Machines that learn how to code open-ended survey data. Part I: the basic approach and a working system Dimensione 787.8 kB Formato Adobe PDF Visualizza/Apri	787.8 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/167647

Citazioni

ND

ND

ND

social impact