In the last six years I have led projects aimed at developing software that learns how to code open-ended survey data from data manually coded by humans. These projects have led to the development of software now in operation at the Customer Satisfaction division of a large international banking group, and now integrated into a major software platform for the management of open-ended survey data. This software, which can code data at a rate of tens of thousands responses per hour, is the result of contributions from different fields of computer science, including Information Retrieval, Machine Learning, Computational Linguistics, and Opinion Mining. In this talk I will discuss the basic philosophy underlying this software, I will present the results of experiments we have run on several datasets of respondent data in which we compare the accuracy of the software against the accuracy of human coders, and I will argue for a notion of "accuracy" defined in terms of inter-coder agreement rates. Finally, I will discuss the kind of characteristics that make a survey more or less amenable to automated coding by means of our system.

Machines that learn how to code open-ended survey data: underlying principles, experimental data, and methodological issues

Sebastiani F
2008

Abstract

In the last six years I have led projects aimed at developing software that learns how to code open-ended survey data from data manually coded by humans. These projects have led to the development of software now in operation at the Customer Satisfaction division of a large international banking group, and now integrated into a major software platform for the management of open-ended survey data. This software, which can code data at a rate of tens of thousands responses per hour, is the result of contributions from different fields of computer science, including Information Retrieval, Machine Learning, Computational Linguistics, and Opinion Mining. In this talk I will discuss the basic philosophy underlying this software, I will present the results of experiments we have run on several datasets of respondent data in which we compare the accuracy of the software against the accuracy of human coders, and I will argue for a notion of "accuracy" defined in terms of inter-coder agreement rates. Finally, I will discuss the kind of characteristics that make a survey more or less amenable to automated coding by means of our system.
2008
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Survey coding
Customer satisfaction
Text classification
Sentiment analysis
Machine learning
Market research
File in questo prodotto:
File Dimensione Formato  
prod_120647-doc_129628.pdf

accesso aperto

Descrizione: Machines that learn how to code open-ended survey data: underlying principles, experimental data, and methodological issues
Tipologia: Versione Editoriale (PDF)
Dimensione 2.78 MB
Formato Adobe PDF
2.78 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/85971
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact