Survey coding is the task of assigning a symbolic code from a predefined set of such codes to the answer given in response to an open-ended question in a questionnaire (aka survey). We formulate the problem of automated survey coding as a text categorization problem, i.e. as the problem of learning, by means of supervised machine learning techniques, a model of the association between answers and codes from a training set of pre-coded answers, and applying the resulting model to the classi.cation of new answers. In this paper we experiment with two different learning techniques, one based on naÏve Bayesian classi.cation and the other one based on multiclass support vector machines, and test the resulting framework on a corpus of social surveys. The results we have obtained significantly outperform the results achieved by previous automated survey coding approaches.
Multiclass text categorization for automated survey coding
Sebastiani F.
2003
Abstract
Survey coding is the task of assigning a symbolic code from a predefined set of such codes to the answer given in response to an open-ended question in a questionnaire (aka survey). We formulate the problem of automated survey coding as a text categorization problem, i.e. as the problem of learning, by means of supervised machine learning techniques, a model of the association between answers and codes from a training set of pre-coded answers, and applying the resulting model to the classi.cation of new answers. In this paper we experiment with two different learning techniques, one based on naÏve Bayesian classi.cation and the other one based on multiclass support vector machines, and test the resulting framework on a corpus of social surveys. The results we have obtained significantly outperform the results achieved by previous automated survey coding approaches.| Campo DC | Valore | Lingua |
|---|---|---|
| dc.authority.orgunit | Istituto di linguistica computazionale "Antonio Zampolli" - ILC | en |
| dc.authority.orgunit | Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI | en |
| dc.authority.people | Giorgetti D. | en |
| dc.authority.people | Sebastiani F. | en |
| dc.collection.id.s | 71c7200a-7c5f-4e83-8d57-d3d2ba88f40d | * |
| dc.collection.name | 04.01 Contributo in Atti di convegno | * |
| dc.contributor.appartenenza | Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI | * |
| dc.contributor.appartenenza.mi | 973 | * |
| dc.date.accessioned | 2024/02/18 14:16:59 | - |
| dc.date.available | 2024/02/18 14:16:59 | - |
| dc.date.issued | 2003 | - |
| dc.description.abstracteng | Survey coding is the task of assigning a symbolic code from a predefined set of such codes to the answer given in response to an open-ended question in a questionnaire (aka survey). We formulate the problem of automated survey coding as a text categorization problem, i.e. as the problem of learning, by means of supervised machine learning techniques, a model of the association between answers and codes from a training set of pre-coded answers, and applying the resulting model to the classi.cation of new answers. In this paper we experiment with two different learning techniques, one based on naÏve Bayesian classi.cation and the other one based on multiclass support vector machines, and test the resulting framework on a corpus of social surveys. The results we have obtained significantly outperform the results achieved by previous automated survey coding approaches. | - |
| dc.description.affiliations | CNR-ILC, Pisa, Italy; CNR-ISTI, Pisa, Italy | - |
| dc.description.allpeople | Giorgetti, D.; Sebastiani, F. | - |
| dc.description.allpeopleoriginal | Giorgetti D.; Sebastiani F. | en |
| dc.description.fulltext | restricted | en |
| dc.description.numberofauthors | 2 | - |
| dc.identifier.scopus | 2-s2.0-0037661005 | en |
| dc.identifier.uri | https://hdl.handle.net/20.500.14243/147456 | - |
| dc.language.iso | eng | en |
| dc.publisher.country | USA | en |
| dc.publisher.name | ACM Press | en |
| dc.publisher.place | New York | en |
| dc.relation.conferencedate | 9-12 March 2003 | en |
| dc.relation.conferencename | SAC-03 - 18th ACM Symposium on Applied Computing | en |
| dc.relation.conferenceplace | Melbourne | en |
| dc.relation.firstpage | 798 | en |
| dc.relation.ispartofbook | na | en |
| dc.relation.lastpage | 802 | en |
| dc.relation.medium | ELETTRONICO | en |
| dc.relation.numberofpages | 5 | en |
| dc.subject.keywordseng | Text categorization | - |
| dc.subject.keywordseng | Classifier Design and Evaluation | - |
| dc.subject.keywordseng | Learning | - |
| dc.subject.keywordseng | Information Search and Retrieval | - |
| dc.subject.keywordseng | Sociology | - |
| dc.subject.singlekeyword | Text categorization | * |
| dc.subject.singlekeyword | Classifier Design and Evaluation | * |
| dc.subject.singlekeyword | Learning | * |
| dc.subject.singlekeyword | Information Search and Retrieval | * |
| dc.subject.singlekeyword | Sociology | * |
| dc.title | Multiclass text categorization for automated survey coding | en |
| dc.type.driver | info:eu-repo/semantics/conferenceObject | - |
| dc.type.full | 04 Contributo in convegno::04.01 Contributo in Atti di convegno | it |
| dc.type.miur | 273 | - |
| dc.type.referee | Sì, ma tipo non specificato | en |
| dc.ugov.descaux1 | 171489 | - |
| iris.mediafilter.data | 2025/04/18 03:04:12 | * |
| iris.orcid.lastModifiedDate | 2024/09/09 18:31:58 | * |
| iris.orcid.lastModifiedMillisecond | 1725899518848 | * |
| iris.scopus.extIssued | 2003 | - |
| iris.scopus.extTitle | Multiclass text categorization for automated survey coding | - |
| iris.sitodocente.maxattempts | 1 | - |
| scopus.category | 1712 | * |
| scopus.contributor.affiliation | Consiglio Nazionale delle Ricerche | - |
| scopus.contributor.affiliation | Consiglio Nazionale delle Ricerche | - |
| scopus.contributor.afid | 60008941 | - |
| scopus.contributor.afid | 60021199 | - |
| scopus.contributor.auid | 7801379715 | - |
| scopus.contributor.auid | 7004170314 | - |
| scopus.contributor.country | Italy | - |
| scopus.contributor.country | Italy | - |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | - | |
| scopus.contributor.name | Daniela | - |
| scopus.contributor.name | Fabrizio | - |
| scopus.contributor.subaffiliation | Ist. di Linguistica Computazionale; | - |
| scopus.contributor.subaffiliation | Ist. Sci./Tecnologie dell'I.; | - |
| scopus.contributor.surname | Giorgetti | - |
| scopus.contributor.surname | Sebastiani | - |
| scopus.date.issued | 2003 | * |
| scopus.description.abstracteng | Survey coding is the task of assigning a symbolic code from a predefined set of such codes to the answer given in response to an open-ended question in a questionnaire (aka survey). We formulate the problem of automated survey coding as a text categorization problem, i.e. as the problem of learning, by means of supervised machine learning techniques, a model of the association between answers and codes from a training set of pre-coded answers, and applying the resulting model to the classification of new answers. In this paper we experiment with two different learning techniques, one based on naïve Bayesian classification and the other one based on multiclass support vector machines, and test the resulting framework on a corpus of social surveys. The results we have obtained significantly outperform the results achieved by previous automated survey coding approaches. | * |
| scopus.description.allpeopleoriginal | Giorgetti D.; Sebastiani F. | * |
| scopus.differences | scopus.relation.conferencename | * |
| scopus.differences | scopus.publisher.name | * |
| scopus.differences | scopus.subject.keywords | * |
| scopus.differences | scopus.relation.conferencedate | * |
| scopus.differences | scopus.identifier.doi | * |
| scopus.differences | scopus.description.abstracteng | * |
| scopus.differences | scopus.relation.conferenceplace | * |
| scopus.document.type | cp | * |
| scopus.document.types | cp | * |
| scopus.identifier.doi | 10.1145/952686.952691 | * |
| scopus.identifier.pui | 36816908 | * |
| scopus.identifier.scopus | 2-s2.0-0037661005 | * |
| scopus.journal.sourceid | 89358 | * |
| scopus.language.iso | eng | * |
| scopus.publisher.name | Association for Computing Machinery (ACM) | * |
| scopus.relation.conferencedate | 2003 | * |
| scopus.relation.conferencename | Proceedings of the 2003 ACM Symposium on Applied Computing | * |
| scopus.relation.conferenceplace | Melbourne, FL, usa | * |
| scopus.relation.firstpage | 798 | * |
| scopus.relation.lastpage | 802 | * |
| scopus.subject.keywords | Multiclass text categorization; Open-ended survey coding; | * |
| scopus.title | Multiclass text categorization for automated survey coding | * |
| scopus.titleeng | Multiclass text categorization for automated survey coding | * |
| Appare nelle tipologie: | 04.01 Contributo in Atti di convegno | |
| File | Dimensione | Formato | |
|---|---|---|---|
|
prod_171489-doc_123629.pdf
solo utenti autorizzati
Descrizione: Multiclass text categorization for automated survey coding
Tipologia:
Versione Editoriale (PDF)
Licenza:
NON PUBBLICO - Accesso privato/ristretto
Dimensione
112.35 kB
Formato
Adobe PDF
|
112.35 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


