The present paper tackles the issue of PoS tag conversion within the framework of a distributed web service platform for the automatic creation of language resources. PoS tagging is now considered a "solved problem"; yet, because of the differences in the tagsets, interchange of the various PoS taggers vailable is still hampered. In this paper we describe the implementation of a PoS-tagged-corpus converter, which is needed for chaining together in a workflow the FreeLing PoS tagger for Italian and the DESR dependency parser, given that these two tools have been developed independently. The conversion problems experienced during the implementation, related to the properties of the different tagsets and of tagset conversion in general, are discussed together with the solutions adopted. Finally, the converter is evaluated by assessing the impact of conversion on the performance of the dependency parser by comparing with the outcome of the native pipeline. From this we learn that in most cases parsing errors are due to actual tagging errors, and not to conversion itself. Besides, information on accuracy loss is an important feature in a distributed environment of (NLP) services, where users need to decide which services best suit their needs

Integrating NLP Tools in a Distributed Environment: A Case Study Chaining a Tagger with a Dependency Parser

Frontini Francesca;Quochi Valeria
2012

Abstract

The present paper tackles the issue of PoS tag conversion within the framework of a distributed web service platform for the automatic creation of language resources. PoS tagging is now considered a "solved problem"; yet, because of the differences in the tagsets, interchange of the various PoS taggers vailable is still hampered. In this paper we describe the implementation of a PoS-tagged-corpus converter, which is needed for chaining together in a workflow the FreeLing PoS tagger for Italian and the DESR dependency parser, given that these two tools have been developed independently. The conversion problems experienced during the implementation, related to the properties of the different tagsets and of tagset conversion in general, are discussed together with the solutions adopted. Finally, the converter is evaluated by assessing the impact of conversion on the performance of the dependency parser by comparing with the outcome of the native pipeline. From this we learn that in most cases parsing errors are due to actual tagging errors, and not to conversion itself. Besides, information on accuracy loss is an important feature in a distributed environment of (NLP) services, where users need to decide which services best suit their needs
Campo DC Valore Lingua
dc.authority.orgunit Istituto di linguistica computazionale "Antonio Zampolli" - ILC -
dc.authority.people Rubino Francesco it
dc.authority.people Frontini Francesca it
dc.authority.people Quochi Valeria it
dc.authority.project Platform for Automatic, Normalized Annotation and Cost-Effective Acquisition of Language Resources for Human Language Technologies -
dc.collection.id.s 71c7200a-7c5f-4e83-8d57-d3d2ba88f40d *
dc.collection.name 04.01 Contributo in Atti di convegno *
dc.contributor.appartenenza Istituto di linguistica computazionale "Antonio Zampolli" - ILC *
dc.contributor.appartenenza.mi 918 *
dc.date.accessioned 2024/02/16 15:53:27 -
dc.date.available 2024/02/16 15:53:27 -
dc.date.issued 2012 -
dc.description.abstracteng The present paper tackles the issue of PoS tag conversion within the framework of a distributed web service platform for the automatic creation of language resources. PoS tagging is now considered a "solved problem"; yet, because of the differences in the tagsets, interchange of the various PoS taggers vailable is still hampered. In this paper we describe the implementation of a PoS-tagged-corpus converter, which is needed for chaining together in a workflow the FreeLing PoS tagger for Italian and the DESR dependency parser, given that these two tools have been developed independently. The conversion problems experienced during the implementation, related to the properties of the different tagsets and of tagset conversion in general, are discussed together with the solutions adopted. Finally, the converter is evaluated by assessing the impact of conversion on the performance of the dependency parser by comparing with the outcome of the native pipeline. From this we learn that in most cases parsing errors are due to actual tagging errors, and not to conversion itself. Besides, information on accuracy loss is an important feature in a distributed environment of (NLP) services, where users need to decide which services best suit their needs -
dc.description.affiliations Istituto di Linguistica Computazionale "A. Zampolli", CNR, Pisa -
dc.description.allpeople Rubino, Francesco; Frontini, Francesca; Quochi, Valeria -
dc.description.allpeopleoriginal Rubino, Francesco; Frontini, Francesca; Quochi, Valeria -
dc.description.fulltext none en
dc.description.numberofauthors 3 -
dc.identifier.isbn 9782951740877 -
dc.identifier.isi WOS:000323927702032 -
dc.identifier.uri https://hdl.handle.net/20.500.14243/128261 -
dc.identifier.url http://www.lrec-conf.org/proceedings/lrec2012/summaries/726.html -
dc.language.iso eng -
dc.publisher.country FRA -
dc.publisher.name European language resources association (ELRA) -
dc.publisher.place Paris -
dc.relation.alleditors Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet U?ur Do?an, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis -
dc.relation.conferencedate 23-25 Maggio 2012 -
dc.relation.conferencename Language Resources and Evaluation Conference 2012 -
dc.relation.conferenceplace Istanbul, Turchia -
dc.relation.firstpage 2125 -
dc.relation.ispartofbook Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12) -
dc.relation.lastpage 2131 -
dc.relation.numberofpages 7 -
dc.relation.projectAcronym PANACEA -
dc.relation.projectAwardNumber 248064 -
dc.relation.projectAwardTitle Platform for Automatic, Normalized Annotation and Cost-Effective Acquisition of Language Resources for Human Language Technologies -
dc.relation.projectFunderName - en
dc.relation.projectFundingStream FP7 -
dc.subject.keywords PoS tag conversion -
dc.subject.keywords interoperability -
dc.subject.keywords NLP pipelines -
dc.subject.singlekeyword PoS tag conversion *
dc.subject.singlekeyword interoperability *
dc.subject.singlekeyword NLP pipelines *
dc.title Integrating NLP Tools in a Distributed Environment: A Case Study Chaining a Tagger with a Dependency Parser en
dc.type.driver info:eu-repo/semantics/conferenceObject -
dc.type.full 04 Contributo in convegno::04.01 Contributo in Atti di convegno it
dc.type.miur 273 -
dc.type.referee Sì, ma tipo non specificato -
dc.ugov.descaux1 220773 -
iris.isi.extIssued 2012 -
iris.isi.extTitle Integrating NLP Tools in a Distributed Environment: A Case Study Chaining a Tagger with a Dependency Parser -
iris.orcid.lastModifiedDate 2024/04/04 11:49:10 *
iris.orcid.lastModifiedMillisecond 1712224150282 *
iris.scopus.extIssued 2012 -
iris.scopus.extTitle Integrating NLP tools in a distributed environment: A case study chaining a tagger with a dependency parser -
iris.sitodocente.maxattempts 2 -
isi.authority.sdg Goal 3: Good health and well-being###12083 *
isi.category OT *
isi.category OY *
isi.contributor.affiliation Consiglio Nazionale delle Ricerche (CNR) -
isi.contributor.affiliation Consiglio Nazionale delle Ricerche (CNR) -
isi.contributor.affiliation Consiglio Nazionale delle Ricerche (CNR) -
isi.contributor.country Italy -
isi.contributor.country Italy -
isi.contributor.country Italy -
isi.contributor.name Francesco -
isi.contributor.name Francesca -
isi.contributor.name Valeria -
isi.contributor.researcherId JJJ-6437-2023 -
isi.contributor.researcherId MDT-6613-2025 -
isi.contributor.researcherId E-7468-2011 -
isi.contributor.subaffiliation Ist Linguist Computaz A Zampolli -
isi.contributor.subaffiliation Ist Linguist Computaz A Zampolli -
isi.contributor.subaffiliation Ist Linguist Computaz A Zampolli -
isi.contributor.surname Rubino -
isi.contributor.surname Frontini -
isi.contributor.surname Quochi -
isi.date.issued 2012 *
isi.description.abstracteng The present paper tackles the issue of PoS tag conversion within the framework of a distributed web service platform for the automatic creation of language resources. PoS tagging is now considered a "solved problem"; yet, because of the differences in the tagsets, interchange of the various PoS taggers available is still hampered. In this paper we describe the implementation of a PoS-tagged-corpus converter, which is needed for chaining together in a workflow the FreeLing PoS tagger for Italian and the DESR dependency parser, given that these two tools have been developed independently. The conversion problems experienced during the implementation, related to the properties of the different tagsets and of tagset conversion in general, are discussed together with the solutions adopted. Finally, the converter is evaluated by assessing the impact of conversion on the performance of the dependency parser by comparing with the outcome of the native pipeline. From this we learn that in most cases parsing errors are due to actual tagging errors, and not to conversion itself. Besides, information on accuracy loss is an important feature in a distributed environment of (NLP) services, where users need to decide which services best suit their needs. *
isi.description.allpeopleoriginal Rubino, F; Frontini, F; Quochi, V; *
isi.document.sourcetype WOS.ISSHP *
isi.document.type Proceedings Paper *
isi.document.types Proceedings Paper *
isi.identifier.isi WOS:000323927702032 *
isi.journal.journaltitle LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION *
isi.language.original English *
isi.publisher.place 55-57, RUE BRILLAT-SAVARIN, PARIS, 75013, FRANCE *
isi.relation.firstpage 2125 *
isi.relation.lastpage 2131 *
isi.title Integrating NLP Tools in a Distributed Environment: A Case Study Chaining a Tagger with a Dependency Parser *
Appare nelle tipologie: 04.01 Contributo in Atti di convegno
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/128261
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? 0
social impact