In this paper we describe our approach to EVALITA 2016 POS tagging for Italian Social Media Texts (PoSTWITA). We developed a two-branch bidirectional Long Short Term Memory recurrent neural network, where the first bi-LSTM uses a typical vector representation for the input words, while the second one uses a newly introduced word-vector representation able to encode information about the characters in the words avoiding the increasing of computational costs due to the hierarchical LSTM introduced by the character-based LSTM architectures. The vector representations calculated by the two LSTM are then merged by the sum operation. Even if participants were allowed to use other annotated resources in their systems, we used only the distributed data set to train our system. When evaluated on the official test set, our system outperformed all the other systems achieving the highest accuracy score in EVALITA 2016 PoSTWITA, with a tagging accuracy of 93.19%. Further experiments carried out after the official evaluation period allowed us to develop a system able to achieve a higher accuracy. These experiments showed the central role played by the handcrafted features even when machine learning algorithms based on neural networks are used.

Building the state-of-the-art in POS tagging of Italian Tweets

Cimino A;Dell'orletta F
2016

Abstract

In this paper we describe our approach to EVALITA 2016 POS tagging for Italian Social Media Texts (PoSTWITA). We developed a two-branch bidirectional Long Short Term Memory recurrent neural network, where the first bi-LSTM uses a typical vector representation for the input words, while the second one uses a newly introduced word-vector representation able to encode information about the characters in the words avoiding the increasing of computational costs due to the hierarchical LSTM introduced by the character-based LSTM architectures. The vector representations calculated by the two LSTM are then merged by the sum operation. Even if participants were allowed to use other annotated resources in their systems, we used only the distributed data set to train our system. When evaluated on the official test set, our system outperformed all the other systems achieving the highest accuracy score in EVALITA 2016 PoSTWITA, with a tagging accuracy of 93.19%. Further experiments carried out after the official evaluation period allowed us to develop a system able to achieve a higher accuracy. These experiments showed the central role played by the handcrafted features even when machine learning algorithms based on neural networks are used.
Campo DC Valore Lingua
dc.authority.anceserie CEUR WORKSHOP PROCEEDINGS -
dc.authority.anceserie CEUR Workshop Proceedings -
dc.authority.people Cimino A it
dc.authority.people Dell'orletta F it
dc.collection.id.s 71c7200a-7c5f-4e83-8d57-d3d2ba88f40d *
dc.collection.name 04.01 Contributo in Atti di convegno *
dc.contributor.appartenenza Istituto di linguistica computazionale "Antonio Zampolli" - ILC *
dc.contributor.appartenenza.mi 918 *
dc.date.accessioned 2024/02/20 07:46:19 -
dc.date.available 2024/02/20 07:46:19 -
dc.date.issued 2016 -
dc.description.abstracteng In this paper we describe our approach to EVALITA 2016 POS tagging for Italian Social Media Texts (PoSTWITA). We developed a two-branch bidirectional Long Short Term Memory recurrent neural network, where the first bi-LSTM uses a typical vector representation for the input words, while the second one uses a newly introduced word-vector representation able to encode information about the characters in the words avoiding the increasing of computational costs due to the hierarchical LSTM introduced by the character-based LSTM architectures. The vector representations calculated by the two LSTM are then merged by the sum operation. Even if participants were allowed to use other annotated resources in their systems, we used only the distributed data set to train our system. When evaluated on the official test set, our system outperformed all the other systems achieving the highest accuracy score in EVALITA 2016 PoSTWITA, with a tagging accuracy of 93.19%. Further experiments carried out after the official evaluation period allowed us to develop a system able to achieve a higher accuracy. These experiments showed the central role played by the handcrafted features even when machine learning algorithms based on neural networks are used. -
dc.description.affiliations Istituto di Linguistica Computazionale Antonio Zampolli (ILC-CNR), ItaliaNLP Lab, Italy -
dc.description.allpeople Cimino A.; Dell'orletta F. -
dc.description.allpeopleoriginal Cimino A.; Dell'orletta F. -
dc.description.fulltext none en
dc.description.numberofauthors 2 -
dc.identifier.scopus 2-s2.0-85009243622 -
dc.identifier.uri https://hdl.handle.net/20.500.14243/333953 -
dc.identifier.url http://www.scopus.com/record/display.url?eid=2-s2.0-85009243622&origin=inward -
dc.language.iso eng -
dc.relation.conferencedate 7/12/2016 -
dc.relation.conferencename Evaluation of NLP and Speech Tools for Italian (EVALITA 1016) -
dc.relation.conferenceplace Napoli -
dc.relation.volume 1749 -
dc.subject.keywords nlp -
dc.subject.keywords part-of-speech tagging -
dc.subject.singlekeyword nlp *
dc.subject.singlekeyword part-of-speech tagging *
dc.title Building the state-of-the-art in POS tagging of Italian Tweets en
dc.type.driver info:eu-repo/semantics/conferenceObject -
dc.type.full 04 Contributo in convegno::04.01 Contributo in Atti di convegno it
dc.type.miur 273 -
dc.type.referee Sì, ma tipo non specificato -
dc.ugov.descaux1 366728 -
iris.orcid.lastModifiedDate 2024/03/23 09:55:55 *
iris.orcid.lastModifiedMillisecond 1711184155379 *
iris.scopus.extIssued 2016 -
iris.scopus.extTitle Building the state-of-the-art in POS tagging of Italian Tweets -
iris.sitodocente.maxattempts 2 -
scopus.authority.anceserie CEUR WORKSHOP PROCEEDINGS###1613-0073 *
scopus.category 1700 *
scopus.contributor.affiliation ItaliaNLP Lab -
scopus.contributor.affiliation ItaliaNLP Lab -
scopus.contributor.afid 60008941 -
scopus.contributor.afid 60008941 -
scopus.contributor.auid 57002803800 -
scopus.contributor.auid 57540567000 -
scopus.contributor.country Italy -
scopus.contributor.country Italy -
scopus.contributor.dptid 114087935 -
scopus.contributor.dptid 114087935 -
scopus.contributor.name Andrea -
scopus.contributor.name Felice -
scopus.contributor.subaffiliation Istituto di Linguistica Computazionale Antonio Zampolli (ILC-CNR); -
scopus.contributor.subaffiliation Istituto di Linguistica Computazionale Antonio Zampolli (ILC-CNR); -
scopus.contributor.surname Cimino -
scopus.contributor.surname Dell'orletta -
scopus.date.issued 2016 *
scopus.description.abstracteng In this paper we describe our approach to EVALITA 2016 POS tagging for Italian Social Media Texts (PoSTWITA). We developed a two-branch bidirectional Long Short Term Memory recurrent neural network, where the first bi-LSTM uses a typical vector representation for the input words, while the second one uses a newly introduced word-vector representation able to encode information about the characters in the words avoiding the increasing of computational costs due to the hierarchical LSTM introduced by the character-based LSTM architectures. The vector representations calculated by the two LSTM are then merged by the sum operation. Even if participants were allowed to use other annotated resources in their systems, we used only the distributed data set to train our system. When evaluated on the official test set, our system outperformed all the other systems achieving the highest accuracy score in EVALITA 2016 PoSTWITA, with a tagging accuracy of 93.19%. Further experiments carried out after the official evaluation period allowed us to develop a system able to achieve a higher accuracy. These experiments showed the central role played by the handcrafted features even when machine learning algorithms based on neural networks are used. *
scopus.description.allpeopleoriginal Cimino A.; Dell'orletta F. *
scopus.differences scopus.relation.conferencename *
scopus.differences scopus.authority.anceserie *
scopus.differences scopus.publisher.name *
scopus.differences scopus.relation.conferencedate *
scopus.differences scopus.relation.conferenceplace *
scopus.document.type cp *
scopus.document.types cp *
scopus.identifier.pui 614074481 *
scopus.identifier.scopus 2-s2.0-85009243622 *
scopus.journal.sourceid 21100218356 *
scopus.language.iso eng *
scopus.publisher.name CEUR-WS *
scopus.relation.conferencedate 2016 *
scopus.relation.conferencename 3rd Italian Conference on Computational Linguistics, CLiC-it 2016 and 5th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian, EVALITA 2016 *
scopus.relation.conferenceplace ita *
scopus.relation.volume 1749 *
scopus.title Building the state-of-the-art in POS tagging of Italian Tweets *
scopus.titleeng Building the state-of-the-art in POS tagging of Italian Tweets *
Appare nelle tipologie: 04.01 Contributo in Atti di convegno
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/333953
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact