In this paper we describe our approach to EVALITA 2016 POS tagging for Italian Social Media Texts (PoSTWITA). We developed a two-branch bidirectional Long Short Term Memory recurrent neural network, where the first bi-LSTM uses a typical vector representation for the input words, while the second one uses a newly introduced word-vector representation able to encode information about the characters in the words avoiding the increasing of computational costs due to the hierarchical LSTM introduced by the character-based LSTM architectures. The vector representations calculated by the two LSTM are then merged by the sum operation. Even if participants were allowed to use other annotated resources in their systems, we used only the distributed data set to train our system. When evaluated on the official test set, our system outperformed all the other systems achieving the highest accuracy score in EVALITA 2016 PoSTWITA, with a tagging accuracy of 93.19%. Further experiments carried out after the official evaluation period allowed us to develop a system able to achieve a higher accuracy. These experiments showed the central role played by the handcrafted features even when machine learning algorithms based on neural networks are used.
Building the state-of-the-art in POS tagging of Italian Tweets
Cimino A;Dell'orletta F
2016
Abstract
In this paper we describe our approach to EVALITA 2016 POS tagging for Italian Social Media Texts (PoSTWITA). We developed a two-branch bidirectional Long Short Term Memory recurrent neural network, where the first bi-LSTM uses a typical vector representation for the input words, while the second one uses a newly introduced word-vector representation able to encode information about the characters in the words avoiding the increasing of computational costs due to the hierarchical LSTM introduced by the character-based LSTM architectures. The vector representations calculated by the two LSTM are then merged by the sum operation. Even if participants were allowed to use other annotated resources in their systems, we used only the distributed data set to train our system. When evaluated on the official test set, our system outperformed all the other systems achieving the highest accuracy score in EVALITA 2016 PoSTWITA, with a tagging accuracy of 93.19%. Further experiments carried out after the official evaluation period allowed us to develop a system able to achieve a higher accuracy. These experiments showed the central role played by the handcrafted features even when machine learning algorithms based on neural networks are used.| Campo DC | Valore | Lingua |
|---|---|---|
| dc.authority.anceserie | CEUR WORKSHOP PROCEEDINGS | - |
| dc.authority.anceserie | CEUR Workshop Proceedings | - |
| dc.authority.people | Cimino A | it |
| dc.authority.people | Dell'orletta F | it |
| dc.collection.id.s | 71c7200a-7c5f-4e83-8d57-d3d2ba88f40d | * |
| dc.collection.name | 04.01 Contributo in Atti di convegno | * |
| dc.contributor.appartenenza | Istituto di linguistica computazionale "Antonio Zampolli" - ILC | * |
| dc.contributor.appartenenza.mi | 918 | * |
| dc.date.accessioned | 2024/02/20 07:46:19 | - |
| dc.date.available | 2024/02/20 07:46:19 | - |
| dc.date.issued | 2016 | - |
| dc.description.abstracteng | In this paper we describe our approach to EVALITA 2016 POS tagging for Italian Social Media Texts (PoSTWITA). We developed a two-branch bidirectional Long Short Term Memory recurrent neural network, where the first bi-LSTM uses a typical vector representation for the input words, while the second one uses a newly introduced word-vector representation able to encode information about the characters in the words avoiding the increasing of computational costs due to the hierarchical LSTM introduced by the character-based LSTM architectures. The vector representations calculated by the two LSTM are then merged by the sum operation. Even if participants were allowed to use other annotated resources in their systems, we used only the distributed data set to train our system. When evaluated on the official test set, our system outperformed all the other systems achieving the highest accuracy score in EVALITA 2016 PoSTWITA, with a tagging accuracy of 93.19%. Further experiments carried out after the official evaluation period allowed us to develop a system able to achieve a higher accuracy. These experiments showed the central role played by the handcrafted features even when machine learning algorithms based on neural networks are used. | - |
| dc.description.affiliations | Istituto di Linguistica Computazionale Antonio Zampolli (ILC-CNR), ItaliaNLP Lab, Italy | - |
| dc.description.allpeople | Cimino A.; Dell'orletta F. | - |
| dc.description.allpeopleoriginal | Cimino A.; Dell'orletta F. | - |
| dc.description.fulltext | none | en |
| dc.description.numberofauthors | 2 | - |
| dc.identifier.scopus | 2-s2.0-85009243622 | - |
| dc.identifier.uri | https://hdl.handle.net/20.500.14243/333953 | - |
| dc.identifier.url | http://www.scopus.com/record/display.url?eid=2-s2.0-85009243622&origin=inward | - |
| dc.language.iso | eng | - |
| dc.relation.conferencedate | 7/12/2016 | - |
| dc.relation.conferencename | Evaluation of NLP and Speech Tools for Italian (EVALITA 1016) | - |
| dc.relation.conferenceplace | Napoli | - |
| dc.relation.volume | 1749 | - |
| dc.subject.keywords | nlp | - |
| dc.subject.keywords | part-of-speech tagging | - |
| dc.subject.singlekeyword | nlp | * |
| dc.subject.singlekeyword | part-of-speech tagging | * |
| dc.title | Building the state-of-the-art in POS tagging of Italian Tweets | en |
| dc.type.driver | info:eu-repo/semantics/conferenceObject | - |
| dc.type.full | 04 Contributo in convegno::04.01 Contributo in Atti di convegno | it |
| dc.type.miur | 273 | - |
| dc.type.referee | Sì, ma tipo non specificato | - |
| dc.ugov.descaux1 | 366728 | - |
| iris.orcid.lastModifiedDate | 2024/03/23 09:55:55 | * |
| iris.orcid.lastModifiedMillisecond | 1711184155379 | * |
| iris.scopus.extIssued | 2016 | - |
| iris.scopus.extTitle | Building the state-of-the-art in POS tagging of Italian Tweets | - |
| iris.sitodocente.maxattempts | 2 | - |
| scopus.authority.anceserie | CEUR WORKSHOP PROCEEDINGS###1613-0073 | * |
| scopus.category | 1700 | * |
| scopus.contributor.affiliation | ItaliaNLP Lab | - |
| scopus.contributor.affiliation | ItaliaNLP Lab | - |
| scopus.contributor.afid | 60008941 | - |
| scopus.contributor.afid | 60008941 | - |
| scopus.contributor.auid | 57002803800 | - |
| scopus.contributor.auid | 57540567000 | - |
| scopus.contributor.country | Italy | - |
| scopus.contributor.country | Italy | - |
| scopus.contributor.dptid | 114087935 | - |
| scopus.contributor.dptid | 114087935 | - |
| scopus.contributor.name | Andrea | - |
| scopus.contributor.name | Felice | - |
| scopus.contributor.subaffiliation | Istituto di Linguistica Computazionale Antonio Zampolli (ILC-CNR); | - |
| scopus.contributor.subaffiliation | Istituto di Linguistica Computazionale Antonio Zampolli (ILC-CNR); | - |
| scopus.contributor.surname | Cimino | - |
| scopus.contributor.surname | Dell'orletta | - |
| scopus.date.issued | 2016 | * |
| scopus.description.abstracteng | In this paper we describe our approach to EVALITA 2016 POS tagging for Italian Social Media Texts (PoSTWITA). We developed a two-branch bidirectional Long Short Term Memory recurrent neural network, where the first bi-LSTM uses a typical vector representation for the input words, while the second one uses a newly introduced word-vector representation able to encode information about the characters in the words avoiding the increasing of computational costs due to the hierarchical LSTM introduced by the character-based LSTM architectures. The vector representations calculated by the two LSTM are then merged by the sum operation. Even if participants were allowed to use other annotated resources in their systems, we used only the distributed data set to train our system. When evaluated on the official test set, our system outperformed all the other systems achieving the highest accuracy score in EVALITA 2016 PoSTWITA, with a tagging accuracy of 93.19%. Further experiments carried out after the official evaluation period allowed us to develop a system able to achieve a higher accuracy. These experiments showed the central role played by the handcrafted features even when machine learning algorithms based on neural networks are used. | * |
| scopus.description.allpeopleoriginal | Cimino A.; Dell'orletta F. | * |
| scopus.differences | scopus.relation.conferencename | * |
| scopus.differences | scopus.authority.anceserie | * |
| scopus.differences | scopus.publisher.name | * |
| scopus.differences | scopus.relation.conferencedate | * |
| scopus.differences | scopus.relation.conferenceplace | * |
| scopus.document.type | cp | * |
| scopus.document.types | cp | * |
| scopus.identifier.pui | 614074481 | * |
| scopus.identifier.scopus | 2-s2.0-85009243622 | * |
| scopus.journal.sourceid | 21100218356 | * |
| scopus.language.iso | eng | * |
| scopus.publisher.name | CEUR-WS | * |
| scopus.relation.conferencedate | 2016 | * |
| scopus.relation.conferencename | 3rd Italian Conference on Computational Linguistics, CLiC-it 2016 and 5th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian, EVALITA 2016 | * |
| scopus.relation.conferenceplace | ita | * |
| scopus.relation.volume | 1749 | * |
| scopus.title | Building the state-of-the-art in POS tagging of Italian Tweets | * |
| scopus.titleeng | Building the state-of-the-art in POS tagging of Italian Tweets | * |
| Appare nelle tipologie: | 04.01 Contributo in Atti di convegno | |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


