In this paper we present some preliminary results on the generation of word embeddings for the Italian language. We compare two popular word representation models, word2vec and GloVe, and train them on two datasets with different stylistic properties. We test the generated word embeddings on a word analogy test derived from the one originally proposed for word2vec, adapted to capture some of the linguistic aspects that are specific of Italian. Results show that the tested models are able to create syntactically and semantically meaningful word embeddings despite the higher morphological complexity of Italian with respect to English. Moreover, we have found that the stylistic properties of the training dataset plays a relevant role in the type of information captured by the produced vectors.

Word embeddings go to Italy: A comparison of models and training datasets

Esuli A;
2015

Abstract

In this paper we present some preliminary results on the generation of word embeddings for the Italian language. We compare two popular word representation models, word2vec and GloVe, and train them on two datasets with different stylistic properties. We test the generated word embeddings on a word analogy test derived from the one originally proposed for word2vec, adapted to capture some of the linguistic aspects that are specific of Italian. Results show that the tested models are able to create syntactically and semantically meaningful word embeddings despite the higher morphological complexity of Italian with respect to English. Moreover, we have found that the stylistic properties of the training dataset plays a relevant role in the type of information captured by the produced vectors.
2015
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Inglese
Boldi P.; Perego R.; Sebastiani F.
IIR 2015 Italian Information Retrieval Workshop Proceedings of the 6th Italian Information Retrieval Workshop
6th Italian Information Retrieval Workshop
8
http://ceur-ws.org/Vol-1404/
Sì, ma tipo non specificato
25-26/05/2015
Cagliari
Glove
Word embeddings
Word2vec
1
restricted
Berardi G.; Esuli A.; Marcheggiani D.
273
info:eu-repo/semantics/conferenceObject
04 Contributo in convegno::04.01 Contributo in Atti di convegno
File in questo prodotto:
File Dimensione Formato  
prod_344516-doc_107925.pdf

solo utenti autorizzati

Descrizione: Word embeddings go to Italy: A comparison of models and training datasets
Tipologia: Versione Editoriale (PDF)
Dimensione 246.36 kB
Formato Adobe PDF
246.36 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/334416
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 6
  • ???jsp.display-item.citation.isi??? ND
social impact