Polylingual Text Classification (PLC) is a supervised learning task that consists of assigning class labels to documents written in different languages, assuming that a representative set of training documents is available for each language. This scenario is more and more frequent, given the large quantity of multilingual platforms and communities emerging on the Internet. In this work we analyse some important methods proposed in the literature that are machine-translation-free and dictionary-free, and we propose a particular configuration of the Random Indexing method (that we dub Lightweight Random Indexing). We show that it outperforms all compared algorithms and also displays a significantly reduced computational cost.

Lightweight random indexing for polylingual text classification

Moreo Fernandez A;Esuli A;Sebastiani F
2018

Abstract

Polylingual Text Classification (PLC) is a supervised learning task that consists of assigning class labels to documents written in different languages, assuming that a representative set of training documents is available for each language. This scenario is more and more frequent, given the large quantity of multilingual platforms and communities emerging on the Internet. In this work we analyse some important methods proposed in the literature that are machine-translation-free and dictionary-free, and we propose a particular configuration of the Random Indexing method (that we dub Lightweight Random Indexing). We show that it outperforms all compared algorithms and also displays a significantly reduced computational cost.
2018
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
random indexing
File in questo prodotto:
File Dimensione Formato  
prod_401253-doc_139417.pdf

accesso aperto

Descrizione: Lightweight random indexing for polylingual text classification
Tipologia: Versione Editoriale (PDF)
Dimensione 1.25 MB
Formato Adobe PDF
1.25 MB Adobe PDF Visualizza/Apri
prod_401253-doc_164131.pdf

accesso aperto

Descrizione: Lightweight random indexing for polylingual text classification
Tipologia: Versione Editoriale (PDF)
Dimensione 3.11 MB
Formato Adobe PDF
3.11 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/359292
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact