Pre-trained word embeddings encode general word semantics and lexical regularities of natural language, and have proven useful across many NLP tasks, including word sense disambiguation, machine translation, and sentiment analysis, to name a few. In supervised tasks such as multiclass text classification (the focus of this article) it seems appealing to enhance word representations with ad-hoc embeddings that encode task-specific information. We propose (supervised) word-class embeddings (WCEs), and show that, when concatenated to (unsupervised) pre-trained word embeddings, they substantially facilitate the training of deep-learning models in multiclass classification by topic. We show empirical evidence that WCEs yield a consistent improvement in multiclass classification accuracy, using four popular neural architectures and six widely used and publicly available datasets for multiclass text classification. Our code that implements WCEs is publicly available at this https URL

Word-Class Embeddings for Multiclass Text Classification

Moreo Fernandez AD;Esuli A;Sebastiani F
2019

Abstract

Pre-trained word embeddings encode general word semantics and lexical regularities of natural language, and have proven useful across many NLP tasks, including word sense disambiguation, machine translation, and sentiment analysis, to name a few. In supervised tasks such as multiclass text classification (the focus of this article) it seems appealing to enhance word representations with ad-hoc embeddings that encode task-specific information. We propose (supervised) word-class embeddings (WCEs), and show that, when concatenated to (unsupervised) pre-trained word embeddings, they substantially facilitate the training of deep-learning models in multiclass classification by topic. We show empirical evidence that WCEs yield a consistent improvement in multiclass classification accuracy, using four popular neural architectures and six widely used and publicly available datasets for multiclass text classification. Our code that implements WCEs is publicly available at this https URL
2019
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Word-Class Embeddings
Word embeddings
Distributional hypothesis
Multiclass text classification
File in questo prodotto:
File Dimensione Formato  
prod_415579-doc_146371.pdf

accesso aperto

Descrizione: Word-Class Embeddings for Multiclass Text Classification
Dimensione 7.34 MB
Formato Adobe PDF
7.34 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/372076
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact