CNR Institutional Research Information System

The classification of natural language texts has gained a growing importance in many real world applications due to its significant implications in relation to crucial tasks, such as Information Retrieval, Question Answering, Text Summarization, Natural Language Understanding. In this paper we present an analysis of a Deep Learning architecture devoted to text classification, considering the extreme multi-class and multi-label text classification problem, when a hierarchical label set is defined. The paper presents a methodology named Hierarchical Label Set Expansion (HLSE), used to regularize the data labels, and an analysis of the impact of different Word Embedding (WE) models that explicitly incorporate grammatical and syntactic features. We evaluate the aforementioned methodologies on the PubMed scientific articles collection, where a multi-class and multi-label text classification problem is defined with the Medical Subject Headings (MeSH) label set, a hierarchical set of 27,775 classes. The experimental assessment proves the usefulness of the proposed HLSE methodology and also provides some interesting results relating to the impact of different uses and combinations of WE models as input to the neural network in this kind of application.

Deep neural network for hierarchical extreme multi-label text classification

Francesco Gargiulo;Stefano Silvestri;Mario Ciampi;Giuseppe De Pietro

2019

Abstract

The classification of natural language texts has gained a growing importance in many real world applications due to its significant implications in relation to crucial tasks, such as Information Retrieval, Question Answering, Text Summarization, Natural Language Understanding. In this paper we present an analysis of a Deep Learning architecture devoted to text classification, considering the extreme multi-class and multi-label text classification problem, when a hierarchical label set is defined. The paper presents a methodology named Hierarchical Label Set Expansion (HLSE), used to regularize the data labels, and an analysis of the impact of different Word Embedding (WE) models that explicitly incorporate grammatical and syntactic features. We evaluate the aforementioned methodologies on the PubMed scientific articles collection, where a multi-class and multi-label text classification problem is defined with the Medical Subject Headings (MeSH) label set, a hierarchical set of 27,775 classes. The experimental assessment proves the usefulness of the proposed HLSE methodology and also provides some interesting results relating to the impact of different uses and combinations of WE models as input to the neural network in this kind of application.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2019
			
	Strutture organizzative
	
				Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
			
	Parole chiave
	
				Extreme multi-label text classification
Semantic indexing
Deep learning
MeSH
SemSemi-supervised word embeddings
			
	Appare nelle tipologie:
	
				01.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
Pubblicazione13.pdf non disponibili Licenza: NON PUBBLICO - Accesso privato/ristretto Dimensione 1.11 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.11 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/392446

Citazioni

ND

154

ND

social impact