CNR Institutional Research Information System

Question Classification is a core module of Question Answering paradigm. Development of classification models based on neural networks showed that convolutional architectures allow obtaining uppermost results for this task. In particular, this type of approach avoids extracting features from questions, by treating text as a sequence of words, and transforming each word in a dense vector, named word embedding. Among different techniques to learn word embeddings, a recent approach takes into account also subword information, which could be very useful for morphologically rich languages. In this paper, a Question Classification approach based on word embedding using subword information and Convolutional Neural Networks is proposed, in order to improve classification accuracy. In particular, questions taken from a TRC dataset are considered, and a comparison between English and Italian languages is reported, by highlighting eventual improvements obtained by initializing word embeddings with advanced vectors learned in an unsupervised manner using skip- gram model and comprising character-based information.

Question Classification by Convolutional Neural Networks Embodying Subword Information

Pota M;Esposito M

2018

Abstract

Question Classification is a core module of Question Answering paradigm. Development of classification models based on neural networks showed that convolutional architectures allow obtaining uppermost results for this task. In particular, this type of approach avoids extracting features from questions, by treating text as a sequence of words, and transforming each word in a dense vector, named word embedding. Among different techniques to learn word embeddings, a recent approach takes into account also subword information, which could be very useful for morphologically rich languages. In this paper, a Question Classification approach based on word embedding using subword information and Convolutional Neural Networks is proposed, in order to improve classification accuracy. In particular, questions taken from a TRC dataset are considered, and a comparison between English and Italian languages is reported, by highlighting eventual improvements obtained by initializing word embeddings with advanced vectors learned in an unsupervised manner using skip- gram model and comprising character-based information.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2018
			
	Strutture organizzative
	
				Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
			
	Parole chiave
	
				feature extraction
feedforward neural nets
learning (artificial intelligence)
natural language processing
pattern classification
query processing text analysis
			
	Appare nelle tipologie:
	
				04.01 Contributo in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/343639

Citazioni

ND

6

ND

social impact