Cognitive Systems have attracted attention in last years, especially regarding high interactivity of Question Answering systems. Question Classification plays an important role for individuation of answer type. It involves the use of Natural Language Processing of the question, the extraction of a broad variety of features, and the use of machine learning algorithms to map features with a given taxonomy of question classes. In this work, a novel learning approach is proposed, based on the use of Support Vector Machines, for building a number of classifiers, to use for different questions, each one comprising the respective features, chosen through a particular forward-selection procedure. This approach aims at decreasing the total number of features, and avoiding, in some cases, to consider features that for such cases contribute with scarce information and/or even with noise. A Question Classification framework is implemented, comprising new sets of features with low numerosity. The application on a benchmark dataset shows classification accuracy competitive with the state-of-the-art, by considering a lower total number of features.
A forward-selection algorithm for SVM-based question classification in cognitive systems
Pota M;Esposito M;De Pietro G
2016
Abstract
Cognitive Systems have attracted attention in last years, especially regarding high interactivity of Question Answering systems. Question Classification plays an important role for individuation of answer type. It involves the use of Natural Language Processing of the question, the extraction of a broad variety of features, and the use of machine learning algorithms to map features with a given taxonomy of question classes. In this work, a novel learning approach is proposed, based on the use of Support Vector Machines, for building a number of classifiers, to use for different questions, each one comprising the respective features, chosen through a particular forward-selection procedure. This approach aims at decreasing the total number of features, and avoiding, in some cases, to consider features that for such cases contribute with scarce information and/or even with noise. A Question Classification framework is implemented, comprising new sets of features with low numerosity. The application on a benchmark dataset shows classification accuracy competitive with the state-of-the-art, by considering a lower total number of features.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.