Question Answering is a longevous field in computer science, aimed at realizing systems able to answer questions expressed in natural language. However, building Question Answering systems for Italian and able to extract answers from a corpus pertaining a closed domain is still an open research problem. Indeed, extracting clues from a question to generate a query for the information retrieval engine as well as determining the likelihood that a candidate answer is correct are two very thorny tasks. To face these issues, the paper presents a Question Answering pipeline for Italian and based on a corpus of documents pertaining a closed domain. In particular, this pipeline exhibits functionalities for: (i) analyzing natural language questions in Italian by using lexical features; (ii) handling both factoid and description answer types and, depending on them, filtering contextual stop words from questions; (iii) scoring and selecting candidate answers with respect to their type in order to determine the best one. The proposed solution has been subject to an evaluation of its performance using standard metrics, showing promising results.
An effective corpus-based question answering pipeline for Italian
Spinelli R;Esposito M;de Pietro G
2018
Abstract
Question Answering is a longevous field in computer science, aimed at realizing systems able to answer questions expressed in natural language. However, building Question Answering systems for Italian and able to extract answers from a corpus pertaining a closed domain is still an open research problem. Indeed, extracting clues from a question to generate a query for the information retrieval engine as well as determining the likelihood that a candidate answer is correct are two very thorny tasks. To face these issues, the paper presents a Question Answering pipeline for Italian and based on a corpus of documents pertaining a closed domain. In particular, this pipeline exhibits functionalities for: (i) analyzing natural language questions in Italian by using lexical features; (ii) handling both factoid and description answer types and, depending on them, filtering contextual stop words from questions; (iii) scoring and selecting candidate answers with respect to their type in order to determine the best one. The proposed solution has been subject to an evaluation of its performance using standard metrics, showing promising results.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


