In this work we propose a solution for the problem of the entities and relations extraction from textual documents to build an index for a semantically oriented search engine. The approach we propose is based on the integration of statistical classifiers and ontological constraints through Markov random fields. Owing to the high computational complexity of the approach, the architecture of our system is distributed and exploits parallelisation to lower processing time. In the experimental assessment we show how the proposed system can be effectively applied to a large data set, namely BioNLP-ST 2013. While the experimental results provided in the paper refer to a biomedical application, the approach is very general and can be ported to different domains.
A distributed architecture to integrate ontological nowledge into information extraction
Silvestri Stefano
2016
Abstract
In this work we propose a solution for the problem of the entities and relations extraction from textual documents to build an index for a semantically oriented search engine. The approach we propose is based on the integration of statistical classifiers and ontological constraints through Markov random fields. Owing to the high computational complexity of the approach, the architecture of our system is distributed and exploits parallelisation to lower processing time. In the experimental assessment we show how the proposed system can be effectively applied to a large data set, namely BioNLP-ST 2013. While the experimental results provided in the paper refer to a biomedical application, the approach is very general and can be ported to different domains.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.