Ontology evaluation through functional requirements—such as testing via competency question (CQ) verification—is a well-established yet costly, labour-intensive, and error-prone endeavour, even for ontology engineering experts. In this work, we introduce OE-Assist, a novel framework designed to assist ontology evaluation through automated and semi-automated CQ verification. By presenting and leveraging a dataset of 1,393 CQs paired with corresponding ontologies and ontology stories, our contributions present, to our knowledge, the first systematic investigation into large language model (LLM)-assisted ontology evaluation, and include: (i) evaluating the effectiveness of a LLM-based approach for automatically performing CQ verification against a manually created gold standard, and (ii) developing and assessing an LLM-powered framework to assist CQ verification with Protégé, by providing suggestions. We found that automated LLM-based evaluation with o1-preview and o3-mini perform at a similar level to the average user’s performance. Through a user study on the framework with 19 knowledge engineers from eight international institutions, we also show that LLMs can assist manual CQ verification and improve user accuracy, especially when suggestions are correct. Additionally, participants reported a marked decrease in perceived task difficulty. However, we also observed a reduction in human performance when the LLM provided incorrect guidance, showing a critical trade-off between efficiency and accuracy in assisted ontology evaluation.
Large Language Models Assisting Ontology Evaluation
Lippolis A. S.;Gangemi A.;Blomqvist E.;Nuzzolese A. G.
2026
Abstract
Ontology evaluation through functional requirements—such as testing via competency question (CQ) verification—is a well-established yet costly, labour-intensive, and error-prone endeavour, even for ontology engineering experts. In this work, we introduce OE-Assist, a novel framework designed to assist ontology evaluation through automated and semi-automated CQ verification. By presenting and leveraging a dataset of 1,393 CQs paired with corresponding ontologies and ontology stories, our contributions present, to our knowledge, the first systematic investigation into large language model (LLM)-assisted ontology evaluation, and include: (i) evaluating the effectiveness of a LLM-based approach for automatically performing CQ verification against a manually created gold standard, and (ii) developing and assessing an LLM-powered framework to assist CQ verification with Protégé, by providing suggestions. We found that automated LLM-based evaluation with o1-preview and o3-mini perform at a similar level to the average user’s performance. Through a user study on the framework with 19 knowledge engineers from eight international institutions, we also show that LLMs can assist manual CQ verification and improve user accuracy, especially when suggestions are correct. Additionally, participants reported a marked decrease in perceived task difficulty. However, we also observed a reduction in human performance when the LLM provided incorrect guidance, showing a critical trade-off between efficiency and accuracy in assisted ontology evaluation.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


