Recent advances in Large Language Models (LLMs) have demonstrated their capabilities across a variety of tasks. However, automatically extracting implicit knowledge from natural language remains a significant challenge, as machines lack direct experience with the physical world. Given this scenario, semantic knowledge graphs can guide LLMs to achieve more efficient and explainable results. In this paper, we apply a Logic Augmented Generation framework that leverages the explicit representation of a text through a semantic knowledge graph and applies it in combination with prompt heuristics to elicit implicit analogical connections. This method generates extended knowledge graph triples representing implicit meaning, enabling systems to reason on unlabeled multimodal data regardless of the domain. We validate our work through three metaphor detection and understanding tasks across five datasets, text-based, domain-specific, and visual, as they require deep analogical reasoning capabilities. The results show that the proposed integrated approach surpasses current baselines and performs better than humans in understanding visual metaphors. It also provides justifications for the reasoning processes, yet remains susceptible to shortcut cues and cross-modal interference. The error analysis discusses issues with existing metaphor datasets and current evaluation and annotation methods.
Enhancing multimodal analogical reasoning with logic augmented generation
Lippolis A. S.;Nuzzolese A. G.;Gangemi A.
2026
Abstract
Recent advances in Large Language Models (LLMs) have demonstrated their capabilities across a variety of tasks. However, automatically extracting implicit knowledge from natural language remains a significant challenge, as machines lack direct experience with the physical world. Given this scenario, semantic knowledge graphs can guide LLMs to achieve more efficient and explainable results. In this paper, we apply a Logic Augmented Generation framework that leverages the explicit representation of a text through a semantic knowledge graph and applies it in combination with prompt heuristics to elicit implicit analogical connections. This method generates extended knowledge graph triples representing implicit meaning, enabling systems to reason on unlabeled multimodal data regardless of the domain. We validate our work through three metaphor detection and understanding tasks across five datasets, text-based, domain-specific, and visual, as they require deep analogical reasoning capabilities. The results show that the proposed integrated approach surpasses current baselines and performs better than humans in understanding visual metaphors. It also provides justifications for the reasoning processes, yet remains susceptible to shortcut cues and cross-modal interference. The error analysis discusses issues with existing metaphor datasets and current evaluation and annotation methods.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


