CNR Institutional Research Information System

Image Schema research has long been hampered by the scarcity of annotated data, limiting advancements in the field. This paper presents a novel approach to overcoming this challenge by leveraging Large Language Models (LLMs) to extend the IS Catalogue. We systematically tested various LLMs to identify the most effective model for this task, ultimately selecting Claude 3.5 Sonnet. We asked the model to extend the IS catalogue in two ways: first, by expanding the annotations associated with the sentences in the catalogue to include multiple IS; second, by generating new literal sentences to be added to the catalogue. To evaluate the model we conducted several analyses, including accuracy ratings with the original annotations. Our approach demonstrated remarkable efficacy, with the chosen model successfully retrieving the original annotation in 81% of cases when considering the entire set of image schemas extracted in the profile. This approach enables rapid processing and annotation of large text volumes while maintaining high accuracy and consistency. Partial evaluation by domain experts has found the enriched IS Catalogue to be sound and plausible, suggesting that LLM-assisted extension can produce high-quality synthetic data aligned with expert knowledge. Our method offers a promising solution to the data scarcity problem in IS research, potentially accelerating advancements in the field.

"The time for action has arrived": Extending the IS Catalogue leveraging Large Language Models

Stefano De Giorgis^Co-primo;

2024

Abstract

Image Schema research has long been hampered by the scarcity of annotated data, limiting advancements in the field. This paper presents a novel approach to overcoming this challenge by leveraging Large Language Models (LLMs) to extend the IS Catalogue. We systematically tested various LLMs to identify the most effective model for this task, ultimately selecting Claude 3.5 Sonnet. We asked the model to extend the IS catalogue in two ways: first, by expanding the annotations associated with the sentences in the catalogue to include multiple IS; second, by generating new literal sentences to be added to the catalogue. To evaluate the model we conducted several analyses, including accuracy ratings with the original annotations. Our approach demonstrated remarkable efficacy, with the chosen model successfully retrieving the original annotation in 81% of cases when considering the entire set of image schemas extracted in the profile. This approach enables rapid processing and annotation of large text volumes while maintaining high accuracy and consistency. Partial evaluation by domain experts has found the enriched IS Catalogue to be sound and plausible, suggesting that LLM-assisted extension can produce high-quality synthetic data aligned with expert knowledge. Our method offers a promising solution to the data scarcity problem in IS research, potentially accelerating advancements in the field.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2024
			
	Strutture organizzative
	
				Istituto di Scienze e Tecnologie della Cognizione - ISTC
			
	Parole chiave
	
				Knowledge Representation
Embodied Cognition
			
	Parole chiave
	
				Image Schemas
			
	Appare nelle tipologie:
	
				04.01 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
_ISD8____The_time_for_action_has_arrived____Extending_the_IS_Catalogue_leveraging_Large_Language_Models-1.pdf accesso aperto Descrizione: “The Time for Action has Arrived”: Extending the IS Catalogue Leveraging Large Language Models, 2024, Stefano De Giorgis, Guendalina Righetti , CEUR Workshop Proceedings, Vol.3888, paper_7.pdf Tipologia: Documento in Post-print Licenza: Creative commons Dimensione 1.08 MB Formato Adobe PDF Visualizza/Apri	1.08 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/539446

Citazioni

ND

0

ND

social impact