Image Schema research has long been hampered by the scarcity of annotated data, limiting advancements in the field. This paper presents a novel approach to overcoming this challenge by leveraging Large Language Models (LLMs) to extend the IS Catalogue. We systematically tested various LLMs to identify the most effective model for this task, ultimately selecting Claude 3.5 Sonnet. We asked the model to extend the IS catalogue in two ways: first, by expanding the annotations associated with the sentences in the catalogue to include multiple IS; second, by generating new literal sentences to be added to the catalogue. To evaluate the model we conducted several analyses, including accuracy ratings with the original annotations. Our approach demonstrated remarkable efficacy, with the chosen model successfully retrieving the original annotation in 81% of cases when considering the entire set of image schemas extracted in the profile. This approach enables rapid processing and annotation of large text volumes while maintaining high accuracy and consistency. Partial evaluation by domain experts has found the enriched IS Catalogue to be sound and plausible, suggesting that LLM-assisted extension can produce high-quality synthetic data aligned with expert knowledge. Our method offers a promising solution to the data scarcity problem in IS research, potentially accelerating advancements in the field.

"The time for action has arrived": Extending the IS Catalogue leveraging Large Language Models

Stefano De Giorgis
Co-primo
;
2024

Abstract

Image Schema research has long been hampered by the scarcity of annotated data, limiting advancements in the field. This paper presents a novel approach to overcoming this challenge by leveraging Large Language Models (LLMs) to extend the IS Catalogue. We systematically tested various LLMs to identify the most effective model for this task, ultimately selecting Claude 3.5 Sonnet. We asked the model to extend the IS catalogue in two ways: first, by expanding the annotations associated with the sentences in the catalogue to include multiple IS; second, by generating new literal sentences to be added to the catalogue. To evaluate the model we conducted several analyses, including accuracy ratings with the original annotations. Our approach demonstrated remarkable efficacy, with the chosen model successfully retrieving the original annotation in 81% of cases when considering the entire set of image schemas extracted in the profile. This approach enables rapid processing and annotation of large text volumes while maintaining high accuracy and consistency. Partial evaluation by domain experts has found the enriched IS Catalogue to be sound and plausible, suggesting that LLM-assisted extension can produce high-quality synthetic data aligned with expert knowledge. Our method offers a promising solution to the data scarcity problem in IS research, potentially accelerating advancements in the field.
2024
Istituto di Scienze e Tecnologie della Cognizione - ISTC
Knowledge Representation
Embodied Cognition
Image Schemas
File in questo prodotto:
File Dimensione Formato  
_ISD8____The_time_for_action_has_arrived____Extending_the_IS_Catalogue_leveraging_Large_Language_Models-1.pdf

accesso aperto

Descrizione: “The Time for Action has Arrived”: Extending the IS Catalogue Leveraging Large Language Models, 2024, Stefano De Giorgis, Guendalina Righetti , CEUR Workshop Proceedings, Vol.3888, paper_7.pdf
Tipologia: Documento in Post-print
Licenza: Creative commons
Dimensione 1.08 MB
Formato Adobe PDF
1.08 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/539446
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact