Image Schema research has long been hampered by the scarcity of annotated data, limiting advancements in the field. This paper presents a novel approach to overcoming this challenge by leveraging Large Language Models (LLMs) to extend the IS Catalogue. We systematically tested various LLMs to identify the most effective model for this task, ultimately selecting Claude 3.5 Sonnet. We asked the model to extend the IS catalogue in two ways: first, by expanding the annotations associated with the sentences in the catalogue to include multiple IS; second, by generating new literal sentences to be added to the catalogue. To evaluate the model we conducted several analyses, including accuracy ratings with the original annotations. Our approach demonstrated remarkable efficacy, with the chosen model successfully retrieving the original annotation in 81% of cases when considering the entire set of image schemas extracted in the profile. This approach enables rapid processing and annotation of large text volumes while maintaining high accuracy and consistency. Partial evaluation by domain experts has found the enriched IS Catalogue to be sound and plausible, suggesting that LLM-assisted extension can produce high-quality synthetic data aligned with expert knowledge. Our method offers a promising solution to the data scarcity problem in IS research, potentially accelerating advancements in the field.
"The time for action has arrived": Extending the IS Catalogue leveraging Large Language Models
Stefano De Giorgis
Co-primo
;
2024
Abstract
Image Schema research has long been hampered by the scarcity of annotated data, limiting advancements in the field. This paper presents a novel approach to overcoming this challenge by leveraging Large Language Models (LLMs) to extend the IS Catalogue. We systematically tested various LLMs to identify the most effective model for this task, ultimately selecting Claude 3.5 Sonnet. We asked the model to extend the IS catalogue in two ways: first, by expanding the annotations associated with the sentences in the catalogue to include multiple IS; second, by generating new literal sentences to be added to the catalogue. To evaluate the model we conducted several analyses, including accuracy ratings with the original annotations. Our approach demonstrated remarkable efficacy, with the chosen model successfully retrieving the original annotation in 81% of cases when considering the entire set of image schemas extracted in the profile. This approach enables rapid processing and annotation of large text volumes while maintaining high accuracy and consistency. Partial evaluation by domain experts has found the enriched IS Catalogue to be sound and plausible, suggesting that LLM-assisted extension can produce high-quality synthetic data aligned with expert knowledge. Our method offers a promising solution to the data scarcity problem in IS research, potentially accelerating advancements in the field.File | Dimensione | Formato | |
---|---|---|---|
_ISD8____The_time_for_action_has_arrived____Extending_the_IS_Catalogue_leveraging_Large_Language_Models-1.pdf
accesso aperto
Descrizione: “The Time for Action has Arrived”: Extending the IS Catalogue Leveraging Large Language Models, 2024, Stefano De Giorgis, Guendalina Righetti , CEUR Workshop Proceedings, Vol.3888, paper_7.pdf
Tipologia:
Documento in Post-print
Licenza:
Creative commons
Dimensione
1.08 MB
Formato
Adobe PDF
|
1.08 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.