CNR Institutional Research Information System

This report presents the design and implementation of an LLM-powered AI agent integrated into the D4Science digital infrastructure. D4Science provides Virtual Research Environments (VREs) to support collaborative and data-centric scientific workflows. By leveraging the Cheshire Cat framework, we design an AI Agent with memory, tool-use capabilities, and a well-defined identity aligned with the infrastructure's mission. Its core functionality centers on assisting researchers by retrieving, processing, and summarizing digital artifacts stored in the D4Science Workspace. The agent interacts with D4Science services via a custom plugin built atop a robust Python library, which abstracts access to RESTful APIs. To ensure consistent and context-aware behavior, we adopt a prompt engineering strategy that embeds a structured preamble defining the agent's personality, capabilities, and operational constraints. Through Retrieval-Augmented Generation (RAG), the agent maintains episodic and declarative memory using a Qdrant-based vector store, allowing it to reason over previously seen documents and interactions. We demonstrate the agent's utility via a representative use case and describe its architecture, tools, and interaction flow. This work illustrates a practical integration of state-of-the-art language models with established research infrastructures, promoting more intuitive, semantically rich access to shared scientific resources.

An LLM-powered agent for the D4Science digital infrastructure

Oliviero A.;Peccerillo B.;Procaccini M.

2025

Abstract

This report presents the design and implementation of an LLM-powered AI agent integrated into the D4Science digital infrastructure. D4Science provides Virtual Research Environments (VREs) to support collaborative and data-centric scientific workflows. By leveraging the Cheshire Cat framework, we design an AI Agent with memory, tool-use capabilities, and a well-defined identity aligned with the infrastructure's mission. Its core functionality centers on assisting researchers by retrieving, processing, and summarizing digital artifacts stored in the D4Science Workspace. The agent interacts with D4Science services via a custom plugin built atop a robust Python library, which abstracts access to RESTful APIs. To ensure consistent and context-aware behavior, we adopt a prompt engineering strategy that embeds a structured preamble defining the agent's personality, capabilities, and operational constraints. Through Retrieval-Augmented Generation (RAG), the agent maintains episodic and declarative memory using a Qdrant-based vector store, allowing it to reason over previously seen documents and interactions. We demonstrate the agent's utility via a representative use case and describe its architecture, tools, and interaction flow. This work illustrates a practical integration of state-of-the-art language models with established research infrastructures, promoting more intuitive, semantically rich access to shared scientific resources.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Strutture organizzative
	
				Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Istituto di Geoscienze e Georisorse - IGG - Sede Pisa
			
	Parole chiave
	
				AI agent
Retrieval-Augmented Generation
D4Science
Virtual Research Environment
			
	Appare nelle tipologie:
	
				08.04 Rapporto tecnico

File in questo prodotto:

File	Dimensione	Formato
ISTI-TR-2025-010.pdf accesso aperto Descrizione: An LLM-powered agent for the D4Science digital infrastructure Tipologia: Altro materiale allegato Licenza: Altro tipo di licenza Dimensione 1.19 MB Formato Adobe PDF Visualizza/Apri	1.19 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/550387

Citazioni

ND

ND

ND

social impact