This report presents the design and implementation of an LLM-powered AI agent integrated into the D4Science digital infrastructure. D4Science provides Virtual Research Environments (VREs) to support collaborative and data-centric scientific workflows. By leveraging the Cheshire Cat framework, we design an AI Agent with memory, tool-use capabilities, and a well-defined identity aligned with the infrastructure's mission. Its core functionality centers on assisting researchers by retrieving, processing, and summarizing digital artifacts stored in the D4Science Workspace. The agent interacts with D4Science services via a custom plugin built atop a robust Python library, which abstracts access to RESTful APIs. To ensure consistent and context-aware behavior, we adopt a prompt engineering strategy that embeds a structured preamble defining the agent's personality, capabilities, and operational constraints. Through Retrieval-Augmented Generation (RAG), the agent maintains episodic and declarative memory using a Qdrant-based vector store, allowing it to reason over previously seen documents and interactions. We demonstrate the agent's utility via a representative use case and describe its architecture, tools, and interaction flow. This work illustrates a practical integration of state-of-the-art language models with established research infrastructures, promoting more intuitive, semantically rich access to shared scientific resources.
An LLM-powered agent for the D4Science digital infrastructure
Oliviero A.;Peccerillo B.
;Procaccini M.
2025
Abstract
This report presents the design and implementation of an LLM-powered AI agent integrated into the D4Science digital infrastructure. D4Science provides Virtual Research Environments (VREs) to support collaborative and data-centric scientific workflows. By leveraging the Cheshire Cat framework, we design an AI Agent with memory, tool-use capabilities, and a well-defined identity aligned with the infrastructure's mission. Its core functionality centers on assisting researchers by retrieving, processing, and summarizing digital artifacts stored in the D4Science Workspace. The agent interacts with D4Science services via a custom plugin built atop a robust Python library, which abstracts access to RESTful APIs. To ensure consistent and context-aware behavior, we adopt a prompt engineering strategy that embeds a structured preamble defining the agent's personality, capabilities, and operational constraints. Through Retrieval-Augmented Generation (RAG), the agent maintains episodic and declarative memory using a Qdrant-based vector store, allowing it to reason over previously seen documents and interactions. We demonstrate the agent's utility via a representative use case and describe its architecture, tools, and interaction flow. This work illustrates a practical integration of state-of-the-art language models with established research infrastructures, promoting more intuitive, semantically rich access to shared scientific resources.| File | Dimensione | Formato | |
|---|---|---|---|
|
ISTI-TR-2025-010.pdf
accesso aperto
Descrizione: An LLM-powered agent for the D4Science digital infrastructure
Tipologia:
Altro materiale allegato
Licenza:
Altro tipo di licenza
Dimensione
1.19 MB
Formato
Adobe PDF
|
1.19 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


