Large language models (LLMs), now also used in production environments, are susceptible to significant inaccuracies and errors, particularly when very specific topics in sectoral domains are involved. Incorrect responses produced by generative systems can cause many problems for non-expert users who, unfamiliar with the specific field of knowledge, are unable to assess the reliability of the responses generated. This problem further amplifies the errors of generative platforms. This paper explores how to mitigate such issues using Retrieval-Augmented Generation (RAG), a technique that enhances LLMs by integrating external information retrieval. RAG helps reduce hallucinations by grounding responses in relevant, retrieved content. The study examines the architecture and implementation of a RAG system and evaluates its effectiveness in improving response accuracy through simple experimental examples. It also investigates techniques and mathematical models to enhance the relevance of retrieved information and discusses the flow of structured and unstructured data into a vector database. This case study uses an open-source framework to demonstrate the design, implementation, and configuration of a RAG-based architecture using cloud infrastructure.
Enhancing Language Models with Retrieval-Augmented Generation for Accurate and Contextual Responses
mauro mazzei
Primo
Methodology
2026
Abstract
Large language models (LLMs), now also used in production environments, are susceptible to significant inaccuracies and errors, particularly when very specific topics in sectoral domains are involved. Incorrect responses produced by generative systems can cause many problems for non-expert users who, unfamiliar with the specific field of knowledge, are unable to assess the reliability of the responses generated. This problem further amplifies the errors of generative platforms. This paper explores how to mitigate such issues using Retrieval-Augmented Generation (RAG), a technique that enhances LLMs by integrating external information retrieval. RAG helps reduce hallucinations by grounding responses in relevant, retrieved content. The study examines the architecture and implementation of a RAG system and evaluates its effectiveness in improving response accuracy through simple experimental examples. It also investigates techniques and mathematical models to enhance the relevance of retrieved information and discusses the flow of structured and unstructured data into a vector database. This case study uses an open-source framework to demonstrate the design, implementation, and configuration of a RAG-based architecture using cloud infrastructure.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


