Monitoring workplace activities is critical for ensuring job safety. Generative Artificial Intelligence (Gen-AI) and Human- centered Artificial Intelligence (Hum-AI) can suggest new trustworthy solutions to automate these monitoring procedures, ensuring improved work accident prevention. In this paper, we present a novel framework that combines Retrieval Aug- mented Generation (RAG) with explainable LLMs to automatically generate job safety reports from unstructured accident descriptions. Our method integrates embeddings like BERT and SciBERT and explainable AI exploiting Layer-Wise Rel- evance Propagation (LRP) to highlight root causes of accidents within the generated reports. We evaluate multiple LLMs, including LLaMA 3.1, Mixtral-8x7B, and DeepSeek v2, on the Aviation Safety Reporting System (ASRS) dataset. Results show that our best configuration (Mixtral-8x7B with SciBERT) achieves F1-scores up to 0.909 and GLEU and METEOR scores above 0.3 and 0.2. These findings demonstrate the effectiveness and interpretability of the proposed system in real-world job safety contexts and how the proposed approach could assist safety experts or inspectors more explicitly.

Automatic Generation of Job Safety Reports with Explainable RAG-Based LLMs

Giovanni Panella
;
Riccardo Pecori;
2025

Abstract

Monitoring workplace activities is critical for ensuring job safety. Generative Artificial Intelligence (Gen-AI) and Human- centered Artificial Intelligence (Hum-AI) can suggest new trustworthy solutions to automate these monitoring procedures, ensuring improved work accident prevention. In this paper, we present a novel framework that combines Retrieval Aug- mented Generation (RAG) with explainable LLMs to automatically generate job safety reports from unstructured accident descriptions. Our method integrates embeddings like BERT and SciBERT and explainable AI exploiting Layer-Wise Rel- evance Propagation (LRP) to highlight root causes of accidents within the generated reports. We evaluate multiple LLMs, including LLaMA 3.1, Mixtral-8x7B, and DeepSeek v2, on the Aviation Safety Reporting System (ASRS) dataset. Results show that our best configuration (Mixtral-8x7B with SciBERT) achieves F1-scores up to 0.909 and GLEU and METEOR scores above 0.3 and 0.2. These findings demonstrate the effectiveness and interpretability of the proposed system in real-world job safety contexts and how the proposed approach could assist safety experts or inspectors more explicitly.
2025
Istituto dei Materiali per l'Elettronica ed il Magnetismo - IMEM
job safety
large language models
Explainable AI, RAG, Reporting, Decision support systems, Risk assessment
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/551461
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ente

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact