Monitoring workplace activities is critical for ensuring job safety. Generative Artificial Intelligence (Gen-AI) and Human- centered Artificial Intelligence (Hum-AI) can suggest new trustworthy solutions to automate these monitoring procedures, ensuring improved work accident prevention. In this paper, we present a novel framework that combines Retrieval Aug- mented Generation (RAG) with explainable LLMs to automatically generate job safety reports from unstructured accident descriptions. Our method integrates embeddings like BERT and SciBERT and explainable AI exploiting Layer-Wise Rel- evance Propagation (LRP) to highlight root causes of accidents within the generated reports. We evaluate multiple LLMs, including LLaMA 3.1, Mixtral-8x7B, and DeepSeek v2, on the Aviation Safety Reporting System (ASRS) dataset. Results show that our best configuration (Mixtral-8x7B with SciBERT) achieves F1-scores up to 0.909 and GLEU and METEOR scores above 0.3 and 0.2. These findings demonstrate the effectiveness and interpretability of the proposed system in real-world job safety contexts and how the proposed approach could assist safety experts or inspectors more explicitly.
Automatic Generation of Job Safety Reports with Explainable RAG-Based LLMs
Giovanni Panella
;Riccardo Pecori;
2025
Abstract
Monitoring workplace activities is critical for ensuring job safety. Generative Artificial Intelligence (Gen-AI) and Human- centered Artificial Intelligence (Hum-AI) can suggest new trustworthy solutions to automate these monitoring procedures, ensuring improved work accident prevention. In this paper, we present a novel framework that combines Retrieval Aug- mented Generation (RAG) with explainable LLMs to automatically generate job safety reports from unstructured accident descriptions. Our method integrates embeddings like BERT and SciBERT and explainable AI exploiting Layer-Wise Rel- evance Propagation (LRP) to highlight root causes of accidents within the generated reports. We evaluate multiple LLMs, including LLaMA 3.1, Mixtral-8x7B, and DeepSeek v2, on the Aviation Safety Reporting System (ASRS) dataset. Results show that our best configuration (Mixtral-8x7B with SciBERT) achieves F1-scores up to 0.909 and GLEU and METEOR scores above 0.3 and 0.2. These findings demonstrate the effectiveness and interpretability of the proposed system in real-world job safety contexts and how the proposed approach could assist safety experts or inspectors more explicitly.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


