Food safety demands timely detection, regulation, and public communication, yet the lack of structured datasets hinders Natural Language Processing (NLP) research. We present and release a new dataset of human-written and Large Language Model (LLM)-generated summaries of food safety documents, plus food safety related metadata. We evaluate its utility on three NLP tasks directly reflecting food safety practices: multilabel classification for organizing documents into domain-specific categories; document retrieval for accessing regulatory and scientific evidence; and question answering via retrieval-augmented generation that improves factual accuracy. We show that LLM summaries perform comparably or better than human ones across tasks. We also demonstrate clustering of summaries for event tracking and compliance monitoring. This dataset enables NLP applications that support core food safety practices, including the organization of regulatory and scientific evidence, monitoring of compliance issues, and communication of risks to the public.

FoodSafeSum: enabling natural language processing applications for food safety document summarization and analysis

Rocchietti G.;Rulli C.;Nardini F. M.;Trani S.;
2025

Abstract

Food safety demands timely detection, regulation, and public communication, yet the lack of structured datasets hinders Natural Language Processing (NLP) research. We present and release a new dataset of human-written and Large Language Model (LLM)-generated summaries of food safety documents, plus food safety related metadata. We evaluate its utility on three NLP tasks directly reflecting food safety practices: multilabel classification for organizing documents into domain-specific categories; document retrieval for accessing regulatory and scientific evidence; and question answering via retrieval-augmented generation that improves factual accuracy. We show that LLM summaries perform comparably or better than human ones across tasks. We also demonstrate clustering of summaries for event tracking and compliance monitoring. This dataset enables NLP applications that support core food safety practices, including the organization of regulatory and scientific evidence, monitoring of compliance issues, and communication of risks to the public.
2025
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
9798891763357
Document summarization, Documents analysis, Food-safety, Language model, Language processing, Natural language processing applications, Natural languages, Safety document, Safety practices, Scientific evidence
File in questo prodotto:
File Dimensione Formato  
Rocchietti et al_ACL2025.findings-emnlp.911.pdf

accesso aperto

Descrizione: FoodSafeSum: Enabling Natural Language Processing Applications for Food Safety Document Summarization and Analysis
Tipologia: Versione Editoriale (PDF)
Licenza: Altro tipo di licenza
Dimensione 1.57 MB
Formato Adobe PDF
1.57 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/580481
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact