CNR Institutional Research Information System

Low back pain represents a leading source of disability worldwide and poses a significant challenge for evidence-based clinical decision support. In contexts where Italian-language resources for diversified therapeutic pathways are lacking, we have assembled a novel, annotated dataset comprising up to three pre-treatment documents per patient (MRI report, X-ray report, and patient visit notes), alongside demographic information (age and sex). The cohort consists of 176 patient records, stratified into three therapeutic groups: 50 conservative, 92 regenerative, and 34 surgical. The primary aim is to investigate whether the collected dataset can be harnessed to predict which of the three treatment modalities is most appropriate. To this end, six document-combination scenarios were defined, evaluating each single-report modality as well as all possible pairings. For each scenario, two modeling strategies were contrasted: a traditional Support Vector Machine classifier leveraging TF–IDF features based on unigrams, bigrams, and trigrams, and a fine-tuned Italian BERT model adapted to our corpus. Experimental results indicate that classic n-gram–based approaches achieve the highest performance (macro–𝐹1 up to 71.3%). The BERT model, while outperforming the baseline, encounters limitations in this low-resource scenario.These findings suggest that the present dataset has the potential to catalyze the development of Italian-language clinical decision support systems that account for the distinct signatures of treatment pathways.

A Novel Real-World Dataset of Italian Clinical Notes for NLP-based Decision Support in Low Back Pain Treatment

Bonfigli, Agnese;Piperno, Ruben;Bacco Luca;Dell'Orletta, Felice;Brunato, Dominique;Crispino, Filippo;Papalia, Giuseppe Francesco;Russo, Fabrizio;Vadalà, Gianluca;Papalia, Rocco;Merone, Mario;Pecchia, Leandro

2025

Abstract

Low back pain represents a leading source of disability worldwide and poses a significant challenge for evidence-based clinical decision support. In contexts where Italian-language resources for diversified therapeutic pathways are lacking, we have assembled a novel, annotated dataset comprising up to three pre-treatment documents per patient (MRI report, X-ray report, and patient visit notes), alongside demographic information (age and sex). The cohort consists of 176 patient records, stratified into three therapeutic groups: 50 conservative, 92 regenerative, and 34 surgical. The primary aim is to investigate whether the collected dataset can be harnessed to predict which of the three treatment modalities is most appropriate. To this end, six document-combination scenarios were defined, evaluating each single-report modality as well as all possible pairings. For each scenario, two modeling strategies were contrasted: a traditional Support Vector Machine classifier leveraging TF–IDF features based on unigrams, bigrams, and trigrams, and a fine-tuned Italian BERT model adapted to our corpus. Experimental results indicate that classic n-gram–based approaches achieve the highest performance (macro–𝐹1 up to 71.3%). The BERT model, while outperforming the baseline, encounters limitations in this low-resource scenario.These findings suggest that the present dataset has the potential to catalyze the development of Italian-language clinical decision support systems that account for the distinct signatures of treatment pathways.

Scheda breve

Scheda completa

Scheda completa (DC)

Campo DC	Valore	Lingua
dc.authority.orgunit	Istituto di linguistica computazionale "Antonio Zampolli" - ILC	en
dc.authority.people	Bonfigli, Agnese	en
dc.authority.people	Piperno, Ruben	en
dc.authority.people	Bacco Luca	en
dc.authority.people	Dell'Orletta, Felice	en
dc.authority.people	Brunato, Dominique	en
dc.authority.people	Crispino, Filippo	en
dc.authority.people	Papalia, Giuseppe Francesco	en
dc.authority.people	Russo, Fabrizio	en
dc.authority.people	Vadalà, Gianluca	en
dc.authority.people	Papalia, Rocco	en
dc.authority.people	Merone, Mario	en
dc.authority.people	Pecchia, Leandro	en
dc.collection.id.s	71c7200a-7c5f-4e83-8d57-d3d2ba88f40d	*
dc.collection.name	04.01 Contributo in Atti di convegno	*
dc.contributor.appartenenza	Istituto di linguistica computazionale "Antonio Zampolli" - ILC	*
dc.contributor.appartenenza.mi	918	*
dc.contributor.area	Non assegn	*
dc.contributor.area	Non assegn	*
dc.date.accessioned	2026/03/03 17:31:06	-
dc.date.available	2026/03/03 17:31:06	-
dc.date.firstsubmission	2026/03/03 17:15:48	*
dc.date.issued	2025	-
dc.date.submission	2026/03/03 17:20:43	*
dc.description.abstracteng	Low back pain represents a leading source of disability worldwide and poses a significant challenge for evidence-based clinical decision support. In contexts where Italian-language resources for diversified therapeutic pathways are lacking, we have assembled a novel, annotated dataset comprising up to three pre-treatment documents per patient (MRI report, X-ray report, and patient visit notes), alongside demographic information (age and sex). The cohort consists of 176 patient records, stratified into three therapeutic groups: 50 conservative, 92 regenerative, and 34 surgical. The primary aim is to investigate whether the collected dataset can be harnessed to predict which of the three treatment modalities is most appropriate. To this end, six document-combination scenarios were defined, evaluating each single-report modality as well as all possible pairings. For each scenario, two modeling strategies were contrasted: a traditional Support Vector Machine classifier leveraging TF–IDF features based on unigrams, bigrams, and trigrams, and a fine-tuned Italian BERT model adapted to our corpus. Experimental results indicate that classic n-gram–based approaches achieve the highest performance (macro–𝐹1 up to 71.3%). The BERT model, while outperforming the baseline, encounters limitations in this low-resource scenario.These findings suggest that the present dataset has the potential to catalyze the development of Italian-language clinical decision support systems that account for the distinct signatures of treatment pathways.	-
dc.description.allpeople	Bonfigli, Agnese; Piperno, Ruben; Bacco, Luca; Dell'Orletta, Felice; Brunato, Dominique; Crispino, Filippo; Papalia, Giuseppe Francesco; Russo, Fabrizio; Vadalà, Gianluca; Papalia, Rocco; Merone, Mario; Pecchia, Leandro	-
dc.description.allpeopleoriginal	Bonfigli, Agnese; Piperno, Ruben; Bacco Luca; Dell'Orletta, Felice; Brunato, Dominique; Crispino, Filippo; Papalia, Giuseppe Francesco; Russo, Fabrizio; Vadalà, Gianluca; Papalia, Rocco; Merone, Mario; Pecchia, Leandro	en
dc.description.fulltext	open	en
dc.description.numberofauthors	12	-
dc.identifier.source	manual	*
dc.identifier.uri	https://hdl.handle.net/20.500.14243/570763	-
dc.language.iso	eng	en
dc.relation.ispartofbook	Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025)	en
dc.subject.keywords	NLP in healthcare	-
dc.subject.keywordseng	Large Language Models (LLMs)	-
dc.subject.keywordseng	Italian Medical Corpus	-
dc.subject.singlekeyword	NLP in healthcare	*
dc.subject.singlekeyword	Large Language Models (LLMs)	*
dc.subject.singlekeyword	Italian Medical Corpus	*
dc.title	A Novel Real-World Dataset of Italian Clinical Notes for NLP-based Decision Support in Low Back Pain Treatment	en
dc.type.driver	info:eu-repo/semantics/conferenceObject	-
dc.type.full	04 Contributo in convegno::04.01 Contributo in Atti di convegno	it
dc.type.miur	273	-
iris.mediafilter.data	2026/03/04 02:52:05	*
iris.orcid.lastModifiedDate	2026/03/03 17:31:06	*
iris.orcid.lastModifiedMillisecond	1772555466506	*
iris.sitodocente.maxattempts	10	-
Appare nelle tipologie:	04.01 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
11_main_long.pdf accesso aperto Licenza: Creative commons Dimensione 294.14 kB Formato Adobe PDF Visualizza/Apri	294.14 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/570763

Citazioni

ND

ND

ND

social impact