CNR Institutional Research Information System

Low back pain represents a leading source of disability worldwide and poses a significant challenge for evidence-based clinical decision support. In contexts where Italian-language resources for diversified therapeutic pathways are lacking, we have assembled a novel, annotated dataset comprising up to three pre-treatment documents per patient (MRI report, X-ray report, and patient visit notes), alongside demographic information (age and sex). The cohort consists of 176 patient records, stratified into three therapeutic groups: 50 conservative, 92 regenerative, and 34 surgical. The primary aim is to investigate whether the collected dataset can be harnessed to predict which of the three treatment modalities is most appropriate. To this end, six document-combination scenarios were defined, evaluating each single-report modality as well as all possible pairings. For each scenario, two modeling strategies were contrasted: a traditional Support Vector Machine classifier leveraging TF–IDF features based on unigrams, bigrams, and trigrams, and a fine-tuned Italian BERT model adapted to our corpus. Experimental results indicate that classic n-gram–based approaches achieve the highest performance (macro–𝐹1 up to 71.3%). The BERT model, while outperforming the baseline, encounters limitations in this low-resource scenario.These findings suggest that the present dataset has the potential to catalyze the development of Italian-language clinical decision support systems that account for the distinct signatures of treatment pathways.

A Novel Real-World Dataset of Italian Clinical Notes for NLP-based Decision Support in Low Back Pain Treatment

Bonfigli, Agnese;Piperno, Ruben;Bacco Luca;Dell'Orletta, Felice;Brunato, Dominique;Crispino, Filippo;Papalia, Giuseppe Francesco;Russo, Fabrizio;Vadalà, Gianluca;Papalia, Rocco;Merone, Mario;Pecchia, Leandro

2025

Abstract

Low back pain represents a leading source of disability worldwide and poses a significant challenge for evidence-based clinical decision support. In contexts where Italian-language resources for diversified therapeutic pathways are lacking, we have assembled a novel, annotated dataset comprising up to three pre-treatment documents per patient (MRI report, X-ray report, and patient visit notes), alongside demographic information (age and sex). The cohort consists of 176 patient records, stratified into three therapeutic groups: 50 conservative, 92 regenerative, and 34 surgical. The primary aim is to investigate whether the collected dataset can be harnessed to predict which of the three treatment modalities is most appropriate. To this end, six document-combination scenarios were defined, evaluating each single-report modality as well as all possible pairings. For each scenario, two modeling strategies were contrasted: a traditional Support Vector Machine classifier leveraging TF–IDF features based on unigrams, bigrams, and trigrams, and a fine-tuned Italian BERT model adapted to our corpus. Experimental results indicate that classic n-gram–based approaches achieve the highest performance (macro–𝐹1 up to 71.3%). The BERT model, while outperforming the baseline, encounters limitations in this low-resource scenario.These findings suggest that the present dataset has the potential to catalyze the development of Italian-language clinical decision support systems that account for the distinct signatures of treatment pathways.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Strutture organizzative
	
				Istituto di linguistica computazionale "Antonio Zampolli" - ILC
			
	Parole chiave
	
				NLP in healthcare
			
	Parole chiave
	
				Large Language Models (LLMs)
Italian Medical Corpus
			
	Appare nelle tipologie:
	
				04.01 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
11_main_long.pdf accesso aperto Licenza: Creative commons Dimensione 294.14 kB Formato Adobe PDF Visualizza/Apri	294.14 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/570763

Citazioni

ND

ND

ND

social impact