CNR Institutional Research Information System

Generating natural language descriptions from structured tabular data is a crucial challenge with high-impact applications across diverse domains, including business intelligence, scientific communication, and data analytics. Traditional rule-based and machine learning approaches have faced limitations in reusability, vocabulary coverage, and handling complex table layouts. Recent advances in LLMs pre-trained on vast corpora offer an opportunity to overcome these limitations by leveraging their strong language understanding and generation capabilities in a flexible learning setup. In this paper, We conduct a comprehensive evaluation of two LLMs - GPT-3.5 and LLaMa2-7B - on table-to-text generation across three diverse public datasets: WebNLG, NumericNLG, and ToTTo. Our experiments investigate both zero-shot prompting techniques and finetuning using the parameter-efficient LoRA method. Results demonstrate GPT-3.5’s impressive capabilities, outperforming LLaMa2 in zero-shot settings. However, finetuning LLaMa2 on a subset of data significantly bridges this performance gap and produces generations much closer to ground truth and comparable to SOTA approaches. Our findings highlight LLMs’ promising potential for data-to-text while identifying key areas for future research.

Leveraging Large Language Models for Flexible and Robust Table-to-Text Generation

Oro E.^Primo;De Grandis L.;Granata F. M.;Ruffolo M.

2024

Abstract

Generating natural language descriptions from structured tabular data is a crucial challenge with high-impact applications across diverse domains, including business intelligence, scientific communication, and data analytics. Traditional rule-based and machine learning approaches have faced limitations in reusability, vocabulary coverage, and handling complex table layouts. Recent advances in LLMs pre-trained on vast corpora offer an opportunity to overcome these limitations by leveraging their strong language understanding and generation capabilities in a flexible learning setup. In this paper, We conduct a comprehensive evaluation of two LLMs - GPT-3.5 and LLaMa2-7B - on table-to-text generation across three diverse public datasets: WebNLG, NumericNLG, and ToTTo. Our experiments investigate both zero-shot prompting techniques and finetuning using the parameter-efficient LoRA method. Results demonstrate GPT-3.5’s impressive capabilities, outperforming LLaMa2 in zero-shot settings. However, finetuning LLaMa2 on a subset of data significantly bridges this performance gap and produces generations much closer to ground truth and comparable to SOTA approaches. Our findings highlight LLMs’ promising potential for data-to-text while identifying key areas for future research.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2024
			
	Strutture organizzative
	
				Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
			
	Parole chiave
	
				Data-to-Text
Finetuning
GPT-3
LLaMa
LLM
LoRA
Natural Language Generation
Prompt
Table-to-Text
Zero-Shot
			
	Appare nelle tipologie:
	
				04.01 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
Dexa_2024_84_ORO.pdf solo utenti autorizzati Licenza: NON PUBBLICO - Accesso privato/ristretto Dimensione 308.93 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	308.93 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/522233

Citazioni

ND

0

0

social impact