Ensuring that generated text is accurately attributed to its underlying sources is critical for the transparency, trustworthiness, and verifiability of large language model (LLM) outputs. In this work, we conduct a comparative study of post-hoc context-attribution methods, focusing on the use of cross-encoders—both frozen and fine-tuned—as well as proprietary and open-source LLMs in low-annotation settings. We explore strategies for leveraging frozen LLMs for context-attribution without fine-tuning, and we develop techniques to optimize cross-encoder performance for semantic alignment between generated text and source material. Our evaluation spans four datasets: ASQA, ELI5, TREC-RAG, and a proprietary legal corpus, and includes both answer-level and sentence-level attribution tasks. Additionally, we investigate the impact of training small cross-encoders on synthetic data to assess their scalability and deployment potential in resource-constrained environments. Our results demonstrate that cross-encoders prove to be valid alternatives to LLMs for post-generation answer-level context-attribution. Moreover, after proper hyperparameter tuning, the same model can achieve performance comparable to proprietary LLM performance for sentence- and answer-level context-attribution. Finally, trained solely on synthetic data, small cross-encoders’ performance can be further improved while offering a scalable and cost-effective solution.

Improving Context-Attribution with Semi-Supervised Cross-Encoders

Ermelinda Oro
Supervision
2025

Abstract

Ensuring that generated text is accurately attributed to its underlying sources is critical for the transparency, trustworthiness, and verifiability of large language model (LLM) outputs. In this work, we conduct a comparative study of post-hoc context-attribution methods, focusing on the use of cross-encoders—both frozen and fine-tuned—as well as proprietary and open-source LLMs in low-annotation settings. We explore strategies for leveraging frozen LLMs for context-attribution without fine-tuning, and we develop techniques to optimize cross-encoder performance for semantic alignment between generated text and source material. Our evaluation spans four datasets: ASQA, ELI5, TREC-RAG, and a proprietary legal corpus, and includes both answer-level and sentence-level attribution tasks. Additionally, we investigate the impact of training small cross-encoders on synthetic data to assess their scalability and deployment potential in resource-constrained environments. Our results demonstrate that cross-encoders prove to be valid alternatives to LLMs for post-generation answer-level context-attribution. Moreover, after proper hyperparameter tuning, the same model can achieve performance comparable to proprietary LLM performance for sentence- and answer-level context-attribution. Finally, trained solely on synthetic data, small cross-encoders’ performance can be further improved while offering a scalable and cost-effective solution.
2025
Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
978-1-64368-631-8
context-attribution, cross-encoders, large language models, retrieval-augmented generation, semi-supervised learning, synthetic data, post-hoc attribution, natural language inference, answer-level attribution, sentence-level attribution
File in questo prodotto:
File Dimensione Formato  
FAIA-413-FAIA251415.pdf

accesso aperto

Tipologia: Versione Editoriale (PDF)
Licenza: Creative commons
Dimensione 527.46 kB
Formato Adobe PDF
527.46 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/559877
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact