Ensuring that generated text is accurately attributed to its underlying sources is critical for the transparency, trustworthiness, and verifiability of large language model (LLM) outputs. In this work, we conduct a comparative study of post-hoc context-attribution methods, focusing on the use of cross-encoders—both frozen and fine-tuned—as well as proprietary and open-source LLMs in low-annotation settings. We explore strategies for leveraging frozen LLMs for context-attribution without fine-tuning, and we develop techniques to optimize cross-encoder performance for semantic alignment between generated text and source material. Our evaluation spans four datasets: ASQA, ELI5, TREC-RAG, and a proprietary legal corpus, and includes both answer-level and sentence-level attribution tasks. Additionally, we investigate the impact of training small cross-encoders on synthetic data to assess their scalability and deployment potential in resource-constrained environments. Our results demonstrate that cross-encoders prove to be valid alternatives to LLMs for post-generation answer-level context-attribution. Moreover, after proper hyperparameter tuning, the same model can achieve performance comparable to proprietary LLM performance for sentence- and answer-level context-attribution. Finally, trained solely on synthetic data, small cross-encoders’ performance can be further improved while offering a scalable and cost-effective solution.
Improving Context-Attribution with Semi-Supervised Cross-Encoders
Ermelinda OroSupervision
2025
Abstract
Ensuring that generated text is accurately attributed to its underlying sources is critical for the transparency, trustworthiness, and verifiability of large language model (LLM) outputs. In this work, we conduct a comparative study of post-hoc context-attribution methods, focusing on the use of cross-encoders—both frozen and fine-tuned—as well as proprietary and open-source LLMs in low-annotation settings. We explore strategies for leveraging frozen LLMs for context-attribution without fine-tuning, and we develop techniques to optimize cross-encoder performance for semantic alignment between generated text and source material. Our evaluation spans four datasets: ASQA, ELI5, TREC-RAG, and a proprietary legal corpus, and includes both answer-level and sentence-level attribution tasks. Additionally, we investigate the impact of training small cross-encoders on synthetic data to assess their scalability and deployment potential in resource-constrained environments. Our results demonstrate that cross-encoders prove to be valid alternatives to LLMs for post-generation answer-level context-attribution. Moreover, after proper hyperparameter tuning, the same model can achieve performance comparable to proprietary LLM performance for sentence- and answer-level context-attribution. Finally, trained solely on synthetic data, small cross-encoders’ performance can be further improved while offering a scalable and cost-effective solution.| File | Dimensione | Formato | |
|---|---|---|---|
|
FAIA-413-FAIA251415.pdf
accesso aperto
Tipologia:
Versione Editoriale (PDF)
Licenza:
Creative commons
Dimensione
527.46 kB
Formato
Adobe PDF
|
527.46 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


