CNR Institutional Research Information System

Pre-trained language models based on transformer networks arehighly effective for document re-ranking in ad-hoc search. Amongthese, cross-encoders stand out for their effectiveness, as they pro-cess query-document pairs through the entire transformer networkto compute ranking scores. However, this traversal is computation-ally expensive. To address this, prior work has explored early-exitstrategies, enabling the model to terminate the traversal of query-document pairs. These techniques rely on learned classifiers, placedafter each transformer block, that decide if a query-document paircan be dropped. Diverging from previous approaches, we proposeSimilarity-based Early Exit ( SEE ), a novel—non-learned—strategythat exploits the similarities between query and document tokenembeddings to early-terminate the inference of documents that willmost likely be non-relevant to the query. Even though SEE can beused after every transformer block, we show that the best advan-tage is achieved when applied before the first transformer block,thus saving most of the inference cost for the query-document pairs.Reproducible experiments on 17 public datasets covering in-domainand out-of-domain evaluation show that SEE can be effectively ap-plied to four different cross-encoders, achieving speedups of up to3.5× with a limited loss in ranking effectiveness.

Efficient re-ranking with cross-encoders via early exit

Busolin F.;Lucchese C.;Nardini F. M.;Orlando S.;Perego R.;Trani S.;Veneri A.

2025

Abstract

Pre-trained language models based on transformer networks arehighly effective for document re-ranking in ad-hoc search. Amongthese, cross-encoders stand out for their effectiveness, as they pro-cess query-document pairs through the entire transformer networkto compute ranking scores. However, this traversal is computation-ally expensive. To address this, prior work has explored early-exitstrategies, enabling the model to terminate the traversal of query-document pairs. These techniques rely on learned classifiers, placedafter each transformer block, that decide if a query-document paircan be dropped. Diverging from previous approaches, we proposeSimilarity-based Early Exit ( SEE ), a novel—non-learned—strategythat exploits the similarities between query and document tokenembeddings to early-terminate the inference of documents that willmost likely be non-relevant to the query. Even though SEE can beused after every transformer block, we show that the best advan-tage is achieved when applied before the first transformer block,thus saving most of the inference cost for the query-document pairs.Reproducible experiments on 17 public datasets covering in-domainand out-of-domain evaluation show that SEE can be effectively ap-plied to four different cross-encoders, achieving speedups of up to3.5× with a limited loss in ranking effectiveness.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Strutture organizzative
	
				Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
			
	Codice ISBN
	
				979-8-4007-1592-1
			
	Parole chiave
	
				Early exit; LLM-based rankers; Efficiency
			
	Appare nelle tipologie:
	
				04.01 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
Busolin et al_EfficientReRanking_2025.pdf accesso aperto Descrizione: Efficient re-ranking with cross-encoders via early exit Tipologia: Versione Editoriale (PDF) Licenza: Creative commons Dimensione 1.72 MB Formato Adobe PDF Visualizza/Apri	1.72 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/562499

Citazioni

ND

4

1

social impact