Pre-trained language models based on transformer networks arehighly effective for document re-ranking in ad-hoc search. Amongthese, cross-encoders stand out for their effectiveness, as they pro-cess query-document pairs through the entire transformer networkto compute ranking scores. However, this traversal is computation-ally expensive. To address this, prior work has explored early-exitstrategies, enabling the model to terminate the traversal of query-document pairs. These techniques rely on learned classifiers, placedafter each transformer block, that decide if a query-document paircan be dropped. Diverging from previous approaches, we proposeSimilarity-based Early Exit ( SEE ), a novel—non-learned—strategythat exploits the similarities between query and document tokenembeddings to early-terminate the inference of documents that willmost likely be non-relevant to the query. Even though SEE can beused after every transformer block, we show that the best advan-tage is achieved when applied before the first transformer block,thus saving most of the inference cost for the query-document pairs.Reproducible experiments on 17 public datasets covering in-domainand out-of-domain evaluation show that SEE can be effectively ap-plied to four different cross-encoders, achieving speedups of up to3.5× with a limited loss in ranking effectiveness.

Efficient re-ranking with cross-encoders via early exit

Busolin F.
;
Lucchese C.
;
Nardini F. M.
;
Orlando S.;Perego R.;Trani S.
;
Veneri A.
2025

Abstract

Pre-trained language models based on transformer networks arehighly effective for document re-ranking in ad-hoc search. Amongthese, cross-encoders stand out for their effectiveness, as they pro-cess query-document pairs through the entire transformer networkto compute ranking scores. However, this traversal is computation-ally expensive. To address this, prior work has explored early-exitstrategies, enabling the model to terminate the traversal of query-document pairs. These techniques rely on learned classifiers, placedafter each transformer block, that decide if a query-document paircan be dropped. Diverging from previous approaches, we proposeSimilarity-based Early Exit ( SEE ), a novel—non-learned—strategythat exploits the similarities between query and document tokenembeddings to early-terminate the inference of documents that willmost likely be non-relevant to the query. Even though SEE can beused after every transformer block, we show that the best advan-tage is achieved when applied before the first transformer block,thus saving most of the inference cost for the query-document pairs.Reproducible experiments on 17 public datasets covering in-domainand out-of-domain evaluation show that SEE can be effectively ap-plied to four different cross-encoders, achieving speedups of up to3.5× with a limited loss in ranking effectiveness.
2025
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
979-8-4007-1592-1
Early exit; LLM-based rankers; Efficiency
File in questo prodotto:
File Dimensione Formato  
Busolin et al_EfficientReRanking_2025.pdf

accesso aperto

Descrizione: Efficient re-ranking with cross-encoders via early exit
Tipologia: Versione Editoriale (PDF)
Licenza: Creative commons
Dimensione 1.72 MB
Formato Adobe PDF
1.72 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/562499
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 1
social impact