CNR Institutional Research Information System

Deep pretrained transformer networks are effective at various ranking tasks, such as question answering and ad-hoc document ranking. However, their computational expenses deem them cost-prohibitive in practice. Our proposed approach, called PreTTR (Precomputing Transformer Term Representations), considerably reduces the query-time latency of deep transformer networks (up to a 42x speedup on web document ranking) making these networks more practical to use in a real-time ranking scenario. Specifically, we precompute part of the document term representations at indexing time (without a query), and merge them with the query representation at query time to compute the final ranking score. Due to the large size of the token representations, we also propose an effective approach to reduce the storage requirement by training a compression layer to match attention scores. Our compression technique reduces the storage required up to 95% and it can be applied without a substantial degradation in ranking performance.

Efficient document re-ranking for transformers by precomputing term representations

MacAvaney S;Nardini FM;Perego R;Tonellotto N;Goharian N;Frieder O

2020

Abstract

Deep pretrained transformer networks are effective at various ranking tasks, such as question answering and ad-hoc document ranking. However, their computational expenses deem them cost-prohibitive in practice. Our proposed approach, called PreTTR (Precomputing Transformer Term Representations), considerably reduces the query-time latency of deep transformer networks (up to a 42x speedup on web document ranking) making these networks more practical to use in a real-time ranking scenario. Specifically, we precompute part of the document term representations at indexing time (without a query), and merge them with the query representation at query time to compute the final ranking score. Due to the large size of the token representations, we also propose an effective approach to reduce the storage requirement by training a compression layer to match attention scores. Our compression technique reduces the storage required up to 95% and it can be applied without a substantial degradation in ranking performance.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2020
			
	Strutture organizzative
	
				Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
			
	Codice ISBN
	
				9781450380164
			
	Parole chiave
	
				Pre-trained transformer networks
Ranking efficiency
Contextualized language models
Neural ranking
			
	Appare nelle tipologie:
	
				04.01 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
prod_440216-doc_157961.pdf accesso aperto Descrizione: preprint Tipologia: Documento in Pre-print Licenza: Nessuna licenza dichiarata (non attribuibile a prodotti successivi al 2023) Dimensione 757.08 kB Formato Adobe PDF Visualizza/Apri	757.08 kB	Adobe PDF	Visualizza/Apri
prod_440216-doc_158107.pdf non disponibili Descrizione: Efficient Document Re-Ranking for Transformers by Precomputing Term Representations Tipologia: Versione Editoriale (PDF) Licenza: NON PUBBLICO - Accesso privato/ristretto Dimensione 1.16 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.16 MB	Adobe PDF	Visualizza/Apri Richiedi una copia
Perego_Efficient document re-ranking_AAM.pdf accesso aperto Descrizione: Efficient document re-ranking for transformers by precomputing term representations Tipologia: Documento in Post-print Licenza: Nessuna licenza dichiarata (non attribuibile a prodotti successivi al 2023) Dimensione 757.08 kB Formato Adobe PDF Visualizza/Apri	757.08 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/420621

Citazioni

ND

88

67

social impact