Deep pretrained transformer networks are effective at various ranking tasks, such as question answering and ad-hoc document ranking. However, their computational expenses deem them cost-prohibitive in practice. Our proposed approach, called PreTTR (Precomputing Transformer Term Representations), considerably reduces the query-time latency of deep transformer networks (up to a 42x speedup on web document ranking) making these networks more practical to use in a real-time ranking scenario. Specifically, we precompute part of the document term representations at indexing time (without a query), and merge them with the query representation at query time to compute the final ranking score. Due to the large size of the token representations, we also propose an effective approach to reduce the storage requirement by training a compression layer to match attention scores. Our compression technique reduces the storage required up to 95% and it can be applied without a substantial degradation in ranking performance.

Efficient document re-ranking for transformers by precomputing term representations

Nardini FM;Perego R;
2020

Abstract

Deep pretrained transformer networks are effective at various ranking tasks, such as question answering and ad-hoc document ranking. However, their computational expenses deem them cost-prohibitive in practice. Our proposed approach, called PreTTR (Precomputing Transformer Term Representations), considerably reduces the query-time latency of deep transformer networks (up to a 42x speedup on web document ranking) making these networks more practical to use in a real-time ranking scenario. Specifically, we precompute part of the document term representations at indexing time (without a query), and merge them with the query representation at query time to compute the final ranking score. Due to the large size of the token representations, we also propose an effective approach to reduce the storage requirement by training a compression layer to match attention scores. Our compression technique reduces the storage required up to 95% and it can be applied without a substantial degradation in ranking performance.
2020
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
9781450380164
Pre-trained transformer networks
Ranking efficiency
Contextualized language models
Neural ranking
File in questo prodotto:
File Dimensione Formato  
prod_440216-doc_157961.pdf

accesso aperto

Descrizione: preprint
Tipologia: Documento in Pre-print
Licenza: Nessuna licenza dichiarata (non attribuibile a prodotti successivi al 2023)
Dimensione 757.08 kB
Formato Adobe PDF
757.08 kB Adobe PDF Visualizza/Apri
prod_440216-doc_158107.pdf

non disponibili

Descrizione: Efficient Document Re-Ranking for Transformers by Precomputing Term Representations
Tipologia: Versione Editoriale (PDF)
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 1.16 MB
Formato Adobe PDF
1.16 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Perego_Efficient document re-ranking_AAM.pdf

accesso aperto

Descrizione: Efficient document re-ranking for transformers by precomputing term representations
Tipologia: Documento in Post-print
Licenza: Nessuna licenza dichiarata (non attribuibile a prodotti successivi al 2023)
Dimensione 757.08 kB
Formato Adobe PDF
757.08 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/420621
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 88
  • ???jsp.display-item.citation.isi??? 67
social impact