Dense retrieval techniques employ pre-trained large language models to build high-dimensional representations of queries and passages. These representations compute the relevance of a passage with respect to a query using efficient similarity measures. Multi-vector representations show improved effectiveness but come with a one-order-of-magnitude increase in memory footprint and query latency by encoding queries and documents on a per-token level. Recently, PLAID addressed these challenges by introducing a centroid-based term representation to reduce the memory impact of multi-vector systems. By exploiting a centroid interaction mechanism, PLAID filters out non-relevant documents, reducing the cost of successive ranking stages. This paper proposes "Efficient Multi-Vector Dense Retrieval with Bit Vectors" (EMVB), a novel framework for efficient query processing in multi-vector dense retrieval. First, EMVB employs a highly efficient pre-filtering step of passages using optimized bit vectors. Second, the computation of the centroid interaction happens column-wise, leveraging SIMD instructions to reduce latency. Third, EMVB uses Product Quantization (PQ) to reduce the memory footprint of storing vector representations while allowing for fast late interaction. Finally, we introduce a per-document term filtering method that further improves the efficiency of the final step. Experiments on MS MARCO and LoTTE demonstrate that EMVB is up to 2.8× faster and reduces the memory footprint by 1.8× without any loss in retrieval accuracy compared to PLAID.
Efficient multi-vector dense retrieval with bit vectors
Nardini F. M.
;Rulli C.
;Venturini R.
2024
Abstract
Dense retrieval techniques employ pre-trained large language models to build high-dimensional representations of queries and passages. These representations compute the relevance of a passage with respect to a query using efficient similarity measures. Multi-vector representations show improved effectiveness but come with a one-order-of-magnitude increase in memory footprint and query latency by encoding queries and documents on a per-token level. Recently, PLAID addressed these challenges by introducing a centroid-based term representation to reduce the memory impact of multi-vector systems. By exploiting a centroid interaction mechanism, PLAID filters out non-relevant documents, reducing the cost of successive ranking stages. This paper proposes "Efficient Multi-Vector Dense Retrieval with Bit Vectors" (EMVB), a novel framework for efficient query processing in multi-vector dense retrieval. First, EMVB employs a highly efficient pre-filtering step of passages using optimized bit vectors. Second, the computation of the centroid interaction happens column-wise, leveraging SIMD instructions to reduce latency. Third, EMVB uses Product Quantization (PQ) to reduce the memory footprint of storing vector representations while allowing for fast late interaction. Finally, we introduce a per-document term filtering method that further improves the efficiency of the final step. Experiments on MS MARCO and LoTTE demonstrate that EMVB is up to 2.8× faster and reduces the memory footprint by 1.8× without any loss in retrieval accuracy compared to PLAID.File | Dimensione | Formato | |
---|---|---|---|
ECIR24.pdf
accesso aperto
Descrizione: Efficient Multi-vector Dense Retrieval with Bit Vectors
Tipologia:
Documento in Pre-print
Licenza:
Creative commons
Dimensione
643.43 kB
Formato
Adobe PDF
|
643.43 kB | Adobe PDF | Visualizza/Apri |
Nardini-Rulli-Venturini-LNCS-2024.pdf
solo utenti autorizzati
Descrizione: Efficient Multi-vector Dense Retrieval with Bit Vectors
Tipologia:
Versione Editoriale (PDF)
Licenza:
NON PUBBLICO - Accesso privato/ristretto
Dimensione
438.49 kB
Formato
Adobe PDF
|
438.49 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.