Efficient multi-vector dense retrieval with bit vectors

Nardini, F. M.; Rulli, C.; Venturini, R.

doi:10.1007/978-3-031-56060-6_1

Dense retrieval techniques employ pre-trained large language models to build high-dimensional representations of queries and passages. These representations compute the relevance of a passage with respect to a query using efficient similarity measures. Multi-vector representations show improved effectiveness but come with a one-order-of-magnitude increase in memory footprint and query latency by encoding queries and documents on a per-token level. Recently, PLAID addressed these challenges by introducing a centroid-based term representation to reduce the memory impact of multi-vector systems. By exploiting a centroid interaction mechanism, PLAID filters out non-relevant documents, reducing the cost of successive ranking stages. This paper proposes "Efficient Multi-Vector Dense Retrieval with Bit Vectors" (EMVB), a novel framework for efficient query processing in multi-vector dense retrieval. First, EMVB employs a highly efficient pre-filtering step of passages using optimized bit vectors. Second, the computation of the centroid interaction happens column-wise, leveraging SIMD instructions to reduce latency. Third, EMVB uses Product Quantization (PQ) to reduce the memory footprint of storing vector representations while allowing for fast late interaction. Finally, we introduce a per-document term filtering method that further improves the efficiency of the final step. Experiments on MS MARCO and LoTTE demonstrate that EMVB is up to 2.8× faster and reduces the memory footprint by 1.8× without any loss in retrieval accuracy compared to PLAID.

Efficient multi-vector dense retrieval with bit vectors

Nardini F. M.;Rulli C.;Venturini R.

2024

Abstract

Dense retrieval techniques employ pre-trained large language models to build high-dimensional representations of queries and passages. These representations compute the relevance of a passage with respect to a query using efficient similarity measures. Multi-vector representations show improved effectiveness but come with a one-order-of-magnitude increase in memory footprint and query latency by encoding queries and documents on a per-token level. Recently, PLAID addressed these challenges by introducing a centroid-based term representation to reduce the memory impact of multi-vector systems. By exploiting a centroid interaction mechanism, PLAID filters out non-relevant documents, reducing the cost of successive ranking stages. This paper proposes "Efficient Multi-Vector Dense Retrieval with Bit Vectors" (EMVB), a novel framework for efficient query processing in multi-vector dense retrieval. First, EMVB employs a highly efficient pre-filtering step of passages using optimized bit vectors. Second, the computation of the centroid interaction happens column-wise, leveraging SIMD instructions to reduce latency. Third, EMVB uses Product Quantization (PQ) to reduce the memory footprint of storing vector representations while allowing for fast late interaction. Finally, we introduce a per-document term filtering method that further improves the efficiency of the final step. Experiments on MS MARCO and LoTTE demonstrate that EMVB is up to 2.8× faster and reduces the memory footprint by 1.8× without any loss in retrieval accuracy compared to PLAID.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2024
			
	Strutture organizzative
	
				Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
			
	Codice ISBN
	
				9783031560590
9783031560606
			
	Parole chiave
	
				Efficiency, Retrieval, Embeddings
			
	Appare nelle tipologie:
	
				04.01 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
ECIR24.pdf accesso aperto Descrizione: Efficient Multi-vector Dense Retrieval with Bit Vectors Tipologia: Documento in Pre-print Licenza: Creative commons Dimensione 643.43 kB Formato Adobe PDF Visualizza/Apri	643.43 kB	Adobe PDF	Visualizza/Apri
Nardini-Rulli-Venturini-LNCS-2024.pdf solo utenti autorizzati Descrizione: Efficient Multi-vector Dense Retrieval with Bit Vectors Tipologia: Versione Editoriale (PDF) Licenza: NON PUBBLICO - Accesso privato/ristretto Dimensione 438.49 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	438.49 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/525274

Citazioni

ND

14

8

CNR Institutional Research Information System

Efficient multi-vector dense retrieval with bit vectors

Nardini F. M.;Rulli C.;Venturini R.

2024

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

Citazioni

social impact

CNR Institutional Research Information System

Efficient multi-vector dense retrieval with bit vectors

Nardini F. M.;Rulli C.;Venturini R.

2024

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Informazioni

Citazioni

social impact

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)