Dense Information Retrieval (IR) systems rely on neural networks to embed documents and queries within a latent low-dimensional space. Among the Dense IR approaches, bi-encoders are particularly popular, as they achieve state-of-the-art performance and allow for efficient encoding of documents and queries. Nevertheless, using this class of systems, by construction, all the documents and queries are represented using the same set of dimensions. In this article, we introduce the Manifold Clustering (MC) hypothesis which states that, for each query, there exists a query-dependent manifold of the original embedding space where the query and documents relevant to it cluster more effectively. We empirically validate the MC hypothesis showing that it is possible to find a query-dependent linear subspace of the original embedding space where high retrieval effectiveness is achieved.
Getting off the DIME: dimension pruning via dimension importance estimation for dense information retrieval
Perego Raffaele;
2026
Abstract
Dense Information Retrieval (IR) systems rely on neural networks to embed documents and queries within a latent low-dimensional space. Among the Dense IR approaches, bi-encoders are particularly popular, as they achieve state-of-the-art performance and allow for efficient encoding of documents and queries. Nevertheless, using this class of systems, by construction, all the documents and queries are represented using the same set of dimensions. In this article, we introduce the Manifold Clustering (MC) hypothesis which states that, for each query, there exists a query-dependent manifold of the original embedding space where the query and documents relevant to it cluster more effectively. We empirically validate the MC hypothesis showing that it is possible to find a query-dependent linear subspace of the original embedding space where high retrieval effectiveness is achieved.| File | Dimensione | Formato | |
|---|---|---|---|
|
Perego et al_DIME_2026.pdf
accesso aperto
Descrizione: Getting off the DIME: dimension pruning via dimension importance estimation for dense information retrieval
Tipologia:
Versione Editoriale (PDF)
Licenza:
Creative commons
Dimensione
937.38 kB
Formato
Adobe PDF
|
937.38 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


