Neural models have transformed Information Retrieval (IR) by enabling semantic search, representing queries and documents as dense embeddings in latent spaces. However, recent works indicate the contribution of single dimensions in these representations to ranking quality is uneven: some dimensions are essential, while others may even degrade performance. Dimension IMportance Estimators (DIMEs) are heuristics to guide the search for the subsets of dimensions that induce an optimal subspace where retrieval is more effective. To explore these subspaces, DIMEs rely on two simplifying assumptions: the linearity of subspaces and the independence of dimensions. In this paper, we move a step forward by relaxing the independence assumption and employing genetic algorithms to select the optimal set of dimensions. We show that selecting optimal dimensions for individual queries can achieve up to 0.981 nDCG@10 and 0.831 AP using state-of-the-art dense retrieval models on the considered datasets. Additionally, we identify subsets of dimensions that improve ranking quality across multiple queries simultaneously. Finally, we show that a dataset-specific subset of dimensions enables dense retrieval models to generalize across other datasets without loss of performance.
When reducing representations improves performance
Perego RaffaeleMembro del Collaboration Group
;
2026
Abstract
Neural models have transformed Information Retrieval (IR) by enabling semantic search, representing queries and documents as dense embeddings in latent spaces. However, recent works indicate the contribution of single dimensions in these representations to ranking quality is uneven: some dimensions are essential, while others may even degrade performance. Dimension IMportance Estimators (DIMEs) are heuristics to guide the search for the subsets of dimensions that induce an optimal subspace where retrieval is more effective. To explore these subspaces, DIMEs rely on two simplifying assumptions: the linearity of subspaces and the independence of dimensions. In this paper, we move a step forward by relaxing the independence assumption and employing genetic algorithms to select the optimal set of dimensions. We show that selecting optimal dimensions for individual queries can achieve up to 0.981 nDCG@10 and 0.831 AP using state-of-the-art dense retrieval models on the considered datasets. Additionally, we identify subsets of dimensions that improve ranking quality across multiple queries simultaneously. Finally, we show that a dataset-specific subset of dimensions enables dense retrieval models to generalize across other datasets without loss of performance.| File | Dimensione | Formato | |
|---|---|---|---|
|
978-3-032-21289-4_30.pdf
solo utenti autorizzati
Descrizione: When Reducing Representations Improves Performance
Tipologia:
Versione Editoriale (PDF)
Licenza:
NON PUBBLICO - Accesso privato/ristretto
Dimensione
568.71 kB
Formato
Adobe PDF
|
568.71 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
|
Perego et al_ECIR 2026_WRRIP_preprint.pdf
accesso aperto
Descrizione: When Reducing Representations Improves Performance
Tipologia:
Documento in Pre-print
Licenza:
Altro tipo di licenza
Dimensione
286.68 kB
Formato
Adobe PDF
|
286.68 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


