Neural models have transformed Information Retrieval (IR) by enabling semantic search, representing queries and documents as dense embeddings in latent spaces. However, recent works indicate the contribution of single dimensions in these representations to ranking quality is uneven: some dimensions are essential, while others may even degrade performance. Dimension IMportance Estimators (DIMEs) are heuristics to guide the search for the subsets of dimensions that induce an optimal subspace where retrieval is more effective. To explore these subspaces, DIMEs rely on two simplifying assumptions: the linearity of subspaces and the independence of dimensions. In this paper, we move a step forward by relaxing the independence assumption and employing genetic algorithms to select the optimal set of dimensions. We show that selecting optimal dimensions for individual queries can achieve up to 0.981 nDCG@10 and 0.831 AP using state-of-the-art dense retrieval models on the considered datasets. Additionally, we identify subsets of dimensions that improve ranking quality across multiple queries simultaneously. Finally, we show that a dataset-specific subset of dimensions enables dense retrieval models to generalize across other datasets without loss of performance.

When Reducing Representations Improves Performance

Perego R.
Membro del Collaboration Group
;
Tonellotto N.
2026

Abstract

Neural models have transformed Information Retrieval (IR) by enabling semantic search, representing queries and documents as dense embeddings in latent spaces. However, recent works indicate the contribution of single dimensions in these representations to ranking quality is uneven: some dimensions are essential, while others may even degrade performance. Dimension IMportance Estimators (DIMEs) are heuristics to guide the search for the subsets of dimensions that induce an optimal subspace where retrieval is more effective. To explore these subspaces, DIMEs rely on two simplifying assumptions: the linearity of subspaces and the independence of dimensions. In this paper, we move a step forward by relaxing the independence assumption and employing genetic algorithms to select the optimal set of dimensions. We show that selecting optimal dimensions for individual queries can achieve up to 0.981 nDCG@10 and 0.831 AP using state-of-the-art dense retrieval models on the considered datasets. Additionally, we identify subsets of dimensions that improve ranking quality across multiple queries simultaneously. Finally, we show that a dataset-specific subset of dimensions enables dense retrieval models to generalize across other datasets without loss of performance.
2026
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Dense Representations
Effectiveness
Genetic Algorithms
Information Retrieval
Optimization
Ranking
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/583863
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ente

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact