The identification of relevance with little textual context is a primary challenge in passage retrieval. We address this problem with a representation-based ranking approach that: (1) explicitly models the importance of each term using a contextualized language model; (2) performs passage expansion by propagating the importance to similar terms; and (3) grounds the representations in the lexicon, making them interpretable. Passage representations can be pre-computed at index time to reduce query-time latency. We call our approach EPIC (Expansion via Prediction of Importance with Contextualization). We show that EPIC significantly outperforms prior importance-modeling and document expansion approaches. We also observe that the performance is additive with the current leading first-stage retrieval methods, further narrowing the gap between inexpensive and cost-prohibitive passage ranking approaches. Specifically, EPIC achieves a MRR@10 of 0.304 on the MS-MARCO passage ranking dataset with 78ms average query latency on commodity hardware. We also find that the latency is further reduced to 68ms by pruning document representations, with virtually no difference in effectiveness.

Expansion via prediction of importance with contextualization

Nardini FM;Perego R;
2020

Abstract

The identification of relevance with little textual context is a primary challenge in passage retrieval. We address this problem with a representation-based ranking approach that: (1) explicitly models the importance of each term using a contextualized language model; (2) performs passage expansion by propagating the importance to similar terms; and (3) grounds the representations in the lexicon, making them interpretable. Passage representations can be pre-computed at index time to reduce query-time latency. We call our approach EPIC (Expansion via Prediction of Importance with Contextualization). We show that EPIC significantly outperforms prior importance-modeling and document expansion approaches. We also observe that the performance is additive with the current leading first-stage retrieval methods, further narrowing the gap between inexpensive and cost-prohibitive passage ranking approaches. Specifically, EPIC achieves a MRR@10 of 0.304 on the MS-MARCO passage ranking dataset with 78ms average query latency on commodity hardware. We also find that the latency is further reduced to 68ms by pruning document representations, with virtually no difference in effectiveness.
2020
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Inglese
43rd International ACM SIGIR Conference on Research and Development in Information Retrieval
1573
1576
9781450380164
https://doi.org/10.1145/3397271.3401262
Sì, ma tipo non specificato
25-30 July, 2020
online
Document representation
Query representation
Neural ranking
Efficient ranking
6
partially_open
Macavaney, S; Nardini, Fm; Perego, R; Tonellotto, N; Goharian, N; Frieder, O
273
info:eu-repo/semantics/conferenceObject
04 Contributo in convegno::04.01 Contributo in Atti di convegno
   Big Data to Enable Global Disruption of the Grapevine-powered Industries
   BigDataGrapes
   H2020
   780751
File in questo prodotto:
File Dimensione Formato  
prod_440218-doc_157963.pdf

accesso aperto

Descrizione: preprint
Tipologia: Versione Editoriale (PDF)
Dimensione 1.06 MB
Formato Adobe PDF
1.06 MB Adobe PDF Visualizza/Apri
prod_440218-doc_158108.pdf

non disponibili

Descrizione: Expansion via prediction of importance with contextualization
Tipologia: Versione Editoriale (PDF)
Dimensione 1.26 MB
Formato Adobe PDF
1.26 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/420623
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 86
  • ???jsp.display-item.citation.isi??? 67
social impact