We observe that most relevant terms in unstructured news articles are primarily concentrated towards the beginning and the end of the document. Exploiting this observation, we propose a novel version of the classical BM25 weighting model, called BM25 Passage (BM25P), which scores query results by computing a linear combination of term statistics in the different portions of news articles. Our experimentation, conducted using three publicly available news datasets, demonstrates that BM25P markedly outperforms BM25 in term of effectiveness by up to 17.44% in NDCG@5 and 85% in NDCG@1.
Enhanced news retrieval: passages lead the way!
Catena M;Nardini FM;Perego R;Muntean CI;Tonellotto N
2019
Abstract
We observe that most relevant terms in unstructured news articles are primarily concentrated towards the beginning and the end of the document. Exploiting this observation, we propose a novel version of the classical BM25 weighting model, called BM25 Passage (BM25P), which scores query results by computing a linear combination of term statistics in the different portions of news articles. Our experimentation, conducted using three publicly available news datasets, demonstrates that BM25P markedly outperforms BM25 in term of effectiveness by up to 17.44% in NDCG@5 and 85% in NDCG@1.File in questo prodotto:
File | Dimensione | Formato | |
---|---|---|---|
prod_415603-doc_146377.pdf
non disponibili
Descrizione: Enhanced News Retrieval: Passages Lead the Way
Tipologia:
Versione Editoriale (PDF)
Dimensione
1.14 MB
Formato
Adobe PDF
|
1.14 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.