Modern search engines face enormous performance challenges. The most popular ones process tens of thousands of queries per second and manage billions of documents. Besides being efficient in processing the queries, they have also to be effective in satisfying the information needs of the users. However, techniques that improve the quality of the results can also require longer processing time. Search engines have thus often to attain a compromise between effectiveness and efficiency when employing new query processing solutions. In this thesis, we propose three new solutions for improving the efficiency-effectiveness trade-offs in several different tasks of modern query processors, namely: query expansion, search results filtering, and top-1 retrieval. Query expansion is the task of augmenting the user query with additional terms so to overcome some of the difficulties arising from the natural language, such as synonymy and polysemy. To this regard, we propose a thesaurus-based query expansion framework relying on structured Conjunctive Normal Form queries. We also propose three novel term selection supervised models capturing efficiency and effectiveness of the expansion candidates to include into the expanded query. Search results filtering refers to the task of discarding some search results from an attribute-sorted list, e.g., by recency or by price, to improve the effectiveness of the returned results while preserving the ordering. To this end, we propose one efficient optimal algorithm and one faster approximate algorithm with strong approximation guarantees, which enable filtering in scenarios with tight time constraints. Top-1 retrieval regards the identification of the utmost relevant result. It is critical in many applications, such as conversational AI and question answering, returning only one utmost relevant result to each user query. To this regard, we propose an efficient algorithm for finding the result winning the highest number of pairwise comparisons, given by a pairwise machine learning classifier, while performing the minimum number of comparisons needed to solve this problem. In this thesis, we show that by trading efficiency for effectiveness it is possible to achieve a huge improvement of efficiency on these tasks at the cost of a negligible loss of effectiveness. We also demonstrate that new efficient algorithms can enable effective yet inefficient techniques to be efficiently used, thus providing new appealing solutions in the efficiency-effectiveness trade-off space.

Efficiency-Effectiveness Trade-offs in Modern Query Processing / Trani R.. - (04/03/2020).

Efficiency-Effectiveness Trade-offs in Modern Query Processing

Trani R
2020

Abstract

Modern search engines face enormous performance challenges. The most popular ones process tens of thousands of queries per second and manage billions of documents. Besides being efficient in processing the queries, they have also to be effective in satisfying the information needs of the users. However, techniques that improve the quality of the results can also require longer processing time. Search engines have thus often to attain a compromise between effectiveness and efficiency when employing new query processing solutions. In this thesis, we propose three new solutions for improving the efficiency-effectiveness trade-offs in several different tasks of modern query processors, namely: query expansion, search results filtering, and top-1 retrieval. Query expansion is the task of augmenting the user query with additional terms so to overcome some of the difficulties arising from the natural language, such as synonymy and polysemy. To this regard, we propose a thesaurus-based query expansion framework relying on structured Conjunctive Normal Form queries. We also propose three novel term selection supervised models capturing efficiency and effectiveness of the expansion candidates to include into the expanded query. Search results filtering refers to the task of discarding some search results from an attribute-sorted list, e.g., by recency or by price, to improve the effectiveness of the returned results while preserving the ordering. To this end, we propose one efficient optimal algorithm and one faster approximate algorithm with strong approximation guarantees, which enable filtering in scenarios with tight time constraints. Top-1 retrieval regards the identification of the utmost relevant result. It is critical in many applications, such as conversational AI and question answering, returning only one utmost relevant result to each user query. To this regard, we propose an efficient algorithm for finding the result winning the highest number of pairwise comparisons, given by a pairwise machine learning classifier, while performing the minimum number of comparisons needed to solve this problem. In this thesis, we show that by trading efficiency for effectiveness it is possible to achieve a huge improvement of efficiency on these tasks at the cost of a negligible loss of effectiveness. We also demonstrate that new efficient algorithms can enable effective yet inefficient techniques to be efficiently used, thus providing new appealing solutions in the efficiency-effectiveness trade-off space.
4
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Dottorato
tournament graph
top-1 retrieval
relevance-aware filtering
search results filtering
query reformulation
query expansion
efficiency-effectiveness trade-offs
Prof. Rossano Venturini, Dott. Franco Maria Nardini
File in questo prodotto:
File Dimensione Formato  
prod_425139-doc_151747.pdf

non disponibili

Descrizione: Efficiency-Effectiveness Trade-offs in Modern Query Processing
Dimensione 3.36 MB
Formato Adobe PDF
3.36 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/406387
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact