This document reports on a set of widely used evaluation measures for recommender systems. The measures described in this document are well known by recommender system scientific community. In particular, all the first four measures reported, Precision and Recall, and MAE and ROC, assume to know which are the "good" results and which are the "bad" ones. This is done without looking at the real content of the involved objects. These measures are useful (and are used) to evaluate recommender systems that try to mimic the user behavior by estimating or predicting his future choices. By using them, we can only give a binary evaluation of recommendations (good/not good), without knowing the real quality of both recommended and not recommended results. In order to overcome this limitation, a series of other approaches have been proposed. Some of them give a quality measure based on the content of the objects. They compare the content of recommended items against the desired content representative (e.g. a query, the keywords of a paper, etc.). Another approach use the context of objects by assuming that similar objects are related to other similar objects. It exploits the object-to-object relationships to evaluate their similarity. Finally, a direct study of user reactions can be done. In this case, recommendations are proposed directly to a set of selected users. Users give a direct evaluation of the proposed recommendations, e.g. by giving ratings over them, clicking the related links, opening the proposed paper, etc. In the following, all these type of metrics are briefly introduced, discussing their respective good and negative aspects.
Mendeley Recommender System: Evaluation Measures
Dazzi P;Mordacchini M
2010
Abstract
This document reports on a set of widely used evaluation measures for recommender systems. The measures described in this document are well known by recommender system scientific community. In particular, all the first four measures reported, Precision and Recall, and MAE and ROC, assume to know which are the "good" results and which are the "bad" ones. This is done without looking at the real content of the involved objects. These measures are useful (and are used) to evaluate recommender systems that try to mimic the user behavior by estimating or predicting his future choices. By using them, we can only give a binary evaluation of recommendations (good/not good), without knowing the real quality of both recommended and not recommended results. In order to overcome this limitation, a series of other approaches have been proposed. Some of them give a quality measure based on the content of the objects. They compare the content of recommended items against the desired content representative (e.g. a query, the keywords of a paper, etc.). Another approach use the context of objects by assuming that similar objects are related to other similar objects. It exploits the object-to-object relationships to evaluate their similarity. Finally, a direct study of user reactions can be done. In this case, recommendations are proposed directly to a set of selected users. Users give a direct evaluation of the proposed recommendations, e.g. by giving ratings over them, clicking the related links, opening the proposed paper, etc. In the following, all these type of metrics are briefly introduced, discussing their respective good and negative aspects.| File | Dimensione | Formato | |
|---|---|---|---|
|
prod_161226-doc_132557.pdf
solo utenti autorizzati
Descrizione: Mendeley Recommender System: Evaluation Measures
Dimensione
196.09 kB
Formato
Adobe PDF
|
196.09 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


