Recommender Systems Evaluation
2018 (English)In: Encyclopedia of Social Network Analysis and Mining / [ed] Reda Alhajj, Jon Rokne, Springer, 2018, 2Chapter in book (Refereed)
Place, publisher, year, edition, pages
Springer, 2018, 2.
National Category
Other Computer and Information Science
Research subject
Skövde Artificial Intelligence Lab (SAIL); INF301 Data Science
Identifiers
URN: urn:nbn:se:his:diva-15039DOI: 10.1007/978-1-4939-7131-2_110162ISBN: 978-1-4939-7130-5 (print)ISBN: 978-1-4939-7131-2 (electronic)ISBN: 978-1-4939-7132-9 OAI: oai:DiVA.org:his-15039DiVA, id: diva2:1197264
Note
The evaluation of RSs has been, and still is, the object of active research in the field. Since the advent of the first RS, recommendation performance has been usually equated to the accuracy of rating prediction, that is, estimated ratings are compared against actual ratings, and differences between them are computed by means of the MAE and RMSE metrics. In terms of the effective utility of recommendations for users, there is however an increasing realization that the quality (precision) of a ranking of recommended items can be more important than the accuracy in predicting specific rating values. As a result, precision-oriented metrics are being increasingly considered in the field, and a large amount of recent work has focused on evaluating top-N ranked recommendation lists with the above type of metrics. Besides that, other dimensions apart from accuracy – such as coverage, diversity, novelty, and serendipity – have been recently taken into account and analyzed when considered what makes a good recommendation (Said et al, 2014b; Cremonesi et al, 2011; McNee et al, 2006; Bellog´ın and de Vries, 2013; Bollen et al, 2010). So, what makes a good evaluation? The realization that high prediction accuracy might not translate to a higher perceived performance from the users has brought a plethora of novel metrics and methods, focusing on other aspects of recommendation (Said et al, 2013a; Castells et al, 2015; Vargas and Castells, 2014). Recent trends in evaluation methodologies point towards there being a shift from traditional methods solely based on statistical analyses of static data, i.e., raising precision performance of algorithms on offline data (Ekstrand et al, 2011b) – offline data in this case being recorded user interactions such as movie ratings or product purchases. Evaluation is the key to identifying how well an algorithm or a system works. Deploying a new algorithm in a new system will have an effect on the overall performance of the system – in terms of accuracy and other types of metrics. Both prior deploying the algorithm, and after the deployment, it is important to evaluate the system performance. It is in the evaluation of a RS one needs to decide on what should be sought-for, e.g., depending on whether the evaluation is to be performed from the users’ perspective (accuracy, serendipity, novelty), the vendor’s perspective (catalog, profit, churn), or even from the technical perspective of the system running the RS (CPU load, training time, adaptability). Given the context of the system, there might be other perspectives as well; in summary, what is important is to define the Key Performance Indicator (KPI) that one wants to measure. Let us imagine an online marketplace where customers buy various goods, an improved recommendation algorithm could result in, e.g., increased numbers of sold goods, more expensive goods sold, more goods from a specific section of the catalog sold, customers returning to the marketplace more often, etc. When evaluating a system like this, one needs to decide on what is to be evaluated – what the soughtfor quality is – and how it is going to be measured.
2018-04-122018-04-122019-02-14Bibliographically approved