Mobasher, Bamshad
A Graph-based Approach for Mitigating Multi-sided Exposure Bias in Recommender Systems
Mansoury, Masoud, Abdollahpouri, Himan, Pechenizkiy, Mykola, Mobasher, Bamshad, Burke, Robin
Fairness is a critical system-level objective in recommender systems that has been the subject of extensive recent research. A specific form of fairness is supplier exposure fairness where the objective is to ensure equitable coverage of items across all suppliers in recommendations provided to users. This is especially important in multistakeholder recommendation scenarios where it may be important to optimize utilities not just for the end-user, but also for other stakeholders such as item sellers or producers who desire a fair representation of their items. This type of supplier fairness is sometimes accomplished by attempting to increasing aggregate diversity in order to mitigate popularity bias and to improve the coverage of long-tail items in recommendations. In this paper, we introduce FairMatch, a general graph-based algorithm that works as a post processing approach after recommendation generation to improve exposure fairness for items and suppliers. The algorithm iteratively adds high quality items that have low visibility or items from suppliers with low exposure to the users' final recommendation lists. A comprehensive set of experiments on two datasets and comparison with state-of-the-art baselines show that FairMatch, while significantly improves exposure fairness and aggregate diversity, maintains an acceptable level of relevance of the recommendations.
Evaluating Team Skill Aggregation in Online Competitive Games
Dehpanah, Arman, Ghori, Muheeb Faizan, Gemmell, Jonathan, Mobasher, Bamshad
One of the main goals of online competitive games is increasing player engagement by ensuring fair matches. These games use rating systems for creating balanced match-ups. Rating systems leverage statistical estimation to rate players' skills and use skill ratings to predict rank before matching players. Skill ratings of individual players can be aggregated to compute the skill level of a team. While research often aims to improve the accuracy of skill estimation and fairness of match-ups, less attention has been given to how the skill level of a team is calculated from the skill level of its members. In this paper, we propose two new aggregation methods and compare them with a standard approach extensively used in the research literature. We present an exhaustive analysis of the impact of these methods on the predictive performance of rating systems. We perform our experiments using three popular rating systems, Elo, Glicko, and TrueSkill, on three real-world datasets including over 100,000 battle royale and head-to-head matches. Our evaluations show the superiority of the MAX method over the other two methods in the majority of the tested cases, implying that the overall performance of a team is best determined by the performance of its most skilled member. The results of this study highlight the necessity of devising more elaborated methods for calculating a team's performance -- methods covering different aspects of players' behavior such as skills, strategy, or goals.
The Evaluation of Rating Systems in Team-based Battle Royale Games
Dehpanah, Arman, Ghori, Muheeb Faizan, Gemmell, Jonathan, Mobasher, Bamshad
Online competitive games have become a mainstream entertainment platform. To create a fair and exciting experience, these games use rating systems to match players with similar skills. While there has been an increasing amount of research on improving the performance of these systems, less attention has been paid to how their performance is evaluated. In this paper, we explore the utility of several metrics for evaluating three popular rating systems on a real-world dataset of over 25,000 team battle royale matches. Our results suggest considerable differences in their evaluation patterns. Some metrics were highly impacted by the inclusion of new players. Many could not capture the real differences between certain groups of players. Among all metrics studied, normalized discounted cumulative gain (NDCG) demonstrated more reliable performance and more flexibility. It alleviated most of the challenges faced by the other metrics while adding the freedom to adjust the focus of the evaluations on different groups of players.
User-centered Evaluation of Popularity Bias in Recommender Systems
Abdollahpouri, Himan, Mansoury, Masoud, Burke, Robin, Mobasher, Bamshad, Malthouse, Edward
Recommendation and ranking systems are known to suffer from popularity bias; the tendency of the algorithm to favor a few popular items while under-representing the majority of other items. Prior research has examined various approaches for mitigating popularity bias and enhancing the recommendation of long-tail, less popular, items. The effectiveness of these approaches is often assessed using different metrics to evaluate the extent to which over-concentration on popular items is reduced. However, not much attention has been given to the user-centered evaluation of this bias; how different users with different levels of interest towards popular items are affected by such algorithms. In this paper, we show the limitations of the existing metrics to evaluate popularity bias mitigation when we want to assess these algorithms from the users' perspective and we propose a new metric that can address these limitations. In addition, we present an effective approach that mitigates popularity bias from the user-centered point of view. Finally, we investigate several state-of-the-art approaches proposed in recent years to mitigate popularity bias and evaluate their performances using the existing metrics and also from the users' perspective. Our experimental results using two publicly-available datasets show that existing popularity bias mitigation techniques ignore the users' tolerance towards popular items. Our proposed user-centered method can tackle popularity bias effectively for different users while also improving the existing metrics.
Opportunistic Multi-aspect Fairness through Personalized Re-ranking
Sonboli, Nasim, Eskandanian, Farzad, Burke, Robin, Liu, Weiwen, Mobasher, Bamshad
As recommender systems have become more widespread and moved into areas with greater social impact, such as employment and housing, researchers have begun to seek ways to ensure fairness in the results that such systems produce. This work has primarily focused on developing recommendation approaches in which fairness metrics are jointly optimized along with recommendation accuracy. However, the previous work had largely ignored how individual preferences may limit the ability of an algorithm to produce fair recommendations. Furthermore, with few exceptions, researchers have only considered scenarios in which fairness is measured relative to a single sensitive feature or attribute (such as race or gender). In this paper, we present a re-ranking approach to fairness-aware recommendation that learns individual preferences across multiple fairness dimensions and uses them to enhance provider fairness in recommendation results. Specifically, we show that our opportunistic and metric-agnostic approach achieves a better trade-off between accuracy and fairness than prior re-ranking approaches and does so across multiple fairness dimensions.
Investigating Potential Factors Associated with Gender Discrimination in Collaborative Recommender Systems
Mansoury, Masoud (Eindhoven University of Technology ) | Abdollahpouri, Himan (University of Colorado Boulder) | Smith, Jessie (University of Colorado Boulder) | Dehpanah, Arman (DePaul University) | Pechenizkiy, Mykola (Eindhoven University of Technology) | Mobasher, Bamshad (DePaul University)
The proliferation of personalized recommendation technologies has raised concerns about discrepancies in their recommendation performance across different genders, age groups, and racial or ethnic populations. This varying degree of performance could impact usersโ trust in the system and may pose legal and ethical issues in domains where fairness and equity are critical concerns, like job recommendation. In this paper, we investigate several potential factors that could be associated with discriminatory performance of a recommendation algorithm for women versus men. We specifically study several characteristics of user profiles and analyze their possible associations with disparate behavior of the system towards different genders. These characteristics include the anomaly in rating behavior, the entropy of usersโ profiles, and the usersโ profile size. Our experimental results on a public dataset using four recommendation algorithms show that, based on all the three mentioned factors, women get less accurate recommendations than men indicating an unfair nature of recommendation algorithms across genders.
Crank up the volume: preference bias amplification in collaborative recommendation
Lin, Kun, Sonboli, Nasim, Mobasher, Bamshad, Burke, Robin
Recommender systems are personalized: we expect the results given to a particular user to reflect that user's preferences. Some researchers have studied the notion of calibration, how well recommendations match users' stated preferences, and bias disparity the extent to which mis-calibration affects different user groups. In this paper, we examine bias disparity over a range of different algorithms and for different item categories and demonstrate significant differences between model-based and memory-based algorithms.
Modeling the Dynamics of User Preferences for Sequence-Aware Recommendation Using Hidden Markov Models
Eskandanian, Farzad (DePaul University) | Mobasher, Bamshad (DePaul University)
In a variety of online settings involving interaction with end-users it is critical for the systems to adapt to changes in user preference. User preferences on items tend to change over time due to a variety of factors such as change in context, the task being performed, or other short-term or long-term external factors. Recommender systems, in particular need to be able to capture these dynamics in user preferences in order to remain tuned to the most current interests of users. In this work we present a recommendation framework which takes into account the dynamics of user preferences. We propose an approach based on Hidden Markov Models (HMM) to identify change-points in the sequence of user interactions which reflect significant changes in preference according to the sequential behavior of all the users in the data. The proposed framework leverages the identified change points to generate recommendations using a sequence-aware non-negative matrix factorization model. We empirically demonstrate the effectiveness of the HMM-based change detection method as compared to standard baseline methods. Additionally, we evaluate the performance of the proposed recommendation method and show that it compares favorably to state-of-the-art sequence-aware recommendation models.
Power of the Few: Analyzing the Impact of Influential Users in Collaborative Recommender Systems
Eskandanian, Farzad, Sonboli, Nasim, Mobasher, Bamshad
Like other social systems, in collaborative filtering a small number of "influential" users may have a large impact on the recommendations of other users, thus affecting the overall behavior of the system. Identifying influential users and studying their impact on other users is an important problem because it provides insight into how small groups can inadvertently or intentionally affect the behavior of the system as a whole. Modeling these influences can also shed light on patterns and relationships that would otherwise be difficult to discern, hopefully leading to more transparency in how the system generates personalized content. In this work we first formalize the notion of "influence" in collaborative filtering using an Influence Discrimination Model. We then empirically identify and characterize influential users and analyze their impact on the system under different underlying recommendation algorithms and across three different recommendation domains: job, movie and book recommendations. Insights from these experiments can help in designing systems that are not only optimized for accuracy, but are also tuned to mitigate the impact of influential users when it might lead to potential imbalance or unfairness in the system's outcomes.
Value-Aware Item Weighting for Long-Tail Recommendation
Abdollahpouri, Himan, Burke, Robin, Mobasher, Bamshad
Many recommender systems suffer from the popularity bias problem: popular items are being recommended frequently while less popular, niche products, are recommended rarely if not at all. However, those ignored products are exactly the products that businesses need to find customers for and their recommendations would be more beneficial. In this paper, we examine an item weighting approach to improve long-tail recommendation. Our approach works as a simple yet powerful add-on to existing recommendation algorithms for making a tunable trade-off between accuracy and long-tail coverage.