Biega, Asia
What's in a Query: Polarity-Aware Distribution-Based Fair Ranking
Balagopalan, Aparna, Wang, Kai, Salaudeen, Olawale, Biega, Asia, Ghassemi, Marzyeh
Machine learning-driven rankings, where individuals (or items) are ranked in response to a query, mediate search exposure or attention in a variety of safety-critical settings. Thus, it is important to ensure that such rankings are fair. Under the goal of equal opportunity, attention allocated to an individual on a ranking interface should be proportional to their relevance across search queries. In this work, we examine amortized fair ranking -- where relevance and attention are cumulated over a sequence of user queries to make fair ranking more feasible in practice. Unlike prior methods that operate on expected amortized attention for each individual, we define new divergence-based measures for attention distribution-based fairness in ranking (DistFaiR), characterizing unfairness as the divergence between the distribution of attention and relevance corresponding to an individual over time. This allows us to propose new definitions of unfairness, which are more reliable at test time. Second, we prove that group fairness is upper-bounded by individual fairness under this definition for a useful class of divergence measures, and experimentally show that maximizing individual fairness through an integer linear programming-based optimization is often beneficial to group fairness. Lastly, we find that prior research in amortized fair ranking ignores critical information about queries, potentially leading to a fairwashing risk in practice by making rankings appear more fair than they actually are.
The trade-off between data minimization and fairness in collaborative filtering
Sonboli, Nasim, Li, Sipei, Elahi, Mehdi, Biega, Asia
General Data Protection Regulations (GDPR) aim to safeguard individuals' personal information from harm. While full compliance is mandatory in the European Union and the California Privacy Rights Act (CPRA), it is not in other places. GDPR requires simultaneous compliance with all the principles such as fairness, accuracy, and data minimization. However, it overlooks the potential contradictions within its principles. This matter gets even more complex when compliance is required from decision-making systems. Therefore, it is essential to investigate the feasibility of simultaneously achieving the goals of GDPR and machine learning, and the potential tradeoffs that might be forced upon us. This paper studies the relationship between the principles of data minimization and fairness in recommender systems. We operationalize data minimization via active learning (AL) because, unlike many other methods, it can preserve a high accuracy while allowing for strategic data collection, hence minimizing the amount of data collection. We have implemented several active learning strategies (personalized and non-personalized) and conducted a comparative analysis focusing on accuracy and fairness on two publicly available datasets. The results demonstrate that different AL strategies may have different impacts on the accuracy of recommender systems with nearly all strategies negatively impacting fairness. There has been no to very limited work on the trade-off between data minimization and fairness, the pros and cons of active learning methods as tools for implementing data minimization, and the potential impacts of AL on fairness. By exploring these critical aspects, we offer valuable insights for developing recommender systems that are GDPR compliant.
On the Trade-Off between Actionable Explanations and the Right to be Forgotten
Pawelczyk, Martin, Leemann, Tobias, Biega, Asia, Kasneci, Gjergji
As machine learning (ML) models are increasingly being deployed in high-stakes applications, policymakers have suggested tighter data protection regulations (e.g., GDPR, CCPA). One key principle is the "right to be forgotten" which gives users the right to have their data deleted. Another key principle is the right to an actionable explanation, also known as algorithmic recourse, allowing users to reverse unfavorable decisions. To date, it is unknown whether these two principles can be operationalized simultaneously. Therefore, we introduce and study the problem of recourse invalidation in the context of data deletion requests. More specifically, we theoretically and empirically analyze the behavior of popular state-of-the-art algorithms and demonstrate that the recourses generated by these algorithms are likely to be invalidated if a small number of data deletion requests (e.g., 1 or 2) warrant updates of the predictive model. For the setting of differentiable models, we suggest a framework to identify a minimal subset of critical training points which, when removed, maximize the fraction of invalidated recourses. Using our framework, we empirically show that the removal of as little as 2 data instances from the training set can invalidate up to 95 percent of all recourses output by popular state-of-the-art algorithms. Thus, our work raises fundamental questions about the compatibility of "the right to an actionable explanation" in the context of the "right to be forgotten", while also providing constructive insights on the determining factors of recourse robustness.
The Role of Relevance in Fair Ranking
Balagopalan, Aparna, Jacobs, Abigail Z., Biega, Asia
Online platforms mediate access to opportunity: relevance-based rankings create and constrain options by allocating exposure to job openings and job candidates in hiring platforms, or sellers in a marketplace. In order to do so responsibly, these socially consequential systems employ various fairness measures and interventions, many of which seek to allocate exposure based on worthiness. Because these constructs are typically not directly observable, platforms must instead resort to using proxy scores such as relevance and infer them from behavioral signals such as searcher clicks. Yet, it remains an open question whether relevance fulfills its role as such a worthiness score in high-stakes fair rankings. In this paper, we combine perspectives and tools from the social sciences, information retrieval, and fairness in machine learning to derive a set of desired criteria that relevance scores should satisfy in order to meaningfully guide fairness interventions. We then empirically show that not all of these criteria are met in a case study of relevance inferred from biased user click data. We assess the impact of these violations on the estimated system fairness and analyze whether existing fairness interventions may mitigate the identified issues. Our analyses and results surface the pressing need for new approaches to relevance collection and generation that are suitable for use in fair ranking.