Not enough data to create a plot.
Try a different view from the menu above.
Farnadi, Golnoosh
From Representational Harms to Quality-of-Service Harms: A Case Study on Llama 2 Safety Safeguards
Chehbouni, Khaoula, Roshan, Megha, Ma, Emmanuel, Wei, Futian Andrew, Taik, Afaf, Cheung, Jackie CK, Farnadi, Golnoosh
Recent progress in large language models (LLMs) has led to their widespread adoption in various domains. However, these advancements have also introduced additional safety risks and raised concerns regarding their detrimental impact on already marginalized populations. Despite growing mitigation efforts to develop safety safeguards, such as supervised safety-oriented fine-tuning and leveraging safe reinforcement learning from human feedback, multiple concerns regarding the safety and ingrained biases in these models remain. Furthermore, previous work has demonstrated that models optimized for safety often display exaggerated safety behaviors, such as a tendency to refrain from responding to certain requests as a precautionary measure. As such, a clear trade-off between the helpfulness and safety of these models has been documented in the literature. In this paper, we further investigate the effectiveness of safety measures by evaluating models on already mitigated biases. Using the case of Llama 2 as an example, we illustrate how LLMs' safety responses can still encode harmful assumptions. To do so, we create a set of non-toxic prompts, which we then use to evaluate Llama models. Through our new taxonomy of LLMs responses to users, we observe that the safety/helpfulness trade-offs are more pronounced for certain demographic groups which can lead to quality-of-service harms for marginalized populations.
Towards More Realistic Extraction Attacks: An Adversarial Perspective
More, Yash, Ganesh, Prakhar, Farnadi, Golnoosh
Language models are prone to memorizing large parts of their training data, making them vulnerable to extraction attacks. Existing research on these attacks remains limited in scope, often studying isolated trends rather than the real-world interactions with these models. In this paper, we revisit extraction attacks from an adversarial perspective, exploiting the brittleness of language models. We find significant churn in extraction attack trends, i.e., even minor, unintuitive changes to the prompt, or targeting smaller models and older checkpoints, can exacerbate the risks of extraction by up to $2-4 \times$. Moreover, relying solely on the widely accepted verbatim match underestimates the extent of extracted information, and we provide various alternatives to more accurately capture the true risks of extraction. We conclude our discussion with data deduplication, a commonly suggested mitigation strategy, and find that while it addresses some memorization concerns, it remains vulnerable to the same escalation of extraction risks against a real-world adversary. Our findings highlight the necessity of acknowledging an adversary's true capabilities to avoid underestimating extraction risks.
Position: Cracking the Code of Cascading Disparity Towards Marginalized Communities
Farnadi, Golnoosh, Havaei, Mohammad, Rostamzadeh, Negar
The rise of foundation models holds immense promise for advancing AI, but this progress may amplify existing risks and inequalities, leaving marginalized communities behind. In this position paper, we discuss that disparities towards marginalized communities - performance, representation, privacy, robustness, interpretability and safety - are not isolated concerns but rather interconnected elements of a cascading disparity phenomenon. We contrast foundation models with traditional models and highlight the potential for exacerbated disparity against marginalized communities. Moreover, we emphasize the unique threat of cascading impacts in foundation models, where interconnected disparities can trigger long-lasting negative consequences, specifically to the people on the margin. We define marginalized communities within the machine learning context and explore the multifaceted nature of disparities. We analyze the sources of these disparities, tracing them from data creation, training and deployment procedures to highlight the complex technical and socio-technical landscape. To mitigate the pressing crisis, we conclude with a set of calls to action to mitigate disparity at its source.
Mitigating Disparate Impact of Differential Privacy in Federated Learning through Robust Clustering
Malekmohammadi, Saber, Taik, Afaf, Farnadi, Golnoosh
Federated Learning (FL) is a decentralized machine learning (ML) approach that keeps data localized and often incorporates Differential Privacy (DP) to enhance privacy guarantees. Similar to previous work on DP in ML, we observed that differentially private federated learning (DPFL) introduces performance disparities, particularly affecting minority groups. Recent work has attempted to address performance fairness in vanilla FL through clustering, but this method remains sensitive and prone to errors, which are further exacerbated by the DP noise in DPFL. To fill this gap, in this paper, we propose a novel clustered DPFL algorithm designed to effectively identify clients' clusters in highly heterogeneous settings while maintaining high accuracy with DP guarantees. To this end, we propose to cluster clients based on both their model updates and training loss values. Our proposed approach also addresses the server's uncertainties in clustering clients' model updates by employing larger batch sizes along with Gaussian Mixture Model (GMM) to alleviate the impact of noise and potential clustering errors, especially in privacy-sensitive scenarios. We provide theoretical analysis of the effectiveness of our proposed approach. We also extensively evaluate our approach across diverse data distributions and privacy budgets and show its effectiveness in mitigating the disparate impact of DP in FL settings with a small computational cost.
The Cost of Arbitrariness for Individuals: Examining the Legal and Technical Challenges of Model Multiplicity
Ganesh, Prakhar, Daldaban, Ihsan Ibrahim, Cofone, Ignacio, Farnadi, Golnoosh
Model multiplicity, the phenomenon where multiple models achieve similar performance despite different underlying learned functions, introduces arbitrariness in model selection. While this arbitrariness may seem inconsequential in expectation, its impact on individuals can be severe. This paper explores various individual concerns stemming from multiplicity, including the effects of arbitrariness beyond final predictions, disparate arbitrariness for individuals belonging to protected groups, and the challenges associated with the arbitrariness of a single algorithmic system creating a monopoly across various contexts. It provides both an empirical examination of these concerns and a comprehensive analysis from the legal standpoint, addressing how these issues are perceived in the anti-discrimination law in Canada. We conclude the discussion with technical challenges in the current landscape of model multiplicity to meet legal requirements and the legal gap between current law and the implications of arbitrariness in model selection, highlighting relevant future research directions for both disciplines.
Understanding Intrinsic Socioeconomic Biases in Large Language Models
Arzaghi, Mina, Carichon, Florian, Farnadi, Golnoosh
Large Language Models (LLMs) are increasingly integrated into critical decision-making processes, such as loan approvals and visa applications, where inherent biases can lead to discriminatory outcomes. In this paper, we examine the nuanced relationship between demographic attributes and socioeconomic biases in LLMs, a crucial yet understudied area of fairness in LLMs. We introduce a novel dataset of one million English sentences to systematically quantify socioeconomic biases across various demographic groups. Our findings reveal pervasive socioeconomic biases in both established models such as GPT-2 and state-of-the-art models like Llama 2 and Falcon. We demonstrate that these biases are significantly amplified when considering intersectionality, with LLMs exhibiting a remarkable capacity to extract multiple demographic attributes from names and then correlate them with specific socioeconomic biases. This research highlights the urgent necessity for proactive and robust bias mitigation techniques to safeguard against discriminatory outcomes when deploying these powerful models in critical real-world applications.
Advancing Cultural Inclusivity: Optimizing Embedding Spaces for Balanced Music Recommendations
Moradi, Armin, Neophytou, Nicola, Farnadi, Golnoosh
Popularity bias in music recommendation systems -- where artists and tracks with the highest listen counts are recommended more often -- can also propagate biases along demographic and cultural axes. In this work, we identify these biases in recommendations for artists from underrepresented cultural groups in prototype-based matrix factorization methods. Unlike traditional matrix factorization methods, prototype-based approaches are interpretable. This allows us to directly link the observed bias in recommendations for minority artists (the effect) to specific properties of the embedding space (the cause). We mitigate popularity bias in music recommendation through capturing both users' and songs' cultural nuances in the embedding space. To address these challenges while maintaining recommendation quality, we propose two novel enhancements to the embedding space: i) we propose an approach to filter-out the irrelevant prototypes used to represent each user and item to improve generalizability, and ii) we introduce regularization techniques to reinforce a more uniform distribution of prototypes within the embedding space. Our results demonstrate significant improvements in reducing popularity bias and enhancing demographic and cultural fairness in music recommendations while achieving competitive -- if not better -- overall performance.
Causal Fair Metric: Bridging Causality, Individual Fairness, and Adversarial Robustness
Ehyaei, Ahmad-Reza, Farnadi, Golnoosh, Samadi, Samira
Despite the essential need for comprehensive considerations in responsible AI, factors like robustness, fairness, and causality are often studied in isolation. Adversarial perturbation, used to identify vulnerabilities in models, and individual fairness, aiming for equitable treatment of similar individuals, despite initial differences, both depend on metrics to generate comparable input data instances. Previous attempts to define such joint metrics often lack general assumptions about data or structural causal models and were unable to reflect counterfactual proximity. To address this, our paper introduces a causal fair metric formulated based on causal structures encompassing sensitive attributes and protected causal perturbation. To enhance the practicality of our metric, we propose metric learning as a method for metric estimation and deployment in real-world problems in the absence of structural causal models. We also demonstrate the application of our novel metric in classifiers. Empirical evaluation of real-world and synthetic datasets illustrates the effectiveness of our proposed metric in achieving an accurate classifier with fairness, resilience to adversarial perturbations, and a nuanced understanding of causal relationships.
Balancing Act: Constraining Disparate Impact in Sparse Models
Hashemizadeh, Meraj, Ramirez, Juan, Sukumaran, Rohan, Farnadi, Golnoosh, Lacoste-Julien, Simon, Gallego-Posada, Jose
Model pruning is a popular approach to enable the deployment of large deep learning models on edge devices with restricted computational or storage capacities. Although sparse models achieve performance comparable to that of their dense counterparts at the level of the entire dataset, they exhibit high accuracy drops for some data sub-groups. Existing methods to mitigate this disparate impact induced by pruning (i) rely on surrogate metrics that address the problem indirectly and have limited interpretability; or (ii) scale poorly with the number of protected sub-groups in terms of computational cost. We propose a constrained optimization approach that directly addresses the disparate impact of pruning: our formulation bounds the accuracy change between the dense and sparse models, for each subgroup. This choice of constraints provides an interpretable success criterion to determine if a pruned model achieves acceptable disparity levels. Experimental results demonstrate that our technique scales reliably to problems involving large models and hundreds of protected sub-groups. Current deep learning practice displays a trend towards larger architectures (Bommasani et al., 2021), as exemplified by popular models such as GPT-4 (OpenAI, 2023), Llama 2 (Touvron et al., 2023) and DALL-E 2 (Ramesh et al., 2022). Model compression techniques such as pruning (Gale et al., 2019), knowledge distillation (Hinton et al., 2015), or quantization (Gholami et al., 2021) are crucial towards enabling the deployment of large models across a wide range of platforms, including resource-constrained edge devices like smartphones. Despite achieving comparable performance at an aggregate level over the entire dataset, pruned models often exhibit significant accuracy reduction for some data sub-groups (Hooker et al., 2019; 2020; Paganini, 2020). In particular, under-represented groups can suffer high performance degradation while the overall performance remains unaffected, thus exacerbating systemic biases in machine learning models. Tran et al. (2022) refer to this phenomenon as the disparate impact of pruning. Existing mitigation methods face challenges in terms of interpretability and scalability to a large number of sub-groups. Tran et al. (2022) introduce constraints aiming to equalize the loss of the sparse model across sub-groups. However, their approach does not account for the unequal grouplevel performance of the dense model. Moreover, while the loss can be a useful surrogate for training, this method addresses the disparate impact issue indirectly as it focuses on controlling the loss, rather than group-level changes in accuracy. Alternatively, Lin et al. (2022) compute per-group importance scores for every model parameter to determine the weights to be pruned. This approach becomes prohibitively expensive when the model or the number of sub-groups is large.
Causal Adversarial Perturbations for Individual Fairness and Robustness in Heterogeneous Data Spaces
Ehyaei, Ahmad-Reza, Mohammadi, Kiarash, Karimi, Amir-Hossein, Samadi, Samira, Farnadi, Golnoosh
As responsible AI gains importance in machine learning algorithms, properties such as fairness, adversarial robustness, and causality have received considerable attention in recent years. However, despite their individual significance, there remains a critical gap in simultaneously exploring and integrating these properties. In this paper, we propose a novel approach that examines the relationship between individual fairness, adversarial robustness, and structural causal models in heterogeneous data spaces, particularly when dealing with discrete sensitive attributes. We use causal structural models and sensitive attributes to create a fair metric and apply it to measure semantic similarity among individuals. By introducing a novel causal adversarial perturbation and applying adversarial training, we create a new regularizer that combines individual fairness, causality, and robustness in the classifier. Our method is evaluated on both real-world and synthetic datasets, demonstrating its effectiveness in achieving an accurate classifier that simultaneously exhibits fairness, adversarial robustness, and causal awareness.