Plotting

 Joshi, Aparna


Debiasing Text Safety Classifiers through a Fairness-Aware Ensemble

arXiv.org Artificial Intelligence

Increasing use of large language models (LLMs) demand performant guardrails to ensure the safety of inputs and outputs of LLMs. When these safeguards are trained on imbalanced data, they can learn the societal biases. We present a light-weight, post-processing method for mitigating counterfactual fairness in closed-source text safety classifiers. Our approach involves building an ensemble that not only outperforms the input classifiers and policy-aligns them, but also acts as a debiasing regularizer. We introduce two threshold-agnostic metrics to assess the counterfactual fairness of a model, and demonstrate how combining these metrics with Fair Data Reweighting (FDW) helps mitigate biases. We create an expanded Open AI dataset, and a new templated LLM-generated dataset based on user-prompts, both of which are counterfactually balanced across identity groups and cover four key areas of safety; we will work towards publicly releasing these datasets. Our results show that our approach improves counterfactual fairness with minimal impact on model performance.


Investigating Speed Deviation Patterns During Glucose Episodes: A Quantile Regression Approach

arXiv.org Artificial Intelligence

Given the growing prevalence of diabetes, there has been significant interest in determining how diabetes affects instrumental daily functions, like driving. Complication of glucose control in diabetes includes hypoglycemic and hyperglycemic episodes, which may impair cognitive and psychomotor functions needed for safe driving. The goal of this paper was to determine patterns of diabetes speed behavior during acute glucose to drivers with diabetes who were euglycemic or control drivers without diabetes in a naturalistic driving environment. By employing distribution-based analytic methods which capture distribution patterns, our study advances prior literature that has focused on conventional approach of average speed to explore speed deviation patterns.