Goto

Collaborating Authors

 unintended bias


Refining Language Models with Compositional Explanations

Neural Information Processing Systems

Pre-trained language models have been successful on text classification tasks, but are prone to learning spurious correlations from biased datasets, and are thus vulnerable when making inferences in a new domain. Prior work reveals such spurious patterns via post-hoc explanation algorithms which compute the importance of input features. Further, the model is regularized to align the importance scores with human knowledge, so that the unintended model behaviors are eliminated. However, such a regularization technique lacks flexibility and coverage, since only importance scores towards a pre-defined list of features are adjusted, while more complex human knowledge such as feature interaction and pattern generalization can hardly be incorporated. In this work, we propose to refine a learned language model for a target domain by collecting human-provided compositional explanations regarding observed biases. By parsing these explanations into executable logic rules, the human-specified refinement advice from a small set of explanations can be generalized to more training examples. We additionally introduce a regularization term allowing adjustments for both importance and interaction of features to better rectify model behavior. We demonstrate the effectiveness of the proposed approach on two text classification tasks by showing improved performance in target domain as well as improved model fairness after refinement1.


Welcome! You are invited to join a webinar: Avoid Unintended Bias: How to Responsibly Use AI/Machine Learning to Address Health Disparities. After registering, you will receive a confirmation email about joining the webinar.

#artificialintelligence

Activity Description Artificial intelligence (AI) and machine learning (ML) are increasingly being used in healthcare settings with applications in decision support, patient care, and disease management. If the underlying data on which AI depends is inherently biased or lacks a diverse representation of populations, the algorithms cannot produce accurate outputs and will further widen the gap of equitable care. This activity will discuss how AI and ML can be used to address social determinants of health and create more equitable healthcare solutions and improve health outcomes. Learning Objectives 1. Identify approaches to create inclusive data sets that produce positive health outcomes for all patients.


Google Research, 2022 & beyond: Responsible AI โ€“ Google AI Blog

#artificialintelligence

The last year showed tremendous breakthroughs in artificial intelligence (AI), particularly in large language models (LLMs) and text-to-image models. These technological advances require that we are thoughtful and intentional in how they are developed and deployed. In this blogpost, we share ways we have approached Responsible AI across our research in the past year and where we're headed in 2023. We highlight four primary themes covering foundational and socio-technical research, applied research, and product solutions, as part of our commitment to build AI products in a responsible and ethical manner, in alignment with our AI Principles. When machine learning (ML) systems are used in real world contexts, they can fail to behave in expected ways, which reduces their realized benefit.


Study finds that artificial intelligence can determine race from medical images

#artificialintelligence

Artificial intelligence (AI) is used in a wide variety of health care settings, from analyzing medical images to assisting with surgical procedures. While AI can sometimes outperform trained clinicians, these superhuman abilities are not always fully understood. In a recent study published in The Lancet Digital Health, researchers found that AI models could accurately predict self-reported race in several different types of radiographic images--a task not possible for human experts. These findings suggest that race information could be unknowingly incorporated into image analysis models, which could potentially exacerbate racial disparities in the medical setting. "AI has immense potential to revolutionize the diagnosis, treatment, and monitoring of numerous diseases and conditions and could dramatically shape the way that we approach health care," said first study author and NIBIB Data and Technology Advancement (DATA) National Service Scholar Judy Gichoya, M.D. "However, for AI to truly benefit all patients, we need a better understanding of how these algorithms make their decisions to prevent unintended biases."


Reward Modeling for Mitigating Toxicity in Transformer-based Language Models

arXiv.org Artificial Intelligence

Transformer-based language models are able to generate fluent text and be efficiently adapted across various natural language generation tasks. However, language models that are pretrained on large unlabeled web text corpora have been shown to suffer from degenerating toxic content and social bias behaviors, consequently hindering their safe deployment. Various detoxification methods were proposed to mitigate the language model's toxicity; however, these methods struggled to detoxify language models when conditioned on prompts that contain specific social identities related to gender, race, or religion. In this study, we propose Reinforce-Detoxify; A reinforcement learning-based method for mitigating toxicity in language models. We address the challenge of safety in language models and propose a new reward model that is able to detect toxic content and mitigate unintended bias towards social identities in toxicity prediction. The experiments demonstrate that the Reinforce-Detoxify method for language model detoxification outperforms existing detoxification approaches in automatic evaluation metrics, indicating the ability of our approach in language model detoxification and less prone to unintended bias toward social identities in generated content.


Unintended Bias in Language Model-driven Conversational Recommendation

arXiv.org Artificial Intelligence

Conversational Recommendation Systems (CRSs) have recently started to leverage pretrained language models (LM) such as BERT for their ability to semantically interpret a wide range of preference statement variations. However, pretrained LMs are well-known to be prone to intrinsic biases in their training data, which may be exacerbated by biases embedded in domain-specific language data(e.g., user reviews) used to fine-tune LMs for CRSs. We study a recently introduced LM-driven recommendation backbone (termed LMRec) of a CRS to investigate how unintended bias i.e., language variations such as name references or indirect indicators of sexual orientation or location that should not affect recommendations manifests in significantly shifted price and category distributions of restaurant recommendations. The alarming results we observe strongly indicate that LMRec has learned to reinforce harmful stereotypes through its recommendations. For example, offhand mention of names associated with the black community significantly lowers the price distribution of recommended restaurants, while offhand mentions of common male-associated names lead to an increase in recommended alcohol-serving establishments. These and many related results presented in this work raise a red flag that advances in the language handling capability of LM-drivenCRSs do not come without significant challenges related to mitigating unintended bias in future deployed CRS assistants with a potential reach of hundreds of millions of end-users.


Can companies police the biases found in artificial intelligence?

#artificialintelligence

Artificial intelligence has seeped into almost every corner of our lives, including how people are hired for work. AI is used to screen and evaluate applicants, but there's a problem with that. Research has shown that AI can produce biased results, especially against women and minorities. That's something that Kenneth Chenault, chairman and managing director at the venture capital firm General Catalyst, is trying to address with his Data and Trust Alliance. Chenault is the co-chair of the organization.


Clearview AI Has New Tools to Identify You in Photos

WIRED

Clearview AI has stoked controversy by scraping the web for photos and applying facial recognition to give police and others an unprecedented ability to peer into our lives. Now the company's CEO wants to use artificial intelligence to make Clearview's surveillance tool even more powerful. It may make it more dangerous and error-prone as well. Clearview has collected billions of photos from across websites that include Facebook, Instagram, and Twitter and uses AI to identify a particular person in images. Police and government agents have used the company's face database to help identify suspects in photos by tying them to online profiles.


Investigating Bias In Automatic Toxic Comment Detection: An Empirical Study

arXiv.org Artificial Intelligence

With surge in online platforms, there has been an upsurge in the user engagement on these platforms via comments and reactions. A large portion of such textual comments are abusive, rude and offensive to the audience. With machine learning systems in-place to check such comments coming onto platform, biases present in the training data gets passed onto the classifier leading to discrimination against a set of classes, religion and gender. In this work, we evaluate different classifiers and feature to estimate the bias in these classifiers along with their performance on downstream task of toxicity classification. Results show that improvement in performance of automatic toxic comment detection models is positively correlated to mitigating biases in these models. In our work, LSTM with attention mechanism proved to be a better modelling strategy than a CNN model. Further analysis shows that fasttext embeddings is marginally preferable than glove embeddings on training models for toxicity comment detection. Deeper analysis reveals the findings that such automatic models are particularly biased to specific identity groups even though the model has a high AUC score. Finally, in effort to mitigate bias in toxicity detection models, a multi-task setup trained with auxiliary task of toxicity sub-types proved to be useful leading to upto 0.26% (6% relative) gain in AUC scores.


How The Department Of Defense Approaches Ethical AI

#artificialintelligence

Military and defense organizations using transformative technologies such as artificial intelligence and machine learning can realize tremendous gains and help to maintain advantages over increasingly capable adversaries and competitors. It can allow autonomous vehicles to go into terrain deemed too dangerous for humans, provide predictive analytics and maintenance to keep large fleets running smoothly and safely, and help to provide autonomous operations in difficult conditions. As the US Department of Defense (DoD) increasingly adopts AI technology in a wide variety of use cases ranging from back-office functions to battlefield operations, there is a realization that despite the benefits that AI can bring, there is also a risk of unintended consequences that could cause significant harm by using these various technologies. As a result, the DoD takes the topics of topics of ethics, transparency, and ethics policy very seriously. A few years ago, the DoD created the Joint Artificial Intelligence Center, also referred to as the JAIC, to help figure out how to best move forward with this transformative technology.