AITopics | Dev, Sunipa

Collaborating Authors

Dev, Sunipa

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The Tail Wagging the Dog: Dataset Construction Biases of Social Bias Benchmarks

Selvam, Nikil Roashan, Dev, Sunipa, Khashabi, Daniel, Khot, Tushar, Chang, Kai-Wei

arXiv.org Artificial IntelligenceJun-16-2023

How reliably can we trust the scores obtained from social bias benchmarks as faithful indicators of problematic social biases in a given language model? In this work, we study this question by contrasting social biases with non-social biases stemming from choices made during dataset construction that might not even be discernible to the human eye. To do so, we empirically simulate various alternative constructions for a given benchmark based on innocuous modifications (such as paraphrasing or random-sampling) that maintain the essence of their social bias. On two well-known social bias benchmarks (Winogender and BiasNLI) we observe that these shallow modifications have a surprising effect on the resulting degree of bias across various models. We hope these troubling observations motivate more robust measures of social biases.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2210.1004

Country:

North America > United States > California (0.14)
Europe (0.14)
Asia (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

SeeGULL: A Stereotype Benchmark with Broad Geo-Cultural Coverage Leveraging Generative Models

Jha, Akshita, Davani, Aida, Reddy, Chandan K., Dave, Shachi, Prabhakaran, Vinodkumar, Dev, Sunipa

arXiv.org Artificial IntelligenceMay-19-2023

Stereotype benchmark datasets are crucial to detect and mitigate social stereotypes about groups of people in NLP models. However, existing datasets are limited in size and coverage, and are largely restricted to stereotypes prevalent in the Western society. This is especially problematic as language technologies gain hold across the globe. To address this gap, we present SeeGULL, a broad-coverage stereotype dataset, built by utilizing generative capabilities of large language models such as PaLM, and GPT-3, and leveraging a globally diverse rater pool to validate the prevalence of those stereotypes in society. SeeGULL is in English, and contains stereotypes about identity groups spanning 178 countries across 8 different geo-political regions across 6 continents, as well as state-level identities within the US and India. We also include fine-grained offensiveness scores for different stereotypes and demonstrate their global disparities. Furthermore, we include comparative annotations about the same groups by annotators living in the region vs. those that are based in North America, and demonstrate that within-region stereotypes about groups differ from those prevalent in North America. CONTENT WARNING: This paper contains stereotype examples that may be offensive.

artificial intelligence, natural language, stereotype, (16 more...)

arXiv.org Artificial Intelligence

2305.1184

Country:

North America > United States (1.00)
Europe (1.00)
Asia (1.00)
Africa (1.00)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Cultural Re-contextualization of Fairness Research in Language Technologies in India

Bhatt, Shaily, Dev, Sunipa, Talukdar, Partha, Dave, Shachi, Prabhakaran, Vinodkumar

arXiv.org Artificial IntelligenceNov-21-2022

Recent research has revealed undesirable biases in NLP data and models. However, these efforts largely focus on social disparities in the West, and are not directly portable to other geo-cultural contexts. In this position paper, we outline a holistic research agenda to re-contextualize NLP fairness research for the Indian context, accounting for Indian societal context, bridging technological gaps in capability and resources, and adapting to Indian cultural values. We also summarize findings from an empirical study on various social biases along different axes of disparities relevant to India, demonstrating their prevalence in corpora and models.

artificial intelligence, indian context, natural language, (13 more...)

arXiv.org Artificial Intelligence

2211.11206

Country: Asia > India (1.00)

Genre: Research Report (0.83)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Re-contextualizing Fairness in NLP: The Case of India

Bhatt, Shaily, Dev, Sunipa, Talukdar, Partha, Dave, Shachi, Prabhakaran, Vinodkumar

arXiv.org Artificial IntelligenceNov-21-2022

Recent research has revealed undesirable biases in NLP data and models. However, these efforts focus on social disparities in West, and are not directly portable to other geo-cultural contexts. In this paper, we focus on NLP fair-ness in the context of India. We start with a brief account of the prominent axes of social disparities in India. We build resources for fairness evaluation in the Indian context and use them to demonstrate prediction biases along some of the axes. We then delve deeper into social stereotypes for Region andReligion, demonstrating its prevalence in corpora and models. Finally, we outline a holistic research agenda to re-contextualize NLP fairness research for the Indian context, ac-counting for Indian societal context, bridging technological gaps in NLP capabilities and re-sources, and adapting to Indian cultural values. While we focus on India, this framework can be generalized to other geo-cultural contexts.

artificial intelligence, natural language, social media, (20 more...)

arXiv.org Artificial Intelligence

2209.12226

Country: Asia > India (1.00)

Genre: Research Report (0.50)

Industry:

Health & Medicine (0.93)
Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Communications > Social Media (0.68)

Add feedback

Auditing Algorithmic Fairness in Machine Learning for Health with Severity-Based LOGAN

Ovalle, Anaelia, Dev, Sunipa, Zhao, Jieyu, Sarrafzadeh, Majid, Chang, Kai-Wei

arXiv.org Artificial IntelligenceNov-16-2022

Auditing machine learning-based (ML) healthcare tools for bias is critical to preventing patient harm, especially in communities that disproportionately face health inequities. General frameworks are becoming increasingly available to measure ML fairness gaps between groups. However, ML for health (ML4H) auditing principles call for a contextual, patient-centered approach to model assessment. Therefore, ML auditing tools must be (1) better aligned with ML4H auditing principles and (2) able to illuminate and characterize communities vulnerable to the most harm. To address this gap, we propose supplementing ML4H auditing frameworks with SLOGAN (patient Severity-based LOcal Group biAs detectioN), an automatic tool for capturing local biases in a clinical prediction task. SLOGAN adapts an existing tool, LOGAN (LOcal Group biAs detectioN), by contextualizing group bias detection in patient illness severity and past medical history. We investigate and compare SLOGAN's bias detection capabilities to LOGAN and other clustering techniques across patient subgroups in the MIMIC-III dataset. On average, SLOGAN identifies larger fairness disparities in over 75% of patient groups than LOGAN while maintaining clustering quality. Furthermore, in a diabetes case study, health disparity literature corroborates the characterizations of the most biased clusters identified by SLOGAN. Our results contribute to the broader discussion of how machine learning biases may perpetuate existing healthcare disparities.

artificial intelligence, machine learning, slogan, (15 more...)

arXiv.org Artificial Intelligence

2211.08742

Country: North America > United States > Minnesota (0.28)

Genre:

Research Report > Experimental Study (0.68)
Research Report > New Finding (0.48)

Industry:

Health & Medicine > Health Care Providers & Services (1.00)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.74)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

Add feedback

Harms of Gender Exclusivity and Challenges in Non-Binary Representation in Language Technologies

Dev, Sunipa, Monajatipoor, Masoud, Ovalle, Anaelia, Subramonian, Arjun, Phillips, Jeff M, Chang, Kai-Wei

arXiv.org Artificial IntelligenceSep-10-2021

Gender is widely discussed in the context of language tasks and when examining the stereotypes propagated by language models. However, current discussions primarily treat gender as binary, which can perpetuate harms such as the cyclical erasure of non-binary gender identities. These harms are driven by model and dataset biases, which are consequences of the non-recognition and lack of understanding of non-binary genders in society. In this paper, we explain the complexity of gender and language around it, and survey non-binary persons to understand harms associated with the treatment of gender as binary in English language technologies. We also detail how current language representations (e.g., GloVe, BERT) capture and perpetuate these harms and related challenges that need to be acknowledged and addressed for representations to equitably encode gender information.

machine translation, pronoun, survey article, (22 more...)

arXiv.org Artificial Intelligence

2108.12084

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre:

Questionnaire & Opinion Survey (1.00)
Overview (0.67)

Industry:

Health & Medicine (1.00)
Law (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.94)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.69)

Add feedback

The Geometry of Distributed Representations for Better Alignment, Attenuated Bias, and Improved Interpretability

Dev, Sunipa

arXiv.org Artificial IntelligenceNov-24-2020

High-dimensional representations for words, text, images, knowledge graphs and other structured data are commonly used in different paradigms of machine learning and data mining. These representations have different degrees of interpretability, with efficient distributed representations coming at the cost of the loss of feature to dimension mapping. This implies that there is obfuscation in the way concepts are captured in these embedding spaces. Its effects are seen in many representations and tasks, one particularly problematic one being in language representations where the societal biases, learned from underlying data, are captured and occluded in unknown dimensions and subspaces. As a result, invalid associations (such as different races and their association with a polar notion of good versus bad) are made and propagated by the representations, leading to unfair outcomes in different tasks where they are used. This work addresses some of these problems pertaining to the transparency and interpretability of such representations. A primary focus is the detection, quantification, and mitigation of socially biased associations in language representation.

deep learning, natural language processing, neural network, (24 more...)

arXiv.org Artificial Intelligence

2011.12465

Country:

Europe (1.00)
Asia > Middle East (1.00)
Africa (1.00)
North America > United States > California (0.28)

Genre: Research Report (1.00)

Industry:

Media (1.00)
Leisure & Entertainment > Sports (1.00)
Health & Medicine > Therapeutic Area (1.00)
(4 more...)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Social Media (1.00)
(4 more...)

Add feedback

OSCaR: Orthogonal Subspace Correction and Rectification of Biases in Word Embeddings

Dev, Sunipa, Li, Tao, Phillips, Jeff M, Srikumar, Vivek

arXiv.org Artificial IntelligenceJun-30-2020

Language representations are known to carry stereotypical biases and, as a result, lead to biased predictions in downstream tasks. While existing methods are effective at mitigating biases by linear projection, such methods are too aggressive: they not only remove bias, but also erase valuable information from word embeddings. We develop new measures for evaluating specific information retention that demonstrate the tradeoff between bias removal and information retention. To address this challenge, we propose OSCaR (Orthogonal Subspace Correction and Rectification), a bias-mitigating method that focuses on disentangling biased associations between concepts instead of removing concepts wholesale. Our experiments on gender biases show that OSCaR is a well-balanced approach that ensures that semantic information is retained in the embeddings and bias is also effectively mitigated.

health & medicine, information, text processing, (17 more...)

arXiv.org Artificial Intelligence

2007.00049

Country:

Asia (0.46)
North America > United States (0.28)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.48)

Add feedback

Absolute Orientation for Word Embedding Alignment

Dev, Sunipa, Hassan, Safia, Phillips, Jeff M.

arXiv.org Machine LearningJun-4-2018

We propose a new technique to align word embeddings which are derived from different source datasets or created using different mechanisms (e.g., GloVe or word2vec). We design a simple, closed-form solution to find the optimal rotation and optionally scaling which minimizes the root mean squared error or maximizes the average cosine similarity between two embeddings of the same vocabulary into the same dimensional space. Our methods extend approaches known as Absolute Orientation, which are popular for aligning objects in three-dimensions. We extend them to arbitrary dimensions, and show that a simple scaling solution can be derived independent of the rotation, and also that it optimizes cosine similarity. Then we demonstrate how to evaluate the similarity of embeddings from different sources or mechanisms, and that certain properties like synonyms and analogies are preserved across the embeddings and can be enhanced by simply aligning and averaging ensembles of embeddings.

alignment, artificial intelligence, text processing, (20 more...)

arXiv.org Machine Learning

1806.0133

Country:

North America > United States (0.14)
Europe > Spain (0.14)
Europe > Middle East > Malta (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback