AITopics | Wilson, Shomir

Collaborating Authors

Wilson, Shomir

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Race and Privacy in Broadcast Police Communications

Venkit, Pranav Narayanan, Graziul, Christopher, Goodman, Miranda Ardith, Kenny, Samantha Nicole, Wilson, Shomir

arXiv.org Artificial IntelligenceJul-1-2024

Radios are essential for the operations of modern police departments, and they function as both a collaborative communication technology and a sociotechnical system. However, little prior research has examined their usage or their connections to individual privacy and the role of race in policing, two growing topics of concern in the US. As a case study, we examine the Chicago Police Department's (CPD's) use of broadcast police communications (BPC) to coordinate the activity of law enforcement officers (LEOs) in the city. From a recently assembled archive of 80,775 hours of BPC associated with CPD operations, we analyze text transcripts of radio transmissions broadcast 9:00 AM to 5:00 PM on August 10th, 2018 in one majority Black, one majority white, and one majority Hispanic area of the city (24 hours of audio) to explore three research questions: (1) Do BPC reflect reported racial disparities in policing? (2) How and when is gender, race/ethnicity, and age mentioned in BPC? (3) To what extent do BPC include sensitive information, and who is put at most risk by this practice? (4) To what extent can large language models (LLMs) heighten this risk? We explore the vocabulary and speech acts used by police in BPC, comparing mentions of personal characteristics to local demographics, the personal information shared over BPC, and the privacy concerns that it poses. Analysis indicates (a) policing professionals in the city of Chicago exhibit disproportionate attention to Black members of the public regardless of context, (b) sociodemographic characteristics like gender, race/ethnicity, and age are primarily mentioned in BPC about event information, and (c) disproportionate attention introduces disproportionate privacy risks for Black members of the public.

large language model, natural language, utterance, (14 more...)

arXiv.org Artificial Intelligence

2407.01817

Country: North America > United States > Illinois > Cook County > Chicago (0.46)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study > Negative Result (0.34)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)

Add feedback

"Confidently Nonsensical?'': A Critical Survey on the Perspectives and Challenges of 'Hallucinations' in NLP

Venkit, Pranav Narayanan, Chakravorti, Tatiana, Gupta, Vipul, Biggs, Heidi, Srinath, Mukund, Goswami, Koustava, Rajtmajer, Sarah, Wilson, Shomir

arXiv.org Artificial IntelligenceApr-10-2024

We investigate how hallucination in large language models (LLM) is characterized in peer-reviewed literature using a critical examination of 103 publications across NLP research. Through a comprehensive review of sociological and technological literature, we identify a lack of agreement with the term `hallucination.' Additionally, we conduct a survey with 171 practitioners from the field of NLP and AI to capture varying perspectives on hallucination. Our analysis underscores the necessity for explicit definitions and frameworks outlining hallucination within NLP, highlighting potential challenges, and our survey inputs provide a thematic understanding of the influence and ramifications of hallucination in society.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2404.07461

Country:

Asia > Middle East > UAE (0.14)
Europe > Middle East > Malta (0.14)
North America > United States > Maine (0.14)

Genre:

Overview (1.00)
Research Report > New Finding (0.93)

Industry:

Education (0.93)
Health & Medicine > Therapeutic Area (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.93)

Add feedback

The Sentiment Problem: A Critical Survey towards Deconstructing Sentiment Analysis

Venkit, Pranav Narayanan, Srinath, Mukund, Gautam, Sanjana, Venkatraman, Saranya, Gupta, Vipul, Passonneau, Rebecca J., Wilson, Shomir

arXiv.org Artificial IntelligenceOct-18-2023

We conduct an inquiry into the sociotechnical aspects of sentiment analysis (SA) by critically examining 189 peer-reviewed papers on their applications, models, and datasets. Our investigation stems from the recognition that SA has become an integral component of diverse sociotechnical systems, exerting influence on both social and technical users. By delving into sociological and technological literature on sentiment, we unveil distinct conceptualizations of this term in domains such as finance, government, and medicine. Our study exposes a lack of explicit definitions and frameworks for characterizing sentiment, resulting in potential challenges and biases. To tackle this issue, we propose an ethics sheet encompassing critical inquiries to guide practitioners in ensuring equitable utilization of SA. Our findings underscore the significance of adopting an interdisciplinary approach to defining sentiment in SA and offer a pragmatic solution for its implementation.

critical survey, deconstructing sentiment analysis, sentiment problem

arXiv.org Artificial Intelligence

2310.12318

Genre: Research Report > New Finding (0.53)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.60)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.60)

Add feedback

CALM : A Multi-task Benchmark for Comprehensive Assessment of Language Model Bias

Gupta, Vipul, Venkit, Pranav Narayanan, Laurençon, Hugo, Wilson, Shomir, Passonneau, Rebecca J.

arXiv.org Artificial IntelligenceAug-23-2023

As language models (LMs) become increasingly powerful, it is important to quantify and compare them for sociodemographic bias with potential for harm. Prior bias measurement datasets are sensitive to perturbations in their manually designed templates, therefore unreliable. To achieve reliability, we introduce the Comprehensive Assessment of Language Model bias (CALM), a benchmark dataset to quantify bias in LMs across three tasks. We integrate 16 existing datasets across different domains, such as Wikipedia and news articles, to filter 224 templates from which we construct a dataset of 78,400 examples. We compare the diversity of CALM with prior datasets on metrics such as average semantic similarity, and variation in template length, and test the sensitivity to small perturbations. We show that our dataset is more diverse and reliable than previous datasets, thus better capture the breadth of linguistic variation required to reliably evaluate model bias. We evaluate 20 large language models including six prominent families of LMs such as Llama-2. In two LM series, OPT and Bloom, we found that larger parameter models are more biased than lower parameter models. We found the T0 series of models to be the least biased. Furthermore, we noticed a tradeoff between gender and racial bias with increasing model size in some model series. The code is available at https://github.com/vipulgupta1011/CALM.

artificial intelligence, natural language, text processing, (17 more...)

arXiv.org Artificial Intelligence

2308.12539

Country:

Europe (1.00)
Asia (0.93)
North America > United States > Minnesota (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.53)

Add feedback

Survey on Sociodemographic Bias in Natural Language Processing

Gupta, Vipul, Venkit, Pranav Narayanan, Wilson, Shomir, Passonneau, Rebecca J.

arXiv.org Artificial IntelligenceAug-21-2023

Deep neural networks often learn unintended bias during training, which might have harmful effects when deployed in real-world settings. This work surveys 214 papers related to sociodemographic bias in natural language processing (NLP). In this study, we aim to provide a more comprehensive understanding of the similarities and differences among approaches to sociodemographic bias in NLP. To better understand the distinction between bias and real-world harm, we turn to ideas from psychology and behavioral economics to propose a definition for sociodemographic bias. We identify three main categories of NLP bias research: types of bias, quantifying bias, and debiasing techniques. We highlight the current trends in quantifying bias and debiasing techniques, offering insights into their strengths and weaknesses. We conclude that current approaches on quantifying bias face reliability issues, that many of the bias metrics do not relate to real-world bias, and that debiasing techniques need to focus more on training methods. Finally, we provide recommendations for future work.

computational linguistic, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2306.08158

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Minnesota (0.29)

Genre:

Overview (1.00)
Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.93)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.92)

Add feedback

Unmasking Nationality Bias: A Study of Human Perception of Nationalities in AI-Generated Articles

Venkit, Pranav Narayanan, Gautam, Sanjana, Panchanadikar, Ruchi, Huang, Ting-Hao `Kenneth', Wilson, Shomir

arXiv.org Artificial IntelligenceAug-8-2023

We investigate the potential for nationality biases in natural language processing (NLP) models using human evaluation methods. Biased NLP models can perpetuate stereotypes and lead to algorithmic discrimination, posing a significant challenge to the fairness and justice of AI systems. Our study employs a two-step mixed-methods approach that includes both quantitative and qualitative analysis to identify and understand the impact of nationality bias in a text generation model. Through our human-centered quantitative analysis, we measure the extent of nationality bias in articles generated by AI sources. We then conduct open-ended interviews with participants, performing qualitative coding and thematic analysis to understand the implications of these biases on human readers. Our findings reveal that biased NLP models tend to replicate and amplify existing societal biases, which can translate to harm if used in a sociotechnical setting. The qualitative analysis from our interviews offers insights into the experience readers have when encountering such articles, highlighting the potential to shift a reader's perception of a country. These findings emphasize the critical role of public perception in shaping AI's impact on society and the need to correct biases in AI systems.

artificial intelligence, natural language, participant, (14 more...)

arXiv.org Artificial Intelligence

2308.04346

Country:

North America > United States (0.69)
Europe > United Kingdom (0.46)

Genre: Research Report > New Finding (1.00)

Industry:

Government (0.93)
Law Enforcement & Public Safety (0.93)

Technology:

Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.94)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

Automated Ableism: An Exploration of Explicit Disability Biases in Sentiment and Toxicity Analysis Models

Venkit, Pranav Narayanan, Srinath, Mukund, Wilson, Shomir

arXiv.org Artificial IntelligenceJul-18-2023

We analyze sentiment analysis and toxicity detection models to detect the presence of explicit bias against people with disability (PWD). We employ the bias identification framework of Perturbation Sensitivity Analysis to examine conversations related to PWD on social media platforms, specifically Twitter and Reddit, in order to gain insight into how disability bias is disseminated in real-world social settings. We then create the \textit{Bias Identification Test in Sentiment} (BITS) corpus to quantify explicit disability bias in any sentiment analysis and toxicity detection models. Our study utilizes BITS to uncover significant biases in four open AIaaS (AI as a Service) sentiment analysis tools, namely TextBlob, VADER, Google Cloud Natural Language API, DistilBERT and two toxicity detection models, namely two versions of Toxic-BERT. Our findings indicate that all of these models exhibit statistically significant explicit bias against PWD.

artificial intelligence, disability bias, natural language, (17 more...)

arXiv.org Artificial Intelligence

2307.09209

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.34)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.68)
Law > Civil Rights & Constitutional Law (0.68)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)

Add feedback

Nationality Bias in Text Generation

Venkit, Pranav Narayanan, Gautam, Sanjana, Panchanadikar, Ruchi, Huang, Ting-Hao 'Kenneth', Wilson, Shomir

arXiv.org Artificial IntelligenceFeb-14-2023

Little attention is placed on analyzing nationality bias in language models, especially when nationality is highly used as a factor in increasing the performance of social NLP models. This paper examines how a text generation model, GPT-2, accentuates pre-existing societal biases about country-based demonyms. We generate stories using GPT-2 for various nationalities and use sensitivity analysis to explore how the number of internet users and the country's economic status impacts the sentiment of the stories. To reduce the propagation of biases through large language models (LLM), we explore the debiasing method of adversarial triggering. Our results show that GPT-2 demonstrates significant bias against countries with lower internet users, and adversarial triggering effectively reduces the same.

gpt-2, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2302.02463

Country:

Africa (0.94)
North America > United States (0.68)
Europe (0.68)

Genre: Research Report > New Finding (1.00)

Industry: Media > News (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Automated Detection of Doxing on Twitter

Karimi, Younes, Squicciarini, Anna, Wilson, Shomir

arXiv.org Artificial IntelligenceFeb-2-2022

The term"dox" is an abbreviation for"documents," and doxing is the act of disclosing private, sensitive, or personally identifiable information about a person without their consent. Sensitive information can be considered as any type of confidential information or any information that can be used to identify a person uniquely. This information is called doxed information and includes demographic information [53] such as birthday, sexual orientation, race, ethnicity, and religion, or location information which can be used to precisely or approximately locate a person such as the street address, ZIP code, IP address, and GPS coordinates. Other categories of doxed information are identity documents like passport number and social security number, contact information like phone number and email address, financial information such as credit card and bank account details, or sign-in credentials such as usernames and passwords[15]. Such disclosure may have various consequences. It may encourage forms of bigotry and hate groups, encourage human or child trafficking and endanger people's lives or reputations, scare and intimidate people by swatting

criminal law, information, machine learning, (29 more...)

arXiv.org Artificial Intelligence

2202.00879

Country: North America > United States (1.00)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.67)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Law (1.00)
Information Technology > Security & Privacy (1.00)
(3 more...)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Identification of Bias Against People with Disabilities in Sentiment Analysis and Toxicity Detection Models

Venkit, Pranav Narayanan, Wilson, Shomir

arXiv.org Artificial IntelligenceNov-25-2021

Sociodemographic biases are a common problem for natural language processing, affecting the fairness and integrity of its applications. Within sentiment analysis, these biases may undermine sentiment predictions for texts that mention personal attributes that unbiased human readers would consider neutral. Such discrimination can have great consequences in the applications of sentiment analysis both in the public and private sectors. For example, incorrect inferences in applications like online abuse and opinion analysis in social media platforms can lead to unwanted ramifications, such as wrongful censoring, towards certain populations. In this paper, we address the discrimination against people with disabilities, PWD, done by sentiment analysis and toxicity classification models. We provide an examination of sentiment and toxicity analysis models to understand in detail how they discriminate PWD. We present the Bias Identification Test in Sentiments (BITS), a corpus of 1,126 sentences designed to probe sentiment analysis models for biases in disability. We use this corpus to demonstrate statistically significant biases in four widely used sentiment analysis tools (TextBlob, VADER, Google Cloud Natural Language API and DistilBERT) and two toxicity analysis models trained to predict toxic comments on Jigsaw challenges (Toxic comment classification and Unintended Bias in Toxic comments). The results show that all exhibit strong negative biases on sentences that mention disability. We publicly release BITS Corpus for others to identify potential biases against disability in any sentiment analysis tools and also to update the corpus to be used as a test for other sociodemographic variables as well.

civil rights & constitutional law, disability, natural language, (18 more...)

arXiv.org Artificial Intelligence

2111.13259

Country: North America > United States (0.28)

Genre:

Research Report > Experimental Study (0.69)
Research Report > New Finding (0.67)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.68)
Information Technology > Services (0.66)
Law > Civil Rights & Constitutional Law (0.66)
Health & Medicine > Therapeutic Area > Neurology > Autism (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)

Add feedback