AITopics | Field, Anjalie

Collaborating Authors

Field, Anjalie

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Evaluating Differentially Private Synthetic Data Generation in High-Stakes Domains

Ramesh, Krithika, Gandhi, Nupoor, Madaan, Pulkit, Bauer, Lisa, Peris, Charith, Field, Anjalie

arXiv.org Artificial IntelligenceOct-10-2024

The difficulty of anonymizing text data hinders the development and deployment of NLP in high-stakes domains that involve private data, such as healthcare and social services. Poorly anonymized sensitive data cannot be easily shared with annotators or external researchers, nor can it be used to train public models. In this work, we explore the feasibility of using synthetic data generated from differentially private language models in place of real data to facilitate the development of NLP in these domains without compromising privacy. In contrast to prior work, we generate synthetic data for real high-stakes domains, and we propose and conduct use-inspired evaluations to assess data quality. Our results show that prior simplistic evaluations have failed to highlight utility, privacy, and fairness issues in the synthetic data. Overall, our work underscores the need for further improvements to synthetic data generation for it to be a viable way to enable privacy-preserving data sharing.

computational linguistic, data mining, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2410.08327

Country:

North America > United States (1.00)
Europe (0.67)

Genre: Research Report > New Finding (0.68)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Health Care Providers & Services (1.00)
Government (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Data Science > Data Mining (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Locating Information Gaps and Narrative Inconsistencies Across Languages: A Case Study of LGBT People Portrayals on Wikipedia

Samir, Farhan, Park, Chan Young, Field, Anjalie, Shwartz, Vered, Tsvetkov, Yulia

arXiv.org Artificial IntelligenceOct-5-2024

To explain social phenomena and identify systematic biases, much research in computational social science focuses on comparative text analyses. These studies often rely on coarse corpus-level statistics or local word-level analyses, mainly in English. We introduce the InfoGap method -- an efficient and reliable approach to locating information gaps and inconsistencies in articles at the fact level, across languages. We evaluate InfoGap by analyzing LGBT people's portrayals, across 2.7K biography pages on English, Russian, and French Wikipedias. We find large discrepancies in factual coverage across the languages. Moreover, our analysis reveals that biographical facts carrying negative connotations are more likely to be highlighted in Russian Wikipedia. Crucially, InfoGap both facilitates large scale analyses, and pinpoints local document- and fact-level information gaps, laying a new foundation for targeted and nuanced comparative language analysis at scale.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2410.04282

Country:

Europe (1.00)
Asia (0.68)
North America > United States (0.28)
North America > Canada (0.28)

Genre: Research Report > New Finding (0.68)

Industry: Government (0.67)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.66)

Add feedback

Designing an Evaluation Framework for Large Language Models in Astronomy Research

Wu, John F., Hyk, Alina, McCormick, Kiera, Ye, Christine, Astarita, Simone, Baral, Elina, Ciuca, Jo, Cranney, Jesse, Field, Anjalie, Iyer, Kartheik, Koehn, Philipp, Kotler, Jenn, Kruk, Sandor, Ntampaka, Michelle, O'Neill, Charles, Peek, Joshua E. G., Sharma, Sanjib, Yunus, Mikaeel

arXiv.org Artificial IntelligenceMay-30-2024

Large Language Models (LLMs) are shifting how scientific research is done. It is imperative to understand how researchers interact with these models and how scientific sub-communities like astronomy might benefit from them. However, there is currently no standard for evaluating the use of LLMs in astronomy. Therefore, we present the experimental design for an evaluation study on how astronomy researchers interact with LLMs. We deploy a Slack chatbot that can answer queries from users via Retrieval-Augmented Generation (RAG); these responses are grounded in astronomy papers from arXiv. We record and anonymize user questions and chatbot answers, user upvotes and downvotes to LLM responses, user feedback to the LLM, and retrieved documents and similarity scores with the query. Our data collection method will enable future dynamic evaluations of LLM tools for astronomy.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2405.20389

Country:

North America > United States > Maryland (0.14)
North America > United States > Oregon (0.14)

Genre: Research Report > New Finding (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Riveter: Measuring Power and Social Dynamics Between Entities

Antoniak, Maria, Field, Anjalie, Mun, Jimin, Walsh, Melanie, Klein, Lauren F., Sap, Maarten

arXiv.org Artificial IntelligenceDec-15-2023

Riveter provides a complete easy-to-use pipeline for analyzing verb connotations associated with entities in text corpora. We prepopulate the package with connotation frames of sentiment, power, and agency, which have demonstrated usefulness for capturing social phenomena, such as gender bias, in a broad range of corpora. For decades, lexical frameworks have been foundational tools in computational social science, digital humanities, and natural language processing, facilitating multifaceted analysis of text corpora. But working with verb-centric lexica specifically requires natural language processing skills, reducing their accessibility to other researchers. By organizing the language processing pipeline, providing complete lexicon scores and visualizations for all entities in a corpus, and providing functionality for users to target specific research questions, Riveter greatly improves the accessibility of verb lexica and can facilitate a broad range of future research.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2023.acl-demo.36

2312.09536

Country:

Europe (1.00)
North America > United States > Washington > King County > Seattle (0.14)
North America > United States > New York (0.14)
(3 more...)

Genre: Research Report > Experimental Study (0.88)

Industry:

Media > News (0.68)
Health & Medicine (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

Add feedback

Developing Speech Processing Pipelines for Police Accountability

Field, Anjalie, Verma, Prateek, San, Nay, Eberhardt, Jennifer L., Jurafsky, Dan

arXiv.org Artificial IntelligenceJun-9-2023

Police body-worn cameras have the potential to improve accountability and transparency in policing. Yet in practice, they result in millions of hours of footage that is never reviewed. We investigate the potential of large pre-trained speech models for facilitating reviews, focusing on ASR and officer speech detection in footage from traffic stops. Our proposed pipeline includes training data alignment and filtering, fine-tuning with resource constraints, and combining officer speech detection with ASR for a fully automated approach. We find that (1) fine-tuning strongly improves ASR performance on officer speech (WER=12-13%), (2) ASR on officer speech is much more accurate than on community member speech (WER=43.55-49.07%), (3) domain-specific tasks like officer speech detection and diarization remain challenging. Our work offers practical applications for reviewing body camera footage and general guidance for adapting pre-trained speech models to noisy multi-speaker domains.

artificial intelligence, machine learning, speech, (14 more...)

arXiv.org Artificial Intelligence

2306.06086

Country:

North America > United States (0.28)
Europe (0.28)

Genre: Research Report (0.64)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Law (0.94)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Examining risks of racial biases in NLP tools for child protective services

Field, Anjalie, Coston, Amanda, Gandhi, Nupoor, Chouldechova, Alexandra, Putnam-Hornstein, Emily, Steier, David, Tsvetkov, Yulia

arXiv.org Artificial IntelligenceMay-30-2023

Although much literature has established the presence of demographic bias in natural language processing (NLP) models, most work relies on curated bias metrics that may not be reflective of real-world applications. At the same time, practitioners are increasingly using algorithmic tools in high-stakes settings, with particular recent interest in NLP. In this work, we focus on one such setting: child protective services (CPS). CPS workers often write copious free-form text notes about families they are working with, and CPS agencies are actively seeking to deploy NLP models to leverage these data. Given well-established racial bias in this setting, we investigate possible ways deployed NLP is liable to increase racial disparities. We specifically examine word statistics within notes and algorithmic fairness in risk prediction, coreference resolution, and named entity recognition (NER). We document consistent algorithmic unfairness in NER models, possible algorithmic unfairness in coreference resolution models, and little evidence of exacerbated racial bias in risk prediction. While there is existing pronounced criticism of risk prediction, our results expose previously undocumented risks of racial bias in realistic information extraction systems, highlighting potential concerns in deploying them, even though they may appear more benign. Our work serves as a rare realistic examination of NLP algorithmic fairness in a potential deployed setting and a timely investigation of a specific risk associated with deploying NLP in CPS settings.

computational linguistic, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3593013.3594094

2305.19409

Country:

North America > United States > New York (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > Louisiana (0.14)
(4 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Law > Family Law (1.00)
Government > Social Services (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.93)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Mention Annotations Alone Enable Efficient Domain Adaptation for Coreference Resolution

Gandhi, Nupoor, Field, Anjalie, Strubell, Emma

arXiv.org Artificial IntelligenceMay-30-2023

Although recent neural models for coreference resolution have led to substantial improvements on benchmark datasets, transferring these models to new target domains containing out-of-vocabulary spans and requiring differing annotation schemes remains challenging. Typical approaches involve continued training on annotated target-domain data, but obtaining annotations is costly and time-consuming. We show that annotating mentions alone is nearly twice as fast as annotating full coreference chains. Accordingly, we propose a method for efficiently adapting coreference models, which includes a high-precision mention detection objective and requires annotating only mentions in the target domain. Extensive evaluation across three English coreference datasets: CoNLL-2012 (news/conversation), i2b2/VA (medical notes), and previously unstudied child welfare notes, reveals that our approach facilitates annotation-efficient transfer and results in a 7-14% improvement in average F1 without increasing annotator time.

annotation, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2210.07602

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.68)

Industry:

Health & Medicine > Health Care Technology > Medical Record (0.46)
Education > Social Development & Welfare (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Gendered Mental Health Stigma in Masked Language Models

Lin, Inna Wanyin, Njoo, Lucille, Field, Anjalie, Sharma, Ashish, Reinecke, Katharina, Althoff, Tim, Tsvetkov, Yulia

arXiv.org Artificial IntelligenceApr-11-2023

Mental health stigma prevents many individuals from receiving the appropriate care, and social psychology studies have shown that mental health tends to be overlooked in men. In this work, we investigate gendered mental health stigma in masked language models. In doing so, we operationalize mental health stigma by developing a framework grounded in psychology research: we use clinical psychology literature to curate prompts, then evaluate the models' propensity to generate gendered words. We find that masked language models capture societal stigma about gender in mental health: models are consistently more likely to predict female subjects than male in sentences about having a mental health condition (32% vs. 19%), and this disparity is exacerbated for sentences that indicate treatment-seeking behavior. Furthermore, we find that different models capture dimensions of stigma differently for men and women, associating stereotypes like anger, blame, and pity more with women with mental health conditions than with men. In showing the complex nuances of models' gendered mental health stigma, we demonstrate that context and overlapping dimensions of identity are important considerations when assessing computational models' social biases.

artificial intelligence, natural language, stigma, (14 more...)

arXiv.org Artificial Intelligence

2210.15144

Country: North America > United States > Minnesota (0.28)

Genre:

Research Report > New Finding (0.95)
Research Report > Experimental Study (0.70)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback