AITopics | Daumé, Hal III

Collaborating Authors

Daumé, Hal III

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The Impact of Explanations on Fairness in Human-AI Decision-Making: Protected vs Proxy Features

Goyal, Navita, Baumler, Connor, Nguyen, Tin, Daumé, Hal III

arXiv.org Artificial IntelligenceOct-12-2023

Research in XAI aims to improve fairness in human-AI decision-making by providing insights into model predictions, and thereby allowing humans to understand and correct for model biases. On the other hand, in the context of human-AI decision-making, previous work has noted that humans often over-rely on AI predictions, and explanations can exacerbate this concern [9]. This is especially troubling if the underlying model contains systematic biases, which may go unnoticed even when teamed with a human. In order for the human-AI team to be successful, the human needs to be able to determine when to rely on or override potentially biased AI predictions. Previous work has shown that explanations can help human-AI teams alleviate model biases when those biases depend directly on protected attributes [18, 54], but little is known in the very common case that protected attributes are not explicitly included, and rather the features used for prediction contain proxies thereof (e.g., zip code for race, length of credit for age, and university attended for gender). In particular, it may be difficult for humans to identify and resolve biased model predictions based on the proxy features present in real-world data, even when explanations are provided. In this work, we study whether explanations can help people to identify model biases and to calibrate their reliance on an AI model based on these biases. We extend this line of investigation beyond direct biases that are revealed through the use of protected (i.e., sensitive) features by considering the effect of explanations when indirect bias is revealed Both co-first authors contributed equally to this manuscript, and each has the right to list their name first on their CV.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2310.08617

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Maryland > Prince George's County > College Park (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry:

Law (1.00)
Banking & Finance (0.92)
Government > Regional Government > North America Government > United States Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.94)
Information Technology > Data Science > Data Mining (0.68)
Information Technology > Artificial Intelligence > Natural Language (0.68)
(2 more...)

Add feedback

ASL Citizen: A Community-Sourced Dataset for Advancing Isolated Sign Language Recognition

Desai, Aashaka, Berger, Lauren, Minakov, Fyodor O., Milan, Vanessa, Singh, Chinmay, Pumphrey, Kriston, Ladner, Richard E., Daumé, Hal III, Lu, Alex X., Caselli, Naomi, Bragg, Danielle

arXiv.org Artificial IntelligenceJun-19-2023

Sign languages are used as a primary language by approximately 70 million D/deaf people world-wide. However, most communication technologies operate in spoken and written languages, creating inequities in access. To help tackle this problem, we release ASL Citizen, the first crowdsourced Isolated Sign Language Recognition (ISLR) dataset, collected with consent and containing 83,399 videos for 2,731 distinct signs filmed by 52 signers in a variety of environments. We propose that this dataset be used for sign language dictionary retrieval for American Sign Language (ASL), where a user demonstrates a sign to their webcam to retrieve matching signs from a dictionary. We show that training supervised machine learning classifiers with our dataset advances the state-of-the-art on metrics relevant for dictionary retrieval, achieving 63% accuracy and a recall-at-10 of 91%, evaluated entirely on videos of users who are not present in the training or validation sets.

artificial intelligence, dataset, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2304.05934

Country: North America > United States (0.68)

Genre: Research Report (1.00)

Industry: Education > Curriculum > Subject-Specific Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Communications > Social Media > Crowdsourcing (0.35)

Add feedback

Evaluating the Social Impact of Generative AI Systems in Systems and Society

Solaiman, Irene, Talat, Zeerak, Agnew, William, Ahmad, Lama, Baker, Dylan, Blodgett, Su Lin, Daumé, Hal III, Dodge, Jesse, Evans, Ellie, Hooker, Sara, Jernite, Yacine, Luccioni, Alexandra Sasha, Lusoli, Alberto, Mitchell, Margaret, Newman, Jessica, Png, Marie-Therese, Strait, Andrew, Vassilev, Apostol

arXiv.org Artificial IntelligenceJun-12-2023

Generative AI systems across modalities, ranging from text, image, audio, and video, have broad social impacts, but there exists no official standard for means of evaluating those impacts and which impacts should be evaluated. We move toward a standard approach in evaluating a generative AI system for any modality, in two overarching categories: what is able to be evaluated in a base system that has no predetermined application and what is able to be evaluated in society. We describe specific social impact categories and how to approach and conduct evaluations in the base technical system, then in people and society. Our framework for a base system defines seven categories of social impact: bias, stereotypes, and representational harms; cultural values and sensitive content; disparate performance; privacy and data protection; financial costs; environmental costs; and data and content moderation labor costs. Suggested methods for evaluation apply to all modalities and analyses of the limitations of existing evaluations serve as a starting point for necessary investment in future evaluations. We offer five overarching categories for what is able to be evaluated in society, each with their own subcategories: trustworthiness and autonomy; inequality, marginalization, and violence; concentration of authority; labor and creativity; and ecosystem and environment. Each subcategory includes recommendations for mitigating harm. We are concurrently crafting an evaluation repository for the AI research community to contribute existing evaluations along the given categories. This version will be updated following a CRAFT session at ACM FAccT 2023.

ai system, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2306.05949

Country:

Europe (1.00)
Asia (0.92)
North America > United States > New York > New York County > New York City (0.14)
North America > United States > California > Santa Clara County (0.14)

Genre:

Overview (0.67)
Research Report > Experimental Study (0.45)

Industry:

Social Sector (1.00)
Law > Civil Rights & Constitutional Law (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
(8 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback

Define, Evaluate, and Improve Task-Oriented Cognitive Capabilities for Instruction Generation Models

Zhao, Lingjun, Nguyen, Khanh, Daumé, Hal III

arXiv.org Artificial IntelligenceMay-28-2023

Recent work studies the cognitive capabilities of language models through psychological tests designed for humans. While these studies are helpful for understanding the general capabilities of these models, there is no guarantee that a model possessing sufficient capabilities to pass those tests would actually use those capabilities in performing real-life tasks. In this work, we formulate task-oriented cognitive capabilities, which are human-like cognitive capabilities that language models leverage to perform tasks. These capabilities are (i) the ability to quickly generate good candidate utterances (the search capability) (ii) the ability to predict how a listener interprets those utterances and choose the most appropriate one (the pragmatic capability). We design an evaluation scheme for comparing these capabilities of a language model with those of a human. Applying this scheme to examine various models in a navigation instruction generation problem, we find that their pragmatic capability is severely lacking. This insight leads us to augment them with better models of the listener and obtain a significant boost of 11% in success rate in guiding real humans. Our work advocates for having a principled procedure for aligning language models with humans that involves (i) formulating task-oriented capabilities, (ii) devising a method to quantify their deficiency, and (iii) iteratively improving them.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2301.05149

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.67)

Add feedback

It Takes Two to Tango: Navigating Conceptualizations of NLP Tasks and Measurements of Performance

Subramonian, Arjun, Yuan, Xingdi, Daumé, Hal III, Blodgett, Su Lin

arXiv.org Artificial IntelligenceMay-15-2023

Progress in NLP is increasingly measured through benchmarks; hence, contextualizing progress requires understanding when and why practitioners may disagree about the validity of benchmarks. We develop a taxonomy of disagreement, drawing on tools from measurement modeling, and distinguish between two types of disagreement: 1) how tasks are conceptualized and 2) how measurements of model performance are operationalized. To provide evidence for our taxonomy, we conduct a meta-analysis of relevant literature to understand how NLP tasks are conceptualized, as well as a survey of practitioners about their impressions of different factors that affect benchmark validity. Our meta-analysis and survey across eight tasks, ranging from coreference resolution to question answering, uncover that tasks are generally not clearly and consistently conceptualized and benchmarks suffer from operationalization disagreements. These findings support our proposed taxonomy of disagreement. Finally, based on our taxonomy, we present a framework for constructing benchmarks and documenting their limitations.

benchmark, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2305.09022

Country:

North America > United States > Louisiana (0.14)
North America > United States > Oregon (0.14)
North America > United States > Maryland (0.14)
(6 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.67)

Industry: Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.93)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)

Add feedback

How do Authors' Perceptions of their Papers Compare with Co-authors' Perceptions and Peer-review Decisions?

Rastogi, Charvi, Stelmakh, Ivan, Beygelzimer, Alina, Dauphin, Yann N., Liang, Percy, Vaughan, Jennifer Wortman, Xue, Zhenyu, Daumé, Hal III, Pierson, Emma, Shah, Nihar B.

arXiv.org Artificial IntelligenceNov-22-2022

How do author perceptions match up to the outcomes of the peer-review process and perceptions of others? In a top-tier computer science conference (NeurIPS 2021) with more than 23,000 submitting authors and 9,000 submitted papers, we survey the authors on three questions: (i) their predicted probability of acceptance for each of their papers, (ii) their perceived ranking of their own papers based on scientific contribution, and (iii) the change in their perception about their own papers after seeing the reviews. The salient results are: (1) Authors have roughly a three-fold overestimate of the acceptance probability of their papers: The median prediction is 70% for an approximately 25% acceptance rate. (2) Female authors exhibit a marginally higher (statistically significant) miscalibration than male authors; predictions of authors invited to serve as meta-reviewers or reviewers are similarly calibrated, but better than authors who were not invited to review. (3) Authors' relative ranking of scientific contribution of two submissions they made generally agree (93%) with their predicted acceptance probabilities, but there is a notable 7% responses where authors think their better paper will face a worse outcome. (4) The author-provided rankings disagreed with the peer-review decisions about a third of the time; when co-authors ranked their jointly authored papers, co-authors disagreed at a similar rate -- about a third of the time. (5) At least 30% of respondents of both accepted and rejected papers said that their perception of their own paper improved after the review process. The stakeholders in peer review should take these findings into account in setting their expectations from peer review.

artificial intelligence, machine learning, perception, (18 more...)

arXiv.org Artificial Intelligence

2211.12966

Country:

Europe (0.46)
North America > United States (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Questionnaire & Opinion Survey (1.00)

Industry:

Government (0.46)
Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Learning When and What to Ask: a Hierarchical Reinforcement Learning Framework

Nguyen, Khanh, Bisk, Yonatan, Daumé, Hal III

arXiv.org Artificial IntelligenceOct-13-2021

Reliable AI agents should be mindful of the limits of their knowledge and consult humans when sensing that they do not have sufficient knowledge to make sound decisions. We formulate a hierarchical reinforcement learning framework for learning to decide when to request additional information from humans and what type of information would be helpful to request. Our framework extends partially-observed Markov decision processes (POMDPs) by allowing an agent to interact with an assistant to leverage their knowledge in accomplishing tasks. Results on a simulated human-assisted navigation problem demonstrate the effectiveness of our framework: aided with an interaction policy learned by our method, a navigation policy achieves up to a 7x improvement in task success rate compared to performing tasks only by itself. The interaction policy is also efficient: on average, only a quarter of all actions taken during a task execution are requests for information. We analyze benefits and challenges of learning with a hierarchical policy structure and suggest directions for future work.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2110.08258

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.55)

Add feedback

A Human-Centered Interpretability Framework Based on Weight of Evidence

Alvarez-Melis, David, Kaur, Harmanpreet, Daumé, Hal III, Wallach, Hanna, Vaughan, Jennifer Wortman

arXiv.org Artificial IntelligenceApr-27-2021

In this paper, we take a human-centered approach to interpretable machine learning. First, drawing inspiration from the study of explanation in philosophy, cognitive science, and the social sciences, we propose a list of design principles for machine-generated explanations that are meaningful to humans. Using the concept of weight of evidence from information theory, we develop a method for producing explanations that adhere to these principles. We show that this method can be adapted to handle high-dimensional, multi-class settings, yielding a flexible meta-algorithm for generating explanations. We demonstrate that these explanations can be estimated accurately from finite samples and are robust to small perturbations of the inputs. We also evaluate our method through a qualitative user study with machine learning practitioners, where we observe that the resulting explanations are usable despite some participants struggling with background concepts like prior class probabilities. Finally, we conclude by surfacing design implications for interpretability tools

explanation, immunology, survey article, (19 more...)

arXiv.org Artificial Intelligence

2104.13299

Country:

North America > United States (0.67)
Europe > United Kingdom > England (0.14)

Genre:

Research Report > Experimental Study (1.00)
Questionnaire & Opinion Survey (1.00)
Research Report > New Finding (0.93)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

On the Potential of Lexico-logical Alignments for Semantic Parsing to SQL Queries

Shi, Tianze, Zhao, Chen, Boyd-Graber, Jordan, Daumé, Hal III, Lee, Lillian

arXiv.org Artificial IntelligenceOct-21-2020

Large-scale semantic parsing datasets annotated with logical forms have enabled major advances in supervised approaches. But can richer supervision help even more? To explore the utility of fine-grained, lexical-level supervision, we introduce Squall, a dataset that enriches 11,276 WikiTableQuestions English-language questions with manually created SQL equivalents plus alignments between SQL and question fragments. Our annotation enables new training possibilities for encoder-decoder models, including approaches from machine translation previously precluded by the absence of alignments. We propose and test two methods: (1) supervised attention; (2) adopting an auxiliary objective of disambiguating references in the input queries to table columns. In 5-fold cross validation, these strategies improve over strong baselines by 4.4% execution accuracy. Oracle experiments suggest that annotated alignments can support further accuracy gains of up to 23.9%.

alignment, deep learning, neural network, (18 more...)

arXiv.org Artificial Intelligence

2010.11246

Country:

North America > United States (0.28)
Europe (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Active Imitation Learning from Multiple Non-Deterministic Teachers: Formulation, Challenges, and Algorithms

Nguyen, Khanh, Daumé, Hal III

arXiv.org Machine LearningJun-13-2020

We formulate the problem of learning to imitate multiple, non-deterministic teachers with minimal interaction cost. Rather than learning a specific policy as in standard imitation learning, the goal in this problem is to learn a distribution over a policy space. We first present a general framework that efficiently models and estimates such a distribution by learning continuous representations of the teacher policies. Next, we develop Active Performance-Based Imitation Learning (APIL), an active learning algorithm for reducing the learner-teacher interaction cost in this framework. By making query decisions based on predictions of future progress, our algorithm avoids the pitfalls of traditional uncertainty-based approaches in the face of teacher behavioral uncertainty. Results on both toy and photo-realistic navigation tasks show that APIL significantly reduces the numbers of interactions with teachers without compromising on performance. Moreover, it is robust to various degrees of teacher behavioral uncertainty.

agent, artificial intelligence, neural network, (18 more...)

arXiv.org Machine Learning

2006.07777

Country: North America > United States > Maryland (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback