Goto

Collaborating Authors

 Forbes, Maxwell


Edited Media Understanding Frames: Reasoning About the Intent and Implications of Visual Misinformation

arXiv.org Artificial Intelligence

Multimodal disinformation, from 'deepfakes' to simple edits that deceive, is an important societal problem. Yet at the same time, the vast majority of media edits are harmless -- such as a filtered vacation photo. The difference between this example, and harmful edits that spread disinformation, is one of intent. Recognizing and describing this intent is a major challenge for today's AI systems. We present the task of Edited Media Understanding, requiring models to answer open-ended questions that capture the intent and implications of an image edit. We introduce a dataset for our task, EMU, with 48k question-answer pairs written in rich natural language. We evaluate a wide variety of vision-and-language models for our task, and introduce a new model PELICAN, which builds upon recent progress in pretrained multimodal representations. Our model obtains promising results on our dataset, with humans rating its answers as accurate 40.35% of the time. At the same time, there is still much work to be done -- humans prefer human-annotated captions 93.56% of the time -- and we provide analysis that highlights areas for further progress.


Can Machines Learn Morality? The Delphi Experiment

arXiv.org Artificial Intelligence

As AI systems become increasingly powerful and pervasive, there are growing concerns about machines' morality or a lack thereof. Yet, teaching morality to machines is a formidable task, as morality remains among the most intensely debated questions in humanity, let alone for AI. Existing AI systems deployed to millions of users, however, are already making decisions loaded with moral implications, which poses a seemingly impossible challenge: teaching machines moral sense, while humanity continues to grapple with it. To explore this challenge, we introduce Delphi, an experimental framework based on deep neural networks trained directly to reason about descriptive ethical judgments, e.g., "helping a friend" is generally good, while "helping a friend spread fake news" is not. Empirical results shed novel insights on the promises and limits of machine ethics; Delphi demonstrates strong generalization capabilities in the face of novel ethical situations, while off-the-shelf neural network models exhibit markedly poor judgment including unjust biases, confirming the need for explicitly teaching machines moral sense. Yet, Delphi is not perfect, exhibiting susceptibility to pervasive biases and inconsistencies. Despite that, we demonstrate positive use cases of imperfect Delphi, including using it as a component model within other imperfect AI systems. Importantly, we interpret the operationalization of Delphi in light of prominent ethical theories, which leads us to important future research questions.


Moral Stories: Situated Reasoning about Norms, Intents, Actions, and their Consequences

arXiv.org Artificial Intelligence

In social settings, much of human behavior is governed by unspoken rules of conduct. For artificial systems to be fully integrated into social environments, adherence to such norms is a central prerequisite. We investigate whether contemporary NLG models can function as behavioral priors for systems deployed in social settings by generating action hypotheses that achieve predefined goals under moral constraints. Moreover, we examine if models can anticipate likely consequences of (im)moral actions, or explain why certain actions are preferable by generating relevant norms. For this purpose, we introduce 'Moral Stories', a crowd-sourced dataset of structured, branching narratives for the study of grounded, goal-oriented social reasoning. Finally, we propose decoding strategies that effectively combine multiple expert models to significantly improve the quality of generated actions, consequences, and norms compared to strong baselines, e.g. though abductive reasoning.


Social Chemistry 101: Learning to Reason about Social and Moral Norms

arXiv.org Artificial Intelligence

Social norms---the unspoken commonsense rules about acceptable social behavior---are crucial in understanding the underlying causes and intents of people's actions in narratives. For example, underlying an action such as "wanting to call cops on my neighbors" are social norms that inform our conduct, such as "It is expected that you report crimes." We present Social Chemistry, a new conceptual formalism to study people's everyday social norms and moral judgments over a rich spectrum of real life situations described in natural language. We introduce Social-Chem-101, a large-scale corpus that catalogs 292k rules-of-thumb such as "it is rude to run a blender at 5am" as the basic conceptual units. Each rule-of-thumb is further broken down with 12 different dimensions of people's judgments, including social judgments of good and bad, moral foundations, expected cultural pressure, and assumed legality, which together amount to over 4.5 million annotations of categorical labels and free-text descriptions. Comprehensive empirical results based on state-of-the-art neural models demonstrate that computational modeling of social norms is a promising research direction. Our model framework, Neural Norm Transformer, learns and generalizes Social-Chem-101 to successfully reason about previously unseen situations, generating relevant (and potentially novel) attribute-aware social rules-of-thumb.


Programming by Demonstration with Situated Semantic Parsing

AAAI Conferences

Programming by Demonstration (PbD) is an approach to programming robots by demonstrating the desired behavior. Speech is a natural, hands-free way to augment demonstrations with control commands that guide the PbD process. However, existing speech interfaces for PbD systems rely on ad-hoc, predefined command sets that are rigid and require user training. Instead, we aim to develop flexible speech interfaces to accommodate user variations and ambiguous utterances. To that end, we propose to use a situated semantic parser that jointly reasons about the user's speech and the robot's state to resolve ambiguities. In this paper, we describe this approach and compare its utility to a rigid speech command interface.


Robot Programming by Demonstration with Crowdsourced Action Fixes

AAAI Conferences

Programming by Demonstration (PbD) can allow end-users to teach robots new actions simply by demonstrating them. However, learning generalizable actions requires a large number of demonstrations that is unreasonable to expect from end-users. In this paper, we explore the idea of using crowdsourcing to collect action demonstrations from the crowd. We propose a PbD framework in which the end-user provides an initial seed demonstration, and then the robot searches for scenarios in which the action will not work and requests the crowd to fix the action for these scenarios. We use instance-based learning with a simple yet powerful action representation that allows an intuitive visualization of the action. Crowd workers directly interact with these visualizations to fix them. We demonstrate the utility of our approach with a user study involving local crowd workers (N=31) and analyze the collected data and the impact of alternative design parameters so as to inform a real-world deployment of our system.