Collaborating Authors


FFCI: A Framework for Interpretable Automatic Evaluation of Summarization

Journal of Artificial Intelligence Research

In this paper, we propose FFCI, a framework for fine-grained summarization evaluation that comprises four elements: faithfulness (degree of factual consistency with the source), focus (precision of summary content relative to the reference), coverage (recall of summary content relative to the reference), and inter-sentential coherence (document fluency between adjacent sentences). We construct a novel dataset for focus, coverage, and inter-sentential coherence, and develop automatic methods for evaluating each of the four dimensions of FFCI based on cross-comparison of evaluation metrics and model-based evaluation methods, including question answering (QA) approaches, semantic textual similarity (STS), next-sentence prediction (NSP), and scores derived from 19 pre-trained language models. We then apply the developed metrics in evaluating a broad range of summarization models across two datasets, with some surprising findings.

Neural Natural Language Generation: A Survey on Multilinguality, Multimodality, Controllability and Learning

Journal of Artificial Intelligence Research

Developing artificial learning systems that can understand and generate natural language has been one of the long-standing goals of artificial intelligence. Recent decades have witnessed an impressive progress on both of these problems, giving rise to a new family of approaches. Especially, the advances in deep learning over the past couple of years have led to neural approaches to natural language generation (NLG). These methods combine generative language learning techniques with neural-networks based frameworks. With a wide range of applications in natural language processing, neural NLG (NNLG) is a new and fast growing field of research. In this state-of-the-art report, we investigate the recent developments and applications of NNLG in its full extent from a multidimensional view, covering critical perspectives such as multimodality, multilinguality, controllability and learning strategies. We summarize the fundamental building blocks of NNLG approaches from these aspects and provide detailed reviews of commonly used preprocessing steps and basic neural architectures. This report also focuses on the seminal applications of these NNLG models such as machine translation, description generation, automatic speech recognition, abstractive summarization, text simplification, question answering and generation, and dialogue generation. Finally, we conclude with a thorough discussion of the described frameworks by pointing out some open research directions.

Predicting Decisions in Language Based Persuasion Games

Journal of Artificial Intelligence Research

Sender-receiver interactions, and specifically persuasion games, are widely researched in economic modeling and artificial intelligence, and serve as a solid foundation for powerful applications. However, in the classic persuasion games setting, the messages sent from the expert to the decision-maker are abstract or well-structured application-specific signals rather than natural (human) language messages, although natural language is a very common communication signal in real-world persuasion setups. This paper addresses the use of natural language in persuasion games, exploring its impact on the decisions made by the players and aiming to construct effective models for the prediction of these decisions. For this purpose, we conduct an online repeated interaction experiment. At each trial of the interaction, an informed expert aims to sell an uninformed decision-maker a vacation in a hotel, by sending her a review that describes the hotel. While the expert is exposed to several scored reviews, the decision-maker observes only the single review sent by the expert, and her payoff in case she chooses to take the hotel is a random draw from the review score distribution available to the expert only. The expert’s payoff, in turn, depends on the number of times the decision-maker chooses the hotel. We also compare the behavioral patterns in this experiment to the equivalent patterns in similar experiments where the communication is based on the numerical values of the reviews rather than the reviews’ text, and observe substantial differences which can be explained through an equilibrium analysis of the game. We consider a number of modeling approaches for our verbal communication setup, differing from each other in the model type (deep neural network (DNN) vs. linear classifier), the type of features used by the model (textual, behavioral or both) and the source of the textual features (DNN-based vs. hand-crafted). Our results demonstrate that given a prefix of the interaction sequence, our models can predict the future decisions of the decision-maker, particularly when a sequential modeling approach and hand-crafted textual features are applied. Further analysis of the hand-crafted textual features allows us to make initial observations about the aspects of text that drive decision making in our setup.

Get out of the BAG! Silos in AI Ethics Education: Unsupervised Topic Modeling Analysis of Global AI Curricula

Journal of Artificial Intelligence Research

The domain of Artificial Intelligence (AI) ethics is not new, with discussions going back at least 40 years. Teaching the principles and requirements of ethical AI to students is considered an essential part of this domain, with an increasing number of technical AI courses taught at several higher-education institutions around the globe including content related to ethics. By using Latent Dirichlet Allocation (LDA), a generative probabilistic topic model, this study uncovers topics in teaching ethics in AI courses and their trends related to where the courses are taught, by whom, and at what level of cognitive complexity and specificity according to Bloom’s taxonomy. In this exploratory study based on unsupervised machine learning, we analyzed a total of 166 courses: 116 from North American universities, 11 from Asia, 36 from Europe, and 10 from other regions. Based on this analysis, we were able to synthesize a model of teaching approaches, which we call BAG (Build, Assess, and Govern), that combines specific cognitive levels, course content topics, and disciplines affiliated with the department(s) in charge of the course. We critically assess the implications of this teaching paradigm and provide suggestions about how to move away from these practices. We challenge teaching practitioners and program coordinators to reflect on their usual procedures so that they may expand their methodology beyond the confines of stereotypical thought and traditional biases regarding what disciplines should teach and how. This article appears in the AI & Society track.

Underspecification Challenging Machine Learning Modeling - AI Trends


The three little bears strived to get it just right, and AI model builders strive to do the same thing when it comes to specifying their model. Underspecification is when you build a model that performs well on your data, but so do other models, which could lead to your model decaying over time. The discussion of underspecification kicked off last fall when Google researchers published a paper on the subject, "Underspecification Presents Challenges for Credibility in Modern Machine Learning." "ML models often exhibit unexpectedly poor behavior when they are deployed in real-world domains. We identify underspecification as a key reason for these failures," stated the paper, put together by a group of scientists led by author Alexander D'Amour, a research scientist with Google Brain of Cambridge, Mass.

ZeroGen: Efficient Zero-shot Learning via Dataset Generation Artificial Intelligence

There is a growing interest in dataset generation recently due to the superior generative capacity of large pre-trained language models (PLMs). In this paper, we study a flexible and efficient zero-short learning method, ZeroGen. Given a zero-shot task, we first generate a dataset from scratch using PLMs in an unsupervised manner. Then, we train a tiny task model (e.g., LSTM) under the supervision of the synthesized dataset. This approach allows highly efficient inference as the final task model only has orders of magnitude fewer parameters comparing to PLMs (e.g., GPT2-XL). Apart from being annotation-free and efficient, we argue that ZeroGen can also provide useful insights from the perspective of data-free model-agnostic knowledge distillation, and unreferenced text generation evaluation. Experiments and analysis on different NLP tasks, namely, text classification, question answering, and natural language inference), show the effectiveness of ZeroGen.

Contextual Importance and Utility: aTheoretical Foundation Artificial Intelligence

This paper provides new theory to support to the eXplainable AI (XAI) method Contextual Importance and Utility (CIU). CIU arithmetic is based on the concepts of Multi-Attribute Utility Theory, which gives CIU a solid theoretical foundation. The novel concept of contextual influence is also defined, which makes it possible to compare CIU directly with so-called additive feature attribution (AFA) methods for model-agnostic outcome explanation. One key takeaway is that the "influence" concept used by AFA methods is inadequate for outcome explanation purposes even for simple models to explain. Experiments with simple models show that explanations using contextual importance (CI) and contextual utility (CU) produce explanations where influence-based methods fail. It is also shown that CI and CU guarantees explanation faithfulness towards the explained model.

The New Intelligence Game


The relevance of the video is that the browser identified the application being used by the IAI as Google Earth and, according to the OSC 2006 report, the Arabic-language caption reads Islamic Army in Iraq/The Military Engineering Unit – Preparations for Rocket Attack, the video was recorded in 5/1/2006, we provide, in Appendix A, a reproduction of the screenshot picture made available in the OSC report. Now, prior to the release of this video demonstration of the use of Google Earth to plan attacks, in accordance with the OSC 2006 report, in the OSC-monitored online forums, discussions took place on the use of Google Earth as a GEOINT tool for terrorist planning. On August 5, 2005 the user "Al-Illiktrony" posted a message to the Islamic Renewal Organization forum titled A Gift for the Mujahidin, a Program To Enable You to Watch Cities of the World Via Satellite, in this post the author dedicated Google Earth to the mujahidin brothers and to Shaykh Muhammad al-Mas'ari, the post was replied in the forum by "Al-Mushtaq al-Jannah" warning that Google programs retain complete information about their users. This is a relevant issue, however, there are two caveats, given the amount of Google Earth users, it may be difficult for Google to flag a jihadist using the functionality in time to prevent an attack plan, one possible solution would be for Google to flag computers based on searched websites and locations, for instance to flag computers that visit certain critical sites, but this is a problem when landmarks are used, furthermore, and this is the second caveat, one may not use one's own computer to produce the search or even mask the IP address. On October 3, 2005, as described in the OSC 2006 report, in a reply to a posting by Saddam Al-Arab on the Baghdad al-Rashid forum requesting the identification of a roughly sketched map, "Almuhannad" posted a link to a site that provided a free download of Google Earth, suggesting that the satellite imagery from Google's service could help identify the sketch.

StoryBuddy: A Human-AI Collaborative Chatbot for Parent-Child Interactive Storytelling with Flexible Parental Involvement Artificial Intelligence

Despite its benefits for children's skill development and parent-child bonding, many parents do not often engage in interactive storytelling by having story-related dialogues with their child due to limited availability or challenges in coming up with appropriate questions. While recent advances made AI generation of questions from stories possible, the fully-automated approach excludes parent involvement, disregards educational goals, and underoptimizes for child engagement. Informed by need-finding interviews and participatory design (PD) results, we developed StoryBuddy, an AI-enabled system for parents to create interactive storytelling experiences. StoryBuddy's design highlighted the need for accommodating dynamic user needs between the desire for parent involvement and parent-child bonding and the goal of minimizing parent intervention when busy. The PD revealed varied assessment and educational goals of parents, which StoryBuddy addressed by supporting configuring question types and tracking child progress. A user study validated StoryBuddy's usability and suggested design insights for future parent-AI collaboration systems.

Ultra-fine Entity Typing with Indirect Supervision from Natural Language Inference Artificial Intelligence

The task of ultra-fine entity typing (UFET) seeks to predict diverse and free-form words or phrases that describe the appropriate types of entities mentioned in sentences. A key challenge for this task lies in the large amount of types and the scarcity of annotated data per type. Existing systems formulate the task as a multi-way classification problem and train directly or distantly supervised classifiers. This causes two issues: (i) the classifiers do not capture the type semantics since types are often converted into indices; (ii) systems developed in this way are limited to predicting within a pre-defined type set, and often fall short of generalizing to types that are rarely seen or unseen in training. This work presents LITE, a new approach that formulates entity typing as a natural language inference (NLI) problem, making use of (i) the indirect supervision from NLI to infer type information meaningfully represented as textual hypotheses and alleviate the data scarcity issue, as well as (ii) a learning-to-rank objective to avoid the pre-defining of a type set. Experiments show that, with limited training data, LITE obtains state-of-the-art performance on the UFET task. In addition, LITE demonstrates its strong generalizability, by not only yielding best results on other fine-grained entity typing benchmarks, more importantly, a pre-trained LITE system works well on new data containing unseen types.