AITopics | Clark, Peter

Plotting

Clark, Peter

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning

Lu, Pan, Qiu, Liang, Chang, Kai-Wei, Wu, Ying Nian, Zhu, Song-Chun, Rajpurohit, Tanmay, Clark, Peter, Kalyan, Ashwin

arXiv.org Artificial IntelligenceMar-2-2023

Mathematical reasoning, a core ability of human intelligence, presents unique challenges for machines in abstract thinking and logical reasoning. Recent large pre-trained language models such as GPT-3 have achieved remarkable progress on mathematical reasoning tasks written in text form, such as math word problems (MWP). However, it is unknown if the models can handle more complex problems that involve math reasoning over heterogeneous information, such as tabular data. To fill the gap, we present Tabular Math Word Problems (TabMWP), a new dataset containing 38,431 open-domain grade-level problems that require mathematical reasoning on both textual and tabular data. Each question in TabMWP is aligned with a tabular context, which is presented as an image, semi-structured text, and a structured table. There are two types of questions: free-text and multi-choice, and each problem is annotated with gold solutions to reveal the multi-step reasoning process. We evaluate different pre-trained models on TabMWP, including the GPT-3 model in a few-shot setting. As earlier studies suggest, since few-shot GPT-3 relies on the selection of in-context examples, its performance is unstable and can degrade to near chance. The unstable issue is more severe when handling complex problems like TabMWP. To mitigate this, we further propose a novel approach, PromptPG, which utilizes policy gradient to learn to select in-context examples from a small amount of training data and then constructs the corresponding prompt for the test example. Experimental results show that our method outperforms the best baseline by 5.31% on the accuracy metric and reduces the prediction variance significantly compared to random selection, which verifies its effectiveness in selecting in-context examples.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2209.1461

Country: North America > United States > California (0.14)

Genre:

Research Report > Promising Solution (0.34)
Research Report > New Finding (0.34)
Research Report > Experimental Study (0.34)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.92)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.88)

Add feedback

Memory-assisted prompt editing to improve GPT-3 after deployment

Madaan, Aman, Tandon, Niket, Clark, Peter, Yang, Yiming

arXiv.org Artificial IntelligenceFeb-18-2023

Large LMs such as GPT-3 are powerful, but can commit mistakes that are obvious to humans. For example, GPT-3 would mistakenly interpret "What word is similar to good?" to mean a homophone, while the user intended a synonym. Our goal is to effectively correct such errors via user interactions with the system but without retraining, which will be prohibitively costly. We pair GPT-3 with a growing memory of recorded cases where the model misunderstood the user's intents, along with user feedback for clarification. Such a memory allows our system to produce enhanced prompts for any new query based on the user feedback for error correction on similar cases in the past. On four tasks (two lexical tasks, two advanced ethical reasoning tasks), we show how a (simulated) user can interactively teach a deployed GPT-3, substantially increasing its accuracy over the queries with different kinds of misunderstandings by the GPT-3. Our approach is a step towards the low-cost utility enhancement for very large pre-trained LMs. Code, data, and instructions to implement MEMPROMPT for a new task at https://www.memprompt.com/.

clarification, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2201.06009

Country:

Europe (0.67)
North America > United States > Minnesota (0.28)
North America > United States > Colorado (0.28)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Sports (0.68)
Education > Health & Safety (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Complexity-Based Prompting for Multi-Step Reasoning

Fu, Yao, Peng, Hao, Sabharwal, Ashish, Clark, Peter, Khot, Tushar

arXiv.org Artificial IntelligenceJan-30-2023

We study the task of prompting large-scale language models to perform multistep reasoning. Existing work shows that when prompted with a chain of thoughts (CoT), sequences of short sentences describing intermediate reasoning steps towards a final answer, large language models can generate new reasoning chains and predict answers for new inputs. A central question is which reasoning examples make the most effective prompts. In this work, we propose complexitybased prompting, a simple and effective example selection scheme for multi-step reasoning. We show that prompts with higher reasoning complexity, i.e., chains with more reasoning steps, achieve substantially better performance on multistep reasoning tasks over strong baselines. We further extend our complexitybased criteria from prompting (selecting inputs) to decoding (selecting outputs), where we sample multiple reasoning chains from the model, then choose the majority of generated answers from complex reasoning chains (over simple chains). When used to prompt GPT-3 and Codex, our approach substantially improves multi-step reasoning accuracy and achieves new state-of-the-art (SOTA) performance on three math benchmarks (GSM8K, MultiArith, and MathQA) and two BigBenchHard tasks (Date Understanding and Penguins), with an average +5.3 and up to +18 accuracy improvements. Compared with existing example selection schemes like manual tuning or retrieval-based selection, selection based on reasoning complexity is intuitive, easy to implement, and annotation-efficient. Further results demonstrate the robustness of performance gains from complex prompts under format perturbation and distribution shift. We consider the problem of prompting large language models for multi-step reasoning. Recent breakthroughs (Wei et al., 2022b; Wang et al., 2022b) show that language models, when large enough (>100B parameters), exhibit the emergent ability (Wei et al., 2022a) of performing complex multi-step reasoning when provided with only a few reasoning examples.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2210.0072

Country:

Europe (0.67)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.72)

Add feedback

Improving scripts with a memory of natural feedback

Tandon, Niket, Madaan, Aman, Clark, Peter, Yang, Yiming

arXiv.org Artificial IntelligenceDec-16-2021

How can an end-user provide feedback if a deployed structured prediction model generates incorrect output? Our goal is to allow users to correct errors directly through interaction, without retraining, by giving feedback on the model's output. We create a dynamic memory architecture with a growing memory of feedbacks about errors in the output. Given a new, unseen input, our model can use feedback from a similar, past erroneous state. On a script generation task, we show empirically that the model learns to apply feedback effectively (up to 30 points improvement), while avoiding similar past mistakes after deployment (up to 10 points improvement on an unseen set). This is a first step towards strengthening deployed models, potentially broadening their utility.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2112.09737

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.35)

Add feedback

DREAM: Uncovering Mental Models behind Language Models

Gu, Yuling, Mishra, Bhavana Dalvi, Clark, Peter

arXiv.org Artificial IntelligenceDec-16-2021

To what extent do language models (LMs) build "mental models" of a scene when answering situated questions (e.g., questions about a specific ethical dilemma)? While cognitive science has shown that mental models play a fundamental role in human problem-solving, it is unclear whether the high question-answering performance of existing LMs is backed by similar model building - and if not, whether that can explain their well-known catastrophic failures. We observed that Macaw, an existing T5-based LM, when probed provides somewhat useful but inadequate mental models for situational questions (estimated accuracy=43%, usefulness=21%, consistency=42%). We propose DREAM, a model that takes a situational question as input to produce a mental model elaborating the situation, without any additional task specific training data for mental models. It inherits its social commonsense through distant supervision from existing NLP resources. Our analysis shows that DREAM can produce significantly better mental models (estimated accuracy=67%, usefulness=37%, consistency=71%) compared to Macaw. Finally, mental models generated by DREAM can be used as additional context for situational QA tasks. This additional context improves the answer accuracy of a Macaw zero-shot model by between +1% and +4% (absolute) on three different datasets.

machine learning, mental model, question answering, (21 more...)

arXiv.org Artificial Intelligence

2112.08656

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.82)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.37)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.35)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.34)

Add feedback

Interscript: A dataset for interactive learning of scripts through error feedback

Tandon, Niket, Madaan, Aman, Clark, Peter, Sakaguchi, Keisuke, Yang, Yiming

arXiv.org Artificial IntelligenceDec-15-2021

How can an end-user provide feedback if a deployed structured prediction model generates inconsistent output, ignoring the structural complexity of human language? This is an emerging topic with recent progress in synthetic or constrained settings, and the next big leap would require testing and tuning models in real-world settings. We present a new dataset, Interscript, containing user feedback on a deployed model that generates complex everyday tasks. Interscript contains 8,466 data points -- the input is a possibly erroneous script and a user feedback, and the output is a modified script. We posit two use-cases of \ours that might significantly advance the state-of-the-art in interactive learning. The dataset is available at: https://github.com/allenai/interscript.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2112.07867

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > Minnesota (0.14)

Genre: Research Report (0.40)

Industry: Education > Educational Setting > Online (0.62)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.47)

Add feedback

How Much Coffee Was Consumed During EMNLP 2019? Fermi Problems: A New Reasoning Challenge for AI

Kalyan, Ashwin, Kumar, Abhinav, Chandrasekaran, Arjun, Sabharwal, Ashish, Clark, Peter

arXiv.org Artificial IntelligenceOct-27-2021

Many real-world problems require the combined application of multiple reasoning abilities employing suitable abstractions, commonsense knowledge, and creative synthesis of problem-solving strategies. To help advance AI systems towards such capabilities, we propose a new reasoning challenge, namely Fermi Problems (FPs), which are questions whose answers can only be approximately estimated because their precise computation is either impractical or impossible. For example, "How much would the sea level rise if all ice in the world melted?" FPs are commonly used in quizzes and interviews to bring out and evaluate the creative reasoning abilities of humans. To do the same for AI systems, we present two datasets: 1) A collection of 1k real-world FPs sourced from quizzes and olympiads; and 2) a bank of 10k synthetic FPs of intermediate complexity to serve as a sandbox for the harder real-world challenge. In addition to question answer pairs, the datasets contain detailed solutions in the form of an executable program and supporting facts, helping in supervision and evaluation of intermediate steps. We demonstrate that even extensively fine-tuned large scale language models perform poorly on these datasets, on average making estimates that are off by two orders of magnitude. Our contribution is thus the crystallization of several unsolved AI problems into a single, new challenge that we hope will spur further advances in building systems that can reason.

artificial intelligence, commonsense reasoning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2110.14207

Country: North America > United States (0.14)

Genre: Research Report (0.50)

Industry: Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.35)

Add feedback

Think about it! Improving defeasible reasoning by first modeling the question scenario

Madaan, Aman, Tandon, Niket, Rajagopal, Dheeraj, Clark, Peter, Yang, Yiming, Hovy, Eduard

arXiv.org Artificial IntelligenceOct-24-2021

Defeasible reasoning is the mode of reasoning where conclusions can be overturned by taking into account new evidence. Existing cognitive science literature on defeasible reasoning suggests that a person forms a mental model of the problem scenario before answering questions. Our research goal asks whether neural models can similarly benefit from envisioning the question scenario before answering a defeasible query. Our approach is, given a question, to have a model first create a graph of relevant influences, and then leverage that graph as an additional input when answering the question. Our system, CURIOUS, achieves a new state-of-the-art on three different defeasible reasoning datasets. This result is significant as it illustrates that performance can be improved by guiding a system to "think about" a question and explicitly model the scenario, rather than answering reflexively. Code, data, and pre-trained models are located at https://github.com/madaan/thinkaboutit.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2110.12349

Country:

Europe (1.00)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (0.88)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)

Add feedback

General-Purpose Question-Answering with Macaw

Tafjord, Oyvind, Clark, Peter

arXiv.org Artificial IntelligenceSep-6-2021

Despite the successes of pretrained language models, there are still few high-quality, general-purpose QA systems that are freely available. In response, we present Macaw, a versatile, generative question-answering (QA) system that we are making available to the community. Macaw is built on UnifiedQA, itself built on T5, and exhibits strong performance, zero-shot, on a wide variety of topics, including outperforming GPT-3 by over 10% (absolute) on Challenge300, a suite of 300 challenge questions, despite being an order of magnitude smaller (11 billion vs. 175 billion parameters). In addition, Macaw allows different permutations ("angles") of its inputs and outputs to be used, for example Macaw can take a question and produce an answer; or take an answer and produce a question; or take an answer and question, and produce multiple-choice options. We describe the system, and illustrate a variety of question types where it produces surprisingly good answers, well outside the training setup. We also identify question classes where it still appears to struggle, offering insights into the limitations of pretrained language models. Macaw is freely available, and we hope that it proves useful to the community. Macaw is available at https://github.com/allenai/macaw

acaw, artificial intelligence, ground transportation, (20 more...)

arXiv.org Artificial Intelligence

2109.02593

Country: North America > United States (0.14)

Genre: Research Report (0.50)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Explaining Answers with Entailment Trees

Dalvi, Bhavana, Jansen, Peter, Tafjord, Oyvind, Xie, Zhengnan, Smith, Hannah, Pipatanangkura, Leighanna, Clark, Peter

arXiv.org Artificial IntelligenceApr-17-2021

Our goal, in the context of open-domain textual question-answering (QA), is to explain answers by not just listing supporting textual evidence ("rationales"), but also showing how such evidence leads to the answer in a systematic way. If this could be done, new opportunities for understanding and debugging the system's reasoning would become possible. Our approach is to generate explanations in the form of entailment trees, namely a tree of entailment steps from facts that are known, through intermediate conclusions, to the final answer. To train a model with this skill, we created ENTAILMENTBANK, the first dataset to contain multistep entailment trees. At each node in the tree (typically) two or more facts compose together to produce a new conclusion. Given a hypothesis (question + answer), we define three increasingly difficult explanation tasks: generate a valid entailment tree given (a) all relevant sentences (the leaves of the gold entailment tree), (b) all relevant and some irrelevant sentences, or (c) a corpus. We show that a strong language model only partially solves these tasks, and identify several new directions to improve performance. This work is significant as it provides a new type of dataset (multistep entailments) and baselines, offering a new avenue for the community to generate richer, more systematic explanations.

artificial intelligence, entailment tree, natural language, (20 more...)

arXiv.org Artificial Intelligence

2104.08661

Country: North America > United States > Arizona > Pima County > Tucson (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.35)

Add feedback