Goto

Collaborating Authors

 toolformer




Toolformer: Language Models Can Teach Themselves to Use Tools

Neural Information Processing Systems

Language models (LMs) exhibit remarkable abilities to solve new tasks from just a few examples or textual instructions, especially at scale. They also, paradoxically, struggle with basic functionality, such as arithmetic or factual lookup, where much simpler and smaller specialized models excel. In this paper, we show that LMs can teach themselves to via simple APIs and achieve the best of both worlds. We introduce, a model trained to decide which APIs to call, when to call them, what arguments to pass, and how to best incorporate the results into future token prediction. This is done in a self-supervised way, requiring nothing more than a handful of demonstrations for each API. We incorporate a range of tools, including a calculator, a Q&A system, a search engine, a translation system, and a calendar. Toolformer achieves substantially improved zero-shot performance across a variety of downstream tasks, often competitive with much larger models, without sacrificing its core language modeling abilities.


A API Details

Neural Information Processing Systems

API calls for each position identified in a piece of text. Question Answering We use the Atlas model of Izacard et al. (2022) finetuned on Natural Questions Calculator Our calculator is based on a simple Python script and only supports the operators " It does not return any result for syntactically invalid equations. "=", "equals", "equal to", "total of", "average of" followed by a number, or (iii) contain at least three English text before generating API calls. Below, we list the prompts used to sample API calls for each tool considered. Your task is to add calls to a Question Answering API to a piece of text. Input: Joe Biden was born in Scranton, Pennsylvania. Output: Joe Biden was born in [QA("Where was Joe Biden born?")] Scranton, [QA("In Output: Coca-Cola, or [QA("What other name is Coca-Cola known by?")] Coke, is Your task is to add calls to a Calculator API to a piece of text.



Toolformer: Language Models Can Teach Themselves to Use Tools

Neural Information Processing Systems

Language models (LMs) exhibit remarkable abilities to solve new tasks from just a few examples or textual instructions, especially at scale. They also, paradoxically, struggle with basic functionality, such as arithmetic or factual lookup, where much simpler and smaller specialized models excel. In this paper, we show that LMs can teach themselves to use external tools via simple APIs and achieve the best of both worlds. We introduce Toolformer, a model trained to decide which APIs to call, when to call them, what arguments to pass, and how to best incorporate the results into future token prediction. This is done in a self-supervised way, requiring nothing more than a handful of demonstrations for each API.


Efficient Tool Use with Chain-of-Abstraction Reasoning

Gao, Silin, Dwivedi-Yu, Jane, Yu, Ping, Tan, Xiaoqing Ellen, Pasunuru, Ramakanth, Golovneva, Olga, Sinha, Koustuv, Celikyilmaz, Asli, Bosselut, Antoine, Wang, Tianlu

arXiv.org Artificial Intelligence

To achieve faithful reasoning that aligns with human expectations, large language models (LLMs) need to ground their reasoning to real-world knowledge (e.g., web facts, math and physical rules). Tools help LLMs access this external knowledge, but there remains challenges for fine-tuning LLM agents (e.g., Toolformer) to invoke tools in multi-step reasoning problems, where inter-connected tool calls require holistic and efficient tool usage planning. In this work, we propose a new method for LLMs to better leverage tools in multi-step reasoning. Our method, Chain-of-Abstraction (CoA), trains LLMs to first decode reasoning chains with abstract placeholders, and then call domain tools to reify each reasoning chain by filling in specific knowledge. This planning with abstract chains enables LLMs to learn more general reasoning strategies, which are robust to shifts of domain knowledge (e.g., math results) relevant to different reasoning questions. It also allows LLMs to perform decoding and calling of external tools in parallel, which avoids the inference delay caused by waiting for tool responses. In mathematical reasoning and Wiki QA domains, we show that our method consistently outperforms previous chain-of-thought and tool-augmented baselines on both in-distribution and out-of-distribution test sets, with an average ~6% absolute QA accuracy improvement. LLM agents trained with our method also show more efficient tool use, with inference speed being on average ~1.4x faster than baseline tool-augmented LLMs.


Frugal LMs Trained to Invoke Symbolic Solvers Achieve Parameter-Efficient Arithmetic Reasoning

Dutta, Subhabrata, Singh, Joykirat, Pandey, Ishan, Manchanda, Sunny, Chakrabarti, Soumen, Chakraborty, Tanmoy

arXiv.org Artificial Intelligence

Large Language Models (LLM) exhibit zero-shot mathematical reasoning capacity as a behavior emergent with scale, commonly manifesting as chain-of-thoughts (CoT) reasoning. However, multiple empirical findings suggest that this prowess is exclusive to LLMs with exorbitant sizes (beyond 50 billion parameters). Meanwhile, educational neuroscientists suggest that symbolic algebraic manipulation be introduced around the same time as arithmetic word problems to modularize language-to-formulation, symbolic manipulation of the formulation, and endgame arithmetic. In this paper, we start with the hypothesis that much smaller LMs, which are weak at multi-step reasoning, can achieve reasonable arithmetic reasoning if arithmetic word problems are posed as a formalize-then-solve task. In our architecture, which we call SYRELM, the LM serves the role of a translator to map natural language arithmetic questions into a formal language (FL) description. A symbolic solver then evaluates the FL expression to obtain the answer. A small frozen LM, equipped with an efficient low-rank adapter, is capable of generating FL expressions that incorporate natural language descriptions of the arithmetic problem (e.g., variable names and their purposes, formal expressions combining variables, etc.). We adopt policy-gradient reinforcement learning to train the adapted LM, informed by the non-differentiable symbolic solver. This marks a sharp departure from the recent development in tool-augmented LLMs, in which the external tools (e.g., calculator, Web search, etc.) are essentially detached from the learning phase of the LM. SYRELM shows massive improvements (e.g., +30.65 absolute point improvement in accuracy on the SVAMP dataset using GPT-J 6B model) over base LMs, while keeping our testbed easy to diagnose, interpret and within reach of most researchers.


AI capabilities can be significantly improved without expensive retraining

Davidson, Tom, Denain, Jean-Stanislas, Villalobos, Pablo, Bas, Guillem

arXiv.org Artificial Intelligence

State-of-the-art AI systems can be significantly improved without expensive retraining via "post-training enhancements"-techniques applied after initial training like fine-tuning the system to use a web browser. We review recent post-training enhancements, categorizing them into five types: tool-use, prompting methods, scaffolding, solution selection, and data generation. Different enhancements improve performance on different tasks, making it hard to compare their significance. So we translate improvements from different enhancements into a common currency, the compute-equivalent gain: how much additional training compute would be needed to improve performance by the same amount as the enhancement. Our non-experimental work shows that post-training enhancements have significant benefits: most surveyed enhancements improve benchmark performance by more than a 5x increase in training compute, some by more than 20x. Post-training enhancements are relatively cheap to develop: fine-tuning costs are typically <1% of the original training cost. Governing the development of capable post-training enhancements may be challenging because frontier models could be enhanced by a wide range of actors.


👾 Your guide to AI: March 2023

#artificialintelligence

Welcome to the latest issue of your guide to AI, an editorialized newsletter covering key developments in AI research, industry, geopolitics and startups during February 2023. We wrote an op-ed for Sifted on how generative AI will change the software landscape and commented for TIME's cover story on ChatGPT. On the politics side, we reviewed and recommended spinout policy reform in Tony Blair Institute for Global Change's paper A New National Purpose and were included in Politico's 20 people who matter in UK technology. Air Street was featured in Insider's list of top AI investors See some of you at London.AI on Thurs 9 March w/DeepMind, Adept, Palantir and Basecamp Research. Register for our one-day RAAIS conference on research and applied AI 23 June 2023 in London. We'll be hosting speakers from Meta AI, Cruise, Intercom, Genentech, Northvolt and more to come! FYI, you might have to read this issue in full online vs. in your inbox. As usual, we love hearing what you're up to and what's on your mind, just hit reply or forward to your friends:-) Building large-scale AI models requires enormous computing power, which has emerged as the soft power of our time.