AITopics | toolformer

Collaborating Authors

toolformer

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

d842425e4bf79ba039352da0f658a906-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 09:49:07 GMT

artificial intelligence, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country:

Africa > Ghana (0.05)
North America > United States > Pennsylvania > Lackawanna County > Scranton (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)
Europe > United Kingdom (0.04)

Genre: Personal (0.54)

Industry:

Leisure & Entertainment (0.68)
Health & Medicine (0.68)
Government > Regional Government > North America Government > United States Government (0.47)
Media > Music (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

d842425e4bf79ba039352da0f658a906-Paper-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 09:49:03 GMT

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(8 more...)

Technology:

Information Technology > Information Management (0.94)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.73)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.73)
(2 more...)

Add feedback

Toolformer: Language Models Can Teach Themselves to Use Tools

Neural Information Processing SystemsDec-26-2025, 22:17:58 GMT

Language models (LMs) exhibit remarkable abilities to solve new tasks from just a few examples or textual instructions, especially at scale. They also, paradoxically, struggle with basic functionality, such as arithmetic or factual lookup, where much simpler and smaller specialized models excel. In this paper, we show that LMs can teach themselves to via simple APIs and achieve the best of both worlds. We introduce, a model trained to decide which APIs to call, when to call them, what arguments to pass, and how to best incorporate the results into future token prediction. This is done in a self-supervised way, requiring nothing more than a handful of demonstrations for each API. We incorporate a range of tools, including a calculator, a Q&A system, a search engine, a translation system, and a calendar. Toolformer achieves substantially improved zero-shot performance across a variety of downstream tasks, often competitive with much larger models, without sacrificing its core language modeling abilities.

language model, name change, toolformer, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

A API Details

Neural Information Processing SystemsOct-9-2025, 08:52:42 GMT

API calls for each position identified in a piece of text. Question Answering We use the Atlas model of Izacard et al. (2022) finetuned on Natural Questions Calculator Our calculator is based on a simple Python script and only supports the operators " It does not return any result for syntactically invalid equations. "=", "equals", "equal to", "total of", "average of" followed by a number, or (iii) contain at least three English text before generating API calls. Below, we list the prompts used to sample API calls for each tool considered. Your task is to add calls to a Question Answering API to a piece of text. Input: Joe Biden was born in Scranton, Pennsylvania. Output: Joe Biden was born in [QA("Where was Joe Biden born?")] Scranton, [QA("In Output: Coca-Cola, or [QA("What other name is Coca-Cola known by?")] Coke, is Your task is to add calls to a Calculator API to a piece of text.

artificial intelligence, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Lackawanna County > Scranton (0.24)
Africa > Ghana (0.05)
Asia > China > Jiangsu Province > Nanjing (0.04)
Europe > United Kingdom (0.04)

Genre: Personal (0.54)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Leisure & Entertainment (0.68)
Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

Toolformer: Language Models Can Teach Themselves to Use Tools

Neural Information Processing SystemsOct-9-2025, 08:52:38 GMT

A simple way to overcome the limitations of today's language models is to give them the ability to

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(8 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.73)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.73)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.32)

Add feedback

Toolformer: Language Models Can Teach Themselves to Use Tools

Neural Information Processing SystemsJan-19-2025, 23:40:09 GMT

Language models (LMs) exhibit remarkable abilities to solve new tasks from just a few examples or textual instructions, especially at scale. They also, paradoxically, struggle with basic functionality, such as arithmetic or factual lookup, where much simpler and smaller specialized models excel. In this paper, we show that LMs can teach themselves to use external tools via simple APIs and achieve the best of both worlds. We introduce Toolformer, a model trained to decide which APIs to call, when to call them, what arguments to pass, and how to best incorporate the results into future token prediction. This is done in a self-supervised way, requiring nothing more than a handful of demonstrations for each API.

language model, toolformer, use tool

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Efficient Tool Use with Chain-of-Abstraction Reasoning

Gao, Silin, Dwivedi-Yu, Jane, Yu, Ping, Tan, Xiaoqing Ellen, Pasunuru, Ramakanth, Golovneva, Olga, Sinha, Koustuv, Celikyilmaz, Asli, Bosselut, Antoine, Wang, Tianlu

arXiv.org Artificial IntelligenceJan-30-2024

To achieve faithful reasoning that aligns with human expectations, large language models (LLMs) need to ground their reasoning to real-world knowledge (e.g., web facts, math and physical rules). Tools help LLMs access this external knowledge, but there remains challenges for fine-tuning LLM agents (e.g., Toolformer) to invoke tools in multi-step reasoning problems, where inter-connected tool calls require holistic and efficient tool usage planning. In this work, we propose a new method for LLMs to better leverage tools in multi-step reasoning. Our method, Chain-of-Abstraction (CoA), trains LLMs to first decode reasoning chains with abstract placeholders, and then call domain tools to reify each reasoning chain by filling in specific knowledge. This planning with abstract chains enables LLMs to learn more general reasoning strategies, which are robust to shifts of domain knowledge (e.g., math results) relevant to different reasoning questions. It also allows LLMs to perform decoding and calling of external tools in parallel, which avoids the inference delay caused by waiting for tool responses. In mathematical reasoning and Wiki QA domains, we show that our method consistently outperforms previous chain-of-thought and tool-augmented baselines on both in-distribution and out-of-distribution test sets, with an average ~6% absolute QA accuracy improvement. LLM agents trained with our method also show more efficient tool use, with inference speed being on average ~1.4x faster than baseline tool-augmented LLMs.

reasoning, reasoning chain, toolformer, (14 more...)

arXiv.org Artificial Intelligence

2401.17464

Country:

North America > United States > New York > New York County > Manhattan (0.04)
Asia > Sri Lanka (0.04)
Asia > Maldives (0.04)
Asia > India (0.04)

Genre: Research Report (0.82)

Industry:

Media > Film (1.00)
Leisure & Entertainment > Sports > Tennis (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Frugal LMs Trained to Invoke Symbolic Solvers Achieve Parameter-Efficient Arithmetic Reasoning

Dutta, Subhabrata, Singh, Joykirat, Pandey, Ishan, Manchanda, Sunny, Chakrabarti, Soumen, Chakraborty, Tanmoy

arXiv.org Artificial IntelligenceDec-19-2023

Large Language Models (LLM) exhibit zero-shot mathematical reasoning capacity as a behavior emergent with scale, commonly manifesting as chain-of-thoughts (CoT) reasoning. However, multiple empirical findings suggest that this prowess is exclusive to LLMs with exorbitant sizes (beyond 50 billion parameters). Meanwhile, educational neuroscientists suggest that symbolic algebraic manipulation be introduced around the same time as arithmetic word problems to modularize language-to-formulation, symbolic manipulation of the formulation, and endgame arithmetic. In this paper, we start with the hypothesis that much smaller LMs, which are weak at multi-step reasoning, can achieve reasonable arithmetic reasoning if arithmetic word problems are posed as a formalize-then-solve task. In our architecture, which we call SYRELM, the LM serves the role of a translator to map natural language arithmetic questions into a formal language (FL) description. A symbolic solver then evaluates the FL expression to obtain the answer. A small frozen LM, equipped with an efficient low-rank adapter, is capable of generating FL expressions that incorporate natural language descriptions of the arithmetic problem (e.g., variable names and their purposes, formal expressions combining variables, etc.). We adopt policy-gradient reinforcement learning to train the adapted LM, informed by the non-differentiable symbolic solver. This marks a sharp departure from the recent development in tool-augmented LLMs, in which the external tools (e.g., calculator, Web search, etc.) are essentially detached from the learning phase of the LM. SYRELM shows massive improvements (e.g., +30.65 absolute point improvement in accuracy on the SVAMP dataset using GPT-J 6B model) over base LMs, while keeping our testbed easy to diagnose, interpret and within reach of most researchers.

dataset, gpt-j, reasoning, (16 more...)

arXiv.org Artificial Intelligence

2312.05571

Country: Asia > India > NCT > Delhi (0.04)

Genre: Research Report (0.84)

Industry:

Education (1.00)
Health & Medicine (0.87)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

AI capabilities can be significantly improved without expensive retraining

Davidson, Tom, Denain, Jean-Stanislas, Villalobos, Pablo, Bas, Guillem

arXiv.org Artificial IntelligenceDec-12-2023

State-of-the-art AI systems can be significantly improved without expensive retraining via "post-training enhancements"-techniques applied after initial training like fine-tuning the system to use a web browser. We review recent post-training enhancements, categorizing them into five types: tool-use, prompting methods, scaffolding, solution selection, and data generation. Different enhancements improve performance on different tasks, making it hard to compare their significance. So we translate improvements from different enhancements into a common currency, the compute-equivalent gain: how much additional training compute would be needed to improve performance by the same amount as the enhancement. Our non-experimental work shows that post-training enhancements have significant benefits: most surveyed enhancements improve benchmark performance by more than a 5x increase in training compute, some by more than 20x. Post-training enhancements are relatively cheap to develop: fine-tuning costs are typically <1% of the original training cost. Governing the development of capable post-training enhancements may be challenging because frontier models could be enhanced by a wide range of actors.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2312.07413

Country: Asia > Middle East > Jordan (0.04)

Genre:

Research Report (1.00)
Overview (1.00)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

👾 Your guide to AI: March 2023

#artificialintelligenceApr-13-2023, 01:25:37 GMT

Welcome to the latest issue of your guide to AI, an editorialized newsletter covering key developments in AI research, industry, geopolitics and startups during February 2023. We wrote an op-ed for Sifted on how generative AI will change the software landscape and commented for TIME's cover story on ChatGPT. On the politics side, we reviewed and recommended spinout policy reform in Tony Blair Institute for Global Change's paper A New National Purpose and were included in Politico's 20 people who matter in UK technology. Air Street was featured in Insider's list of top AI investors See some of you at London.AI on Thurs 9 March w/DeepMind, Adept, Palantir and Basecamp Research. Register for our one-day RAAIS conference on research and applied AI 23 June 2023 in London. We'll be hosting speakers from Meta AI, Cruise, Intercom, Genentech, Northvolt and more to come! FYI, you might have to read this issue in full online vs. in your inbox. As usual, we love hearing what you're up to and what's on your mind, just hit reply or forward to your friends:-) Building large-scale AI models requires enormous computing power, which has emerged as the soft power of our time.

chatgpt, openai, university, (16 more...)

#artificialintelligence

Country:

Europe > United Kingdom (0.34)
Asia > Russia (0.14)
North America > United States > New York (0.04)
(10 more...)

Industry:

Information Technology (1.00)
Health & Medicine (1.00)
Banking & Finance (0.94)
Government > Regional Government > Europe Government (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.39)

Add feedback