Goto

Collaborating Authors

 action figure


When Not to Answer: Evaluating Prompts on GPT Models for Effective Abstention in Unanswerable Math Word Problems

Saadat, Asir, Sogir, Tasmia Binte, Chowdhury, Md Taukir Azam, Aziz, Syem

arXiv.org Artificial Intelligence

Large language models (LLMs) are increasingly relied upon to solve complex mathematical word problems. However, being susceptible to hallucination, they may generate inaccurate results when presented with unanswerable questions, raising concerns about their potential harm. While GPT models are now widely used and trusted, the exploration of how they can effectively abstain from answering unanswerable math problems and the enhancement of their abstention capabilities has not been rigorously investigated. In this paper, we investigate whether GPTs can appropriately respond to unanswerable math word problems by applying prompts typically used in solvable mathematical scenarios. Our experiments utilize the Unanswerable Word Math Problem (UWMP) dataset, directly leveraging GPT model APIs. Evaluation metrics are introduced, which integrate three key factors: abstention, correctness and confidence. Our findings reveal critical gaps in GPT models and the hallucination it suffers from for unsolvable problems, highlighting the need for improved models capable of better managing uncertainty and complex reasoning in math word problem-solving contexts.


From Good to Great: Improving Math Reasoning with Tool-Augmented Interleaf Prompting

Chen, Nuo, Li, Hongguang, Wang, Baoyuan, Li, Jia

arXiv.org Artificial Intelligence

This paper investigates the performance of Large Language Models (LLMs) and Tool-augmented LLMs in tackling complex mathematical reasoning tasks. We introduce IMP-TIP: Improving Math Reasoning with Tool-augmented Interleaf Prompting, a framework that combines the strengths of both LLMs and Tool-augmented LLMs. IMP-TIP follows the ``From Good to Great" concept, collecting multiple potential solutions from both LLMs and their Tool-Augmented counterparts for the same math problem, and then selecting or re-generating the most accurate answer after cross-checking these solutions via tool-augmented interleaf prompting. The framework incorporates two key aspects: self-prompt and tool-augmented interleaf prompting (TIP). The former allows LLMs to autonomously refine and improve an initial prompt related to tool usage, while the latter enables LLMs to derive the final answer by dynamically analyzing the problem, cross-checking potential solutions, and revising previous reasoning hints in an interleaved manner. Experimental analysis shows that IMP-TIP achieves enhanced mathematical capabilities and outperforms traditional LLMs and tool-augmented LLMs in accuracy and reasoning diversity on math reasoning tasks. For instance, IMP-TIP can improve Tool-augmented ChatGPT on GSM8K-Hard from 56.0% to 65.2%.


AI in Five, Fifty and Five Hundred Years -- Part Two -- Fifty Years

#artificialintelligence

Check out part one of this series for what the next five to fifteen years looks like in AI. In part two we get super sci-fi and see if our crystal ball can reach 50 years into the future. Once dumb objects have woken up. Your shirt is babbling away with your shades and having a conversation with your girlfriend's pearl earrings when she's traveling to give a talk in Brazil. Everything from our houses, to weapons, to planes, trains and automobiles, to roads, clothes, jewelry, headphones, glasses, and eye contacts are wild with thoughts. The dynamic new algorithms that pushed us past deep learning and powered the fourth wave of the intelligence revolution sprang from world wide efforts to map every single neuron and connection in the human brain. Eventually the processors and biotechnology caught up with our ambitions and scientists succeeded beyond our wildest expectations.


I Finally Found the Droids I Was Looking For -- But Are They Right For My Kids?

TIME - Tech

Every Christmas in the '80s, I wanted the same thing as many other pint-sized Star Wars fans: a robot sidekick to call my own. And not just any old droid would do: It had to be an R2-D2, specifically one that could drop its third leg down and cruise around the world at my side. Growing up in the Death Star era, our entire generation thought it had "The Force." But eventually we realized that moving objects with our thoughts and duping people with Jedi mind tricks were all in our imaginations. But droids--they were real, or at least they could be, one day.


Lightseekers brings your video game into the real world

Engadget

Action figures can look a little staid next to video games where your character can walk, talk and fire all manner of weaponry. But there's still something special about the tactile experience of holding a cool character in your hand, which is why we've seen game developers embrace the world of toys with products like Skylanders, Amiibo and LEGO Dimensions. But, while placing a figure on a base can unlock characters or entire worlds, the interaction between game and toy tends to end there. Lightseekers, launching today on Kickstarter, changes that dynamic by making its action figures a living (and almost breathing) part of its games. Lightseekers, in some ways, is almost reminiscent of the film Small Soldiers.


The Extraordinary Invention of Intelligence - Universal Mind

#artificialintelligence

In 1948 a young man by the name of Alan Turing penned a report entitled "Intelligent Machinery." The opening sentence "I propose to investigate the question as to whether it is possible for machinery to show intelligent behavior" (1) had instantly set the stage for what we today would call AI, or Artificial Intelligence. And ever since that time the world has looked towards the future with glossy stares and dreams of such a day. Turing, in 1935, was the pioneering mind behind the modern computer, though most people recognize the name based on the human computer test called the Turing Test. The test was introduced by Alan in a 1950 paper titled "Computing Machinery and Intelligence," and his goal was to "test if a machine's ability could exhibit intelligent behavior equivalent to, or indistinguishable from, that of a human."