Goto

Collaborating Authors

 scheherazade


Scheherazade: Evaluating Chain-of-Thought Math Reasoning in LLMs with Chain-of-Problems

Miner, Stephen, Takashima, Yoshiki, Han, Simeng, Erata, Ferhat, Antonopoulos, Timos, Piskac, Ruzica, Shapiro, Scott J

arXiv.org Artificial Intelligence

Benchmarks are critical for measuring progress of math reasoning abilities of Large Language Models (LLMs). However, existing widely-used benchmarks such as GSM8K have been rendered less useful as multiple cutting-edge LLMs achieve over 94% accuracy. While harder benchmarks have been proposed, their creation is often manual and expensive. We present Scheherazade, an automated approach for producing challenging mathematical reasoning benchmarks by logically chaining mathematical reasoning problems. We propose two different chaining methods, forward chaining and backward chaining, which require reasoning forward and backward through the chain respectively. We apply Scheherazade on GSM8K to create GSM8K-Scheherazade and evaluate 3 frontier LLMs and OpenAI's o1-preview on it. We show that while frontier models' performance declines precipitously at only a few questions chained, a preliminary evaluation suggests o1-preview performance persists up to 5 questions chained backwards. In addition, while all other models perform worse when problems are chained backwards, o1-preview performs better on backward-chained benchmarks. We will release the dataset and code publicly.


Worried about amoral robots? Try reading them a story.

AITopics Original Links

Why don't we trust robots? After decades, engineers and scientists have tinkered and programmed humanoid robots to be eerily like us. But emotions and ethics remain just beyond their reach, the basis of our fears that, when push comes to shove, artificial intelligence won't have our best interests at heart. But storybooks might fix that, a Georgia Institute of Technology team says. "There is no user manual for being human," Dr. Mark O. Riedl and Dr. Brent Harrison, computer scientists at Georgia Tech, emphasize in their latest paper.


Fairy tales teach robots not to murder

AITopics Original Links

The fairy tale performs many functions. They entertain, they encourage imagination, they teach problem-solving skills. They can also provide moral lessons, highlighting the dangers of failing to follow the social codes that let human beings coexist in harmony. Such moral lessons may not mean much to a robot, but a team of researchers at Georgia Institute of Technology believes it has found a way to leverage the humble fable into a moral lesson an artificial intelligence will take to its cold, mechanical heart. You can read the paper here.


Team uses artificial intelligence to crowdsource interactive fiction

#artificialintelligence

Georgia Institute of Technology researchers have developed a new artificially intelligent system that crowdsources plots for interactive stories, which are popular in video games and let players choose different branching story options. With potentially limitless crowdsourced plot points, the system could allow for more creative stories and an easier method for interactive narrative generation. Current AI models for games have a limited number of scenarios, no matter what a player chooses. They depend on a dataset already programmed into a model by experts. Using the Georgia Tech approach, one might imagine a Star Wars game using online fan fiction to let the AI system generate countless paths for a player to take.


Scheherazade: Crowd-Powered Interactive Narrative Generation

Li, Boyang (Georgia Institute of Technology) | Riedl, Mark (Georgia Institute of Technology)

AAAI Conferences

Interactive narrative is a form of storytelling in which users affect a dramatic storyline through actions by assuming the role of characters in a virtual world.This extended abstract outlines the Scheherazade-IF system, which uses crowdsourcing and artificial intelligence to automatically construct text-based interactive narrative experiences.