Goto

Collaborating Authors

 open door


seqBench: A Tunable Benchmark to Quantify Sequential Reasoning Limits of LLMs

Ramezanali, Mohammad, Vazifeh, Mo, Santi, Paolo

arXiv.org Artificial Intelligence

We introduce seqBench, a parametrized benchmark for probing sequential reasoning limits in Large Language Models (LLMs) through precise, multi-dimensional control over several key complexity dimensions. seqBench allows systematic variation of (1) the logical depth, defined as the number of sequential actions required to solve the task; (2) the number of backtracking steps along the optimal path, quantifying how often the agent must revisit prior states to satisfy deferred preconditions (e.g., retrieving a key after encountering a locked door); and (3) the noise ratio, defined as the ratio between supporting and distracting facts about the environment. Our evaluations on state-of-the-art LLMs reveal a universal failure pattern: accuracy collapses exponentially beyond a model-specific logical depth. Unlike existing benchmarks, seqBench's fine-grained control facilitates targeted analyses of these reasoning failures, illuminating universal scaling laws and statistical limits, as detailed in this paper alongside its generation methodology and evaluation metrics. We find that even top-performing models systematically fail on seqBench's structured reasoning tasks despite minimal search complexity, underscoring key limitations in their commonsense reasoning capabilities. Designed for future evolution to keep pace with advancing models, the seqBench datasets are publicly released to spur deeper scientific inquiry into LLM reasoning, aiming to establish a clearer understanding of their true potential and current boundaries for robust real-world application.


Zero-Shot Iterative Formalization and Planning in Partially Observable Environments

Gong, Liancheng, Zhu, Wang, Thomason, Jesse, Zhang, Li

arXiv.org Artificial Intelligence

Using LLMs not to predict plans but to formalize an environment into the Planning Domain Definition Language (PDDL) has been shown to improve performance and control. Existing work focuses on fully observable environments; we tackle the more realistic and challenging partially observable environments that lack of complete, reliable information. We propose PDDLego+, a framework to iteratively formalize, plan, grow, and refine PDDL representations in a zero-shot manner, without needing access to any existing trajectories. On two textual simulated environments, we show that PDDLego+ improves goal reaching success and exhibits robustness against problem complexity. We also show that the domain knowledge captured after a successful trial can benefit future tasks.


Labour's open door to big tech leaves critics crying foul

The Guardian

The problem with the UK, according to the former Google boss Eric Schmidt, is that it has "so many ways that people can say no". However, for some critics of the Labour government, it has a glaring issue with saying yes: to big tech. Schmidt made his comment in a Q&A conversation with Keir Starmer at a big investment summit in October last year. The prominent position of a tech bigwig at the event underlined the importance of the sector to a government that has made growth a priority and believes the sector is crucial to achieving it. Top US tech firms have a big presence in the UK, including Google, Mark Zuckerberg's Meta, Amazon, Apple, Microsoft and Palantir, the data intelligence firm co-founded by the Maga movement backer Peter Thiel.


A Behavior Tree-inspired programming language for autonomous agents

Biggar, Oliver, Shames, Iman

arXiv.org Artificial Intelligence

We propose a design for a functional programming language for autonomous agents, built off the ideas and motivations of Behavior Trees (BTs). BTs are a popular model for designing agents behavior in robotics and AI. However, as their growth has increased dramatically, the simple model of BTs has come to be limiting. There is a growing push to increase the functionality of BTs, with the end goal of BTs evolving into a programming language in their own right, centred around the defining BT properties of modularity and reactiveness. In this paper, we examine how the BT model must be extended in order to grow into such a language. We identify some fundamental problems which must be solved: implementing `reactive' selection, 'monitoring' safety-critical conditions, and passing data between actions. We provide a variety of small examples which demonstrate that these problems are complex, and that current BT approaches do not handle them in a manner consistent with modularity. We instead provide a simple set of modular programming primitives for handling these use cases, and show how they can be combined to build complex programs. We present a full specification for our BT-inspired language, and give an implementation in the functional programming language Haskell. Finally, we demonstrate our language by translating a large and complex BT into a simple, unambiguous program.


Robotic State Recognition with Image-to-Text Retrieval Task of Pre-Trained Vision-Language Model and Black-Box Optimization

Kawaharazuka, Kento, Obinata, Yoshiki, Kanazawa, Naoaki, Okada, Kei, Inaba, Masayuki

arXiv.org Artificial Intelligence

State recognition of the environment and objects, such as the open/closed state of doors and the on/off of lights, is indispensable for robots that perform daily life support and security tasks. Until now, state recognition methods have been based on training neural networks from manual annotations, preparing special sensors for the recognition, or manually programming to extract features from point clouds or raw images. In contrast, we propose a robotic state recognition method using a pre-trained vision-language model, which is capable of Image-to-Text Retrieval (ITR) tasks. We prepare several kinds of language prompts in advance, calculate the similarity between these prompts and the current image by ITR, and perform state recognition. By applying the optimal weighting to each prompt using black-box optimization, state recognition can be performed with higher accuracy. Experiments show that this theory enables a variety of state recognitions by simply preparing multiple prompts without retraining neural networks or manual programming. In addition, since only prompts and their weights need to be prepared for each recognizer, there is no need to prepare multiple models, which facilitates resource management. It is possible to recognize the open/closed state of transparent doors, the state of whether water is running or not from a faucet, and even the qualitative state of whether a kitchen is clean or not, which have been challenging so far, through language.


Sub-goal Distillation: A Method to Improve Small Language Agents

Hashemzadeh, Maryam, Stengel-Eskin, Elias, Chandar, Sarath, Cote, Marc-Alexandre

arXiv.org Artificial Intelligence

While Large Language Models (LLMs) have demonstrated significant promise as agents in interactive tasks, their substantial computational requirements and restricted number of calls constrain their practical utility, especially in long-horizon interactive tasks such as decision-making or in scenarios involving continuous ongoing tasks. To address these constraints, we propose a method for transferring the performance of an LLM with billions of parameters to a much smaller language model (770M parameters). Our approach involves constructing a hierarchical agent comprising a planning module, which learns through Knowledge Distillation from an LLM to generate sub-goals, and an execution module, which learns to accomplish these sub-goals using elementary actions. Subsequently, we utilize this annotated data to fine-tune both the planning and execution modules. Importantly, neither module relies on real-time access to an LLM during inference, significantly reducing the overall cost associated with LLM interactions to a fixed cost. In ScienceWorld, a challenging and multi-task interactive text environment, our method surpasses standard imitation learning based solely on elementary actions by 16.7% (absolute). Our analysis highlights the efficiency of our approach compared to other LLM-based methods. Recently, Large Language Models (LLMs) have found applications in various fields, including multi-task learning, decision making, answering questions, summarizing documents, translating languages, completing sentences, and serving as search assistants. The promising advantage of LLMs is attributed to their training on extensive text datasets, resulting in impressive capabilities. This prior knowledge can be leveraged for action planning to solve tasks in robotics and reinforcement learning (Huang et al., 2022b; Brohan et al., 2023; Liang et al., 2023). However, the extreme size of LLMs makes them computationally unaffordable for many applications. Consequently, there is an increasing demand to find approaches that are less computationally intensive while still capitalizing on the knowledge embedded in LLMs. One prevalent technique is the use of Knowledge Distillation (KD) (Buciluǎ et al., 2006; Hinton et al., 2015), wherein a smaller model is trained with guidance from a larger model. Through this approach, we can leverage the knowledge in an LLM to train a more compact model with a reduced number of parameters. First, focus on the substance. Figure 1: Example of annotating an expert trajectory with sub-goals for a particular variation of task 1-4 We employ Knowledge Distillation from an LLM to train (change-the-state-of-matter-of).


Successfully Guiding Humans with Imperfect Instructions by Highlighting Potential Errors and Suggesting Corrections

Zhao, Lingjun, Nguyen, Khanh, Daumé, Hal III

arXiv.org Artificial Intelligence

This paper addresses the challenge of leveraging imperfect language models to guide human decision-making in the context of a grounded navigation task. We show that an imperfect instruction generation model can be complemented with an effective communication mechanism to become more successful at guiding humans. The communication mechanism we build comprises models that can detect potential hallucinations in instructions and suggest practical alternatives, and an intuitive interface to present that information to users. We show that this approach reduces the human navigation error by up to 29% with no additional cognitive burden. This result underscores the potential of integrating diverse communication channels into AI systems to compensate for their imperfections and enhance their utility for humans.


A robot dog has learned to open doors with its leg

New Scientist

A robot dog can use a leg to open doors, press buttons and pick up rucksacks while balancing on its other three legs. Four-legged robots like Spot, the star of Boston Dynamics' viral videos, normally need an arm attached to their body to open doors or pick up objects, but this can add significant weight and make it harder for the robot to squeeze through narrow spaces. Philip Arm at ETH Zurich in Switzerland and his colleagues used a machine-learning model to teach an off-the-shelf robotic dog to use one of its legs to perform tasks while standing still or moving with the other three legs. "We cannot do everything with the legs that we could do with an arm – right now, a hand is way more dexterous. But the point is really to make this work for applications where you maybe have mass constraints, or we don't want to have that additional complexity, like for space exploration where every kilogram of such a robot counts," says Arm.


How the world will look in 2050, according to experts

Daily Mail - Science & tech

Futurists of the 1990s predicted that we'd be living underwater or riding flying cars by this point -- but now experts are warning of a much scarier future. Other predictions include making contact with aliens -- but whether or not that's a bad thing remains unknown. It's not all doom and gloom, though, with technology expected to have made the afterlife possible. AI'overlords' could turn everyone into serfs Right now, people are focused on AI potentially causing job losses - but the reality could be far worse. That's according to George Stakhov, chief strategy officer for the global ad agency DDB EMEA who created an AI tool named'The Uncreative Agency'.


DALL-E AI Art Generator Finally Opens Doors to Wider Internet

#artificialintelligence

Internet art and image archives are already flooded with images developed with the use of artificial intelligence. Expect even more images of high imagination or photos of dubious origin now that the AI image generator that arguably started the current artificial image craze, DALL-E, is open and available to all. In a Wednesday blog post, DALL-E developer OpenAI said already have 1.5 million users creating more than 2 million AI-generated images a day. Using data and feedback, the company said they have made their filters stronger at rejecting any images made to emulate sexual, violent, or poltiical content. There is no current API available for DALL-E, but apparently one's in development.