shaker
ConceptAgent: LLM-Driven Precondition Grounding and Tree Search for Robust Task Planning and Execution
Rivera, Corban, Byrd, Grayson, Paul, William, Feldman, Tyler, Booker, Meghan, Holmes, Emma, Handelman, David, Kemp, Bethany, Badger, Andrew, Schmidt, Aurora, Jatavallabhula, Krishna Murthy, de Melo, Celso M, Seenivasan, Lalithkumar, Unberath, Mathias, Chellappa, Rama
Robotic planning and execution in open-world environments is a complex problem due to the vast state spaces and high variability of task embodiment. Recent advances in perception algorithms, combined with Large Language Models (LLMs) for planning, offer promising solutions to these challenges, as the common sense reasoning capabilities of LLMs provide a strong heuristic for efficiently searching the action space. However, prior work fails to address the possibility of hallucinations from LLMs, which results in failures to execute the planned actions largely due to logical fallacies at high- or low-levels. To contend with automation failure due to such hallucinations, we introduce ConceptAgent, a natural language-driven robotic platform designed for task execution in unstructured environments. With a focus on scalability and reliability of LLM-based planning in complex state and action spaces, we present innovations designed to limit these shortcomings, including 1) Predicate Grounding to prevent and recover from infeasible actions, and 2) an embodied version of LLM-guided Monte Carlo Tree Search with self reflection. In simulation experiments, ConceptAgent achieved a 19% task completion rate across three room layouts and 30 easy level embodied tasks outperforming other state-of-the-art LLM-driven reasoning baselines that scored 10.26% and 8.11% on the same benchmark. Additionally, ablation studies on moderate to hard embodied tasks revealed a 20% increase in task completion from the baseline agent to the fully enhanced ConceptAgent, highlighting the individual and combined contributions of Predicate Grounding and LLM-guided Tree Search to enable more robust automation in complex state and action spaces.
Multiple-input, multiple-output modal testing of a Hawk T1A aircraft: A new full-scale dataset for structural health monitoring
Wilson, James, Champneys, Max D., Tipuric, Matt, Mills, Robin, Wagg, David J., Rogers, Timothy J.
The use of measured vibration data from structures has a long history of enabling the development of methods for inference and monitoring. In particular, applications based on system identification and structural health monitoring have risen to prominence over recent decades and promise significant benefits when implemented in practice. However, significant challenges remain in the development of these methods. The introduction of realistic, full-scale datasets will be an important contribution to overcoming these challenges. This paper presents a new benchmark dataset capturing the dynamic response of a decommissioned BAE Systems Hawk T1A. The dataset reflects the behaviour of a complex structure with a history of service that can still be tested in controlled laboratory conditions, using a variety of known loading and damage simulation conditions. As such, it provides a key stepping stone between simple laboratory test structures and in-service structures. In this paper, the Hawk structure is described in detail, alongside a comprehensive summary of the experimental work undertaken. Following this, key descriptive highlights of the dataset are presented, before a discussion of the research challenges that the data present. Using the dataset, non-linearity in the structure is demonstrated, as well as the sensitivity of the structure to damage of different types. The dataset is highly applicable to many academic enquiries and additional analysis techniques which will enable further advancement of vibration-based engineering techniques.
Empowering Large Language Model Agents through Action Learning
Zhao, Haiteng, Ma, Chang, Wang, Guoyin, Su, Jing, Kong, Lingpeng, Xu, Jingjing, Deng, Zhi-Hong, Yang, Hongxia
Large Language Model (LLM) Agents have recently garnered increasing interest yet they are limited in their ability to learn from trial and error, a key element of intelligent behavior. In this work, we argue that the capacity to learn new actions from experience is fundamental to the advancement of learning in LLM agents. While humans naturally expand their action spaces and develop skills through experiential learning, LLM agents typically operate within fixed action spaces, limiting their potential for growth. To address these challenges, our study explores open-action learning for language agents. We introduce a framework LearnAct with an iterative learning strategy to create and improve actions in the form of Python functions. In each iteration, LLM revises and updates the currently available actions based on the errors identified in unsuccessful training tasks, thereby enhancing action effectiveness. Our experimental evaluations across Robotic Planning and Alfworld environments reveal that after learning on a few training task instances, our approach to open-action learning markedly improves agent performance for the type of task (by 32 percent in AlfWorld compared to ReAct+Reflexion, for instance) highlighting the importance of experiential action learning in the development of more intelligent LLM agents.
Using LSTM and GRU With a New Dataset for Named Entity Recognition in the Arabic Language
Shaker, Alaa, Aldarf, Alaa, Bessmertny, Igor
Named entity recognition (NER) is a natural language processing task (NLP), which aims to identify named entities and classify them like person, location, organization, etc. In the Arabic language, we can find a considerable size of unstructured data, and it needs to different preprocessing tool than languages like (English, Russian, German...). From this point, we can note the importance of building a new structured dataset to solve the lack of structured data. In this work, we use the BIOES format to tag the word, which allows us to handle the nested name entity that consists of more than one sentence and define the start and the end of the name. The dataset consists of more than thirty-six thousand records. In addition, this work proposes long shortterm memory (LSTM) units and Gated Recurrent Units (GRU) for building the named entity recognition model in the Arabic language. The models give an approximately good result (80%) because LSTM and GRU models can find the relationships between the words of the sentence. Also, use a new library from Google, which is Trax and platform Colab.
Planning with Critical Section Macros: Theory and Practice
Chrpa, Lukas | Vallati, Mauro (University of Huddersfield)
Macro-operators (macros) are a well-known technique for enhancing performance of planning engines by providing "short-cuts" in the state space. Existing macro learning systems usually generate macros by considering most frequent action sequences in training plans. Unfortunately, frequent action sequences might not capture meaningful activities as a whole, leading to a limited beneficial impact for the planning process. In this paper, inspired by resource locking in critical sections in parallel computing, we propose a technique that generates macros able to capture whole activities in which limited resources (e.g., a robotic hand, or a truck) are used. Specifically, such a Critical Section macro starts by locking the resource (e.g., grabbing an object), continues by using the resource (e.g., manipulating the object) and finishes by releasing the resource (e.g., dropping the object). Hence, such a macro bridges states in which the resource is locked and cannot be used. We also introduce versions of Critical Section macros dealing with multiple resources and phased locks. Usefulness of macros is evaluated using a range of state-of-the-art planners, and a large number of benchmarks from the deterministic and learning tracks of recent editions of the International Planning Competition.
Shaker
We present a demonstration of Ropossum, an authoring tool for the generation and testing of levels of the physics-based game, Cut the Rope. Ropossum integrates many features: (1) automatic design of complete solvable content, (2) incorporation of designer's input through the creation of complete or partial designs, (3) automatic check for playability and (4) optimization of a given design based on playability. The system includes a physics engine to simulate the game and an evolutionary framework to evolve content as well as an AI reasoning agent to check for playability. The system is optimised to allow on-line feedback and realtime interaction.
Shaker
In order to automatically generate high-quality game levels, one needs to be able to automatically verify that the levels are playable. The simulation-based approach to playability testing uses an artificial agent to play through the level, but building such an agent is not always an easy task and such an agent is not always readily available. We discuss this prob- lem in the context of the physics-based puzzle game Cut the Rope, which features continuous time and state space, mak- ing several approaches such as exhaustive search and reactive agents inefficient. We show that a deliberative Prolog-based agent can be used to suggest all sensible moves at each state, which allows us to restrict the search space so that depth-first search for solutions become viable. This agent is successfully used to test playability in Ropossum, a level generator based on grammatical evolution. The method proposed in this paper is likely to be useful for a large variety of games with similar characteristics.
Shaker
In this paper we present a procedural content generator using Non-negative Matrix Factorisation (NMF). We use representative levels from five dissimilar content generators to train NMF models that learn patterns about the various components of the game. The constructed models are then used to automatically generate content that resembles the training data as well as to generate novel content through exploring new combinations of patterns. We describe the methodology followed and we show that the generator proposed has a more powerful capability than each of generator taken individually. The generator's output is compared to the other generators using a number of expressivity metrics. The results show that the proposed generator is able to resemble each individual generator as well as demonstrating ability to cover a wider and more novel content space.
From Classical to Hierarchical: benchmarks for the HTN Track of the International Planning Competition
Pellier, Damien, Fiorino, Humbert
In this short paper, we outline nine classical benchmarks submitted to the first hierarchical planning track of the International Planning competition in 2020. All of these benchmarks are based on the HDDL language. The choice of the benchmarks was based on a questionnaire sent to the HTN community. They are the following: Barman, Childsnack, Rover, Satellite, Blocksworld, Depots, Gripper, and Hiking. In the rest of the paper we give a short description of these benchmarks. All are totally ordered.
Towards Interpretable Multi-Task Learning Using Bilevel Programming
Alesiani, Francesco, Yu, Shujian, Shaker, Ammar, Yin, Wenzhe
Interpretable Multi-Task Learning can be expressed as learning a sparse graph of the task relationship based on the prediction performance of the learned models. Since many natural phenomenon exhibit sparse structures, enforcing sparsity on learned models reveals the underlying task relationship. Moreover, different sparsification degrees from a fully connected graph uncover various types of structures, like cliques, trees, lines, clusters or fully disconnected graphs. In this paper, we propose a bilevel formulation of multi-task learning that induces sparse graphs, thus, revealing the underlying task relationships, and an efficient method for its computation. We show empirically how the induced sparse graph improves the interpretability of the learned models and their relationship on synthetic and real data, without sacrificing generalization performance.