AITopics | blocksworld

Collaborating Authors

blocksworld

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Classical Planning with LLM-Generated Heuristics: Challenging the State of the Art with Python Code

Neural Information Processing SystemsJun-16-2026, 14:09:53 GMT

In recent years, large language models (LLMs) have shown remarkable performance in many problems. However, they fail to plan reliably. Specialized attempts to improve their planning capabilities still produce incorrect plans and fail to generalize to larger tasks. Furthermore, LLMs designed for explicit "reasoning" fail to compete with automated planners while increasing computational costs, which reduces one of the advantages of using LLMs. In this paper, we show how to use LLMs to always generate correct plans, even for out-of-distribution tasks of increasing size.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

North America > Canada > Alberta (0.28)
South America > Brazil (0.28)
Europe > United Kingdom (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry: Transportation (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Benchmark for Planning and Control with Large Language Model Agents: Blocksworld with Model Context Protocol

Jobs, Niklas, da Silva, Luis Miguel Vieira, Somashekaraiah, Jayanth, Weigand, Maximilian, Kube, David, Gehlhoff, Felix

arXiv.org Artificial IntelligenceDec-4-2025

Industrial automation increasingly requires flexible control strategies that can adapt to changing tasks and environments. Agents based on Large Language Models (LLMs) offer potential for such adaptive planning and execution but lack standardized benchmarks for systematic comparison. We introduce a benchmark with an executable simulation environment representing the Blocksworld problem providing five complexity categories. By integrating the Model Context Protocol (MCP) as a standardized tool interface, diverse agent architectures can be connected to and evaluated against the benchmark without implementation-specific modifications. A single-agent implementation demonstrates the benchmark's applicability, establishing quantitative metrics for comparison of LLM-based planning and execution approaches.

artificial intelligence, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2512.03955

Country: Europe > Germany (0.28)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Curriculum Reinforcement Learning from Easy to Hard Tasks Improves LLM Reasoning

Parashar, Shubham, Gui, Shurui, Li, Xiner, Ling, Hongyi, Vemuri, Sushil, Olson, Blake, Li, Eric, Zhang, Yu, Caverlee, James, Kalathil, Dileep, Ji, Shuiwang

arXiv.org Artificial IntelligenceNov-4-2025

We aim to improve the reasoning capabilities of language models via reinforcement learning (RL). Recent RL post-trained models like DeepSeek-R1 have demonstrated reasoning abilities on mathematical and coding tasks. However, prior studies suggest that using RL alone to improve reasoning on inherently difficult tasks is less effective. Here, we draw inspiration from curriculum learning and propose to schedule tasks from easy to hard (E2H), allowing LLMs to build reasoning skills gradually. Our method is termed E2H Reasoner. Empirically, we observe that, although easy tasks are important initially, fading them out through appropriate scheduling is essential in preventing overfitting. Theoretically, we establish convergence guarantees for E2H Reasoner within an approximate policy iteration framework. We derive finite-sample complexity bounds and show that when tasks are appropriately decomposed and conditioned, learning through curriculum stages requires fewer total samples than direct learning. Experiments across multiple domains show that E2H Reasoner significantly improves the reasoning ability of small LLMs (1.5B to 3B), which otherwise struggle when trained with vanilla RL alone, highlighting the effectiveness of our method. Our code can be found on https://github.com/divelab/E2H-Reasoning.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2506.06632

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.67)

Industry: Education > Curriculum (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Classical Planning with LLM-Generated Heuristics: Challenging the State of the Art with Python Code

Corrêa, Augusto B., Pereira, André G., Seipp, Jendrik

arXiv.org Artificial IntelligenceOct-27-2025

In recent years, large language models (LLMs) have shown remarkable capabilities in various artificial intelligence problems. However, they fail to plan reliably, even when prompted with a detailed definition of the planning task. Attempts to improve their planning capabilities, such as chain-of-thought prompting, fine-tuning, and explicit "reasoning" still yield incorrect plans and usually fail to generalize to larger tasks. In this paper, we show how to use LLMs to generate correct plans, even for out-of-distribution tasks of increasing size. For a given planning domain, we ask an LLM to generate several domain-dependent heuristic functions in the form of Python code, evaluate them on a set of training tasks within a greedy best-first search, and choose the strongest one. The resulting LLM-generated heuristics solve many more unseen test tasks than state-of-the-art domain-independent heuristics for classical planning. They are even competitive with the strongest learning algorithm for domain-dependent planning. These findings are especially remarkable given that our proof-of-concept implementation is based on an unoptimized Python planner and the baselines all build upon highly optimized C++ code. In some domains, the LLM-generated heuristics expand fewer states than the baselines, revealing that they are not only efficiently computable, but sometimes even more informative than the state-of-the-art heuristics. Overall, our results show that sampling a set of planning heuristic function programs can significantly improve the planning capabilities of LLMs.

large language model, machine learning, spanner, (18 more...)

arXiv.org Artificial Intelligence

2503.18809

Country:

North America > Canada > Alberta (0.28)
South America > Brazil (0.28)
Europe > United Kingdom (0.28)

Genre: Research Report > New Finding (0.86)

Industry:

Transportation > Infrastructure & Services (0.68)
Transportation > Air (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

LexiCon: a Benchmark for Planning under Temporal Constraints in Natural Language

Mantenoglou, Periklis, Hazra, Rishi, Martires, Pedro Zuidberg Dos, De Raedt, Luc

arXiv.org Artificial IntelligenceOct-8-2025

Owing to their reasoning capabilities, large language models (LLMs) have been evaluated on planning tasks described in natural language. However, LLMs have largely been tested on planning domains without constraints. In order to deploy them in real-world settings where adherence to constraints, in particular safety constraints, is critical, we need to evaluate their performance on constrained planning tasks. We introduce LexiCon -- a natural language-based (Lexi) constrained (Con) planning benchmark, consisting of a suite of environments, that can be used to evaluate the planning capabilities of LLMs in a principled fashion. The core idea behind LexiCon is to take existing planning environments and impose temporal constraints on the states. These constrained problems are then translated into natural language and given to an LLM to solve. A key feature of LexiCon is its extensibility. That is, the set of supported environments can be extended with new (unconstrained) environment generators, for which temporal constraints are constructed automatically. This renders LexiCon future-proof: the hardness of the generated planning problems can be increased as the planning capabilities of LLMs improve. Our experiments reveal that the performance of state-of-the-art LLMs, including reasoning models like GPT-5, o3, and R1, deteriorates as the degree of constrainedness of the planning tasks increases.

constraint, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2510.05972

Country: Europe (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Language Model as Planner and Formalizer under Constraints

Huang, Cassie, Mohan, Stuti, Yang, Ziyi, Tellex, Stefanie, Zhang, Li

arXiv.org Artificial IntelligenceOct-8-2025

LLMs have been widely used in planning, either as planners to generate action sequences end-to-end, or as formalizers to represent the planning domain and problem in a formal language that can derive plans deterministically. However, both lines of work rely on standard benchmarks that only include generic and simplistic environmental specifications, leading to potential overestimation of the planning ability of LLMs and safety concerns in downstream tasks. We bridge this gap by augmenting widely used planning benchmarks with manually annotated, fine-grained, and rich natural language constraints spanning four formally defined categories. Over 4 state-of-the-art reasoning LLMs, 3 formal languages, 5 methods, and 4 datasets, we show that the introduction of constraints not only consistently halves performance, but also significantly challenges robustness to problem complexity and lexical shift.

constraint, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2510.05486

Country:

Asia (0.93)
North America > United States (0.46)
North America > Mexico (0.28)
Europe > Austria > Vienna (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

Improving Large Language Model Planning with Action Sequence Similarity

Zhao, Xinran, Sedghi, Hanie, Bohnet, Bernd, Schuurmans, Dale, Nova, Azade

arXiv.org Artificial IntelligenceMay-5-2025

Planning is essential for artificial intelligence systems to look ahead and proac-tively determine a course of actions to reach objectives in the virtual and real world. However, it remains unclear what signals in the context influence the model performance. In this work, we explore how to improve the model planning capability through in-context learning (ICL), specifically, what signals can help select the exemplars. Through extensive experiments, we observe that commonly used problem similarity may result in false positives with drastically different plans, which can mislead the model. In response, we propose to sample and filter exemplars leveraging plan side action sequence similarity (AS). We propose GRASE-DC: a two-stage pipeline that first re-samples high AS exemplars and then curates the selected exemplars with dynamic clustering on AS to achieve a balance of relevance and diversity. Our experimental result confirms that GRASE-DC achieves significant performance improvement on various planning tasks (up to ~11-40 point absolute accuracy improvement with 27.3% fewer exemplars needed on average). GRASE-DC can further boost the planning accuracy by ~24 absolute points on harder problems using simpler problems as exemplars over a random baseline. This demonstrates its ability to generalize to out-of-distribution problems. Planning is important for intelligent agents when exploring the environment and conducting complex multi-hop actions to achieve their goals strategically. Classical studies in planning mainly leverage search-based algorithms and reinforcement learning to tackle these problems. Recent advances in utilizing Large Language Models (LLMs) as the backbone of agents, e.g., for games (ToT, Y ao et al., 2023) and travel scheduling (Xie et al., 2024), call for the need to improve model planning ability to facilitate various downstream applications. Recent work achieves good performance on LLM planning with a combination of search-based algorithms and LLM decoding (Besta et al., 2024; Silver et al., 2024; Lehnert et al., 2024); however, multiple rounds of prompting in a tree structure, e.g., Monte-Carlo Tree Search (MCTS), can lead to high inference cost (Y ao et al., 2023). To further improve the effectiveness and efficiency, this paper focuses on improving the planning capability of LLMs with direct prompting in the in-context learning (ICL) (Brown et al., 2020) manner. We aim to seek signals that help select the good demonstrative task-plan examples in the context, i.e. exemplars (Rubin et al., 2022).

exemplar, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2505.01009

Country:

North America > United States (0.28)
Asia > Middle East (0.28)

Genre: Research Report > New Finding (0.66)

Industry: Consumer Products & Services > Travel (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Complex LLM Planning via Automated Heuristics Discovery

Ling, Hongyi, Parashar, Shubham, Khurana, Sambhav, Olson, Blake, Basu, Anwesha, Sinha, Gaurangi, Tu, Zhengzhong, Caverlee, James, Ji, Shuiwang

arXiv.org Artificial IntelligenceFeb-26-2025

We consider enhancing large language models (LLMs) for complex planning tasks. While existing methods allow LLMs to explore intermediate steps to make plans, they either depend on unreliable self-verification or external verifiers to evaluate these steps, which demand significant data and computations. Here, we propose automated heuristics discovery (AutoHD), a novel approach that enables LLMs to explicitly generate heuristic functions to guide inference-time search, allowing accurate evaluation of intermediate states. These heuristic functions are further refined through a heuristic evolution process, improving their robustness and effectiveness. Our proposed method requires no additional model training or fine-tuning, and the explicit definition of heuristic functions generated by the LLMs provides interpretability and insights into the reasoning process. Extensive experiments across diverse benchmarks demonstrate significant gains over multiple baselines, including nearly twice the accuracy on some datasets, establishing our approach as a reliable and interpretable solution for complex planning tasks.

arxiv preprint arxiv, heuristic function, llm, (15 more...)

arXiv.org Artificial Intelligence

2502.19295

Country:

North America > United States > Texas (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

Query-Efficient Planning with Language Models

Gonzalez-Pumariega, Gonzalo, Chen, Wayne, Kedia, Kushal, Choudhury, Sanjiban

arXiv.org Artificial IntelligenceDec-8-2024

Planning in complex environments requires an agent to efficiently query a world model to find a feasible sequence of actions from start to goal. Recent work has shown that Large Language Models (LLMs), with their rich prior knowledge and reasoning capabilities, can potentially help with planning by searching over promising states and adapting to feedback from the world. In this paper, we propose and study two fundamentally competing frameworks that leverage LLMs for queryefficient planning. The first uses LLMs as a heuristic within a search-based planner to select promising nodes to expand and propose promising actions. The second uses LLMs as a generative planner to propose an entire sequence of actions from start to goal, query a world model, and adapt based on feedback. We show that while both approaches improve upon comparable baselines, using an LLM as a generative planner results in significantly fewer interactions. Our key finding is that the LLM as a planner can more rapidly adapt its planning strategies based on immediate feedback than LLM as a heuristic. We present evaluations and ablations on Robotouille and PDDL planning benchmarks and discuss connections to existing theory on query-efficient planning algorithms. Planning is the process of determining a sequence of feasible or optimal actions that guide an agent from an initial state to a desired goal state (LaValle, 2006). Planning assumes access to a world model, enabling the agent to simulate and evaluate potential actions without relying on trial-and-error in the real environment. However, in many domains, such as robot task and motion planning, querying the world model is the most computationally expensive step (Kaelbling & Lozano-Pérez, 2013; Garrett et al., 2021). For instance, each query involves running physics or geometric computations or even running a local optimizer. Large language models (LLMs), trained on Internet-scale data, offer multiple opportunities to enable query-efficient planning. Notably, LLMs come with key capabilities such as (1) powerful priors to identify promising states that make progress toward the goal (Ahn et al., 2022), (2) tractable posteriors by easily conditioning on feedback to adaptively choose actions (Lee et al., 2023), and (3) generating complex sequences of actions to plan to the goal (Janner et al., 2021). Recent works leverage one or more such capabilities to design LLM-based agents that solve various decisionmaking tasks (Yao et al., 2022; Shinn et al., 2023b; Huang et al., 2022b; Zhao et al., 2023). However, we show that naively extending such LLM agents to the planning setting becomes quickly intractable. It must not only select among all possible state-action queries but condition on the history of all queries and observations.

default, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2412.06162

Genre: Research Report (0.64)

Industry: Energy > Oil & Gas (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Revealing the Barriers of Language Agents in Planning

Xie, Jian, Zhang, Kexun, Chen, Jiangjie, Yuan, Siyu, Zhang, Kai, Zhang, Yikai, Li, Lei, Xiao, Yanghua

arXiv.org Artificial IntelligenceOct-16-2024

Autonomous planning has been an ongoing pursuit since the inception of artificial intelligence. Based on curated problem solvers, early planning agents could deliver precise solutions for specific tasks but lacked generalization. The emergence of large language models (LLMs) and their powerful reasoning capabilities has reignited interest in autonomous planning by automatically generating reasonable solutions for given tasks. However, prior research and our experiments show that current language agents still lack human-level planning abilities. Even the state-of-the-art reasoning model, OpenAI o1, achieves only 15.6% on one of the complex real-world planning benchmarks. This highlights a critical question: What hinders language agents from achieving human-level planning? Although existing studies have highlighted weak performance in agent planning, the deeper underlying issues and the mechanisms and limitations of the strategies proposed to address them remain insufficiently understood. In this work, we apply the feature attribution study and identify two key factors that hinder agent planning: the limited role of constraints and the diminishing influence of questions. We also find that although current strategies help mitigate these challenges, they do not fully resolve them, indicating that agents still have a long way to go before reaching human-level intelligence.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2410.12409

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
Asia > Thailand > Bangkok > Bangkok (0.04)
North America > United States > Ohio (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Consumer Products & Services (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback