Goto

Collaborating Authors

 solvability



The Agent Capability Problem: Predicting Solvability Through Information-Theoretic Bounds

arXiv.org Artificial Intelligence

When should an autonomous agent commit resources to a task? We introduce the Agent Capability Problem (ACP), a framework for predicting whether an agent can solve a problem under resource constraints. Rather than relying on empirical heuristics, ACP frames problem-solving as information acquisition: an agent requires $\Itotal$ bits to identify a solution and gains $\Istep$ bits per action at cost $\Cstep$, yielding an effective cost $\Ceff = (\Itotal/\Istep), \Cstep$ that predicts resource requirements before search. We prove that $\Ceff$ lower-bounds expected cost and provide tight probabilistic upper bounds. Experimental validation shows that ACP predictions closely track actual agent performance, consistently bounding search effort while improving efficiency over greedy and random strategies. The framework generalizes across LLM-based and agentic workflows, linking principles from active learning, Bayesian optimization, and reinforcement learning through a unified information-theoretic lens. \


Verifiable Accuracy and Abstention Rewards in Curriculum RL to Alleviate Lost-in-Conversation

arXiv.org Artificial Intelligence

Large Language Models demonstrate strong capabilities in single-turn instruction following but suffer from Lost-in-Conversation (LiC), a degradation in performance as information is revealed progressively in multi-turn settings. Motivated by the current progress on Reinforcement Learning with Verifiable Rewards (RLVR), we propose Curriculum Reinforcement Learning with Verifiable Accuracy and Abstention Rewards (RLAAR), a framework that encourages models not only to generate correct answers, but also to judge the solvability of questions in the multi-turn conversation setting. Our approach employs a competence-gated curriculum that incrementally increases dialogue difficulty (in terms of instruction shards), stabilizing training while promoting reliability. Using multi-turn, on-policy rollouts and a mixed-reward system, RLAAR teaches models to balance problem-solving with informed abstention, reducing premature answering behaviors that cause LiC. Evaluated on LiC benchmarks, RLAAR significantly mitigates LiC performance decay (62.6% to 75.1%) and improves calibrated abstention rates (33.5% to 73.4%). Together, these results provide a practical recipe for building multi-turn reliable and trustworthy LLMs.


Code-driven Number Sequence Calculation: Enhancing the inductive Reasoning Abilities of Large Language Models

arXiv.org Artificial Intelligence

Large language models (LLMs) make remarkable progress in reasoning tasks. Among different reasoning modes, inductive reasoning, due to its better alignment with human learning, attracts increasing interest. However, research on inductive reasoning faces certain challenges. First, existing inductive data mostly focuses on superficial regularities while lacking more complex internal patterns. Second, current works merely prompt LLMs or finetune on simple prompt-response pairs, but do not provide precise thinking processes nor implement difficulty control. Unlike previous work, we address these challenges by introducing \textit{CodeSeq}, a synthetic post-training dataset built from number sequences. We package number sequences into algorithmic problems to discover their general terms, defining a general term generation (GTG) task correspondingly. Our pipeline generates supervised finetuning data by reflecting on failed test cases and incorporating iterative corrections, thereby teaching LLMs to learn autonomous case generation and self-checking. Additionally, it leverages reinforcement learning with a novel Case-Synergy Solvability Scaling Reward based on both solvability, estimated from the problem pass rate, and the success rate of self-directed case generation, enabling models to learn more effectively from both successes and failures. Experimental results show that the models trained with \textit{CodeSeq} improve on various reasoning tasks and can preserve the models' OOD performance.



PDDLFuse: A Tool for Generating Diverse Planning Domains

arXiv.org Artificial Intelligence

Various real-world challenges require planning algorithms that can adapt to a broad range of domains. Traditionally, the creation of planning domains has relied heavily on human implementation, which limits the scale and diversity of available domains. While recent advancements have leveraged generative AI technologies such as large language models (LLMs) for domain creation, these efforts have predominantly focused on translating existing domains from natural language descriptions rather than generating novel ones. In contrast, the concept of domain randomization, which has been highly effective in reinforcement learning, enhances performance and generalizability by training on a diverse array of randomized new domains. Inspired by this success, our tool, PDDLFuse, aims to bridge this gap in Planning Domain Definition Language (PDDL). PDDLFuse is designed to generate new, diverse planning domains that can be used to validate new planners or test foundational planning models. We have developed methods to adjust the domain generators parameters to modulate the difficulty of the domains it generates. This adaptability is crucial as existing domain-independent planners often struggle with more complex problems. Initial tests indicate that PDDLFuse efficiently creates intricate and varied domains, representing a significant advancement over traditional domain generation methods and making a contribution towards planning research.


Guided Game Level Repair via Explainable AI

arXiv.org Artificial Intelligence

Procedurally generated levels created by machine learning models can be unsolvable without further editing. Various methods have been developed to automatically repair these levels by enforcing hard constraints during the post-processing step. However, as levels increase in size, these constraint-based repairs become increasingly slow. This paper proposes using explainability methods to identify specific regions of a level that contribute to its unsolvability. By assigning higher weights to these regions, constraint-based solvers can prioritize these problematic areas, enabling more efficient repairs. Our results, tested across three games, demonstrate that this approach can help to repair procedurally generated levels faster.


Unleashing Reasoning Capability of LLMs via Scalable Question Synthesis from Scratch

arXiv.org Artificial Intelligence

The availability of high-quality data is one of the most important factors in improving the reasoning capability of LLMs. Existing works have demonstrated the effectiveness of creating more instruction data from seed questions or knowledge bases. Recent research indicates that continually scaling up data synthesis from strong models (e.g., GPT-4) can further elicit reasoning performance. Though promising, the open-sourced community still lacks high-quality data at scale and scalable data synthesis methods with affordable costs. To address this, we introduce ScaleQuest, a scalable and novel data synthesis method that utilizes "smallsize" (e.g., 7B) open-source models to generate questions from scratch without the need for seed data with complex augmentation constraints. With the efficient ScaleQuest, we automatically constructed a mathematical reasoning dataset consisting of 1 million problem-solution pairs, which are more effective than existing open-sourced datasets. It can universally increase the performance of mainstream open-source models (i.e., Mistral, Llama3, DeepSeekMath, and Qwen2-Math) by achieving 29.2% to 46.4% gains on MATH. Notably, simply fine-tuning the Qwen2-Math-7B-Base model with our dataset can even surpass Qwen2-Math-7B-Instruct, a strong and well-aligned model on closed-source data, and proprietary models such as GPT-4-Turbo and Claude-3.5 Right: Results of Llama3-8B fine-tuned on publicly available datasets constructed by different methods. Juntao Li is the corresponding author. How to improve the reasoning capabilities of Large Language Models (LLMs) has attracted significant attention. The success of recent advanced models, such as OpenAI o1 and Claude-3.5, However, the proprietary nature of the data presents a significant barrier to the open-source community.


Layered LA-MAPF: a decomposition of large agent MAPF instance to accelerate solving without compromising solvability

arXiv.org Artificial Intelligence

Multi-Agent Path Finding (MAPF) has been widely studied in recent years. However, most existing MAPF algorithms assume that an agent occupies only a single grid in a grid-based map. This assumption limits their applicability in many real-world domains where agents have geometric shapes, rather than being point-like. Such agents, which can occupy multiple cells simultaneously, are referred to as ``large'' agents. When considering the shape and size of agents in MAPF, the computational complexity increases significantly as the number of agents grows, primarily due to the increased overhead in conflict detection between geometric agents. In this paper, we propose two types of subproblems for the LA-MAPF (Large-Agent MAPF) problem: \textbf{cluster} (which has no constraints on the order of solution) and \textbf{level} (which imposes constraints on the solution order). We introduce \textbf{Layered LA-MAPF}, a method that decomposes a MAPF instance involving geometric agents into clusters, and then further decomposes each cluster into levels. This approach aims to reduce time complexity when solving LA-MAPF problems. Our results demonstrate the performance of our method as the number of agents increases across various maps, and how it accelerates LA-MAPF methods, such as LA-CBS and LA-LaCAM. Experiments show that our LA-MAPF method with instance decomposition \textbf{halves the time cost (reducing from an average of 40s to 20s) and triples the success rate (from an average of 0.27 to 0.80)} in finding a solution within 60 seconds. To facilitate further research, we have made the source code for Layered LA-MAPF publicly available at \url{https://github.com/JoeYao-bit/LayeredMAPF/algorithm/LA-MAPF}.


Bipolar fuzzy relation equations systems based on the product t-norm

arXiv.org Artificial Intelligence

Bipolar fuzzy relation equations arise as a generalization of fuzzy relation equations considering unknown variables together with their logical connective negations. The occurrence of a variable and the occurrence of its negation simultaneously can give very useful information for certain frameworks where the human reasoning plays a key role. Hence, the resolution of bipolar fuzzy relation equations systems is a research topic of great interest. This paper focuses on the study of bipolar fuzzy relation equations systems based on the max-product t-norm composition. Specifically, the solvability and the algebraic structure of the set of solutions of these bipolar equations systems will be studied, including the case in which such systems are composed of equations whose independent term be equal to zero. As a consequence, this paper complements the contribution carried out by the authors on the solvability of bipolar max-product fuzzy relation equations.