AITopics

Reverse-Engineered Reasoning for Open-Ended Generation

Wang, Haozhe, Que, Haoran, Xu, Qixin, Liu, Minghao, Zhou, Wangchunshu, Feng, Jiazhan, Zhong, Wanjun, Ye, Wei, Yang, Tong, Huang, Wenhao, Zhang, Ge, Lin, Fangzhen

While the ``deep reasoning'' paradigm has spurred significant advances in verifiable domains like mathematics, its application to open-ended, creative generation remains a critical challenge. The two dominant methods for instilling reasoning -- reinforcement learning (RL) and instruction distillation -- falter in this area; RL struggles with the absence of clear reward signals and high-quality reward models, while distillation is prohibitively expensive and capped by the teacher model's capabilities. To overcome these limitations, we introduce REverse-Engineered Reasoning (REER), a new paradigm that fundamentally shifts the approach. Instead of building a reasoning process ``forwards'' through trial-and-error or imitation, REER works ``backwards'' from known-good solutions to computationally discover the latent, step-by-step deep reasoning process that could have produced them. Using this scalable, gradient-free approach, we curate and open-source DeepWriting-20K, a large-scale dataset of 20,000 deep reasoning trajectories for open-ended tasks. Our model, DeepWriter-8B, trained on this data, not only surpasses strong open-source baselines but also achieves performance competitive with, and at times superior to, leading proprietary models like GPT-4o and Claude 3.5.

large language model, machine learning, natural language, (18 more...)

2509.0616

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(2 more...)

Babakano, Faiza, Fahmin, Ahmed, Shen, Bojie, Cheema, Muhammad Aamir, Siddiqui, Isma Farah

Energy-Efficient Path Planning with Multi-Location Object Pickup for Mobile Robots on Uneven Terrain

Autonomous Mobile Robots (AMRs) operate on battery power, making energy efficiency a critical consideration particularly in outdoor environments where terrain variations affect energy consumption. While prior research has primarily focused on computing energy-efficient paths from a source to a destination, these approaches often overlook practical scenarios where a robot needs to pick up an object en route--an action that can significantly impact energy consumption due to changes in payload. This paper introduces the Object-Pickup Minimum Energy Path Problem (OMEPP), which addresses energy-efficient route planning for Autonomous Mobile Robots (AMRs) required to pick up an object from one of the many possible locations and take it to a destination. To address the OMEPP problem, we first introduce a baseline algorithm that employs the Z* algorithm, a variant of A* tailored for energy-efficient routing, to iteratively visit each pickup point. While this approach guarantees optimality, it suffers from high computational cost due to repeated search efforts at each pickup location. To mitigate this inefficiency, we propose a concurrent PCPD search that manages multiple Z* searches simultaneously across all pickup points. Central to our solution is the Payload-Constrained Path Database (PCPD), an extension of the Compressed Path Database (CPD), a state-of-the-art technique for fast shortest path computation, that incorporates payload constraints. We further demonstrate that PCPD significantly reduces branching factors during search, leading to improved overall performance. Although the concurrent PCPD search may produce slightly suboptimal solutions, extensive experiments on real-world datasets demonstrate that it achieves near-optimal performance while being one to two orders of magnitude faster than the baseline algorithm derived from existing methods.

algorithm, artificial intelligence, planning & scheduling, (14 more...)

2509.06061

Country:

Europe (1.00)
Asia (1.00)
North America > United States (0.68)

Genre: Research Report > New Finding (0.67)

Industry: Energy > Energy Storage (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.93)
Information Technology > Artificial Intelligence > Robots > Locomotion (0.82)

Scenario-based Decision-making Using Game Theory for Interactive Autonomous Driving: A Survey

Lin, Zhihao, Tian, Zhen

Game-based interactive driving simulations have emerged as versatile platforms for advancing decision-making algorithms in road transport mobility. While these environments offer safe, scalable, and engaging settings for testing driving strategies, ensuring both realism and robust performance amid dynamic and diverse scenarios remains a significant challenge. Recently, the integration of game-based techniques with advanced learning frameworks has enabled the development of adaptive decision-making models that effectively manage the complexities inherent in varied driving conditions. These models outperform traditional simulation methods, especially when addressing scenario-specific challenges, ranging from obstacle avoidance on highways and precise maneuvering during on-ramp merging to navigation in roundabouts, unsignalized intersections, and even the high-speed demands of autonomous racing. Despite numerous innovations in game-based interactive driving, a systematic review comparing these approaches across different scenarios is still missing. This survey provides a comprehensive evaluation of game-based interactive driving methods by summarizing recent advancements and inherent roadway features in each scenario. Furthermore, the reviewed algorithms are critically assessed based on their adaptation of the standard game model and an analysis of their specific mechanisms to understand their impact on decision-making performance. Finally, the survey discusses the limitations of current approaches and outlines promising directions for future research.

machine learning, real time system, reinforcement learning, (20 more...)

2509.05777

Country: Europe > United Kingdom (0.27)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)
Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
(6 more...)

Ramesh, Vignav, Mardani, Morteza

Test-Time Scaling of Diffusion Models via Noise Trajectory Search

The iterative and stochastic nature of diffusion models enables test-time scaling, whereby spending additional compute during denoising generates higher-fidelity samples. Increasing the number of denoising steps is the primary scaling axis, but this yields quickly diminishing returns. Instead optimizing the noise trajectory--the sequence of injected noise vectors--is promising, as the specific noise realizations critically affect sample quality; but this is challenging due to a high-dimensional search space, complex noise-outcome interactions, and costly trajectory evaluations. We address this by first casting diffusion as a Markov Decision Process (MDP) with a terminal reward, showing tree-search methods such as Monte Carlo tree search (MCTS) to be meaningful but impractical. To balance performance and efficiency, we then resort to a relaxation of MDP, where we view denoising as a sequence of independent contextual bandits. This allows us to introduce an $ε$-greedy search algorithm that globally explores at extreme timesteps and locally exploits during the intermediate steps where de-mixing occurs. Experiments on EDM and Stable Diffusion reveal state-of-the-art scores for class-conditioned/text-to-image generation, exceeding baselines by up to $164\%$ and matching/exceeding MCTS performance. To our knowledge, this is the first practical method for test-time noise trajectory optimization of arbitrary (non-differentiable) rewards.

artificial intelligence, machine learning, reward function, (17 more...)

2506.03164

Genre: Research Report (1.00)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

arXiv.org Artificial IntelligenceSep-8-2025

Planning from Point Clouds over Continuous Actions for Multi-object Rearrangement

Saha, Kallol, Li, Amber, Rodriguez-Izquierdo, Angela, Yu, Lifan, Eisner, Ben, Likhachev, Maxim, Held, David

Long-horizon planning for robot manipulation is a challenging problem that requires reasoning about the effects of a sequence of actions on a physical 3D scene. While traditional task planning methods are shown to be effective for long-horizon manipulation, they require discretizing the continuous state and action space into symbolic descriptions of objects, object relationships, and actions. Instead, we propose a hybrid learning-and-planning approach that leverages learned models as domain-specific priors to guide search in high-dimensional continuous action spaces. We introduce SPOT: Search over Point cloud Object Transformations, which plans by searching for a sequence of transformations from an initial scene point cloud to a goal-satisfying point cloud. SPOT samples candidate actions from learned suggesters that operate on partially observed point clouds, eliminating the need to discretize actions or object relationships. We evaluate SPOT on multi-object rearrangement tasks, reporting task planning success and task execution success in both simulation and real-world environments. Our experiments show that SPOT generates successful plans and outperforms a policy-learning approach. We also perform ablations that highlight the importance of search-based planning.

artificial intelligence, configuration, machine learning, (16 more...)

2509.04645

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.86)

Dernedde, Tim, Thyssens, Daniela, Schmidt-Thieme, Lars

Recurrent State Encoders for Efficient Neural Combinatorial Optimization

arXiv.org Artificial IntelligenceSep-8-2025

The primary paradigm in Neural Combinatorial Optimization (NCO) are construction methods, where a neural network is trained to sequentially add one solution component at a time until a complete solution is constructed. We observe that the typical changes to the state between two steps are small, since usually only the node that gets added to the solution is removed from the state. An efficient model should be able to reuse computation done in prior steps. To that end, we propose to train a recurrent encoder that computes the state embeddings not only based on the state but also the embeddings of the step before. We show that the recurrent encoder can achieve equivalent or better performance than a non-recurrent encoder even if it consists of $3\times$ fewer layers, thus significantly improving on latency. We demonstrate our findings on three different problems: the Traveling Salesman Problem (TSP), the Capacitated Vehicle Routing Problem (CVRP), and the Orienteering Problem (OP) and integrate the models into a large neighborhood search algorithm, to showcase the practical relevance of our findings.

artificial intelligence, encoder, machine learning, (14 more...)

2509.05084

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.54)

Industry: Transportation (0.55)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceSep-8-2025

CoVeR: Conformal Calibration for Versatile and Reliable Autoregressive Next-Token Prediction

Chen, Yuzhu, Wang, Yingjie, Liu, Shunyu, Jing, Yongcheng, Tao, Dacheng

Autoregressive pre-trained models combined with decoding methods have achieved impressive performance on complex reasoning tasks. While mainstream decoding strategies such as beam search can generate plausible candidate sets, they often lack provable coverage guarantees, and struggle to effectively balance search efficiency with the need for versatile trajectories, particularly those involving long-tail sequences that are essential in certain real-world applications. To address these limitations, we propose \textsc{CoVeR}, a novel model-free decoding strategy wihtin the conformal prediction framework that simultaneously maintains a compact search space and ensures high coverage probability over desirable trajectories. Theoretically, we establish a PAC-style generalization bound, guaranteeing that \textsc{CoVeR} asymptotically achieves a coverage rate of at least $1 - α$ for any target level $α\in (0,1)$.

machine learning, natural language, prediction, (18 more...)

2509.04733

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Struleva, Marina, Hundrieser, Shayan, Schuhmacher, Dominic, Munk, Axel

Sharp Convergence Rates of Empirical Unbalanced Optimal Transport for Spatio-Temporal Point Processes

arXiv.org Machine LearningSep-5-2025

We statistically analyze empirical plug-in estimators for unbalanced optimal transport (UOT) formalisms, focusing on the Kantorovich-Rubinstein distance, between general intensity measures based on observations from spatio-temporal point processes. Specifically, we model the observations by two weakly time-stationary point processes with spatial intensity measures $μ$ and $ν$ over the expanding window $(0,t]$ as $t$ increases to infinity, and establish sharp convergence rates of the empirical UOT in terms of the intrinsic dimensions of the measures. We assume a sub-quadratic temporal growth condition of the variance of the process, which allows for a wide range of temporal dependencies. As the growth approaches quadratic, the convergence rate becomes slower. This variance assumption is related to the time-reduced factorial covariance measure, and we exemplify its validity for various point processes, including the Poisson cluster, Hawkes, Neyman-Scott, and log-Gaussian Cox processes. Complementary to our upper bounds, we also derive matching lower bounds for various spatio-temporal point processes of interest and establish near minimax rate optimality of the empirical Kantorovich-Rubinstein distance.

artificial intelligence, machine learning, point process, (18 more...)

arXiv.org Machine Learning

2509.04225

Country:

Europe > Germany > Lower Saxony > Gottingen (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry:

Government (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.45)
Health & Medicine > Diagnostic Medicine > Imaging (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.34)

Luise, Alberto, Lombardi, Michele, Koenigsbuch, Florent Teichteil

Hybrid Reinforcement Learning and Search for Flight Trajectory Planning

arXiv.org Artificial IntelligenceSep-5-2025

This paper explores the combination of Reinforcement Learning (RL) and search-based path planners to speed up the optimization of flight paths for airliners, where in case of emergency a fast route re-calculation can be crucial. The fundamental idea is to train an RL Agent to pre-compute near-optimal paths based on location and atmospheric data and use those at runtime to constrain the underlying path planning solver and find a solution within a certain distance from the initial guess. The approach effectively reduces the size of the solver's search space, significantly speeding up route optimization. Although global optimality is not guaranteed, empirical results conducted with Airbus aircraft's performance models show that fuel consumption remains nearly identical to that of an unconstrained solver, with deviations typically within 1%. At the same time, computation speed can be improved by up to 50% as compared to using a conventional solver alone.

machine learning, reinforcement learning, trajectory, (16 more...)

2509.041

Country: Europe > Italy (0.14)

Genre: Research Report (1.00)

Industry: Transportation > Air (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)