online planning
AdaptiveOnlinePacking-guidedSearchforPOMDPs
Thepartially observableMarkovdecision process (POMDP) provides ageneral framework for modeling an agent's decision process with state uncertainty, and online planning plays a pivotal role in solving it. A belief is a distribution of states representing state uncertainty. Methods forlarge-scale POMDP problems rely on the same idea of sampling both states and observations.
- Europe > Netherlands > South Holland > Delft (0.04)
- North America > Canada (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
- (3 more...)
Online Planning with Lookahead Policies
Real Time Dynamic Programming (RTDP) is an online algorithm based on Dynamic Programming (DP) that acts by 1-step greedy planning. Unlike DP, RTDP does not require access to the entire state space, i.e., it explicitly handles the exploration. This fact makes RTDP particularly appealing when the state space is large and it is not possible to update all states simultaneously. In this we devise a multi-step greedy RTDP algorithm, which we call $h$-RTDP, that replaces the 1-step greedy policy with a $h$-step lookahead policy. We analyze $h$-RTDP in its exact form and establish that increasing the lookahead horizon, $h$, results in an improved sample complexity, with the cost of additional computations. This is the first work that proves improved sample complexity as a result of {\em increasing} the lookahead horizon in online planning. We then analyze the performance of $h$-RTDP in three approximate settings: approximate model, approximate value updates, and approximate state representation. For these cases, we prove that the asymptotic performance of $h$-RTDP remains the same as that of a corresponding approximate DP algorithm, the best one can hope for without further assumptions on the approximation errors.
- Europe > Netherlands > South Holland > Delft (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
- (3 more...)
DyRo-MCTS: A Robust Monte Carlo Tree Search Approach to Dynamic Job Shop Scheduling
Chen, Ruiqi, Mei, Yi, Zhang, Fangfang, Zhang, Mengjie
Dynamic job shop scheduling, a fundamental combinatorial optimisation problem in various industrial sectors, poses substantial challenges for effective scheduling due to frequent disruptions caused by the arrival of new jobs. State-of-the-art methods employ machine learning to learn scheduling policies offline, enabling rapid responses to dynamic events. However, these offline policies are often imperfect, necessitating the use of planning techniques such as Monte Carlo Tree Search (MCTS) to improve performance at online decision time. The unpredictability of new job arrivals complicates online planning, as decisions based on incomplete problem information are vulnerable to disturbances. To address this issue, we propose the Dynamic Robust MCTS (DyRo-MCTS) approach, which integrates action robustness estimation into MCTS. DyRo-MCTS guides the production environment toward states that not only yield good scheduling outcomes but are also easily adaptable to future job arrivals. Extensive experiments show that DyRo-MCTS significantly improves the performance of offline-learned policies with negligible additional online planning time. Moreover, DyRo-MCTS consistently outperforms vanilla MCTS across various scheduling scenarios. Further analysis reveals that its ability to make robust scheduling decisions leads to long-term, sustainable performance gains under disturbances.
- Asia > Singapore (0.04)
- Oceania > New Zealand (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > China > Jiangsu Province > Nanjing (0.04)
- Asia > China > Guangdong Province > Guangzhou (0.04)
Online Planning for Cooperative Air-Ground Robot Systems with Unknown Fuel Requirements
Agarwal, Ritvik, Hatami, Behnoushsadat, Gautam, Alvika, Maini, Parikshit
We consider an online variant of the fuel-constrained UAV routing problem with a ground-based mobile refueling station (FCURP-MRS), where targets incur unknown fuel costs. We develop a two-phase solution: an offline heuristic-based planner computes initial UAV and UGV paths, and a novel online planning algorithm that dynamically adjusts rendezvous points based on real-time fuel consumption during target processing. Preliminary Gazebo simulations demonstrate the feasibility of our approach in maintaining UAV-UGV path validity, ensuring mission completion. Link to video: https://youtu.be/EmpVj-fjqNY
- Transportation (0.88)
- Law > Statutes (0.40)
- Law > Environmental Law (0.40)
Review for NeurIPS paper: Online Planning with Lookahead Policies
Additional Feedback: COMMENTS AFTER REBUTTAL Thank you for your response. However, in this paper's case I find that the significance of the paper (i.e., support for your claim that "theoretical results provided in this work are important on their own") is severely lacking without experiments showing a link between this theory and an algorithm's performance in terms of measures like running time, number of 1-step Bellman backups, etc. ***Note: this is not a claim that every theoretical paper needs experiments; it applies only to this specific work, due to the theory issues mentioned in the original review.*** The rebuttal's attempted arguments against providing experiments really miss the mark: -- The rebuttal gives the "Beyond the one step greedy approach in RL" as an example of a paper similar in the degree of its theoretical focus to this submission, but that paper actually has experiments! Light experiments could do the job. That "Beyond the one step greedy approach in RL" paper that you mentioned yourself is a case in point.
Autonomous Tail-Sitter Flights in Unknown Environments
Lu, Guozheng, Ren, Yunfan, Zhu, Fangcheng, Li, Haotian, Xue, Ruize, Cai, Yixi, Lyu, Ximin, Zhang, Fu
Trajectory generation for fully autonomous flights of tail-sitter unmanned aerial vehicles (UAVs) presents substantial challenges due to their highly nonlinear aerodynamics. In this paper, we introduce, to the best of our knowledge, the world's first fully autonomous tail-sitter UAV capable of high-speed navigation in unknown, cluttered environments. The UAV autonomy is enabled by cutting-edge technologies including LiDAR-based sensing, differential-flatness-based trajectory planning and control with purely onboard computation. In particular, we propose an optimization-based tail-sitter trajectory planning framework that generates high-speed, collision-free, and dynamically-feasible trajectories. To efficiently and reliably solve this nonlinear, constrained \textcolor{black}{problem}, we develop an efficient feasibility-assured solver, EFOPT, tailored for the online planning of tail-sitter UAVs. We conduct extensive simulation studies to benchmark EFOPT's superiority in planning tasks against conventional NLP solvers. We also demonstrate exhaustive experiments of aggressive autonomous flights with speeds up to 15m/s in various real-world environments, including indoor laboratories, underground parking lots, and outdoor parks. A video demonstration is available at https://youtu.be/OvqhlB2h3k8, and the EFOPT solver is open-sourced at https://github.com/hku-mars/EFOPT.
- Asia > China (0.28)
- North America > United States (0.28)
- Transportation > Air (1.00)
- Information Technology > Robotics & Automation (1.00)
- Aerospace & Defense > Aircraft (1.00)
- Energy > Oil & Gas > Upstream (0.46)
Overcoming Slow Decision Frequencies in Continuous Control: Model-Based Sequence Reinforcement Learning for Model-Free Control
Patel, Devdhar, Siegelmann, Hava
Reinforcement learning (RL) is rapidly reaching and surpassing human-level control capabilities. However, state-of-the-art RL algorithms often require timesteps and reaction times significantly faster than human capabilities, which is impractical in real-world settings and typically necessitates specialized hardware. Such speeds are difficult to achieve in the real world and often requires specialized hardware. We introduce Sequence Reinforcement Learning (SRL), an RL algorithm designed to produce a sequence of actions for a given input state, enabling effective control at lower decision frequencies. SRL addresses the challenges of learning action sequences by employing both a model and an actor-critic architecture operating at different temporal scales. We propose a "temporal recall" mechanism, where the critic uses the model to estimate intermediate states between primitive actions, providing a learning signal for each individual action within the sequence. Once training is complete, the actor can generate action sequences independently of the model, achieving model-free control at a slower frequency. We evaluate SRL on a suite of continuous control tasks, demonstrating that it achieves performance comparable to state-of-the-art algorithms while significantly reducing actor sample complexity. To better assess performance across varying decision frequencies, we introduce the Frequency-Averaged Score (FAS) metric. Our results show that SRL significantly outperforms traditional RL algorithms in terms of FAS, making it particularly suitable for applications requiring variable decision frequencies. Additionally, we compare SRL with model-based online planning, showing that SRL achieves superior FAS while leveraging the same model during training that online planners use for planning.
- North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Asia > Middle East > Jordan (0.04)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Leisure & Entertainment (0.68)
- Information Technology (0.68)