Online Planning with Lookahead Policies

Oct-10-2024, 23:39:28 GMT–Neural Information Processing Systems

Real Time Dynamic Programming (RTDP) is an online algorithm based on Dynamic Programming (DP) that acts by 1-step greedy planning. Unlike DP, RTDP does not require access to the entire state space, i.e., it explicitly handles the exploration. This fact makes RTDP particularly appealing when the state space is large and it is not possible to update all states simultaneously. In this we devise a multi-step greedy RTDP algorithm, which we call h -RTDP, that replaces the 1-step greedy policy with a h -step lookahead policy. We analyze h -RTDP in its exact form and establish that increasing the lookahead horizon, h, results in an improved sample complexity, with the cost of additional computations.

algorithm, lookahead policy, online planning, (5 more...)

Neural Information Processing Systems

Oct-10-2024, 23:39:28 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)