action persistence
- North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
- North America > Puerto Rico > San Juan > San Juan (0.04)
- North America > Canada (0.04)
- North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
- North America > Puerto Rico > San Juan > San Juan (0.04)
- North America > Canada (0.04)
Select before Act: Spatially Decoupled Action Repetition for Continuous Control
Nie, Buqing, Fu, Yangqing, Gao, Yue
Reinforcement Learning (RL) has achieved remarkable success in various continuous control tasks, such as robot manipulation and locomotion. Different to mainstream RL which makes decisions at individual steps, recent studies have incorporated action repetition into RL, achieving enhanced action persistence with improved sample efficiency and superior performance. However, existing methods treat all action dimensions as a whole during repetition, ignoring variations among them. This constraint leads to inflexibility in decisions, which reduces policy agility with inferior effectiveness. In this work, we propose a novel repetition framework called SDAR, which implements Spatially Decoupled Action Repetition through performing closed-loop act-or-repeat selection for each action dimension individually. SDAR achieves more flexible repetition strategies, leading to an improved balance between action persistence and diversity. Compared to existing repetition frameworks, SDAR is more sample efficient with higher policy performance and reduced action fluctuation. Experiments are conducted on various continuous control scenarios, demonstrating the effectiveness of spatially decoupled repetition design proposed in this work.
- Asia > China > Shanghai > Shanghai (0.04)
- North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
- North America > Canada > Alberta > Census Division No. 15 > Municipal District of Bighorn No. 8 > Kananaskis (0.04)
- Europe > Portugal > Braga > Braga (0.04)
- Research Report (1.00)
- Workflow (0.88)
Review for NeurIPS paper: Reinforcement Learning for Control with Multiple Frequencies
Summary and Contributions: This work introduces an algorithm for reinforcement learning in settings with factored action spaces in which each element of the action space may have a different control frequency. To motivate the necessity of such an algorithm, it provides an argument that in this setting, a naive approach with a stationary Markovian policy on the states (which does not observe the timestep) can be suboptimal. Further, it argues that simply augmenting the state or action spaces and applying standard RL methods results in costs which are exponential in L, the least common multiple of the set of action persistences. In constructing the method this paper introduces c-persistent Bellman operators, a way of updating a Q-function in an environment with multiple action persistences, and proves its convergence. This leads to a method which uses L Q-functions, one for each step in the periodic structure of action persistences.
State-Novelty Guided Action Persistence in Deep Reinforcement Learning
Hu, Jianshu, Weng, Paul, Ban, Yutong
While a powerful and promising approach, deep reinforcement learning (DRL) still suffers from sample inefficiency, which can be notably improved by resorting to more sophisticated techniques to address the exploration-exploitation dilemma. One such technique relies on action persistence (i.e., repeating an action over multiple steps). However, previous work exploiting action persistence either applies a fixed strategy or learns additional value functions (or policy) for selecting the repetition number. In this paper, we propose a novel method to dynamically adjust the action persistence based on the current exploration status of the state space. In such a way, our method does not require training of additional value functions or policy. Moreover, the use of a smooth scheduling of the repeat probability allows a more effective balance between exploration and exploitation. Furthermore, our method can be seamlessly integrated into various basic exploration strategies to incorporate temporal persistence. Finally, extensive experiments on different DMControl tasks demonstrate that our state-novelty guided action persistence method significantly improves the sample efficiency.
Control Frequency Adaptation via Action Persistence in Batch Reinforcement Learning
Metelli, Alberto Maria, Mazzolini, Flavio, Bisi, Lorenzo, Sabbioni, Luca, Restelli, Marcello
The choice of the control frequency of a system has a relevant impact on the ability of reinforcement learning algorithms to learn a highly performing policy. In this paper, we introduce the notion of action persistence that consists in the repetition of an action for a fixed number of decision steps, having the effect of modifying the control frequency. We start analyzing how action persistence affects the performance of the optimal policy, and then we present a novel algorithm, Persistent Fitted Q-Iteration (PFQI), that extends FQI, with the goal of learning the optimal value function at a given persistence. After having provided a theoretical study of PFQI and a heuristic approach to identify the optimal persistence, we present an experimental campaign on benchmark domains to show the advantages of action persistence and proving the effectiveness of our persistence selection method.
- North America > United States > Colorado > Denver County > Denver (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > Canada > Alberta (0.14)
- (16 more...)