Goto

Collaborating Authors

 steady state analysis


Steady State Analysis of Episodic Reinforcement Learning

Neural Information Processing Systems

Reinforcement Learning (RL) tasks generally divide into two kinds: continual learning and episodic learning. The concept of steady state has played a foundational role in the continual setting, where unique steady-state distribution is typically presumed to exist in the task being studied, which enables principled conceptual framework as well as efficient data collection method for continual RL algorithms. On the other hand, the concept of steady state has been widely considered irrelevant for episodic RL tasks, in which the decision process terminates in finite time. Alternative concepts, such as episode-wise visitation frequency, are used in episodic RL algorithms, which are not only inconsistent with their counterparts in continual RL, and also make it harder to design and analyze RL algorithms in the episodic setting. In this paper we proved that unique steady-state distributions pervasively exist in the learning environment of episodic learning tasks, and that the marginal distributions of the system state indeed approach to the steady state in essentially all episodic tasks.


Review for NeurIPS paper: Steady State Analysis of Episodic Reinforcement Learning

Neural Information Processing Systems

Clarity: In my opinion the main weakness of the paper is its presentation. First, there is a lack of clear, direct, explanations of what the paper is trying to accomplish. Several crucial points are either only implied or mentioned in passing without the proper emphasis. This is true for the positioning of the paper itself. The analysis seems to be mostly concerned with policy gradient methods, but this is never explicitly stated.


Review for NeurIPS paper: Steady State Analysis of Episodic Reinforcement Learning

Neural Information Processing Systems

This paper provides a new perspective in thinking about episodic RL, and should be of interest to anyone working with MDPs in reinforcement learning. Three reviewers (R1, R2, R3) commented that it was well-written and clear, although R4 disagreed. All reviewers commented on the interesting contributions (proving that MDPs within episodic RL can be proven to be ergodic). R1, R2, and R3 had concerns that it was a mostly theoretical paper, and wondered how to practically apply these insights. However, the rebuttal goes some way to address these points, and R4 was convinced to raise their recommendation to weak accept.


Steady State Analysis of Episodic Reinforcement Learning

Neural Information Processing Systems

Reinforcement Learning (RL) tasks generally divide into two kinds: continual learning and episodic learning. The concept of steady state has played a foundational role in the continual setting, where unique steady-state distribution is typically presumed to exist in the task being studied, which enables principled conceptual framework as well as efficient data collection method for continual RL algorithms. On the other hand, the concept of steady state has been widely considered irrelevant for episodic RL tasks, in which the decision process terminates in finite time. Alternative concepts, such as episode-wise visitation frequency, are used in episodic RL algorithms, which are not only inconsistent with their counterparts in continual RL, and also make it harder to design and analyze RL algorithms in the episodic setting. In this paper we proved that unique steady-state distributions pervasively exist in the learning environment of episodic learning tasks, and that the marginal distributions of the system state indeed approach to the steady state in essentially all episodic tasks.