[D] Is there any bottleneck with online reinforcement learning that makes it not mainstream yet? • r/MachineLearning
Online learning may refer to the ones with batch size to be 1, but here I mean online reinforcement learning is the RL where the agent is updated at every timestep. Naively speaking, the concept of online reinforcement learning sounds very much like how human learns, and it's very effective for tasks like stochastic games. Since it performs an update at each timestep, the agent may be more robust under the circumstances such that the current state is relatively unfamiliar. As it was updated in the past ten or so timesteps which are close to the current timesteps, the agent is more adapted to the unfamiliar current states. Also, the weights of the agent may be considered to be conditioned on the past events in the same episode, which may alleviate the issue of LSTM and memory network, that is, they still has the limit on the extent to which they can remember the distant past events in the same episode.
Jan-16-2018, 00:19:00 GMT