[D] Better reinforcement learning algorithms than A3C? • r/MachineLearning
This sounds like an underspecified example. I mean, A3C and DQN/Q-learning aren't even the same in terms of off or on-policy learning. A3C has mostly been replaced by PPO, and on-policy SOTA has moved on from that to Impala/Unicorn. I'm not sure what is SOTA for off-policy learning, but Rainbow outperforms DQN and most of the DQN zoo. And progress here may be somewhat illusory, as the methodological papers have been pointing out: a lot of these tasks are not inherently difficult, there's so much variance in training runs, improvements may be to undocumented tweaks or just somewhat better hyperparameters...
May-26-2018, 02:15:26 GMT
- Technology: