The Policy of Truth
This is the sixth part of "An Outsider's Tour of Reinforcement Learning." Our first generic candidate for solving reinforcement learning is Policy Gradient. I find it shocking that Policy Gradient wasn't ruled out as a bad idea in 1993. Policy gradient is seductive as it apparently lets one fine tune a program to solve any problem without any domain knowledge. Of course, anything that makes such a claim must be too general for its own good.
Feb-27-2018, 12:18:23 GMT
- Technology: