The Policy of Truth

#artificialintelligence 

This is the sixth part of "An Outsider's Tour of Reinforcement Learning." Our first generic candidate for solving reinforcement learning is Policy Gradient. I find it shocking that Policy Gradient wasn't ruled out as a bad idea in 1993. Policy gradient is seductive as it apparently lets one fine tune a program to solve any problem without any domain knowledge. Of course, anything that makes such a claim must be too general for its own good.