Solving Deep Reinforcement Learning Benchmarks with Linear Policy Networks