Large Language Models Can Implement Policy Iteration Ethan Brooks 1, Logan Walls 2, Richard L. Lewis

Neural Information Processing Systems 

Gradient techniques are inherently slow, sacrificing the "few-shot" quality