Strongly-polynomial time and validation analysis of policy gradient methods