Why do policy gradient methods work so well in cooperative MARL? Evidence from policy representation