The RL Perceptron: Generalisation Dynamics of Policy Learning in High Dimensions