Multilinear Tensor Low-Rank Approximation for Policy-Gradient Methods in Reinforcement Learning