Matrix Low-Rank Trust Region Policy Optimization