Learning robust control for LQR systems with multiplicative noise via policy gradient