Finite-Sample Convergence Bounds for Trust Region Policy Optimization in Mean-Field Games