Trust Region Reward Optimization and Proximal Inverse Reward Optimization Algorithm