Offline Reinforcement Learning with Closed-Form Policy Improvement Operators