A Unified Principle of Pessimism for Offline Reinforcement Learning under Model Mismatch