Balancing policy constraint and ensemble size in uncertainty-based offline reinforcement learning