Q-Distribution guided Q-learning for offline reinforcement learning: Uncertainty penalized Q-value via consistency model