Projected Off-Policy Q-Learning (POP-QL) for Stabilizing Offline Reinforcement Learning