RVI-SAC: Average Reward Off-Policy Deep Reinforcement Learning