EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL

Open in new window