Q-Learning in enormous action spaces via amortized approximate maximization