Learning Human-Like RL Agents Through Trajectory Optimization With Action Quantization