Combining information-seeking exploration and reward maximization: Unified inference on continuous state and action spaces under partial observability