Goto

Collaborating Authors

 Reinforcement Learning


884d247c6f65a96a7da4d1105d584ddd-Paper.pdf

Neural Information Processing Systems

DDPG [24]extends Q-learning to continuous control based on the Deterministic Policy Gradient [31] algorithm, which learns a deterministic policyπ(s;φ) parameterized byφto maximize the Q-function to approximate themaxoperator.








Deep Recurrent Optimal Stopping

Neural Information Processing Systems

Deep neural networks (DNNs) have recently emerged as a powerful paradigm for solving Markovian optimal stopping problems. However, a ready extension of DNN-based methods to non-Markovian settings requires significant state and parameter space expansion, manifesting the curse of dimensionality.


Outcome-DrivenReinforcementLearningvia VariationalInference

Neural Information Processing Systems

Standard reinforcement learning (RL) addresses reward maximization in a Markov decision process (MDP) defined by the tuple(S,A,pS0,pd,r,γ) [43, 44], where S and A denote the state and action space, respectively,p0 denotes the initial state distribution,pd is a state transition distribution, r is an immediate reward function, andγ is a discount factor. To sample trajectories, an initial state is sampled according topS0, and successive states are sampled from the state transition distributionSt+1 pd( |st,at) and actions from a policyAt π( |st).