Goto

Collaborating Authors

 sac


Variational Inference with Tail-adaptive f-Divergence

Dilin Wang, Hao Liu, Qiang Liu

Neural Information Processing Systems

However, estimating and optimizingα-divergences require to use importance sampling, which may havelarge orinfinite variance due to heavy tails ofimportance weights.


Real-Time Reinforcement Learning

Simon Ramstedt, Chris Pal

Neural Information Processing Systems

While it is well suited to describe turn-based decision problems such as board games, this framework is ill suited for real-time applications in which the environment's state continues to evolve while the agent selects an action (Travnik et al., 2018). Nevertheless, this framework hasbeen used forreal-time problems using what areessentially tricks, e.g.


cf5a019ae9c11b4be88213ce3f85d85c-Paper-Conference.pdf

Neural Information Processing Systems

Here, we focus on a more practical setting in object rearrangement,i.e., rearranging objects from shuffled layouts to a normative target distribution without explicit goal specification. However, it remains challenging for AI agents, as it is hard to describe the target distribution (goal specification) for reward engineering or collect expert trajectories as demonstrations. Hence, it is infeasible to directly employ reinforcement learning or imitation learning algorithms to address the task. This paper aims to search for a policy only with a set of examples from a target distribution instead of a handcrafted reward function. We employ the score-matching objectiveto train aTargetGradientField (TarGF),indicating a direction on each object to increase the likelihood of the target distribution.






Value Function Decompositionfor Iterative Designof Reinforcement Learning Agents

Neural Information Processing Systems

In BW, an include: areforwardprogress, failur ), acostcontr ), ashapingrehead). Require:Experience B; twinQ-function 1, 2 (with parameters 1, 2; policyparameter ; discount ; entrop ; learningrates q, ; targetnetw ; Boolean 1: Sampletransition(s, a, r,0) B.r2Rm is 2: Samplepolica0 ( |s0; )andu ( |s; ) 3: rm+1 log (a0|s0; ).Extend 4: j argmin