Goto

Collaborating Authors

 Reinforcement Learning



Appendix

Neural Information Processing Systems

Inthis section, we provide additional discussions of applying decision-focused learning toMDPs problems. Specifically, the assumption on smooth policy is similar to the idea of soft Q-learning [12] and soft actor-critic [13]proposed by Haarnoja et al. The randomly initiated neural network uses ReLU layers asnonlinearity followed byalinear layer intheend. Training parameters Across all three examples, we consider the discounted setting where the discount factor isγ = 0.95. Torelax the optimal policygivenbythe RL solver,we relax the Bellman equation used to run value-iteration by relaxing all the argmax and max operators in theBellman equation tosoftmax with temperature0.1,i.e., weuseSOFTMAX(0.1 Q-values)to replace all the argmax over Q values.





DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning Hao Bai 1,2 Yifei Zhou

Neural Information Processing Systems

While training with static demonstrations has shown some promise, we show that such methods fall short for controlling real GUIs due to their failure to deal with real world stochasticity and non-stationarity not captured in static observational data.




TheNetHackLearningEnvironment

Neural Information Processing Systems

As advocated by [39, 38, 18], procedurally generated environments are a promising direction for testing systematic generalization of RL agents.


Checklist

Neural Information Processing Systems

The checklist follows the references. Please do not modify the questions and only use the provided macros for your answers. Checklist section does not count towards the page limit. Do the main claims made in the abstract and introduction accurately reflect the paper's Did you describe the limitations of your work? Did you discuss any potential negative societal impacts of your work?