Goto

Collaborating Authors

 Reinforcement Learning







Supplementary Material for HandMeThat: Human-Robot Communication in Physical and Social Environments Y anming Wan

Neural Information Processing Systems

In Section A, we provide the detailed information for HandMeThat data generation and its textual interface. In Section B, we summarize the statistics of the dataset. Recall that HandMeThat uses an object-centric representation for states. "Location" consists of all non-movable entities. Each class (except for "location") is composed of multiple subclasses, and each subclass contains In total, there are 155 object categories. Each object category is also associated with several attributes.


Value Function Decomposition for Iterative Design of Reinforcement Learning Agents

Neural Information Processing Systems

Despite these successes, applying RL techniques to complex control problems remains a daunting undertaking, where initial attempts often result in underwhelming performance.


Appendix: Continuous Doubly Constrained Batch Reinforcement Learning A Experiment Details Evaluation Procedure

Neural Information Processing Systems

Since the Bellman-evaluation operator is also a contraction under standard conditions [3, 8, 31], our overall argument remains otherwise intact.D.2 Proof of Theorem 2.