Goto

Collaborating Authors

 Reinforcement Learning



GALOIS: Boosting Deep Reinforcement Learning via Generalizable Logic Synthesis

Neural Information Processing Systems

Figure 2: The illustration of knowledge reused from DoorKey to BoxKey. BoxKey As shown in Figure 1b, different from DoorKey, it has to open the box to get the key. Thus the learned program is color-agnostic (i.e., the agent's policy would remain robust no matter The valuation vector representations are fed to all the methods as input. The reward from the MiniGrid environment is sparse (i.e., only a positive reward will be given after We use a batch size of 256. The code is available at: https://github.com/caoysh/GALOIS









Finite-Time Analysis of Adaptive Temporal Difference Learning with Deep Neural Networks

Neural Information Processing Systems

However, from the theoretical perspective, establishing theoretical convergence guarantees for training DNNs is much more complicated than that for the linear approximation algorithms, which is still widely open.