Goto

Collaborating Authors

 Reinforcement Learning




Finite-Time Analysis for Double Q-learning

Neural Information Processing Systems

Theoretical performance of Q-learning has also been intensively explored. The asymptotic convergence has been established in Tsitsiklis (1994); Jaakkola et al. (1994); Borkar and Meyn (2000); Melo (2001); Lee and He (2019).







SUPPLEMENTARY MATERIAL Deep Reinforcement Learning with Stacked Hierarchical Attention for T based Games

Neural Information Processing Systems

Figure 1 shows an example of the raw interface of the game "ztuu", where raw textual observations In this section, we show the first 15 interaction steps of two games: "zork1" and "ztuu". C h o s e n a c t i o n a n d r e w a r d A c t i o n: w e s t Reward: 0 | S c o r e: 0 ===== S t e p 2 ===== ===== 1 . C h o s e n a c t i o n a n d r e w a r d A c t i o n: s o u t h Reward: 0 | S c o r e: 0 ===== S t e p 3 ===== 16 ===== 1 . C h o s e n a c t i o n a n d r e w a r d A c t i o n: s o u t h Reward: 0 | S c o r e: 0 ===== S t e p 4 ===== ===== 1 . C h o s e n a c t i o n a n d r e w a r d A c t i o n: w e s t Reward: 0 | S c o r e: 0 ===== S t e p 5 ===== ===== 1 .