Dynamic Bottleneck for Robust Self-Supervised Exploration

Bai, Chenjia, Wang, Lingxiao, Han, Lei, Garg, Animesh, Hao, Jianye, Liu, Peng, Wang, Zhaoran

Oct-25-2021–arXiv.org Artificial Intelligence

Exploration methods based on pseudo-count of transitions or curiosity of dynamics have achieved promising results in solving reinforcement learning with sparse rewards. However, such methods are usually sensitive to environmental dynamics-irrelevant information, e.g., white-noise. To handle such dynamics-irrelevant information, we propose a Dynamic Bottleneck (DB) model, which attains a dynamics-relevant representation based on the information-bottleneck principle. Based on the DB model, we further propose DB-bonus, which encourages the agent to explore state-action pairs with high information gain. We establish theoretical connections between the proposed DB-bonus, the upper confidence bound (UCB) for linear case, and the visiting count for tabular case. We evaluate the proposed method on Atari suits with dynamics-irrelevant noises. Our experiments show that exploration with DB bonus outperforms several state-of-the-art exploration methods in noisy environments.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

Oct-25-2021

arXiv.org PDF

Add feedback

Country:
- Asia > China (0.14)
- North America > Canada
  - Ontario > Toronto (0.14)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Energy > Oil & Gas (0.45)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.35)
- Leisure & Entertainment > Games (0.30)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning
      - Learning Graphical Models > Directed Networks
        Bayesian Learning (0.92)
      - Neural Networks > Deep Learning (0.92)
      - Reinforcement Learning (0.67)
      - Statistical Learning (1.00)
    - Representation & Reasoning > Uncertainty
      - Bayesian Inference (0.67)
  - Data Science (0.92)