Seek Commonality but Preserve Differences: Dissected Dynamics Modeling for Multi-modal Visual RL

May-28-2025, 07:09:35 GMT–Neural Information Processing Systems

Accurate environment dynamics modeling is crucial for obtaining effective state representations in visual reinforcement learning (RL) applications. However, when facing multiple input modalities, existing dynamics modeling methods (e.g., Deep-MDP) usually stumble in addressing the complex and volatile relationship between different modalities. In this paper, we study the problem of efficient dynamics modeling for multi-modal visual RL. We find that under the existence of modality heterogeneity, modality-correlated and distinct features are equally important but play different roles in reflecting the evolution of environmental dynamics. Motivated by this fact, we propose Dissected Dynamics Modeling (DDM), a novel multi-modal dynamics modeling method for visual RL.

machine learning, natural language, reinforcement learning, (15 more...)

Neural Information Processing Systems

May-28-2025, 07:09:35 GMT

Conferences PDF

Add feedback

Country:
- Asia > China (0.14)

Genre:
- Research Report
  - Experimental Study (1.00)
  - New Finding (1.00)

Industry:
- Information Technology (0.68)
- Transportation > Ground
  - Road (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Neural Networks (1.00)
    - Reinforcement Learning (0.90)
  - Natural Language (0.93)
  - Representation & Reasoning (1.00)
  - Robots (1.00)
  - Vision (1.00)