Goto

Collaborating Authors

 latent world model


CoIRL-AD: Collaborative-Competitive Imitation-Reinforcement Learning in Latent World Models for Autonomous Driving

Zheng, Xiaoji, Yang, Ziyuan, Chen, Yanhao, Peng, Yuhang, Tang, Yuanrong, Liu, Gengyuan, Chen, Bokui, Gong, Jiangtao

arXiv.org Artificial Intelligence

End-to-end autonomous driving models trained solely with imitation learning (IL) often suffer from poor generalization. In contrast, reinforcement learning (RL) promotes exploration through reward maximization but faces challenges such as sample inefficiency and unstable convergence. A natural solution is to combine IL and RL. Moving beyond the conventional two-stage paradigm (IL pretraining followed by RL fine-tuning), we propose CoIRL-AD, a competitive dual-policy framework that enables IL and RL agents to interact during training. CoIRL-AD introduces a competition-based mechanism that facilitates knowledge exchange while preventing gradient conflicts. Experiments on the nuScenes dataset show an 18% reduction in collision rate compared to baselines, along with stronger generalization and improved performance on long-tail scenarios. Code is available at: https://github.com/SEU-zxj/CoIRL-AD.


Review for NeurIPS paper: Latent World Models For Intrinsically Motivated Exploration

Neural Information Processing Systems

Summary and Contributions: The paper proposes a novel method to the address the problem of exploration in RL. It is know problem in RL that sparse rewards make random exploration _very_ inefficient. One approach for overcoming such limitations is using intrinsic motivation methods, building an auxiliary reward signal to encourage an agent to seek novel or rare states, for example proportional to inverse visit counts or, as proposed in this paper, some prediction error. Prediction error as a measure of novely can by heaviliy affected by three types of uncertainty by sources: 1. from novelty (epistemic) -- this is the signal we are typically after. This propose a belief state formulation that the authors claim is not too sensitivity to stochasticity and has the ability to extrapolate the state dynamics, such that the prediction error can be a genuine measurement for novelty.


Review for NeurIPS paper: Latent World Models For Intrinsically Motivated Exploration

Neural Information Processing Systems

All reviewers unanimously agree that this paper should be accepted to NeurIPS. The authors did a great job addressing almost all of the reviewer's concerns, leading to three reviewers increasing their score after the author response. Reviewers particularly praised the readability of the paper, the fact that the method is clearly defined, and that the authors did a good job of visually demonstrating how it works. However, the reviewers also agree that CPC Action would be an important baseline to compare to, so I strongly encourage the authors to take the suggested improvements seriously and work towards an improved version of the paper. I am confident that the authors can make the requested changes and am recommending acceptance.


Latent World Models For Intrinsically Motivated Exploration

Neural Information Processing Systems

In this work we consider partially observable environments with sparse rewards. We present a self-supervised representation learning method for image-based observations, which arranges embeddings respecting temporal distance of observations. This representation is empirically robust to stochasticity and suitable for novelty detection from the error of a predictive forward model. We consider episodic and life-long uncertainties to guide the exploration. We propose to estimate the missing information about the environment with the world model, which operates in the learned latent space.


Learning Multiple Probabilistic Decisions from Latent World Model in Autonomous Driving

Xiao, Lingyu, Liu, Jiang-Jiang, Yang, Sen, Li, Xiaofan, Ye, Xiaoqing, Yang, Wankou, Wang, Jingdong

arXiv.org Artificial Intelligence

The autoregressive world model exhibits robust generalization capabilities in vectorized scene understanding but encounters difficulties in deriving actions due to insufficient uncertainty modeling and self-delusion. In this paper, we explore the feasibility of deriving decisions from an autoregressive world model by addressing these challenges through the formulation of multiple probabilistic hypotheses. We propose LatentDriver, a framework models the environment's next states and the ego vehicle's possible actions as a mixture distribution, from which a deterministic control signal is then derived. By incorporating mixture modeling, the stochastic nature of decisionmaking is captured. Additionally, the self-delusion problem is mitigated by providing intermediate actions sampled from a distribution to the world model. Experimental results on the recently released close-loop benchmark Waymax demonstrate that LatentDriver surpasses state-of-the-art reinforcement learning and imitation learning methods, achieving expert-level performance. The code and models will be made available at https://github.com/Sephirex-X/LatentDriver.


Contrastive Variational Reinforcement Learning for Complex Observations

Ma, Xiao, Chen, Siwei, Hsu, David, Lee, Wee Sun

arXiv.org Machine Learning

Model-free reinforcement learning (MFRL) has achieved great success in game playing [1, 2], robot navigation [3, 4] and etc. However, extending existing RL methods to real-world environments remains challenging, because they require long-horizon reasoning with the low-dimensional useful features, e.g., the position of a robot, embedded in high-dimensional complex observations, e.g., visually rich images. Consider a four-legged mini-cheetah robot [5] navigating on the campus. To determine the traversable path, the robot must extract the relevant geometric features that coexist with irrelevant variable backgrounds, such as the moving pedestrians, paintings on the wall, etc. Model-based RL (MBRL), in contrast to the model-free methods, reasons a world model trained by generative learning and greatly improves the sample efficiency of the model-free methods [6, 7, 8]. Recent MBRL methods learn compact latent world models from high-dimensional visual inputs with Variational Autoencoders (VAEs) [9] by optimizing the evidence lower bound (ELBO) of an observation sequence [10, 11]. However, learning a generative model under complex observations is challenging.