bisimulation metric
- Asia > Macao (0.14)
- Asia > China > Zhejiang Province > Hangzhou (0.04)
- Asia > China > Hong Kong (0.04)
- Africa > Ethiopia > Addis Ababa > Addis Ababa (0.04)
Learning Causal States Under Partial Observability and Perturbation
Li, Na, Shan, Hangguan, Ni, Wei, Zhang, Wenjie, Li, Xinyu, Wang, Yamin
A critical challenge for reinforcement learning (RL) is making decisions based on incomplete and noisy observations, especially in perturbed and partially observable Markov decision processes (P$^2$OMDPs). Existing methods fail to mitigate perturbations while addressing partial observability. We propose \textit{Causal State Representation under Asynchronous Diffusion Model (CaDiff)}, a framework that enhances any RL algorithm by uncovering the underlying causal structure of P$^2$OMDPs. This is achieved by incorporating a novel asynchronous diffusion model (ADM) and a new bisimulation metric. ADM enables forward and reverse processes with different numbers of steps, thus interpreting the perturbation of P$^2$OMDP as part of the noise suppressed through diffusion. The bisimulation metric quantifies the similarity between partially observable environments and their causal counterparts. Moreover, we establish the theoretical guarantee of CaDiff by deriving an upper bound for the value function approximation errors between perturbed observations and denoised causal states, reflecting a principled trade-off between approximation errors of reward and transition-model. Experiments on Roboschool tasks show that CaDiff enhances returns by at least 14.18\% compared to baselines. CaDiff is the first framework that approximates causal states using diffusion models with both theoretical rigor and practicality.
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- (2 more...)
- Research Report > New Finding (0.93)
- Research Report > Experimental Study (0.92)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Mathematics of Computing (0.68)
- Information Technology > Data Science (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)
- Asia > Macao (0.14)
- Asia > China > Hubei Province > Wuhan (0.04)
- Asia > China > Hong Kong (0.04)
- (4 more...)
- Research Report > Experimental Study (0.46)
- Research Report > New Finding (0.46)
- Leisure & Entertainment > Games > Computer Games (0.46)
- Information Technology (0.46)
- North America > Canada > Quebec > Montreal (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > Virginia > Arlington County > Arlington (0.04)
- Asia > Macao (0.14)
- Asia > China > Zhejiang Province > Hangzhou (0.04)
- Asia > China > Hong Kong (0.04)
- Africa > Ethiopia > Addis Ababa > Addis Ababa (0.04)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- (2 more...)
- Research Report > New Finding (0.93)
- Research Report > Experimental Study (0.92)
- Asia > Macao (0.14)
- Asia > China > Hubei Province > Wuhan (0.04)
- Asia > China > Hong Kong (0.04)
- (4 more...)
- Research Report > Experimental Study (0.46)
- Research Report > New Finding (0.46)
- Leisure & Entertainment > Games > Computer Games (0.68)
- Information Technology (0.46)