Reinforcement Learning (RL) agents often learn policies that do not generalise across tasks in which the environmental features and optimal skills are different [des Combes et al., 2018, Garcin et al., 2024].
When artificial agents are jointly trained to perform collaborative tasks using a communication channel, they develop opaque goal-oriented communication protocols.
A well-known dilemma in large vision-language models ( e.g., GPT -4, LLaV A) is that while increasing the number of vision tokens generally enhances visual
With high-dimensional state spaces, visual reinforcement learning (RL) faces significant challenges in exploitation and exploration, resulting in low sample efficiency and training stability.