Markov Models
Export Reviews, Discussions, Author Feedback and Meta-Reviews
If you google ``fully adapted particle filters'' you will find a lot more material. The authors have considered four different and all relevant application examples. The experimental section shows that the iFDM seems to work and that it can provide interesting results. The only comparison provided is against the FFBS-type algorithm, which we know will perform worse due to its construction. I know that it is a lot of work to implement other solutions to the problem, but if one were to do so it would probably provide an even better understanding of the performance of the model and it would be interesting to see the performance of existing solution to these problems. For example, for the multitarget tracking example, the simplest solution to this problem would probably be to use an extended Kalman filter together with nearest neighbour data association. Since your targets are very well separated I would expect this solution to perform quite well. It would be interesting to compare your performance against this simple standard solution. I have not worked with the cocktail party problem and the multiuser detection problems, but for the power disaggregation problem there are interesting solutions available, see for example the following NIPS paper (which is gaining some influence): Kolter, J. Z.; Batra, S.; and Ng, A. Y. Energy disaggregation via discriminative sparse coding.
The Yokai Learning Environment: Tracking Beliefs Over Space and Time
Ruhdorfer, Constantin, Bortoletto, Matteo, Bulling, Andreas
Developing collaborative AI hinges on Theory of Mind (ToM) - the ability to reason about the beliefs of others to build and maintain common ground. Existing ToM benchmarks, however, are restricted to passive observer settings or lack an assessment of how agents establish and maintain common ground over time. To address these gaps, we introduce the Yokai Learning Environment (YLE) - a multi-agent reinforcement learning (RL) environment based on the cooperative card game Yokai. In the YLE, agents take turns peeking at hidden cards and moving them to form clusters based on colour. Success requires tracking evolving beliefs, remembering past observations, using hints as grounded communication, and maintaining common ground with teammates. Our evaluation yields two key findings: First, current RL agents struggle to solve the YLE, even when given access to perfect memory. Second, while belief modelling improves performance, agents are still unable to effectively generalise to unseen partners or form accurate beliefs over longer games, exposing a reliance on brittle conventions rather than robust belief tracking. We use the YLE to investigate research questions in belief modelling, memory, partner generalisation, and scaling to higher-order ToM.
Recent Advances in Transformer and Large Language Models for UAV Applications
Kheddar, Hamza, Habchi, Yassine, Ghanem, Mohamed Chahine, Hemis, Mustapha, Niyato, Dusit
The rapid advancement of Transformer-based models has reshaped the landscape of uncrewed aerial vehicle (UAV) systems by enhancing perception, decision-making, and autonomy. This review paper systematically categorizes and evaluates recent developments in Transformer architectures applied to UAVs, including attention mechanisms, CNN-Transformer hybrids, reinforcement learning Transformers, and large language models (LLMs). Unlike previous surveys, this work presents a unified taxonomy of Transformer-based UAV models, highlights emerging applications such as precision agriculture and autonomous navigation, and provides comparative analyses through structured tables and performance benchmarks. The paper also reviews key datasets, simulators, and evaluation metrics used in the field. Furthermore, it identifies existing gaps in the literature, outlines critical challenges in computational efficiency and real-time deployment, and offers future research directions. This comprehensive synthesis aims to guide researchers and practitioners in understanding and advancing Transformer-driven UAV technologies.
Centralized Permutation Equivariant Policy for Cooperative Multi-Agent Reinforcement Learning
Xu, Zhuofan, Bollig, Benedikt, Fรผgger, Matthias, Nowak, Thomas, Drรฉau, Vincent Le
The Centralized Training with Decentralized Execution (CTDE) paradigm has gained significant attention in multi-agent reinforcement learning (MARL) and is the foundation of many recent algorithms. However, decentralized policies operate under partial observability and often yield suboptimal performance compared to centralized policies, while fully centralized approaches typically face scalability challenges as the number of agents increases. We propose Centralized Permutation Equivariant (CPE) learning, a centralized training and execution framework that employs a fully centralized policy to overcome these limitations. Our approach leverages a novel permutation equivariant architecture, Global-Local Permutation Equivariant (GLPE) networks, that is lightweight, scalable, and easy to implement. Experiments show that CPE integrates seamlessly with both value decomposition and actor-critic methods, substantially improving the performance of standard CTDE algorithms across cooperative benchmarks including MPE, SMAC, and RWARE, and matching the performance of state-of-the-art RWARE implementations.
SLAC: Simulation-Pretrained Latent Action Space for Whole-Body Real-World RL
Hu, Jiaheng, Stone, Peter, Martรญn-Martรญn, Roberto
Building capable household and industrial robots requires mastering the control of versatile, high-degree-of-freedom (DoF) systems such as mobile manipulators. While reinforcement learning (RL) holds promise for autonomously acquiring robot control policies, scaling it to high-DoF embodiments remains challenging. Direct RL in the real world demands both safe exploration and high sample efficiency, which are difficult to achieve in practice. Sim-to-real RL, on the other hand, is often brittle due to the reality gap. This paper introduces SLAC, a method that renders real-world RL feasible for complex embodiments by leveraging a low-fidelity simulator to pretrain a task-agnostic latent action space. SLAC trains this latent action space via a customized unsupervised skill discovery method designed to promote temporal abstraction, disentanglement, and safety, thereby facilitating efficient downstream learning. Once a latent action space is learned, SLAC uses it as the action interface for a novel off-policy RL algorithm to autonomously learn downstream tasks through real-world interactions. We evaluate SLAC against existing methods on a suite of bimanual mobile manipulation tasks, where it achieves state-of-the-art performance. Notably, SLAC learns contact-rich whole-body tasks in under an hour of real-world interactions, without relying on any demonstrations or hand-crafted behavior priors. More information and robot videos at robo-rl.github.io
From Shadows to Safety: Occlusion Tracking and Risk Mitigation for Urban Autonomous Driving
Moller, Korbinian, Schwarzmeier, Luis, Betz, Johannes
-- Autonomous vehicles (A Vs) must navigate dynamic urban environments where occlusions and perception limitations introduce significant uncertainties. This research builds upon and extends existing approaches in risk-aware motion planning and occlusion tracking to address these challenges. While prior studies have developed individual methods for occlusion tracking and risk assessment, a comprehensive method integrating these techniques has not been fully explored. We, therefore, enhance a phantom agent-centric model by incorporating sequential reasoning to track occluded areas and predict potential hazards. Our model enables realistic scenario representation and context-aware risk evaluation by modeling diverse phantom agents, each with distinct behavior profiles. Simulations demonstrate that the proposed approach improves situational awareness and balances proactive safety with efficient traffic flow. While these results underline the potential of our method, validation in real-world scenarios is necessary to confirm its feasibility and generalizability. By utilizing and advancing established methodologies, this work contributes to safer and more reliable A V planning in complex urban environments. T o support further research, our method is available as open-source software at https://github.com/