escort
ESCORT: Efficient Stein-variational and Sliced Consistency-Optimized Temporal Belief Representation for POMDPs
Zhang, Yunuo, Luo, Baiting, Mukhopadhyay, Ayan, Karsai, Gabor, Dubey, Abhishek
In Partially Observable Markov Decision Processes (POMDPs), maintaining and updating belief distributions over possible underlying states provides a principled way to summarize action-observation history for effective decision-making under uncertainty. As environments grow more realistic, belief distributions develop complexity that standard mathematical models cannot accurately capture, creating a fundamental challenge in maintaining representational accuracy. Despite advances in deep learning and probabilistic modeling, existing POMDP belief approximation methods fail to accurately represent complex uncertainty structures such as high-dimensional, multi-modal belief distributions, resulting in estimation errors that lead to suboptimal agent behaviors. To address this challenge, we present ESCORT (Efficient Stein-variational and sliced Consistency-Optimized Representation for Temporal beliefs), a particle-based framework for capturing complex, multi-modal distributions in high-dimensional belief spaces. ESCORT extends SVGD with two key innovations: correlation-aware projections that model dependencies between state dimensions, and temporal consistency constraints that stabilize updates while preserving correlation structures. This approach retains SVGD's attractive-repulsive particle dynamics while enabling accurate modeling of intricate correlation patterns. Unlike particle filters prone to degeneracy or parametric methods with fixed representational capacity, ESCORT dynamically adapts to belief landscape complexity without resampling or restrictive distributional assumptions. We demonstrate ESCORT's effectiveness through extensive evaluations on both POMDP domains and synthetic multi-modal distributions of varying dimensionality, where it consistently outperforms state-of-the-art methods in terms of belief approximation accuracy and downstream decision quality.
Optimally Solving Colored Generalized Sliding-Tile Puzzles: Complexity and Bounds
The Generalized Sliding-Tile Puzzle (GSTP), allowing many square tiles on a board to move in parallel while enforcing natural geometric collision constraints on the movement of neighboring tiles, provide a high-fidelity mathematical model for many high-utility existing and future multi-robot applications, e.g., at mobile robot-based warehouses or autonomous garages. Motivated by practical relevance, this work examines a further generalization of GSTP called the Colored Generalized Sliding-Tile Puzzle (CGSP), where tiles can now assume varying degrees of distinguishability, a common occurrence in the aforementioned applications. Our study establishes the computational complexity of CGSP and its key sub-problems under a broad spectrum of possible conditions and characterizes solution makespan lower and upper bounds that differ by at most a logarithmic factor. These results are further extended to higher-dimensional versions of the puzzle game.
f-GANs in an Information Geometric Nutshell
Richard Nock, Zac Cranko, Aditya K. Menon, Lizhen Qu, Robert C. Williamson
Nowozin et al showed last year how to extend the GAN principle to all f-divergences. The approach is elegant but falls short of a full description of the supervised game, and says little about the key player, the generator: for example, what does the generator actually converge to if solving the GAN game means convergence in some space of parameters? How does that provide hints on the generator's design and compare to the flourishing but almost exclusively experimental literature on the subject? In this paper, we unveil a broad class of distributions for which such convergence happens -- namely, deformed exponential families, a wide superset of exponential families --. We show that current deep architectures are able to factorize a very large number of such densities using an especially compact design, hence displaying the power of deep architectures and their concinnity in the f-GAN game. This result holds given a sufficient condition on activation functions -- which turns out to be satisfied by popular choices. The key to our results is a variational generalization of an old theorem that relates the KL divergence between regular exponential families and divergences between their natural parameters. We complete this picture with additional results and experimental insights on how these results may be used to ground further improvements of GAN architectures, via (i) a principled design of the activation functions in the generator and (ii) an explicit integration of proper composite losses' link function in the discriminator.
On Computing Makespan-Optimal Solutions for Generalized Sliding-Tile Puzzles
In the $15$-puzzle game, $15$ labeled square tiles are reconfigured on a $4\times 4$ board through an escort, wherein each (time) step, a single tile neighboring it may slide into it, leaving the space previously occupied by the tile as the new escort. We study a generalized sliding-tile puzzle (GSTP) in which (1) there are $1+$ escorts and (2) multiple tiles can move synchronously in a single time step. Compared with popular discrete multi-agent/robot motion models, GSTP provides a more accurate model for a broad array of high-utility applications, including warehouse automation and autonomous garage parking, but is less studied due to the more involved tile interactions. In this work, we analyze optimal GSTP solution structures, establishing that computing makespan-optimal solutions for GSTP is NP-complete and developing polynomial time algorithms yielding makespans approximating the minimum with expected/high probability constant factors, assuming randomized start and goal configurations.
Reinforcement learning for multi-item retrieval in the puzzle-based storage system
He, Jing, Liu, Xinglu, Duan, Qiyao, Chan, Wai Kin Victor, Qi, Mingyao
Nowadays, fast delivery services have created the need for high-density warehouses. The puzzle-based storage system is a practical way to enhance the storage density, however, facing difficulties in the retrieval process. In this work, a deep reinforcement learning algorithm, specifically the Double&Dueling Deep Q Network, is developed to solve the multi-item retrieval problem in the system with general settings, where multiple desired items, escorts, and I/O points are placed randomly. Additionally, we propose a general compact integer programming model to evaluate the solution quality. Extensive numerical experiments demonstrate that the reinforcement learning approach can yield high-quality solutions and outperforms three related state-of-the-art heuristic algorithms. Furthermore, a conversion algorithm and a decomposition framework are proposed to handle simultaneous movement and large-scale instances respectively, thus improving the applicability of the PBS system.
Defensive Escort Teams via Multi-Agent Deep Reinforcement Learning
Garg, Arpit, Hasan, Yazied A., Yaรฑez, Adam, Tapia, Lydia
-- Coordinated defensive escorts can aid a navigating payload by positioning themselves in order to maintain the safety of the payload from obstacles. In this paper, we present a novel, end-to-end solution for coordinating an escort team for protecting high-value payloads. Our solution employs deep reinforcement learning (RL) in order to train a team of escorts to maintain payload safety while navigating alongside the payload. This is done in a distributed fashion, relying only on limited range positional information of other escorts, the payload, and the obstacles. When compared to a state-of-art algorithm for obstacle avoidance, our solution with a single escort increases navigation success up to 31%. Additionally, escort teams increase success rate by up to 75% percent over escorts in static formations. We also show that this learned solution is general to several adaptations in the scenario including: a changing number of escorts in the team, changing obstacle density, and changes in payload conformation. Successful navigation in crowded scenarios often requires assuming a nonzero collision probability between the agent and stochastic obstacles [1]. This required assumption of risk is potentially frightening given the value of cargo that modern autonomous agents will be transporting, e.g., human life.
Artificial Intelligence to escort your thought in learning and deciding
The same is told about learning. The more knowledge and agility of mind a person has -- the faster one can operate in rapidly changing environments under pressure. Intelligence is the natural way to employ your mind to solve problems and make decisions. Before all of this, a person acquires knowledge, data or information -- different shades of the same thing. Human intelligence goes for both competently.