hierarchical approach
24cceab7ffc1118f5daaace13c670885-Supplemental.pdf
A.1 Algorithm The code is available at https://github.com/mklissa/MOC. A.2 Tabular experiments A.2.1 Implementation Details For our experiments of the FourRooms domain we based our implementation on [Bacon et al., 2016] and ran the experiments for 500 episodes that last a maximum of 1000 steps with goal located in the right hallway. In the first experiment we verify whether learning a fixed set of options can be accelerated by our method. We define this fixed set as the hallway options from Sutton et al. [1999b]. As the policies of these options were deterministic and we use importance sampling, we relax them to stochastic policies where the most likely action is the one leading to a hallway.
MARL Warehouse Robots
Allman, Price, Thang, Lian, Simmons, Dre, Riaz, Salmon
Our research investigates the complex task of multiple autonomous agents learning to coordinate and deliver packages in warehouse environments--a problem requiring implicit communication, collision avoidance, and efficient task allocation without centralized control. Traditional warehouse automation relies on centralized planning systems that face scalability limitations; multi-agent reinforcement learning (MARL) offers an alternative through decentralized learned policies, but requires solving the credit assignment problem. We compare MARL algorithms on warehouse coordination: QMIX [Rashid et al., 2018] (value decomposition), IPPO (independent learning), and MASAC (centralized critic). Our study progresses from MPE for validation to RWARE for warehouse evaluation, culminating in Unity 3D deployment where agents demonstrate learned package delivery behavior. QMIX emerged as the best performer after systematic comparison. Our contributions: (1) hyperparameter analysis showing default configurations fail on sparse-reward warehouse tasks, (2) comparative evaluation across algorithms and scales, (3) Unity ML-Agents integration demonstrating sim-to-sim transfer with successful package delivery, and (4) identification of scaling challenges. Full experimental details and results are documented in our Quarto documentation book. 1
Reinforcement Learning with Anticipation: A Hierarchical Approach for Long-Horizon Tasks
Solving long-horizon goal-conditioned tasks remains a significant challenge in reinforcement learning (RL). Hierarchical reinforcement learning (HRL) addresses this by decomposing tasks into more manageable sub-tasks, but the automatic discovery of the hierarchy and the joint training of multi-level policies often suffer from instability and can lack theoretical guarantees. In this paper, we introduce Reinforcement Learning with Anticipation (RLA), a principled and potentially scalable framework designed to address these limitations. The RLA agent learns two synergistic models: a low-level, goal-conditioned policy that learns to reach specified subgoals, and a high-level anticipation model that functions as a planner, proposing intermediate subgoals on the optimal path to a final goal. The key feature of RLA is the training of the anticipation model, which is guided by a principle of value geometric consistency, regularized to prevent degenerate solutions. We present proofs that RLA approaches the globally optimal policy under various conditions, establishing a principled and convergent method for hierarchical planning and execution in long-horizon goal-conditioned tasks.
Compression versus Accuracy: A Hierarchy of Lifted Models
Speller, Jan, Luttermann, Malte, Gehrke, Marcel, Braun, Tanya
Probabilistic graphical models that encode indistinguishable objects and relations among them use first-order logic constructs to compress a propositional factorised model for more efficient (lifted) inference. To obtain a lifted representation, the state-of-the-art algorithm Advanced Colour Passing (ACP) groups factors that represent matching distributions. In an approximate version using $\varepsilon$ as a hyperparameter, factors are grouped that differ by a factor of at most $(1\pm \varepsilon)$. However, finding a suitable $\varepsilon$ is not obvious and may need a lot of exploration, possibly requiring many ACP runs with different $\varepsilon$ values. Additionally, varying $\varepsilon$ can yield wildly different models, leading to decreased interpretability. Therefore, this paper presents a hierarchical approach to lifted model construction that is hyperparameter-free. It efficiently computes a hierarchy of $\varepsilon$ values that ensures a hierarchy of models, meaning that once factors are grouped together given some $\varepsilon$, these factors will be grouped together for larger $\varepsilon$ as well. The hierarchy of $\varepsilon$ values also leads to a hierarchy of error bounds. This allows for explicitly weighing compression versus accuracy when choosing specific $\varepsilon$ values to run ACP with and enables interpretability between the different models.
A hierarchical approach for assessing the vulnerability of tree-based classification models to membership inference attack
Machine learning models can inadvertently expose confidential properties of their training data, making them vulnerable to membership inference attacks (MIA). While numerous evaluation methods exist, many require computationally expensive processes, such as training multiple shadow models. This article presents two new complementary approaches for efficiently identifying vulnerable tree-based models: an ante-hoc analysis of hyperparameter choices and a post-hoc examination of trained model structure. While these new methods cannot certify whether a model is safe from MIA, they provide practitioners with a means to significantly reduce the number of models that need to undergo expensive MIA assessment through a hierarchical filtering approach. More specifically, it is shown that the rank order of disclosure risk for different hyperparameter combinations remains consistent across datasets, enabling the development of simple, human-interpretable rules for identifying relatively high-risk models before training. While this ante-hoc analysis cannot determine absolute safety since this also depends on the specific dataset, it allows the elimination of unnecessarily risky configurations during hyperparameter tuning. Additionally, computationally inexpensive structural metrics serve as indicators of MIA vulnerability, providing a second filtering stage to identify risky models after training but before conducting expensive attacks. Empirical results show that hyperparameter-based risk prediction rules can achieve high accuracy in predicting the most at risk combinations of hyperparameters across different tree-based model types, while requiring no model training. Moreover, target model accuracy is not seen to correlate with privacy risk, suggesting opportunities to optimise model configurations for both performance and privacy.
VQA-Levels: A Hierarchical Approach for Classifying Questions in VQA
Madaka, Madhuri Latha, Bhagvati, Chakravarthy
Designing datasets for Visual Question Answering (VQA) is a difficult and complex task that requires NLP for parsing and computer vision for analysing the relevant aspects of the image for answering the question asked. Several benchmark datasets have been developed by researchers but there are many issues with using them for methodical performance tests. This paper proposes a new benchmark dataset -- a pilot version called VQA-Levels is ready now -- for testing VQA systems systematically and assisting researchers in advancing the field. The questions are classified into seven levels ranging from direct answers based on low-level image features (without needing even a classifier) to those requiring high-level abstraction of the entire image content. The questions in the dataset exhibit one or many of ten properties. Each is categorised into a specific level from 1 to 7. Levels 1 - 3 are directly on the visual content while the remaining levels require extra knowledge about the objects in the image. Each question generally has a unique one or two-word answer. The questions are 'natural' in the sense that a human is likely to ask such a question when seeing the images. An example question at Level 1 is, ``What is the shape of the red colored region in the image?" while at Level 7, it is, ``Why is the man cutting the paper?". Initial testing of the proposed dataset on some of the existing VQA systems reveals that their success is high on Level 1 (low level features) and Level 2 (object classification) questions, least on Level 3 (scene text) followed by Level 6 (extrapolation) and Level 7 (whole scene analysis) questions. The work in this paper will go a long way to systematically analyze VQA systems.
Hierarchical Trajectory (Re)Planning for a Large Scale Swarm
Pan, Lishuo, Wang, Yutong, Ayanian, Nora
We consider the trajectory replanning problem for a large-scale swarm in a cluttered environment. Our path planner replans for robots by utilizing a hierarchical approach, dividing the workspace, and computing collision-free paths for robots within each cell in parallel. Distributed trajectory optimization generates a deadlock-free trajectory for efficient execution and maintains the control feasibility even when the optimization fails. Our hierarchical approach combines the benefits of both centralized and decentralized methods, achieving a high task success rate while providing real-time replanning capability. Compared to decentralized approaches, our approach effectively avoids deadlocks and collisions, significantly increasing the task success rate. We demonstrate the real-time performance of our algorithm with up to 142 robots in simulation, and a representative 24 physical Crazyflie nano-quadrotor experiment.
Trade-offs of Dynamic Control Structure in Human-swarm Systems
Kelly, Thomas G., Soorati, Mohammad D., Zauner, Klaus-Peter, Ramchurn, Sarvapali D., Tarapore, and Danesh
Swarm robotics is a study of simple robots that exhibit complex behaviour only by interacting locally with other robots and their environment. The control in swarm robotics is mainly distributed whereas centralised control is widely used in other fields of robotics. Centralised and decentralised control strategies both pose a unique set of benefits and drawbacks for the control of multi-robot systems. While decentralised systems are more scalable and resilient, they are less efficient compared to the centralised systems and they lead to excessive data transmissions to the human operators causing cognitive overload. We examine the trade-offs of each of these approaches in a human-swarm system to perform an environmental monitoring task and propose a flexible hybrid approach, which combines elements of hierarchical and decentralised systems. We find that a flexible hybrid system can outperform a centralised system (in our environmental monitoring task by 19.2%) while reducing the number of messages sent to a human operator (here by 23.1%). We conclude that establishing centralisation for a system is not always optimal for performance and that utilising aspects of centralised and decentralised systems can keep the swarm from hindering its performance.
Hierarchical Delay Attribution Classification using Unstructured Text in Train Management Systems
Borg, Anton, Lingvall, Per, Svensson, Martin
EU directives stipulate a systematic follow-up of train delays. In Sweden, the Swedish Transport Administration registers and assigns an appropriate delay attribution code. However, this delay attribution code is assigned manually, which is a complex task. In this paper, a machine learning-based decision support for assigning delay attribution codes based on event descriptions is investigated. The text is transformed using TF-IDF, and two models, Random Forest and Support Vector Machine, are evaluated against a random uniform classifier and the classification performance of the Swedish Transport Administration. Further, the problem is modeled as both a hierarchical and flat approach. The results indicate that a hierarchical approach performs better than a flat approach. Both approaches perform better than the random uniform classifier but perform worse than the manual classification.