Markov Models
Shared Control of Holonomic Wheelchairs through Reinforcement Learning
Bรคhler, Jannis, Paez-Granados, Diego, Peรฑa-Queralta, Jorge
--Smart electric wheelchairs can improve user experience by supporting the driver with shared control. State-of-the-art work showed the potential of shared control in improving safety in navigation for non-holonomic robots. However, for holonomic systems, current approaches often lead to unintuitive behavior for the user and fail to utilize the full potential of omnidirectional driving. Therefore, we propose a reinforcement learning-based method, which takes a 2D user input and outputs a 3D motion while ensuring user comfort and reducing cognitive load on the driver . Our approach is trained in Isaac Gym and tested in simulation in Gazebo. We compare different RL agent architectures and reward functions based on metrics considering cognitive load and user comfort. We show that our method ensures collision-free navigation while smartly orienting the wheelchair and showing better or competitive smoothness compared to a previous non-learning-based method. We further perform a sim-to-real transfer and demonstrate, to the best of our knowledge, the first real-world implementation of RL-based shared control for an omnidirectional mobility platform.
Novel Multi-Agent Action Masked Deep Reinforcement Learning for General Industrial Assembly Lines Balancing Problems
Ali, Ali Mohamed, Tirel, Luca, Hashim, Hashim A.
Personal use of this material is permitted. Abstract --Efficient planning of activities is essential for modern industrial assembly lines to uphold manufacturing standards, prevent project constraint violations, and achieve cost-effective operations. While exact solutions to such challenges can be obtained through Integer Programming (IP), the dependence of the search space on input parameters often makes IP computationally infeasible for large-scale scenarios. Heuristic methods, such as Genetic Algorithms, can also be applied, but they frequently produce suboptimal solutions in extensive cases. This paper introduces a novel mathematical model of a generic industrial assembly line formulated as a Markov Decision Process (MDP), without imposing assumptions on the type of assembly line a notable distinction from most existing models. The proposed model is employed to create a virtual environment for training Deep Reinforcement Learning (DRL) agents to optimize task and resource scheduling. T o enhance the efficiency of agent training, the paper proposes two innovative tools. The first is an action-masking technique, which ensures the agent selects only feasible actions, thereby reducing training time. The second is a multi-agent approach, where each workstation is managed by an individual agent, as a result, the state and action spaces were reduced. A centralized training framework with decentralized execution is adopted, offering a scalable learning architecture for optimizing industrial assembly lines. This framework allows the agents to learn offline and subsequently provide real-time solutions during operations by leveraging a neural network that maps the current factory state to the optimal action. The effectiveness of the proposed scheme is validated through numerical simulations, demonstrating significantly faster convergence to the optimal solution compared to a comparable model-based approach.
Integrating Reason-Based Moral Decision-Making in the Reinforcement Learning Architecture
Reinforcement Learning is a machine learning methodology that has demonstrated strong performance across a variety of tasks. In particular, it plays a central role in the development of artificial autonomous agents. As these agents become increasingly capable, market readiness is rapidly approaching, which means those agents, for example taking the form of humanoid robots or autonomous cars, are poised to transition from laboratory prototypes to autonomous operation in real-world environments. This transition raises concerns leading to specific requirements for these systems - among them, the requirement that they are designed to behave ethically. Crucially, research directed toward building agents that fulfill the requirement to behave ethically - referred to as artificial moral agents(AMAs) - has to address a range of challenges at the intersection of computer science and philosophy. This study explores the development of reason-based artificial moral agents (RBAMAs). RBAMAs are build on an extension of the reinforcement learning architecture to enable moral decision-making based on sound normative reasoning, which is achieved by equipping the agent with the capacity to learn a reason-theory - a theory which enables it to process morally relevant propositions to derive moral obligations - through case-based feedback. They are designed such that they adapt their behavior to ensure conformance to these obligations while they pursue their designated tasks. These features contribute to the moral justifiability of the their actions, their moral robustness, and their moral trustworthiness, which proposes the extended architecture as a concrete and deployable framework for the development of AMAs that fulfills key ethical desiderata. This study presents a first implementation of an RBAMA and demonstrates the potential of RBAMAs in initial experiments.
Analogy making as amortised model construction
Nagy, David G., Shen, Tingke, Zhou, Hanqi, Wu, Charley M., Dayan, Peter
Humans flexibly construct internal models to navigate novel situations. To be useful, these internal models must be sufficiently faithful to the environment that resource-limited planning leads to adequate outcomes; equally, they must be tractable to construct in the first place. We argue that analogy plays a central role in these processes, enabling agents to reuse solution-relevant structure from past experiences and amortise the computational costs of both model construction (construal) and planning. Formalis-ing analogies as partial homomorphisms between Markov decision processes, we sketch a framework in which abstract modules, derived from previous construals, serve as com-posable building blocks for new ones. This modular reuse allows for flexible adaptation of policies and representations across domains with shared structural essence.
Assessing Adaptive World Models in Machines with Novel Games
Ying, Lance, Collins, Katherine M., Sharma, Prafull, Colas, Cedric, Zhao, Kaiya Ivy, Weller, Adrian, Tavares, Zenna, Isola, Phillip, Gershman, Samuel J., Andreas, Jacob D., Griffiths, Thomas L., Chollet, Francois, Allen, Kelsey R., Tenenbaum, Joshua B.
Human intelligence exhibits a remarkable capacity for rapid adaptation and effective problem-solving in novel and unfamiliar contexts. We argue that this profound adaptability is fundamentally linked to the efficient construction and refinement of internal representations of the environment, commonly referred to as world models, and we refer to this adaptation mechanism as world model induction . However, current understanding and evaluation of world models in artificial intelligence (AI) remains narrow, often focusing on static representations learned from training on massive corpora of data, instead of the efficiency and efficacy in learning these representations through interaction and exploration within a novel environment. In this Perspective, we provide a view of world model induction drawing on decades of research in cognitive science on how humans learn and adapt so efficiently; we then call for a new evaluation framework for assessing adaptive world models in AI. Concretely, we propose a new benchmarking paradigm based on suites of carefully designed games with genuine, deep and continually refreshing novelty in the underlying game structures -- we refer to this class of games as novel games . We detail key desiderata for constructing these games and propose appropriate metrics to explicitly challenge and evaluate the agent's ability for rapid world model induction. We hope that this new evaluation framework will inspire future evaluation efforts on world models in AI and provide a crucial step towards developing AI systems capable of human-like rapid adaptation and robust generalization -- a critical component of artificial general intelligence.
Statistical and Algorithmic Foundations of Reinforcement Learning
Chi, Yuejie, Chen, Yuxin, Wei, Yuting
As a paradigm for sequential decision making in unknown environments, reinforcement learning (RL) has received a flurry of attention in recent years. However, the explosion of model complexity in emerging applications and the presence of nonconvexity exacerbate the challenge of achieving efficient RL in sample-starved situations, where data collection is expensive, time-consuming, or even high-stakes (e.g., in clinical trials, autonomous systems, and online advertising). How to understand and enhance the sample and computational efficacies of RL algorithms is thus of great interest. In this tutorial, we aim to introduce several important algorithmic and theoretical developments in RL, highlighting the connections between new ideas and classical topics. Employing Markov Decision Processes as the central mathematical model, we cover several distinctive RL scenarios (i.e., RL with a simulator, online RL, offline RL, robust RL, and RL with human feedback), and present several mainstream RL approaches (i.e., model-based approach, value-based approach, and policy optimization). Our discussions gravitate around the issues of sample complexity, computational efficiency, as well as algorithm-dependent and information-theoretic lower bounds from a non-asymptotic viewpoint.
Skill Learning via Policy Diversity Yields Identifiable Representations for Reinforcement Learning
Reizinger, Patrik, Mucsรกnyi, Bรกlint, Guo, Siyuan, Eysenbach, Benjamin, Schรถlkopf, Bernhard, Brendel, Wieland
Self-supervised feature learning and pretraining methods in reinforcement learning (RL) often rely on information-theoretic principles, termed mutual information skill learning (MISL). These methods aim to learn a representation of the environment while also incentivizing exploration thereof. However, the role of the representation and mutual information parametrization in MISL is not yet well understood theoretically. Our work investigates MISL through the lens of identifiable representation learning by focusing on the Contrastive Successor Features (CSF) method. We prove that CSF can provably recover the environment's ground-truth features up to a linear transformation due to the inner product parametrization of the features and skill diversity in a discriminative sense. This first identifiability guarantee for representation learning in RL also helps explain the implications of different mutual information objectives and the downsides of entropy regularizers. We empirically validate our claims in MuJoCo and DeepMind Control and show how CSF provably recovers the ground-truth features both from states and pixels.
Better Models and Algorithms for Learning Ising Models from Dynamics
Gaitonde, Jason, Moitra, Ankur, Mossel, Elchanan
We study the problem of learning the structure and parameters of the Ising model, a fundamental model of high-dimensional data, when observing the evolution of an associated Markov chain. A recent line of work has studied the natural problem of learning when observing an evolution of the well-known Glauber dynamics [Bresler, Gamarnik, Shah, IEEE Trans. Inf. Theory 2018, Gaitonde, Mossel STOC 2024], which provides an arguably more realistic generative model than the classical i.i.d. setting. However, this prior work crucially assumes that all site update attempts are observed, \emph{even when this attempt does not change the configuration}: this strong observation model is seemingly essential for these approaches. While perhaps possible in restrictive contexts, this precludes applicability to most realistic settings where we can observe \emph{only} the stochastic evolution itself, a minimal and natural assumption for any process we might hope to learn from. However, designing algorithms that succeed in this more realistic setting has remained an open problem [Bresler, Gamarnik, Shah, IEEE Trans. Inf. Theory 2018, Gaitonde, Moitra, Mossel, STOC 2025]. In this work, we give the first algorithms that efficiently learn the Ising model in this much more natural observation model that only observes when the configuration changes. For Ising models with maximum degree $d$, our algorithm recovers the underlying dependency graph in time $\mathsf{poly}(d)\cdot n^2\log n$ and then the actual parameters in additional $\widetilde{O}(2^d n)$ time, which qualitatively matches the state-of-the-art even in the i.i.d. setting in a much weaker observation model. Our analysis holds more generally for a broader class of reversible, single-site Markov chains that also includes the popular Metropolis chain by leveraging more robust properties of reversible Markov chains.
Robots for Kiwifruit Harvesting and Pollination
This research was a part of a project that developed mobile robots that performed targeted pollen spraying and automated harvesting in pergola structured kiwifruit orchards. Multiple kiwifruit detachment mechanisms were designed and field testing of one of the concepts showed that the mechanism could reliably pick kiwifruit. Furthermore, this kiwifruit detachment mechanism was able to reach over 80 percent of fruit in the cluttered kiwifruit canopy, whereas the previous state of the art mechanism was only able to reach less than 70 percent of the fruit. Artificial pollination was performed by detecting flowers and then spraying pollen in solution onto the detected flowers from a line of sprayers on a boom, while driving at up to 1.4 ms-1. In addition, the height of the canopy was measured and the spray boom was moved up and down to keep the boom close enough to the flowers for the spray to reach the flowers, while minimising collisions with the canopy. Mobile robot navigation was performed using a 2D lidar in apple orchards and vineyards. Lidar navigation in kiwifruit orchards was more challenging because the pergola structure only provides a small amount of data for the direction of rows, compared to the amount of data from the overhead canopy, the undulating ground and other objects in the orchards. Multiple methods are presented here for extracting structure defining features from 3D lidar data in kiwifruit orchards. In addition, a 3D lidar navigation system -- which performed row following, row end detection and row end turns -- was tested for over 30 km of autonomous driving in kiwifruit orchards. Computer vision algorithms for row detection and row following were also tested. The computer vision algorithm worked as well as the 3D lidar row following method in testing.
Joint-Local Grounded Action Transformation for Sim-to-Real Transfer in Multi-Agent Traffic Control
Turnau, Justin, Da, Longchao, Vo, Khoa, Rafi, Ferdous Al, Bachiraju, Shreyas, Chen, Tiejin, Wei, Hua
Traffic Signal Control (TSC) is essential for managing urban traffic flow and reducing congestion. Reinforcement Learning (RL) offers an adaptive method for TSC by responding to dynamic traffic patterns, with multi-agent RL (MARL) gaining traction as intersections naturally function as coordinated agents. However, due to shifts in environmental dynamics, implementing MARL-based TSC policies in the real world often leads to a significant performance drop, known as the sim-to-real gap. Grounded Action Transformation (GAT) has successfully mitigated this gap in single-agent RL for TSC, but real-world traffic networks, which involve numerous interacting intersections, are better suited to a MARL framework. In this work, we introduce JL-GAT, an application of GAT to MARL-based TSC that balances scalability with enhanced grounding capability by incorporating information from neighboring agents. JL-GAT adopts a decentralized approach to GAT, allowing for the scalability often required in real-world traffic networks while still capturing key interactions between agents. Comprehensive experiments on various road networks under simulated adverse weather conditions, along with ablation studies, demonstrate the effectiveness of JL-GAT. The code is publicly available at https://github.com/DaRL-LibSignal/JL-GAT/.