Plotting

 Soltoggio, Andrea


Statistical Context Detection for Deep Lifelong Reinforcement Learning

arXiv.org Artificial Intelligence

Context detection involves labeling segments of an online stream of data as belonging to different tasks. Task labels are used in lifelong learning algorithms to perform consolidation or other procedures that prevent catastrophic forgetting. Inferring task labels from online experiences remains a challenging problem. Most approaches assume finite and low-dimension observation spaces or a preliminary training phase during which task labels are learned. Moreover, changes in the transition or reward functions can be detected only in combination with a policy, and therefore are more difficult to detect than changes in the input distribution. This paper presents an approach to learning both policies and labels in an online deep reinforcement learning setting. The key idea is to use distance metrics, obtained via optimal transport methods, i.e., Wasserstein distance, on suitable latent action-reward spaces to measure distances between sets of data points from past and current streams. Such distances can then be used for statistical tests based on an adapted Kolmogorov-Smirnov calculation to assign labels to sequences of experiences. A rollback procedure is introduced to learn multiple policies by ensuring that only the appropriate data is used to train the corresponding policy. The combination of task detection and policy deployment allows for the optimization of lifelong reinforcement learning agents without an oracle that provides task labels. The approach is tested using two benchmarks and the results show promising performance when compared with related context detection algorithms. The results suggest that optimal transport statistical methods provide an explainable and justifiable procedure for online context detection and reward optimization in lifelong reinforcement learning.


R^3: On-device Real-Time Deep Reinforcement Learning for Autonomous Robotics

arXiv.org Artificial Intelligence

Autonomous robotic systems, like autonomous vehicles and robotic search and rescue, require efficient on-device training for continuous adaptation of Deep Reinforcement Learning (DRL) models in dynamic environments. This research is fundamentally motivated by the need to understand and address the challenges of on-device real-time DRL, which involves balancing timing and algorithm performance under memory constraints, as exposed through our extensive empirical studies. This intricate balance requires co-optimizing two pivotal parameters of DRL training -- batch size and replay buffer size. Configuring these parameters significantly affects timing and algorithm performance, while both (unfortunately) require substantial memory allocation to achieve near-optimal performance. This paper presents R^3, a holistic solution for managing timing, memory, and algorithm performance in on-device real-time DRL training. R^3 employs (i) a deadline-driven feedback loop with dynamic batch sizing for optimizing timing, (ii) efficient memory management to reduce memory footprint and allow larger replay buffer sizes, and (iii) a runtime coordinator guided by heuristic analysis and a runtime profiler for dynamically adjusting memory resource reservations. These components collaboratively tackle the trade-offs in on-device DRL training, improving timing and algorithm performance while minimizing the risk of out-of-memory (OOM) errors. We implemented and evaluated R^3 extensively across various DRL frameworks and benchmarks on three hardware platforms commonly adopted by autonomous robotic systems. Additionally, we integrate R^3 with a popular realistic autonomous car simulator to demonstrate its real-world applicability. Evaluation results show that R^3 achieves efficacy across diverse platforms, ensuring consistent latency performance and timing predictability with minimal overhead.


Lifelong Reinforcement Learning with Modulating Masks

arXiv.org Artificial Intelligence

Lifelong learning aims to create AI systems that continuously and incrementally learn during a lifetime, similar to biological learning. Attempts so far have met problems, including catastrophic forgetting, interference among tasks, and the inability to exploit previous knowledge. While considerable research has focused on learning multiple supervised classification tasks that involve changes in the input distribution, lifelong reinforcement learning (LRL) must deal with variations in the state and transition distributions, and in the reward functions. Modulating masks with a fixed backbone network, recently developed for classification, are particularly suitable to deal with such a large spectrum of task variations. In this paper, we adapted modulating masks to work with deep LRL, specifically PPO and IMPALA agents. The comparison with LRL baselines in both discrete and continuous RL tasks shows superior performance. We further investigated the use of a linear combination of previously learned masks to exploit previous knowledge when learning new tasks: not only is learning faster, the algorithm solves tasks that we could not otherwise solve from scratch due to extremely sparse rewards. The results suggest that RL with modulating masks is a promising approach to lifelong learning, to the composition of knowledge to learn increasingly complex tasks, and to knowledge reuse for efficient and faster learning.


Sharing Lifelong Reinforcement Learning Knowledge via Modulating Masks

arXiv.org Artificial Intelligence

Lifelong learning agents aim to learn multiple tasks sequentially over a lifetime. This involves the ability to exploit previous knowledge when learning new tasks and to avoid forgetting. Modulating masks, a specific type of parameter isolation approach, have recently shown promise in both supervised and reinforcement learning. While lifelong learning algorithms have been investigated mainly within a single-agent approach, a question remains on how multiple agents can share lifelong learning knowledge with each other. We show that the parameter isolation mechanism used by modulating masks is particularly suitable for exchanging knowledge among agents in a distributed and decentralized system of lifelong learners. The key idea is that the isolation of specific task knowledge to specific masks allows agents to transfer only specific knowledge on-demand, resulting in robust and effective distributed lifelong learning. We assume fully distributed and asynchronous scenarios with dynamic agent numbers and connectivity. An on-demand communication protocol ensures agents query their peers for specific masks to be transferred and integrated into their policies when facing each task. Experiments indicate that on-demand mask communication is an effective way to implement distributed lifelong reinforcement learning and provides a lifelong learning benefit with respect to distributed RL baselines such as DD-PPO, IMPALA, and PPO+EWC. The system is particularly robust to connection drops and demonstrates rapid learning due to knowledge exchange.


The configurable tree graph (CT-graph): measurable problems in partially observable and distal reward environments for lifelong reinforcement learning

arXiv.org Artificial Intelligence

Many real-world problems are characterized by a large number of observations, confounding and spurious correlations, partially observable states, and distal, dynamic rewards with hierarchical reward structures. Such conditions make it hard for both animal and machines to learn complex skills. The learning process requires discovering what is important and what can be ignored, how the reward function is structured, and how to reuse knowledge across different tasks that share common properties. For these reasons, the application of standard reinforcement learning (RL) algorithms (Sutton and Barto, 2018) to solve structured problems is often not effective. Limitations of current RL algorithms include the problem of exploration with sparse rewards (Pathak et al., 2017), dealing with partially observable Markov decision problems (POMDP) (Ladosz et al., 2021), coping with large amounts of confounding stimuli (Thrun, 2000; Kim et al., 2019), and reusing skills for efficiently learning multiple task in a lifelong learning setting (Mendez and Eaton, 2020). Standard reinforcement learning algorithms are best suited when the problem can be formulated as a single-task problem in observable Markov decision problem (MDP). Under these assumptions, with complete observability and with static and frequent rewards, deep reinforcement learning (DRL) (Mnih et al., 2015; Li, 2017) has gained popularity due to the ability to learn an approximated Q-value function directly from raw pixel data in the Atari 2600 platform. This and similar algorithms stack multiple frames to derive states of an MDP, and use a basic ɛ-greedy exploration policy. In more complex cases with partial observability and sparse rewards, extensions have been proposed to include more advanced exploration techniques (Ladosz et al., 2022), e.g.


A Domain-Agnostic Approach for Characterization of Lifelong Learning Systems

arXiv.org Artificial Intelligence

Despite the advancement of machine learning techniques in recent years, state-of-the-art systems lack robustness to "real world" events, where the input distributions and tasks encountered by the deployed systems will not be limited to the original training context, and systems will instead need to adapt to novel distributions and tasks while deployed. This critical gap may be addressed through the development of "Lifelong Learning" systems that are capable of 1) Continuous Learning, 2) Transfer and Adaptation, and 3) Scalability. Unfortunately, efforts to improve these capabilities are typically treated as distinct areas of research that are assessed independently, without regard to the impact of each separate capability on other aspects of the system. We instead propose a holistic approach, using a suite of metrics and an evaluation framework to assess Lifelong Learning in a principled way that is agnostic to specific domains or system techniques. Through five case studies, we show that this suite of metrics can inform the development of varied and complex Lifelong Learning systems. We highlight how the proposed suite of metrics quantifies performance trade-offs present during Lifelong Learning system development - both the widely discussed Stability-Plasticity dilemma and the newly proposed relationship between Sample Efficient and Robust Learning. Further, we make recommendations for the formulation and use of metrics to guide the continuing development of Lifelong Learning systems and assess their progress in the future.


Context Meta-Reinforcement Learning via Neuromodulation

arXiv.org Artificial Intelligence

Meta-reinforcement learning (meta-RL) algorithms enable agents to adapt quickly to tasks from few samples in dynamic environments. Such a feat is achieved through dynamic representations in an agent's policy network (obtained via reasoning about task context, model parameter updates, or both). However, obtaining rich dynamic representations for fast adaptation beyond simple benchmark problems is challenging due to the burden placed on the policy network to accommodate different policies. This paper addresses the challenge by introducing neuromodulation as a modular component to augment a standard policy network that regulates neuronal activities in order to produce efficient dynamic representations for task adaptation. The proposed extension to the policy network is evaluated across multiple discrete and continuous control environments of increasing complexity. To prove the generality and benefits of the extension in meta-RL, the neuromodulated network was applied to two state-of-the-art meta-RL algorithms (CAVIA and PEARL). The result demonstrates that meta-RL augmented with neuromodulation produces significantly better result and richer dynamic representations in comparison to the baselines.


Deep Reinforcement Learning with Modulated Hebbian plus Q Network Architecture

arXiv.org Machine Learning

This paper introduces the modulated Hebbian plus Q network architecture (MOHQA) for solving challenging partially observable Markov decision processes (POMDPs) deep reinforcement learning problems with sparse rewards and confounding observations. The proposed architecture combines a deep Q-network (DQN), and a modulated Hebbian network with neural eligibility traces (MOHN). Bio-inspired neural traces are used to bridge temporal delays between actions and rewards. The purpose is to discover distal cause-effect relationships where confounding observations and sparse rewards cause standard RL algorithms to fail. Each of the two modules of the network (DQN and MOHN) is responsible for different aspects of learning. DQN learns low level features and control, while MOHN contributes to the high-level decisions by bridging rewards with past actions. The strength of the approach is to support a DQN standard framework when temporal difference errors are difficult to compute due to non-observable states. The system is tested on a set of generalized decision making problems encoded as decision tree graphs that deliver delayed rewards after key decision points and confounding observations. The simulations show that the proposed approach helps solve problems that are currently challenging for state-of-the-art deep reinforcement learning algorithms.


A Multi-Scale Mapping Approach Based on a Deep Learning CNN Model for Reconstructing High-Resolution Urban DEMs

arXiv.org Machine Learning

Abstract: The shortage o f high - resolution urban digital elevation model ( DEM) datasets has been a challenge for modelling urban flood and managing its risk . A solution is to develop effective approaches to r econstruct high - resolution DEMs from their low - resolution equivalents that are more widely available . However, the current high - resolution DEM reconstruction approaches mainly focus on natural topography . F ew attempts have been made for urban topography which is typically a n integration of complex man - made and natural features . T his study proposes a novel multi - scale mapping approach based on convolutional neural network (CNN) to deal with the complex characteristics of urban topography and reconstruct high - resolution urban DEMs . T he proposed multi - scale CNN model is first ly trained using urban DEMs that contain topographic features at different resolutions, and then used to reconstruct the urban DEM at a specified (high) resolution from a low - resolution equivalent . A two - level accuracy assessment approach is also designed to evaluate the performance of the proposed urban DEM reconstruction method, in terms of numerical accuracy and morphological accuracy . Compared with other commonly used met hods, the current CNN based approach produces superior results, provid ing a cost - effective innovative method to acquire high - resolution DEMs in other data - scarce environment s . Introduction Digital elevation model s (DEM s) have been widely used in many fields such as l andform evolution, soil erosion modeling and other geo - simulation s ( Bishop et al., 2012; Liu et al., 2015; Mondal et al., 2017; Li and Wong, 2010) . In p articular, DEMs provide indispensable data to support water resources management and flood risk assessment (Moore et al., 1991; O'Loughlin et al., 2016) . I n urban flood risk assessment, the availability of high - resolution urban DEMs is crucial for the accurate representation of complex urban topographic features and required for a reliable prediction of flood inundation to inform risk calculation ( Ramirez et al., 2016; Leitão and de Sousa, 2018) .


Born to Learn: the Inspiration, Progress, and Future of Evolved Plastic Artificial Neural Networks

arXiv.org Artificial Intelligence

Biological plastic neural networks are systems of extraordinary computational capabilities shaped by evolution, development, and lifetime learning. The interplay of these elements leads to the emergence of adaptive behavior and intelligence. Inspired by such intricate natural phenomena, Evolved Plastic Artificial Neural Networks (EPANNs) use simulated evolution in-silico to breed plastic neural networks with a large variety of dynamics, architectures, and plasticity rules: these artificial systems are composed of inputs, outputs, and plastic components that change in response to experiences in an environment. These systems may autonomously discover novel adaptive algorithms, and lead to hypotheses on the emergence of biological adaptation. EPANNs have seen considerable progress over the last two decades. Current scientific and technological advances in artificial neural networks are now setting the conditions for radically new approaches and results. In particular, the limitations of hand-designed networks could be overcome by more flexible and innovative solutions. This paper brings together a variety of inspiring ideas that define the field of EPANNs. The main methods and results are reviewed. Finally, new opportunities and developments are presented.