Mobile edge computing (MEC) is considered a novel paradigm for computation-intensive and delay-sensitive tasks in fifth generation (5G) networks and beyond. However, its uncertainty, referred to as dynamic and randomness, from the mobile device, wireless channel, and edge network sides, results in high-dimensional, nonconvex, nonlinear, and NP-hard optimization problems. Thanks to the evolved reinforcement learning (RL), upon iteratively interacting with the dynamic and random environment, its trained agent can intelligently obtain the optimal policy in MEC. Furthermore, its evolved versions, such as deep RL (DRL), can achieve higher convergence speed efficiency and learning accuracy based on the parametric approximation for the large-scale state-action space. This paper provides a comprehensive research review on RL-enabled MEC and offers insight for development in this area. More importantly, associated with free mobility, dynamic channels, and distributed services, the MEC challenges that can be solved by different kinds of RL algorithms are identified, followed by how they can be solved by RL solutions in diverse mobile applications. Finally, the open challenges are discussed to provide helpful guidance for future research in RL training and learning MEC.
While artificial-intelligence-based methods suffer from lack of transparency, rule-based methods dominate in safety-critical systems. Yet, the latter cannot compete with the first ones in robustness to multiple requirements, for instance, simultaneously addressing safety, comfort, and efficiency. Hence, to benefit from both methods they must be joined in a single system. This paper proposes a decision making and control framework, which profits from advantages of both the rule- and machine-learning-based techniques while compensating for their disadvantages. The proposed method embodies two controllers operating in parallel, called Safety and Learned. A rule-based switching logic selects one of the actions transmitted from both controllers. The Safety controller is prioritized every time, when the Learned one does not meet the safety constraint, and also directly participates in the safe Learned controller training. Decision making and control in autonomous driving is chosen as the system case study, where an autonomous vehicle learns a multi-task policy to safely cross an unprotected intersection. Multiple requirements (i.e., safety, efficiency, and comfort) are set for vehicle operation. A numerical simulation is performed for the proposed framework validation, where its ability to satisfy the requirements and robustness to changing environment is successfully demonstrated.
Opponent modeling is the ability to use prior knowledge and observations in order to predict the behavior of an opponent. This survey presents a comprehensive overview of existing opponent modeling techniques for adversarial domains, many of which must address stochastic, continuous, or concurrent actions, and sparse, partially observable payoff structures. We discuss all the components of opponent modeling systems, including feature extraction, learning algorithms, and strategy abstractions. These discussions lead us to propose a new form of analysis for describing and predicting the evolution of game states over time. We then introduce a new framework that facilitates method comparison, analyze a representative selection of techniques using the proposed framework, and highlight common trends among recently proposed methods. Finally, we list several open problems and discuss future research directions inspired by AI research on opponent modeling and related research in other disciplines.
A common approach to prediction and planning in partially observable domains is to use recurrent neural networks (RNNs), which ideally develop and maintain a latent memory about hidden, task-relevant factors. We hypothesize that many of these hidden factors in the physical world are constant over time, changing only sparsely. To study this hypothesis, we propose Gated $L_0$ Regularized Dynamics (GateL0RD), a novel recurrent architecture that incorporates the inductive bias to maintain stable, sparsely changing latent states. The bias is implemented by means of a novel internal gating function and a penalty on the $L_0$ norm of latent state changes. We demonstrate that GateL0RD can compete with or outperform state-of-the-art RNNs in a variety of partially observable prediction and control tasks. GateL0RD tends to encode the underlying generative factors of the environment, ignores spurious temporal dependencies, and generalizes better, improving sampling efficiency and overall performance in model-based planning and reinforcement learning tasks. Moreover, we show that the developing latent states can be easily interpreted, which is a step towards better explainability in RNNs.
Affordance refers to the perception of possible actions allowed by an object. Despite its relevance to human-computer interaction, no existing theory explains the mechanisms that underpin affordance-formation; that is, how affordances are discovered and adapted via interaction. We propose an integrative theory of affordance-formation based on the theory of reinforcement learning in cognitive sciences. The key assumption is that users learn to associate promising motor actions to percepts via experience when reinforcement signals (success/failure) are present. They also learn to categorize actions (e.g., "rotating" a dial), giving them the ability to name and reason about affordance. Upon encountering novel widgets, their ability to generalize these actions determines their ability to perceive affordances. We implement this theory in a virtual robot model, which demonstrates human-like adaptation of affordance in interactive widgets tasks. While its predictions align with trends in human data, humans are able to adapt affordances faster, suggesting the existence of additional mechanisms.
Artificial intelligence (AI) has become a part of everyday conversation and our lives. It is considered as the new electricity that is revolutionizing the world. AI is heavily invested in both industry and academy. However, there is also a lot of hype in the current AI debate. AI based on so-called deep learning has achieved impressive results in many problems, but its limits are already visible. AI has been under research since the 1940s, and the industry has seen many ups and downs due to over-expectations and related disappointments that have followed. The purpose of this book is to give a realistic picture of AI, its history, its potential and limitations. We believe that AI is a helper, not a ruler of humans. We begin by describing what AI is and how it has evolved over the decades. After fundamentals, we explain the importance of massive data for the current mainstream of artificial intelligence. The most common representations for AI, methods, and machine learning are covered. In addition, the main application areas are introduced. Computer vision has been central to the development of AI. The book provides a general introduction to computer vision, and includes an exposure to the results and applications of our own research. Emotions are central to human intelligence, but little use has been made in AI. We present the basics of emotional intelligence and our own research on the topic. We discuss super-intelligence that transcends human understanding, explaining why such achievement seems impossible on the basis of present knowledge,and how AI could be improved. Finally, a summary is made of the current state of AI and what to do in the future. In the appendix, we look at the development of AI education, especially from the perspective of contents at our own university.
Programming intelligent control strategies for complex robot systems is a challenging task. Reinforcement learning (RL) promises the automated synthesis of control strategies through a data-driven approach instead of explicitly designing hand-crafted solutions through expert knowledge. In recent years, the field of RL has witnessed outstanding successes and raised a surge of interest in the control of dynamical systems through such a trial-and-error paradigm. The combination of RL and deep learning methods excel at problems that can be quickly simulated like robotics [13, 22] or video games [18, 29] and in domains where the exact model is known but long-horizon planning is not computationally tractable, e.g. board games like Go and Chess . Despite the significant advances in recent years, the applicability of RL algorithms is still limited when the data at test time differ from those seen during training . Since many real-world systems cannot afford to learn policies from scratch due to the expense of data, simulations are the preferred approach to build data-driven control policies in the RL community. In fact, a gap between the simulation and the real world still persists because the modeling of all effects either requires in-depth expert knowledge or is simply not desirable, e.g.
This work investigates the effects of Curriculum Learning (CL)-based approaches on the agent's performance. In particular, we focus on the safety aspect of robotic mapless navigation, comparing over a standard end-to-end (E2E) training strategy. To this end, we present a CL approach that leverages Transfer of Learning (ToL) and fine-tuning in a Unity-based simulation with the Robotnik Kairos as a robotic agent. For a fair comparison, our evaluation considers an equal computational demand for every learning approach (i.e., the same number of interactions and difficulty of the environments) and confirms that our CL-based method that uses ToL outperforms the E2E methodology. In particular, we improve the average success rate and the safety of the trained policy, resulting in 10% fewer collisions in unseen testing scenarios. To further confirm these results, we employ a formal verification tool to quantify the number of correct behaviors of Reinforcement Learning policies over desired specifications.
In the autonomous driving field, the fusion of human knowledge into Deep Reinforcement Learning (DRL) is often based on the human demonstration recorded in the simulated environment. This limits the generalization and the feasibility of application in real-world traffic. We proposed a two-stage DRL method, that learns from real-world human driving to achieve performance that is superior to the pure DRL agent. Training a DRL agent is done within a framework for CARLA with Robot Operating System (ROS). For evaluation, we designed different real-world driving scenarios to compare the proposed two-stage DRL agent with the pure DRL agent. After extracting the 'good' behavior from the human driver, such as anticipation in a signalized intersection, the agent becomes more efficient and drives safer, which makes this autonomous agent more adapt to Human-Robot Interaction (HRI) traffic.
--In the trial-and-error mechanism of reinforcement learning (RL), a notorious contradiction arises when we expect to learn a safe policy: how to learn a safe policy without enough data and prior model about the dangerous region? Existing methods mostly use the posterior penalty for dangerous actions, which means that the agent is not penalized until experiencing danger . This fact causes that the agent cannot learn a zero-violation policy even after convergence . Otherwise, it would not receive any penalty and lose the knowledge about danger . In this paper, we propose the safe set actor-critic (SSAC) algorithm, which confines the policy update using safety-oriented energy functions, or the safety indexes . The safety index is designed to increase rapidly for potentially dangerous actions, which allow us to locate the safe set on the action space, or the control safe set . Therefore, we can identify the dangerous actions prior to taking them, and further obtain a zero constraint-violation policy after convergence. We claim that we can learn the energy function in a model-free manner similar to learning a value function. By using the energy function transition as the constraint objective, we formulate a constrained RL problem. We prove that our Lagrangian-based solutions make sure that the learned policy will converge to the constrained optimum under some assumptions. The proposed algorithm is evaluated on both the complex simulation environments and a hardware-in-loop (HIL) experiment with a real controller from the autonomous vehicle. Experimental results suggest that the converged policy in all environments achieve zero constraint violation and comparable performance with model-based baseline. EINFORCEMENT learning has drawn rapidly growing attention for its superhuman learning capabilities in many sequential decision making problems like Go , Atari Games , and Starcraft .