Amiri, Saeid (Cleveland State University) | Wei, Suhua (Cleveland State University) | Zhang, Shiqi (Cleveland State University) | Sinapov, Jivko (Tufts University) | Thomason, Jesse (The University of Texas at Austin) | Stone, Peter (The University of Texas at Austin)
Intelligent robots frequently need to explore the objects in their working environments. Modern sensors have enabled robots to learn object properties via perception of multiple modalities. However, object exploration in the real world poses a challenging trade-off between information gains and exploration action costs. Mixed observability Markov decision process (MOMDP) is a framework for planning under uncertainty, while accounting for both fully and partially observable components of the state. Robot perception frequently has to face such mixed observability. This work enables a robot equipped with an arm to dynamically construct query-oriented MOMDPs for object exploration. The robot’s behavioral policy is learned from two datasets collected using real robots. Our approach enables a robot to explore object properties in a way that is significantly faster while improving accuracies in comparison to existing methods that rely on hand-coded exploration strategies.
Andrew W. Moore School of Computer Science Carnegie-Mellon University Pittsburgh, PA 15213 Abstract Parti-game is a new algorithm for learning from delayed rewards in high dimensional real-valued state-spaces. In high dimensions it is essential that learning does not explore or plan over state space uniformly. Parti-game maintains a decision-tree partitioning of state-space and applies game-theory and computational geometry techniquesto efficiently and reactively concentrate high resolution onlyon critical areas. Many simulated problems have been tested, ranging from 2-dimensional to 9-dimensional state-spaces, including mazes, path planning, nonlinear dynamics, and uncurling snakerobots in restricted spaces. In all cases, a good solution is found in less than twenty trials and a few minutes. 1 REINFORCEMENT LEARNING Reinforcement learning [Samuel, 1959, Sutton, 1984, Watkins, 1989, Barto et al., 1991] is a promising method for control systems to program and improve themselves.
Urban Traffic Control (UTC) plays an essential role in Intelligent Transportation System (ITS) but remains difficult. Since model-based UTC methods may not accurately describe the complex nature of traffic dynamics in all situations, model-free data-driven UTC methods, especially reinforcement learning (RL) based UTC methods, received increasing interests in the last decade. However, existing DL approaches did not propose an efficient algorithm to solve the complicated multiple intersections control problems whose state-action spaces are vast. To solve this problem, we propose a Deep Reinforcement Learning (DRL) algorithm that combines several tricks to master an appropriate control strategy within an acceptable time. This new algorithm relaxes the fixed traffic demand pattern assumption and reduces human invention in parameter tuning. Simulation experiments have shown that our method outperforms traditional rule-based approaches and has the potential to handle more complex traffic problems in the real world.
In this paper, a MIMO simulated annealing SA based Q learning method is proposed to control a line follower robot. The conventional controller for these types of robots is the proportional P controller. Considering the unknown mechanical characteristics of the robot and uncertainties such as friction and slippery surfaces, system modeling and controller designing can be extremely challenging. The mathematical modeling for the robot is presented in this paper, and a simulator is designed based on this model. The basic Q learning methods are based pure exploitation and the epsilon-greedy methods, which help exploration, can harm the controller performance after learning completion by exploring nonoptimal actions. The simulated annealing based Q learning method tackles this drawback by decreasing the exploration rate when the learning increases. The simulation and experimental results are provided to evaluate the effectiveness of the proposed controller.
A desirable property in fault-tolerant controllers is adaptability to system changes as they evolve during systems operations. An adaptive controller does not require optimal control policies to be enumerated for possible faults. Instead it can approximate one in real-time. We present two adaptive fault-tolerant control schemes for a discrete time system based on hierarchical reinforcement learning. We compare their performance against a model predictive controller in presence of sensor noise and persistent faults. The controllers are tested on a fuel tank model of a C-130 plane. Our experiments demonstrate that reinforcement learning-based controllers perform more robustly than model predictive controllers under faults, partially observable system models, and varying sensor noise levels.