AITopics

2011.06231

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)

#artificialintelligenceNov-11-2020, 08:43:08 GMT

Algorithms in Reinforcement Learning

To learn the optimal action in unknown environment, Q-learning is the simple algorithm in reinforcement learning. Without having a model of an environment, it can learn the optimal and long-term action. And there have two policies called target policy and behavior policy. Tabular methods give correct policies and functions in tables. In Q-learning, to find optimal action value function, behavior policy can be achieved using policy iteration.

algorithm, function approximation, monte carlo method, (11 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.33)

#artificialintelligenceNov-11-2020, 06:55:04 GMT

Autonomous Particle Accelerators: Accelerate Smarter With Artificial Intelligence

At DESY s ARES accelerator, the research team wants to gain experience with autonomous operation. Particle accelerators are universal tools: They help in production processes in industry, in tumor therapy in hospitals and enable unique discoveries and insights in research. Growing demands on the stability and properties of particle beams make a manual operation of these complex devices increasingly challenging – and require the highest possible level of automation to support operators. A new project of DESY and KIT (Karlsruhe Institute of Technology) is now taking the first steps towards a fully autonomously operated accelerator. The cooperation "Autonomous Accelerator," which is supported by the Helmholtz Association and the two participating Helmholtz research centers within the framework of the Helmholtz Artificial Intelligence Cooperation Unit, brings "reinforcement learning" to the operation of two linear accelerators at DESY and KIT. Reinforcement learning involves measuring state values and adjusting control variables to determine their influence on each other, thus learning a control strategy that also takes into account its effects in the future.

accelerator, artificial intelligence, autonomous particle accelerator, (7 more...)

#artificialintelligence

Country: Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.26)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.50)

Danielczuk, Michael, Balakrishna, Ashwin, Brown, Daniel S., Devgon, Shivin, Goldberg, Ken

Exploratory Grasping: Asymptotically Optimal Algorithms for Grasping Challenging Polyhedral Objects

There has been significant recent work on data-driven algorithms for learning general-purpose grasping policies. However, these policies can consistently fail to grasp challenging objects which are significantly out of the distribution of objects in the training data or which have very few high quality grasps. Motivated by such objects, we propose a novel problem setting, Exploratory Grasping, for efficiently discovering reliable grasps on an unknown polyhedral object via sequential grasping, releasing, and toppling. We formalize Exploratory Grasping as a Markov Decision Process, study the theoretical complexity of Exploratory Grasping in the context of reinforcement learning and present an efficient bandit-style algorithm, Bandits for Online Rapid Grasp Exploration Strategy (BORGES), which leverages the structure of the problem to efficiently discover high performing grasps for each object stable pose. BORGES can be used to complement any general-purpose grasping algorithm with any grasp modality (parallel-jaw, suction, multi-fingered, etc) to learn policies for objects in which they exhibit persistent failures. Simulation experiments suggest that BORGES can significantly outperform both general-purpose grasping pipelines and two other online learning algorithms and achieves performance within 5% of the optimal policy within 1000 and 8000 timesteps on average across 46 challenging objects from the Dex-Net adversarial and EGAD! object datasets, respectively. Initial physical experiments suggest that BORGES can improve grasp success rate by 45% over a Dex-Net baseline with just 200 grasp attempts in the real world. See https://tinyurl.com/exp-grasping for supplementary material and videos.

educational setting, stable pose, upstream oil & gas, (19 more...)

2011.05632

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Industry:

Education > Educational Setting (0.34)
Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Rafailidis, Dimitrios, Antaris, Stefanos

Adaptive Neural Architectures for Recommender Systems

Deep learning has proved an effective means to capture the non-linear associations of user preferences. However, the main drawback of existing deep learning architectures is that they follow a fixed recommendation strategy, ignoring users' real time-feedback. Recent advances of deep reinforcement strategies showed that recommendation policies can be continuously updated while users interact with the system. In doing so, we can learn the optimal policy that fits to users' preferences over the recommendation sessions. The main drawback of deep reinforcement strategies is that are based on predefined and fixed neural architectures. To shed light on how to handle this issue, in this study we first present deep reinforcement learning strategies for recommendation and discuss the main limitations due to the fixed neural architectures. Then, we detail how recent advances on progressive neural architectures are used for consecutive tasks in other research domains. Finally, we present the key challenges to fill the gap between deep reinforcement learning and adaptive neural architectures. We provide guidelines for searching for the best neural architecture based on each user feedback via reinforcement learning, while considering the prediction performance on real-time recommendations and the model complexity.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

2012.00743

Country:

Asia > Myanmar > Tanintharyi Region > Dawei (0.05)
Europe > Netherlands > Limburg > Maastricht (0.05)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Research Report (0.84)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Systems & Languages > Problem-Independent Architectures (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

pymgrid: An Open-Source Python Microgrid Simulator for Applied Artificial Intelligence Research

Henri, Gonzague, Levent, Tanguy, Halev, Avishai, Alami, Reda, Cordier, Philippe

Microgrids, self contained electrical grids that are capable of disconnecting from the main grid, hold potential in both tackling climate change mitigation via reducing CO2 emissions and adaptation by increasing infrastructure resiliency. Due to their distributed nature, microgrids are often idiosyncratic; as a result, control of these systems is nontrivial. While microgrid simulators exist, many are limited in scope and in the variety of microgrids they can simulate. We propose pymgrid, an open-source Python package to generate and simulate a large number of microgrids, and the first open-source tool that can generate more than 600 different microgrids. pymgrid abstracts most of the domain expertise, allowing users to focus on control algorithms. In particular, pymgrid is built to be a reinforcement learning (RL) platform, and includes the ability to model microgrids as Markov decision processes. pymgrid also introduces two pre-computed list of microgrids, intended to allow for research reproducibility in the microgrid setting.

algorithm, microgrid, pymgrid, (9 more...)

2011.08004

Country:

North America > United States > California > Yolo County > Davis (0.14)
Europe > France (0.05)
North America > United States > Texas > Harris County > Houston (0.04)

Genre: Research Report (0.65)

Industry: Energy > Power Industry > Utilities (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Xu, Tengyu, Liang, Yingbin, Lan, Guanghui

A Primal Approach to Constrained Policy Optimization: Global Optimality and Finite-Time Analysis

arXiv.org Machine LearningNov-11-2020

Safe reinforcement learning (SRL) problems are typically modeled as constrained Markov Decision Process (CMDP), in which an agent explores the environment to maximize the expected total reward and meanwhile avoids violating certain constraints on a number of expected total costs. In general, such SRL problems have nonconvex objective functions subject to multiple nonconvex constraints, and hence are very challenging to solve, particularly to provide a globally optimal policy. Many popular SRL algorithms adopt a primal-dual structure which utilizes the updating of dual variables for satisfying the constraints. In contrast, we propose a primal approach, called constraint-rectified policy optimization (CRPO), which updates the policy alternatingly between objective improvement and constraint satisfaction. CRPO provides a primal-type algorithmic framework to solve SRL problems, where each policy update can take any variant of policy optimization step. To demonstrate the theoretical performance of CRPO, we adopt natural policy gradient (NPG) for each policy update step and show that CRPO achieves an $\mathcal{O}(1/\sqrt{T})$ convergence rate to the global optimal policy in the constrained policy set and an $\mathcal{O}(1/\sqrt{T})$ error bound on constraint satisfaction. This is the first finite-time analysis of SRL algorithms with global optimality guarantee. Our empirical results demonstrate that CRPO can outperform the existing primal-dual baseline algorithms significantly.

algorithm, constraint, probability, (11 more...)

arXiv.org Machine Learning

2011.05869

Country:

North America > United States > Ohio (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Leisure & Entertainment > Games (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Offline Learning of Counterfactual Perception as Prediction for Real-World Robotic Reinforcement Learning

Jin, Jun, Graves, Daniel, Haigh, Cameron, Luo, Jun, Jagersand, Martin

We propose a method for offline learning of counterfactual predictions to address real world robotic reinforcement learning challenges. The proposed method encodes action-oriented visual observations as several "what if" questions learned offline from prior experience using reinforcement learning methods. These "what if" questions counterfactually predict how action-conditioned observation would evolve on multiple temporal scales if the agent were to stick to its current action. We show that combining these offline counterfactual predictions along with online in-situ observations (e.g. force feedback) allows efficient policy learning with only a sparse terminal (success/failure) reward. We argue that the learned predictions form an effective representation of the visual task, and guide the online exploration towards high-potential success interactions (e.g. contact-rich regions). Experiments were conducted in both simulation and real-world scenarios for evaluation. Our results demonstrate that it is practical to train a reinforcement learning agent to perform real-world fine manipulation in about half a day, without hand engineered perception systems or calibrated instrumentation. Recordings of the real robot training can be found via https://sites.google.com/view/realrl.

counterfactual prediction, reinforcement, representation, (13 more...)

2011.05857

Country: North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment (0.46)
Education > Educational Setting > Online (0.33)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Aumjaud, Pierre, McAuliffe, David, Lera, Francisco Javier Rodríguez, Cardiff, Philip

Reinforcement Learning Experiments and Benchmark for Solving Robotic Reaching Tasks

Reinforcement learning has shown great promise in robotics thanks to its ability to develop efficient robotic control procedures through self-training. In particular, reinforcement learning has been successfully applied to solving the reaching task with robotic arms. In this paper, we define a robust, reproducible and systematic experimental procedure to compare the performance of various model-free algorithms at solving this task. The policies are trained in simulation and are then transferred to a physical robotic manipulator. It is shown that augmenting the reward signal with the Hindsight Experience Replay exploration technique increases the average return of off-policy agents between 7 and 9 folds when the target position is initialised randomly at the beginning of each episode.

algorithm, international conference, reinforcement learning, (14 more...)

doi: 10.1007/978-3-030-62579-5_22

2011.05782

Country:

Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Europe > Spain > Castile and León > León Province > León (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.43)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Fryen, Thilo, Eppe, Manfred, Nguyen, Phuong D. H., Gerkmann, Timo, Wermter, Stefan

Reinforcement Learning with Time-dependent Goals for Robotic Musicians

Reinforcement learning is a promising method to accomplish robotic control tasks. The task of playing musical instruments is, however, largely unexplored because it involves the challenge of achieving sequential goals - melodies - that have a temporal dimension. In this paper, we address robotic musicianship by introducing a temporal extension to goal-conditioned reinforcement learning: Time-dependent goals. We demonstrate that these can be used to train a robotic musician to play the theremin instrument. We train the robotic agent in simulation and transfer the acquired policy to a real-world robotic thereminist. Supplemental video: https://youtu.be/jvC9mPzdQN4

agent, reinforcement, robot, (15 more...)

2011.05715

Country: Europe > Germany (0.04)

Genre: Research Report (1.00)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)