AITopics

Shah, Syed Naveed Hussain (Microsoft Corporation ) | Hougen, Dean Frederick (University of Oklahoma)

Stochastic Reinforcement Learning for Continuous Actions in Dynamic Environments

AAAI ConferencesMay-16-2020

Reinforcement learning (RL) agents use trial and error to learn action policies for environment states. Environments with continuous action spaces are far more challenging for RL than those with discrete actions because there are infinite possible continuous action values from which to choose. Dynamic environments create additional challenges for RL agents, which must adjust rapidly to changes. We recently introduced REINFORCE SUN, a superclass of REINFORCE with Gaussian units, that allows for stochasticity at different levels of granularity in artificial neural networks (synapse, unit, or network), and have shown that moving stochasticity to synapses greatly aids RL in both static and dynamic environments with continuous action spaces. However, we also found that performance in dynamic environments remained substantially lower than desired. To rectify this, we here consider alternative parameter update equations for learning in dynamic environments. These equations form the core of Stochastic Synapse Reinforcement Learning (SSRL), which we here generalize to create S*RL, a superclass of SSRL that allows for stochasticity at these levels. Empirical results using multi-dimensional robot inverse kinematic data sets show that S*RL update equations greatly outperform traditional REINFORCE equations in dynamic, continuous state and action spaces.

artificial intelligence, machine learning, stochastic reinforcement learning, (2 more...)

AAAI Conferences

The Thirty-Third International Flairs Conference

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Vaidyanath, Skanda ( Birla Institute of Technology and Science, Pilani-Hyderabad Campus ) | Georgila, Kallirroi (University of Southern California) | Traum, David (University of Southern California)

Using Reinforcement Learning to Manage Communications Between Humans and Artificial Agents in an Evacuation Scenario

AAAI ConferencesMay-16-2020

In search and rescue missions, robots can potentially help save survivors faster than human emergency responders alone would. In our experimental virtual reality simulation environment we have a system which comprises a swarm of unmanned aerial vehicles (UAVs) and a virtual "spokesperson". The system and a human operator work together on locating and guiding survivors to safety away from an active wildfire encroaching on a small town. The UAVs and the spokesperson are equipped with natural language capabilities through which they can communicate with the survivors to convince them to evacuate. If they fail to do so they can ask the human operator to intervene. We use reinforcement learning to automatically learn a policy to be followed when a UAV has located survivors. The system learns the best course of action to help the survivors evacuate, i.e., warn them through the UAV or the spokesperson, ask the human operator to intervene if needed, guide them to safety via their preferred method of transportation, or just wait for more information. We vary the distance of the fire, the level of cooperativeness of the survivors, and how busy the human operator is, and we report results in terms of percentage of survivors saved in each condition.

artificial intelligence, machine learning, reinforcement learning, (4 more...)

AAAI Conferences

The Thirty-Third International Flairs Conference

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.60)
Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (0.53)
Information Technology > Artificial Intelligence > Robots (0.53)

#artificialintelligenceMay-15-2020, 04:03:04 GMT

Artificial Intelligence Can Devise An Optimal Tax Policy To Reduce Inequality -- AI Daily - Artificial Intelligence News

Identifying the optimal level of taxation is quite complex. Human behaviour is highly unpredictable and gathering data can be time consuming. Despite decades of economic research being put into finding the optimal tax rate, it remains an open problem. But, scientists at the US business technology company, Salesforce, believe they may have found the key to solving the problem – Artificial Intelligence. The team has developed an AI system called the AI Economist, which uses reinforcement learning technology to identify the optimal level of taxation to make reduce inequality.

artificial intelligence, machine learning, reinforcement learning, (11 more...)

#artificialintelligence

Industry:

Law > Taxation Law (0.80)
Government > Tax (0.80)
Information Technology (0.80)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Lynch, Corey, Sermanet, Pierre

Grounding Language in Play

Natural language is perhaps the most versatile and intuitive way for humans to communicate tasks to a robot. Prior work on Learning from Play (LfP) [Lynch et al, 2019] provides a simple approach for learning a wide variety of robotic behaviors from general sensors. However, each task must be specified with a goal image---something that is not practical in open-world environments. In this work we present a simple and scalable way to condition policies on human language instead. We extend LfP by pairing short robot experiences from play with relevant human language after-the-fact. To make this efficient, we introduce multicontext imitation, which allows us to train a single agent to follow image or language goals, then use just language conditioning at test time. This reduces the cost of language pairing to less than 1% of collected robot experience, with the majority of control still learned via self-supervised imitation. At test time, a single agent trained in this manner can perform many different robotic manipulation skills in a row in a 3D environment, directly from images, and specified only with natural language (e.g. "open the drawer...now pick up the block...now press the green button..."). Finally, we introduce a simple technique that transfers knowledge from large unlabeled text corpora to robotic learning. We find that transfer significantly improves downstream robotic manipulation. It also allows our agent to follow thousands of novel instructions at test time in zero shot, in 16 different languages. See videos of our experiments at language-play.github.io

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2005.07648

Country: North America > United States > New York (0.04)

Genre: Research Report (0.82)

Industry: Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Abdolmaleki, Abbas, Huang, Sandy H., Hasenclever, Leonard, Neunert, Michael, Song, H. Francis, Zambelli, Martina, Martins, Murilo F., Heess, Nicolas, Hadsell, Raia, Riedmiller, Martin

A Distributional View on Multi-Objective Policy Optimization

Many real-world problems require trading off multiple competing objectives. However, these objectives are often in different units and/or scales, which can make it challenging for practitioners to express numerical preferences over objectives in their native units. In this paper we propose a novel algorithm for multi-objective reinforcement learning that enables setting desired preferences for objectives in a scale-invariant way. We propose to learn an action distribution for each objective, and we use supervised learning to fit a parametric policy to a combination of these distributions. We demonstrate the effectiveness of our approach on challenging high-dimensional real and simulated robotics tasks, and show that setting different preferences in our framework allows us to trace out the space of nondominated solutions.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

2005.07513

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Hertweck, Tim, Riedmiller, Martin, Bloesch, Michael, Springenberg, Jost Tobias, Siegel, Noah, Wulfmeier, Markus, Hafner, Roland, Heess, Nicolas

Simple Sensor Intentions for Exploration

Modern reinforcement learning algorithms can learn solutions to increasingly difficult control problems while at the same time reduce the amount of prior knowledge needed for their application. One of the remaining challenges is the definition of reward schemes that appropriately facilitate exploration without biasing the solution in undesirable ways, and that can be implemented on real robotic systems without expensive instrumentation. In this paper we focus on a setting in which goal tasks are defined via simple sparse rewards, and exploration is facilitated via agent-internal auxiliary tasks. We introduce the idea of simple sensor intentions (SSIs) as a generic way to define auxiliary tasks. SSIs reduce the amount of prior knowledge that is required to define suitable rewards. They can further be computed directly from raw sensor streams and thus do not require expensive and possibly brittle state estimation on real systems. We demonstrate that a learning system based on these rewards can solve complex robotic tasks in simulation and in real world settings. In particular, we show that a real robotic arm can learn to grasp and lift and solve a Ball-in-a-Cup task from scratch, when only raw sensor streams are used for both controller input and in the auxiliary reward definition.

experiment, machine learning, reinforcement learning, (18 more...)

2005.07541

Country:

Europe > Germany > Baden-Württemberg > Freiburg (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > California > Santa Clara County > Mountain View (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)

Proxy Experience Replay: Federated Distillation for Distributed Reinforcement Learning

Cha, Han, Park, Jihong, Kim, Hyesung, Bennis, Mehdi, Kim, Seong-Lyun

Traditional distributed deep reinforcement learning (RL) commonly relies on exchanging the experience replay memory (RM) of each agent. Since the RM contains all state observations and action policy history, it may incur huge communication overhead while violating the privacy of each agent. Alternatively, this article presents a communication-efficient and privacy-preserving distributed RL framework, coined federated reinforcement distillation (FRD). In FRD, each agent exchanges its proxy experience replay memory (ProxRM), in which policies are locally averaged with respect to proxy states clustering actual states. To provide FRD design insights, we present ablation studies on the impact of ProxRM structures, neural network architectures, and communication intervals. Furthermore, we propose an improved version of FRD, coined mixup augmented FRD (MixFRD), in which ProxRM is interpolated using the mixup data augmentation algorithm. Simulations in a Cartpole environment validate the effectiveness of MixFRD in reducing the variance of mission completion time and communication cost, compared to the benchmark schemes, vanilla FRD, federated reinforcement learning (FRL), and policy distillation (PD).

artificial intelligence, machine learning, reinforcement learning, (15 more...)

2005.06105

Country:

Europe > Finland > Northern Ostrobothnia > Oulu (0.06)
Asia > South Korea > Seoul > Seoul (0.06)
Europe > Sweden > Stockholm > Stockholm (0.05)
(9 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.46)

Moerland, Thomas M., Deichler, Anna, Baldi, Simone, Broekens, Joost, Jonker, Catholijn M.

Think Too Fast Nor Too Slow: The Computational Trade-off Between Planning And Reinforcement Learning

Planning and reinforcement learning are two key approaches to sequential decision making. Multi-step approximate real-time dynamic programming, a recently successful algorithm class of which AlphaZero [Silver et al., 2018] is an example, combines both by nesting planning within a learning loop. However, the combination of planning and learning introduces a new question: how should we balance time spend on planning, learning and acting? The importance of this trade-off has not been explicitly studied before. We show that it is actually of key importance, with computational results indicating that we should neither plan too long nor too short. Conceptually, we identify a new spectrum of planning-learning algorithms which ranges from exhaustive search (long planning) to model-free RL (no planning), with optimal performance achieved midway.

budget, machine learning, reinforcement learning, (17 more...)

2005.07404

Country:

Europe > Netherlands > South Holland > Delft (0.05)
Europe > Netherlands > South Holland > Leiden (0.04)
Asia > Malaysia (0.04)
Asia > China (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Miralles-Pechuán, Luis, Jiménez, Fernando, Ponce, Hiram, Martínez-Villaseñor, Lourdes

A Deep Q-learning/genetic Algorithms Based Novel Methodology For Optimizing Covid-19 Pandemic Government Actions

arXiv.org Machine LearningMay-15-2020

Whenever countries are threatened by a pandemic, as is the case with the COVID-19 virus, governments should take the right actions to safeguard public health as well as to mitigate the negative effects on the economy. In this regard, there are two completely different approaches governments can take: a restrictive one, in which drastic measures such as self-isolation can seriously damage the economy, and a more liberal one, where more relaxed restrictions may put at risk a high percentage of the population. The optimal approach could be somewhere in between, and, in order to make the right decisions, it is necessary to accurately estimate the future effects of taking one or other measures. In this paper, we use the SEIR epidemiological model (Susceptible - Exposed - Infected - Recovered) for infectious diseases to represent the evolution of the virus COVID-19 over time in the population. To optimize the best sequences of actions governments can take, we propose a methodology with two approaches, one based on Deep Q-Learning and another one based on Genetic Algorithms. The sequences of actions (confinement, self-isolation, two-meter distance or not taking restrictions) are evaluated according to a reward system focused on meeting two objectives: firstly, getting few people infected so that hospitals are not overwhelmed with critical patients, and secondly, avoiding taking drastic measures for too long which can potentially cause serious damage to the economy. The conducted experiments prove that our methodology is a valid tool to discover actions governments can take to reduce the negative effects of a pandemic in both senses. We also prove that the approach based on Deep Q-Learning overcomes the one based on Genetic Algorithms for optimizing the sequences of actions.

evolutionary algorithm, machine learning, reinforcement learning, (20 more...)

arXiv.org Machine Learning

2005.07656

Country:

Europe (0.67)
Asia > China (0.14)
North America > Mexico (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)