AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

UAV Path Planning for Wireless Data Harvesting: A Deep Reinforcement Learning Approach

Bayerlein, Harald, Theile, Mirco, Caccamo, Marco, Gesbert, David

arXiv.org Machine LearningOct-26-2020

Autonomous deployment of unmanned aerial vehicles (UAVs) supporting next-generation communication networks requires efficient trajectory planning methods. We propose a new end-to-end reinforcement learning (RL) approach to UAV-enabled data collection from Internet of Things (IoT) devices in an urban environment. An autonomous drone is tasked with gathering data from distributed sensor nodes subject to limited flying time and obstacle avoidance. While previous approaches, learning and non-learning based, must perform expensive recomputations or relearn a behavior when important scenario parameters such as the number of sensors, sensor positions, or maximum flying time, change, we train a double deep Q-network (DDQN) with combined experience replay to learn a UAV control policy that generalizes over changing scenario parameters. By exploiting a multi-layer map of the environment fed through convolutional network layers to the agent, we show that our proposed network architecture enables the agent to make movement decisions for a variety of scenario parameters that balance the data collection goal with flight time efficiency and safety constraints. Considerable advantages in learning efficiency from using a map centered on the UAV's position over a non-centered map are also illustrated.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Machine Learning

2007.00544

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Europe > France (0.04)
Asia > Japan (0.04)

Genre: Research Report (0.50)

Industry: Information Technology (0.54)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Refactoring Policy for Compositional Generalizability using Self-Supervised Object Proposals

Mu, Tongzhou, Gu, Jiayuan, Jia, Zhiwei, Tang, Hao, Su, Hao

arXiv.org Artificial IntelligenceOct-26-2020

We study how to learn a policy with compositional generalizability. We propose a two-stage framework, which refactorizes a high-reward teacher policy into a generalizable student policy with strong inductive bias. Particularly, we implement an object-centric GNN-based student policy, whose input objects are learned from images through self-supervised learning. Empirically, we evaluate our approach on four difficult tasks that require compositional generalizability, and achieve superior performance compared to baselines.

detector, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2011.00971

Country:

North America > United States > California > San Diego County > San Diego (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (0.50)

Industry:

Education (0.68)
Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
Information Technology > Artificial Intelligence > Vision (0.93)
(2 more...)

Add feedback

Graph-based Reinforcement Learning for Active Learning in Real Time: An Application in Modeling River Networks

Jia, Xiaowei, Lin, Beiyu, Zwart, Jacob, Sadler, Jeffery, Appling, Alison, Oliver, Samantha, Read, Jordan

arXiv.org Artificial IntelligenceOct-26-2020

Effective training of advanced ML models requires large amounts of labeled data, which is often scarce in scientific problems given the substantial human labor and material cost to collect labeled data. This poses a challenge on determining when and where we should deploy measuring instruments (e.g., in-situ sensors) to collect labeled data efficiently. This problem differs from traditional pool-based active learning settings in that the labeling decisions have to be made immediately after we observe the input data that come in a time series. In this paper, we develop a real-time active learning method that uses the spatial and temporal contextual information to select representative query samples in a reinforcement learning framework. To reduce the need for large training data, we further propose to transfer the policy learned from simulation data which is generated by existing physics-based models. We demonstrate the effectiveness of the proposed method by predicting streamflow and water temperature in the Delaware River Basin given a limited budget for collecting labeled data. We further study the spatial and temporal distribution of selected samples to verify the ability of this method in selecting informative samples over space and time.

machine learning, reinforcement learning, river segment, (16 more...)

arXiv.org Artificial Intelligence

2010.14

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Texas (0.04)
North America > United States > Delaware > New Castle County > Wilmington (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)

Add feedback

MELD: Meta-Reinforcement Learning from Images via Latent State Models

Zhao, Tony Z., Nagabandi, Anusha, Rakelly, Kate, Finn, Chelsea, Levine, Sergey

arXiv.org Artificial IntelligenceOct-26-2020

Meta-reinforcement learning algorithms can enable autonomous agents, such as robots, to quickly acquire new behaviors by leveraging prior experience in a set of related training tasks. However, the onerous data requirements of meta-training compounded with the challenge of learning from sensory inputs such as images have made meta-RL challenging to apply to real robotic systems. Latent state models, which learn compact state representations from a sequence of observations, can accelerate representation learning from visual inputs. In this paper, we leverage the perspective of meta-learning as task inference to show that latent state models can \emph{also} perform meta-learning given an appropriately defined observation space. Building on this insight, we develop meta-RL with latent dynamics (MELD), an algorithm for meta-RL from images that performs inference in a latent state model to quickly acquire new skills given observations and rewards. MELD outperforms prior meta-RL methods on several simulated image-based robotic control problems, and enables a real WidowX robotic arm to insert an Ethernet cable into new locations given a sparse task completion signal after only $8$ hours of real world meta-training. To our knowledge, MELD is the first meta-RL algorithm trained in a real-world robotic control setting from images.

machine learning, meld, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2010.13957

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment (0.68)
Education (0.66)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

Forethought and Hindsight in Credit Assignment

Chelu, Veronica, Precup, Doina, van Hasselt, Hado

arXiv.org Artificial IntelligenceOct-26-2020

We address the problem of credit assignment in reinforcement learning and explore fundamental questions regarding the way in which an agent can best use additional computation to propagate new information, by planning with internal models of the world to improve its predictions. Particularly, we work to understand the gains and peculiarities of planning employed as forethought via forward models or as hindsight operating with backward models. We establish the relative merits, limitations and complementary properties of both planning mechanisms in carefully constructed scenarios. Further, we investigate the best use of models in planning, primarily focusing on the selection of states in which predictions should be (re)- evaluated. Lastly, we discuss the issue of model estimation and highlight a spectrum of methods that stretch from explicit environment-dynamics predictors to more abstract planner-aware models.

backward model, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2010.13685

Country:

North America > Canada > Quebec > Montreal (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
(12 more...)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.93)

Add feedback

Energy and Service-priority aware Trajectory Design for UAV-BSs using Double Q-Learning

Hoseini, Sayed Amir, Bokani, Ayub, Hassan, Jahan, Salehi, Shavbo, Kanhere, Salil S.

arXiv.org Artificial IntelligenceOct-26-2020

Next-generation mobile networks have proposed the integration of Unmanned Aerial Vehicles (UAVs) as aerial base stations (UAV-BS) to serve ground nodes. Despite having advantages of using UAV-BSs, their dependence on the on-board, limited-capacity battery hinders their service continuity. Shorter trajectories can save flying energy, however, UAV-BSs must also serve nodes based on their service priority since nodes' service requirements are not always the same. In this paper, we present an energy-efficient trajectory optimization for a UAV assisted IoT system in which the UAV-BS considers the IoT nodes' service priorities in making its movement decisions. We solve the trajectory optimization problem using Double Q-Learning algorithm. Simulation results reveal that the Q-Learning based optimized trajectory outperforms a benchmark algorithm, namely Greedily-served algorithm, in terms of reducing the average energy consumption of the UAV-BS as well as the service delay for high priority nodes.

machine learning, node, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2010.13346

Country:

Oceania > Australia > Queensland (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(4 more...)

Genre: Research Report (0.64)

Industry:

Information Technology > Robotics & Automation (0.48)
Aerospace & Defense > Aircraft (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.67)

Add feedback

Understanding Reinforcement Learning Hands-On: The Bellman Equation pt.1

#artificialintelligenceOct-25-2020, 19:10:05 GMT

Welcome to the fifth entry on a series on Reinforcement Learning. In the previous article, we presented the MDP Framework for describing complex environments. This allowed us to create a more robust and diverse scenario for the basic Multi-Armed Bandits problem, which we called the Casinos Environment. We then implemented this scenario using OpenAI's gym, and made a simple agent that acted randomly to showcase how an interaction is realized under the MDP Framework. Today, we're going to focus back on the agents, and show a way in which we can describe an agent's behavior in complex scenarios, where past actions determine future rewards.

data mining, machine learning, reinforcement learning, (19 more...)

#artificialintelligence

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.60)

Add feedback

Can Animals Help Us Build Better AI?

#artificialintelligenceOct-25-2020, 10:40:18 GMT

Machine learning has been making plenty of headlines in the past few years. Rightfully so, even though headlines tend to oversell. Advances in computing power, algorithmic complexity, data handling capacities, and models of learning mean that machine learning/AI is increasingly being used in many fields. In previous posts, I have written about machine learning/AI in general science and art, but also more specifically in (warning, link fest) historical research, genetic enhancement, mental health, aging research (including the development of'aging clocks'), video game ecology, Hollywood, astrobiology, epidemiology, stock markets, and the job market. Plenty of AI to go around, it seems.

artificial intelligence, machine learning, reinforcement learning, (7 more...)

#artificialintelligence

Industry:

Health & Medicine (0.57)
Leisure & Entertainment > Games (0.37)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.34)

Add feedback

Robust Hierarchical Planning with Policy Delegation

Lai, Tin, Morere, Philippe

arXiv.org Artificial IntelligenceOct-25-2020

We propose a novel framework and algorithm for hierarchical planning based on the principle of delegation. This framework, the Markov Intent Process, features a collection of skills which are each specialised to perform a single task well. Skills are aware of their intended effects and are able to analyse planning goals to delegate planning to the best-suited skill. This principle dynamically creates a hierarchy of plans, in which each skill plans for sub-goals for which it is specialised. The proposed planning method features on-demand execution---skill policies are only evaluated when needed. Plans are only generated at the highest level, then expanded and optimised when the latest state information is available. The high-level plan retains the initial planning intent and previously computed skills, effectively reducing the computation needed to adapt to environmental changes. We show this planning approach is experimentally very competitive to classic planning and reinforcement learning techniques on a variety of domains, both in terms of solution length and planning time.

noise, planning & scheduling, upstream oil & gas, (20 more...)

arXiv.org Artificial Intelligence

2010.13033

Genre: Research Report (0.50)

Industry:

Energy > Oil & Gas > Upstream (0.93)
Materials > Metals & Mining (0.68)
Materials > Chemicals (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)

Add feedback

Model-based Reinforcement Learning for Semi-Markov Decision Processes with Neural ODEs

Du, Jianzhun, Futoma, Joseph, Doshi-Velez, Finale

arXiv.org Machine LearningOct-25-2020

We present two elegant solutions for modeling continuous-time dynamics, in a novel model-based reinforcement learning (RL) framework for semi-Markov decision processes (SMDPs), using neural ordinary differential equations (ODEs). Our models accurately characterize continuous-time dynamics and enable us to develop high-performing policies using a small amount of data. We also develop a model-based approach for optimizing time schedules to reduce interaction rates with the environment while maintaining the near-optimal performance, which is not possible for model-free methods. We experimentally demonstrate the efficacy of our methods across various continuous-time domains.

arxiv preprint arxiv, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

2006.1621

Country:

North America > United States > California > Alameda County > Berkeley (0.14)
North America > United States > North Carolina (0.04)
North America > United States > New York (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology > HIV (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.70)

Add feedback