AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Online Model Selection for Reinforcement Learning with Function Approximation

Lee, Jonathan N., Pacchiano, Aldo, Muthukumar, Vidya, Kong, Weihao, Brunskill, Emma

arXiv.org Machine LearningNov-19-2020

Deep reinforcement learning has achieved impressive successes yet often requires a very large amount of interaction data. This result is perhaps unsurprising, as using complicated function approximation often requires more data to fit, and early theoretical results on linear Markov decision processes provide regret bounds that scale with the dimension of the linear approximation. Ideally, we would like to automatically identify the minimal dimension of the approximation that is sufficient to encode an optimal policy. Towards this end, we consider the problem of model selection in RL with function approximation, given a set of candidate RL algorithms with known regret guarantees. The learner's goal is to adapt to the complexity of the optimal algorithm without knowing it \textit{a priori}. We present a meta-algorithm that successively rejects increasingly complex models using a simple statistical test. Given at least one candidate that satisfies realizability, we prove the meta-algorithm adapts to the optimal complexity with $\tilde{O}(L^{5/6} T^{2/3})$ regret compared to the optimal candidate's $\tilde{O}(\sqrt T)$ regret, where $T$ is the number of episodes and $L$ is the number of algorithms. The dimension and horizon dependencies remain optimal with respect to the best candidate, and our meta-algorithmic approach is flexible to incorporate multiple candidate algorithms and models. Finally, we show that the meta-algorithm automatically admits significantly improved instance-dependent regret bounds that depend on the gaps between the maximal values attainable by the candidates.

algorithm, online model selection, probability, (11 more...)

arXiv.org Machine Learning

2011.0975

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.81)

Add feedback

Teaching KNIME to Play Tic-Tac-Toe

#artificialintelligenceNov-18-2020, 08:26:09 GMT

In this blog post I want to introduce some basic concepts of reinforcement learning, some important terminology, and show a simple use case where I create a game playing AI in KNIME Analytics Platform. After reading this, I hope you'll have a better understanding of the usefulness of reinforcement learning, as well as some key vocabulary to facilitate learning more. You may have heard of Reinforcement Learning (RL) being used to train robots to walk or gently pick up objects; or perhaps you may have heard of it's uses in the discovery of new chemical compounds for medical use. It's even being applied to regular vehicle and network traffics! Reinforcement learning is an area of Machine Learning and has become a broad field of study with many different algorithmic frameworks.

agent, reinforcement learning, tic-tac-toe, (10 more...)

#artificialintelligence

Country:

North America > United States > Texas > Travis County > Austin (0.05)
North America > United States > Michigan > Wayne County > Detroit (0.05)

Industry: Leisure & Entertainment > Games > Tic-Tac-Toe (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Create a Custom Deep Reinforcement Learning Environment in UE4

#artificialintelligenceNov-18-2020, 08:25:17 GMT

While the scope of reinforcement learning (RL) is likely to soon extend far beyond computer simulation, today the main location for training RL agents is within the digital environment. In the world of artificial intelligence, simulators are often the environments in which an algorithm functions. For humans, we are born directly into our simulator and it requires no effort on our part to go on functioning. We call this simulator the universe and it exists whether we believe in it or not. Similarly, the laws of physics apply whether you acknowledge them or not. They require no effort or acquiescence on our part.

artificial intelligence, node, reinforcement learning, (15 more...)

#artificialintelligence

Industry:

Energy > Oil & Gas (0.69)
Education (0.65)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

DeepMind open-sources Lab2D, a grid-based environment for reinforcement learning research

#artificialintelligenceNov-18-2020, 01:30:26 GMT

DeepMind this week open-sourced Lab2D, a software system designed to support the creation of 2D environments for AI and machine learning research. The Alphabet subsidiary says that Lab2D was built with the needs of deep reinforcement learning researchers in mind, but that it can be useful beyond that particular subfield of machine learning. The DeepMind team behind Lab2D makes the case that 2D environments are inherently easier to understand than 3D ones at little loss of expressiveness. Even a game as simple as Pong, which essentially consists of three moving rectangles on a black background, can capture something fundamental about the real game of table tennis, the researchers assert. This abstraction ostensibly makes it easier to capture the essence of problems and concepts in AI. "Rich complexity along numerous dimensions can be studied in 2D just as readily as in 3D, if not more so … In addition, 2D worlds are significantly less resource-intensive to run, and typically do not require any specialized hardware (like GPUs) to attain reasonable performance," the researchers continued in their paper describing Lab2D. "2D worlds have been successfully used to study problems as diverse as social complexity, navigation, imperfect information, abstract reasoning, exploration, and many more."

information, lab2d, reinforcement, (5 more...)

#artificialintelligence

Industry:

Leisure & Entertainment > Sports (0.57)
Leisure & Entertainment > Games (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)

Add feedback

Is Independent Learning All You Need in the StarCraft Multi-Agent Challenge?

de Witt, Christian Schroeder, Gupta, Tarun, Makoviichuk, Denys, Makoviychuk, Viktor, Torr, Philip H. S., Sun, Mingfei, Whiteson, Shimon

arXiv.org Artificial IntelligenceNov-18-2020

Most recently developed approaches to cooperative multi-agent reinforcement learning in the \emph{centralized training with decentralized execution} setting involve estimating a centralized, joint value function. In this paper, we demonstrate that, despite its various theoretical shortcomings, Independent PPO (IPPO), a form of independent learning in which each agent simply estimates its local value function, can perform just as well as or better than state-of-the-art joint learning approaches on popular multi-agent benchmark suite SMAC with little hyperparameter tuning. We also compare IPPO to several variants; the results suggest that IPPO's strong performance may be due to its robustness to some forms of environment non-stationarity.

agent, arxiv, ippo, (14 more...)

arXiv.org Artificial Intelligence

2011.09533

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.28)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Leisure & Entertainment > Games > Computer Games (0.41)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

Using Unity to Help Solve Intelligence

Ward, Tom, Bolt, Andrew, Hemmings, Nik, Carter, Simon, Sanchez, Manuel, Barreira, Ricardo, Noury, Seb, Anderson, Keith, Lemmon, Jay, Coe, Jonathan, Trochim, Piotr, Handley, Tom, Bolton, Adrian

arXiv.org Artificial IntelligenceNov-18-2020

In the pursuit of artificial general intelligence, our most significant measurement of progress is an agent's ability to achieve goals in a wide range of environments. Existing platforms for constructing such environments are typically constrained by the technologies they are founded on, and are therefore only able to provide a subset of scenarios necessary to evaluate progress. To overcome these shortcomings, we present our use of Unity, a widely recognized and comprehensive game engine, to create more diverse, complex, virtual simulations. We describe the concepts and components developed to simplify the authoring of these environments, intended for use predominantly in the field of reinforcement learning. We also introduce a practical approach to packaging and re-distributing environments in a way that attempts to improve the robustness and reproducibility of experiment results. To illustrate the versatility of our use of Unity compared to other solutions, we highlight environments already created using our approach from published papers. We hope that others can draw inspiration from how we adapted Unity to our needs, and anticipate increasingly varied and complex environments to emerge from our approach as familiarity grows.

agent, arxiv preprint arxiv, simulation, (12 more...)

arXiv.org Artificial Intelligence

2011.09294

Country:

Europe > Sweden > Skåne County > Malmö (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Greece (0.04)

Genre: Research Report (0.52)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Information Technology > Software (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback

Indoor Point-to-Point Navigation with Deep Reinforcement Learning and Ultra-wideband

Sutera, Enrico, Mazzia, Vittorio, Salvetti, Francesco, Fantin, Giovanni, Chiaberge, Marcello

arXiv.org Artificial IntelligenceNov-18-2020

Indoor autonomous navigation requires a precise and accurate localization system able to guide robots through cluttered, unstructured and dynamic environments. Ultra-wideband (UWB) technology, as an indoor positioning system, offers precise localization and tracking, but moving obstacles and non-line-of-sight occurrences can generate noisy and unreliable signals. That, combined with sensors noise, unmodeled dynamics and environment changes can result in a failure of the guidance algorithm of the robot. We demonstrate how a power-efficient and low computational cost point-to-point local planner, learnt with deep reinforcement learning (RL), combined with UWB localization technology can constitute a robust and resilient to noise short-range guidance system complete solution. We trained the RL agent on a simulated environment that encapsulates the robot dynamics and task constraints and then, we tested the learnt point-to-point navigation policies in a real setting with more than two-hundred experimental evaluations using UWB localization. Our results show that the computational efficient end-to-end policy learnt in plain simulation, that directly maps low-range sensors signals to robot controls, deployed in combination with ultra-wideband noisy localization in a real environment, can provide a robust, scalable and at-the-edge low-cost navigation system solution.

agent, navigation, robot, (14 more...)

arXiv.org Artificial Intelligence

2011.09241

Country:

Europe > Italy > Piedmont > Turin Province > Turin (0.14)
North America > United States > California (0.04)

Genre: Research Report > New Finding (0.68)

Industry:

Energy (0.68)
Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Machine Learning

#artificialintelligenceNov-17-2020, 04:55:29 GMT

In this era of big data, there is an increasing need to develop and deploy algorithms that can analyze and identify connections in that data. Using machine learning (a subset of artificial intelligence) it is now possible to create computer systems that automatically improve with experience. This technology has numerous real-world applications including robotic control, data mining, autonomous navigation, and bioinformatics. This course features classroom videos and assignments adapted from the CS229 graduate course as delivered on-campus at Stanford in Autumn 2018 and Autumn 2019. In order to make the content and workload more manageable for working professionals, the course has been split into two parts, XCS229i: Machine Learning and XCS229ii: Machine Learning Strategy and Intro to Reinforcement Learning.

learning, machine learning, machine learning strategy, (7 more...)

#artificialintelligence

Country: North America > United States > California > Santa Clara County > Palo Alto (0.42)

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Industry:

Education > Educational Setting > Online (0.59)
Education > Educational Technology > Educational Software > Computer Based Training (0.37)

Technology:

Information Technology > Enterprise Applications > Human Resources > Learning Management (0.37)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.31)

Add feedback

Combining Reinforcement Learning with Model Predictive Control for On-Ramp Merging

Lubars, Joseph, Gupta, Harsh, Raja, Adnan, Srikant, R., Li, Liyun, Wu, Xinzhou

arXiv.org Artificial IntelligenceNov-17-2020

We consider the problem of designing an algorithm to allow a car to autonomously merge on to a highway from an on-ramp. Two broad classes of techniques have been proposed to solve motion planning problems in autonomous driving: Model Predictive Control (MPC) and Reinforcement Learning (RL). In this paper, we first establish the strengths and weaknesses of state-of-the-art MPC and RL-based techniques through simulations. We show that the performance of the RL agent is worse than that of the MPC solution from the perspective of safety and robustness to out-of-distribution traffic patterns, i.e., traffic patterns which were not seen by the RL agent during training. On the other hand, the performance of the RL agent is better than that of the MPC solution when it comes to efficiency and passenger comfort. We subsequently present an algorithm which blends the model-free RL agent with the MPC solution and show that it provides better trade-offs between all metrics -- passenger comfort, efficiency, crash rate and robustness.

agent, artificial intelligence, ground transportation, (17 more...)

arXiv.org Artificial Intelligence

2011.08484

Country: North America > United States (0.14)

Genre: Research Report (0.64)

Industry:

Transportation > Ground > Road (1.00)
Energy > Oil & Gas (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Explaining Conditions for Reinforcement Learning Behaviors from Real and Imagined Data

Acharya, Aastha, Russell, Rebecca, Ahmed, Nisar R.

arXiv.org Artificial IntelligenceNov-17-2020

The deployment of reinforcement learning (RL) in the real world comes with challenges in calibrating user trust and expectations. As a step toward developing RL systems that are able to communicate their competencies, we present a method of generating human-interpretable abstract behavior models that identify the experiential conditions leading to different task execution strategies and outcomes. Our approach consists of extracting experiential features from state representations, abstracting strategy descriptors from trajectories, and training an interpretable decision tree that identifies the conditions most predictive of different RL behaviors. We demonstrate our method on trajectory data generated from interactions with the environment and on imagined trajectory data that comes from a trained probabilistic world model in a model-based RL setting.

agent, experiential feature, trajectory data, (15 more...)

arXiv.org Artificial Intelligence

2011.09004

Country:

North America > United States > Colorado > Boulder County > Boulder (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback