AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

K-spin Hamiltonian for quantum-resolvable Markov decision processes

Jones, Eric B., Graf, Peter, Kapit, Eliot, Jones, Wesley

arXiv.org Artificial IntelligenceApr-13-2020

The Markov decision process is the mathematical formalization underlying the modern field of reinforcement learning when transition and reward functions are unknown. We derive a pseudo-Boolean cost function that is equivalent to a K-spin Hamiltonian representation of the discrete, finite, discounted Markov decision process with infinite horizon. This K-spin Hamiltonian furnishes a starting point from which to solve for an optimal policy using heuristic quantum algorithms such as adiabatic quantum annealing and the quantum approximate optimization algorithm on near-term quantum hardware. In proving that the variational minimization of our Hamiltonian is equivalent to the Bellman optimality condition we establish an interesting analogy with classical field theory. Along with proof-of-concept calculations to corroborate our formulation by simulated and quantum annealing against classical Q-Learning, we analyze the scaling of physical resources required to solve our Hamiltonian on quantum hardware.

hamiltonian, k-spin hamiltonian, optimal policy, (14 more...)

arXiv.org Artificial Intelligence

2004.0604

Country:

North America > United States > Colorado > Jefferson County > Golden (0.14)
North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
North America > United States > California > Los Angeles County > Santa Monica (0.04)

Genre: Research Report (0.40)

Industry: Energy (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

A Deep Reinforcement Learning Framework for Continuous Intraday Market Bidding

Boukas, Ioannis, Ernst, Damien, Théate, Thibaut, Bolland, Adrien, Huynen, Alexandre, Buchwald, Martin, Wynants, Christelle, Cornélusse, Bertrand

arXiv.org Artificial IntelligenceApr-13-2020

The large integration of variable energy resources is expected to shift a large part of the energy exchanges closer to real-time, where more accurate forecasts are available. In this context, the short-term electricity markets and in particular the intraday market are considered a suitable trading floor for these exchanges to occur. A key component for the successful renewable energy sources integration is the usage of energy storage. In this paper, we propose a novel modelling framework for the strategic participation of energy storage in the European continuous intraday market where exchanges occur through a centralized order book. The goal of the storage device operator is the maximization of the profits received over the entire trading horizon, while taking into account the operational constraints of the unit. The sequential decision-making problem of trading in the intraday market is modelled as a Markov Decision Process. An asynchronous distributed version of the fitted Q iteration algorithm is chosen for solving this problem due to its sample efficiency. The large and variable number of the existing orders in the order book motivates the use of high-level actions and an alternative state representation. Historical data are used for the generation of a large number of artificial trajectories in order to address exploration issues during the learning process. The resulting policy is back-tested and compared against a benchmark strategy that is the current industrial standard. Results indicate that the agent converges to a policy that achieves in average higher total revenues than the benchmark strategy.

agent, deep reinforcement learning framework, storage device, (12 more...)

arXiv.org Artificial Intelligence

2004.0594

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
(3 more...)

Genre: Research Report (0.63)

Industry:

Energy > Power Industry (1.00)
Banking & Finance > Trading (1.00)
Energy > Renewable > Hydroelectric (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.87)

Add feedback

Regret Bounds for Kernel-Based Reinforcement Learning

Domingues, Omar Darwiche, Ménard, Pierre, Pirotta, Matteo, Kaufmann, Emilie, Valko, Michal

arXiv.org Machine LearningApr-12-2020

We consider the exploration-exploitation dilemma in finite-horizon reinforcement learning problems whose state-action space is endowed with a metric. We introduce Kernel-UCBVI, a model-based optimistic algorithm that leverages the smoothness of the MDP and a non-parametric kernel estimator of the rewards and transitions to efficiently balance exploration and exploitation. Unlike existing approaches with regret guarantees, it does not use any kind of partitioning of the state-action space. For problems with $K$ episodes and horizon $H$, we provide a regret bound of $O\left( H^3 K^{\max\left(\frac{1}{2}, \frac{2d}{2d+1}\right)}\right)$, where $d$ is the covering dimension of the joint state-action space. We empirically validate Kernel-UCBVI on discrete and continuous MDPs.

algorithm, artificial intelligence, upstream oil & gas, (20 more...)

arXiv.org Machine Learning

2004.05599

Country:

Europe > Hungary (0.14)
North America > United States (0.14)

Genre: Research Report (0.40)

Industry: Energy > Oil & Gas > Upstream (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.86)

Add feedback

5 Hacks to speed up your AI Training (Reinforcement Learning with Unity ML-Agents)

#artificialintelligenceApr-11-2020, 10:14:15 GMT

Easy tips to train your Reinforcement Learning AI with Unity3D using the ML-Agents Framework. My name is Sebastian Schuchmann, AI enthusiast from Germany and we are going to cover simple, beginner-friendly ways to improve your Machine Learning process. The Algorithm used is called PPO and was developed by OpenAI (founded by Elon Musk). After watching this video you will hopefully be able to train an Artificial Intelligence to crack your favorite game. I am very curious about what you guys will create!

ai training, reinforcement learning, unity ml-agent, (1 more...)

#artificialintelligence

Country: Europe > Germany (0.33)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.33)

Add feedback

Artificial Intelligence: Reinforcement Learning in Python

#artificialintelligenceApr-11-2020, 10:10:42 GMT

Created by Lazy Programmer Inc. English [Auto-generated], Portuguese [Auto-generated], 1 more Created by Lazy Programmer Inc. When people talk about artificial intelligence, they usually don't mean supervised and unsupervised machine learning. These tasks are pretty trivial compared to what we think of AIs doing - playing chess and Go, driving cars, and beating video games at a superhuman level. Reinforcement learning has recently become popular for doing all of that and more. Much like deep learning, a lot of the theory was discovered in the 70s and 80s but it hasn't been until recently that we've been able to observe first hand the amazing results that are possible.

artificial intelligence, learning, reinforcement learning, (7 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (0.33)

Industry:

Leisure & Entertainment > Games (0.94)
Education > Educational Setting > Online (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.39)

Add feedback

Certified Adversarial Robustness for Deep Reinforcement Learning

Everett, Michael, Lutjens, Bjorn, How, Jonathan P.

arXiv.org Machine LearningApr-11-2020

Deep Neural Network-based systems are now the state-of-the-art in many robotics tasks, but their application in safety-critical domains remains dangerous without formal guarantees on network robustness. Small perturbations to sensor inputs (from noise or adversarial examples) are often enough to change network-based decisions, which was recently shown to cause an autonomous vehicle to swerve into another lane. In light of these dangers, numerous algorithms have been developed as defensive mechanisms from these adversarial inputs, some of which provide formal robustness guarantees or certificates. This work leverages research on certified adversarial robustness to develop an online certified defense for deep reinforcement learning algorithms. The proposed defense computes guaranteed lower bounds on state-action values during execution to identify and choose a robust action under a worst-case deviation in input space due to possible adversaries or noise. The approach is demonstrated on a Deep Q-Network policy and is shown to increase robustness to noise and adversaries in pedestrian collision avoidance scenarios and a classic control task. This work extends our previous paper with new performance guarantees, expanded results aggregated across more scenarios, an extension into scenarios with adversarial behavior, comparisons with a more computationally expensive method, and visualizations that provide intuition about the robustness algorithm.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

2004.06496

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > Canada > Ontario > Toronto (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
(7 more...)

Genre: Personal (0.93)

Industry:

Information Technology > Security & Privacy (0.47)
Transportation (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Meta-Learning in Neural Networks: A Survey

Hospedales, Timothy, Antoniou, Antreas, Micaelli, Paul, Storkey, Amos

arXiv.org Machine LearningApr-11-2020

The field of meta-learning, or learning-to-learn, has seen a dramatic rise in interest in recent years. Contrary to conventional approaches to AI where a given task is solved from scratch using a fixed learning algorithm, meta-learning aims to improve the learning algorithm itself, given the experience of multiple learning episodes. This paradigm provides an opportunity to tackle many of the conventional challenges of deep learning, including data and computation bottlenecks, as well as the fundamental issue of generalization. In this survey we describe the contemporary meta-learning landscape. We first discuss definitions of meta-learning and position it with respect to related fields, such as transfer learning, multi-task learning, and hyperparameter optimization. We then propose a new taxonomy that provides a more comprehensive breakdown of the space of meta-learning methods today. We survey promising applications and successes of meta-learning including few-shot learning, reinforcement learning and architecture search. Finally, we discuss outstanding challenges and promising areas for future research.

learning, meta-learning, neurips, (13 more...)

arXiv.org Machine Learning

2004.05439

Country:

North America > United States (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Overview (1.00)
Instructional Material (0.67)

Industry:

Education (1.00)
Health & Medicine > Therapeutic Area (0.67)
Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

Reinforcement Learning via Reasoning from Demonstration

Torrey, Lisa

arXiv.org Artificial IntelligenceApr-11-2020

Demonstration is an appealing way for humans to provide assistance to reinforcement-learning agents. Most approaches in this area view demonstrations primarily as sources of behavioral bias. But in sparse-reward tasks, humans seem to treat demonstrations more as sources of causal knowledge. This paper proposes a framework for agents that benefit from demonstration in this human-inspired way. In this framework, agents develop causal models through observation, and reason from this knowledge to decompose tasks for effective reinforcement learning. Experimental results show that a basic implementation of Reasoning from Demonstration (RfD) is effective in a range of sparse-reward tasks.

agent, demonstration, objective, (16 more...)

arXiv.org Artificial Intelligence

2004.05512

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Panama (0.04)

Genre: Research Report > New Finding (0.34)

Industry:

Leisure & Entertainment > Games (0.69)
Transportation > Passenger (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Compress Data And Win Hutter Prize Worth Half A Million Euros

#artificialintelligenceApr-10-2020, 01:17:14 GMT

"Entities should not be multiplied unnecessarily" To incentivize the scientific community to focus on AGI, Marcus Hutter, one of the most prominent researchers of our generation, has renewed his decade-old prize by ten folds to half a million euros (500,000 €). The Hutter prize, named after Marcus Hutter, is given to those who can successfully create new benchmarks for lossless data compression. The data here is a dataset based on Wikipedia. Marcus Hutter, who now works at DeepMind as a senior research scientist, is famous for his work on reinforcement learning along with Juergen Schmidhuber. Dr Hutter proposed AIXI in 2000, which is a reinforcement learning agent that works in line with Occam's razor and sequential decision theory.

compression, dr hutter, hutter, (13 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.79)

Add feedback

Reinforcement Learning via Gaussian Processes with Neural Network Dual Kernels

Goumiri, Imène R., Priest, Benjamin W., Schneider, Michael D.

arXiv.org Machine LearningApr-10-2020

While deep neural networks (DNNs) and Gaussian Processes (GPs) are both popularly utilized to solve problems in reinforcement learning, both approaches feature undesirable drawbacks for challenging problems. DNNs learn complex nonlinear embeddings, but do not naturally quantify uncertainty and are often data-inefficient to train. GPs infer posterior distributions over functions, but popular kernels exhibit limited expressivity on complex and high-dimensional data. Fortunately, recently discovered conjugate and neural tangent kernel functions encode the behavior of overparameterized neural networks in the kernel domain. We demonstrate that these kernels can be efficiently applied to regression and reinforcement learning problems by analyzing a baseline case study. We apply GPs with neural network dual kernels to solve reinforcement learning tasks for the first time. We demonstrate, using the well-understood mountain-car problem, that GPs empowered with dual kernels perform at least as well as those using the conventional radial basis function kernel. We conjecture that by inheriting the probabilistic rigor of GPs and the powerful embedding properties of DNNs, GPs using NN dual kernels will empower future reinforcement learning models on difficult domains.

kernel, neural network, value function, (12 more...)

arXiv.org Machine Learning

2004.05198

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Alameda County > Livermore (0.04)
Europe > Ukraine > Kyiv Oblast > Kyiv (0.04)

Genre: Research Report (0.64)

Industry: Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback