AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Python Machine Learning - Third Edition - Free PDF Download

#artificialintelligenceApr-15-2020, 08:30:41 GMT

Revised and expanded for TensorFlow 2, GANs, and reinforcement learning. Python Machine Learning, Third Edition is a comprehensive guide to machine learning and deep learning with Python. It acts as both a step-by-step tutorial, and a reference you'll keep coming back to as you build your machine learning systems. Packed with clear explanations, visualizations, and working examples, the book covers all the essential machine learning techniques in depth. While some books teach you only to follow instructions, with this machine learning book, Raschka and Mirjalili teach the principles behind machine learning, allowing you to build models and applications for yourself.

free pdf download, python machine learning, tensorflow 2, (1 more...)

#artificialintelligence

Genre:

Collection > Book (0.46)
Instructional Material (0.42)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.37)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.32)

Add feedback

Order Matters: Generating Progressive Explanations for Planning Tasks in Human-Robot Teaming

Zakershahrak, Mehrdad, Marpally, Shashank Rao, Sharma, Akshay, Gong, Ze, Zhang, Yu

arXiv.org Artificial IntelligenceApr-15-2020

Prior work on generating explanations has been focused on providing the rationale behind the robot's decision making. While these approaches provide the right explanations from the explainer's perspective, they fail to heed the cognitive requirement of understanding an explanation from the explainee's perspective. In this work, we set out to address this issue from a planning context by considering the order of information provided in an explanation, which is referred to as the progressiveness of explanations. Progressive explanations contribute to a better understanding by minimizing the cumulative cognitive effort required for understanding all the information in an explanation. As a result, such explanations are easier to understand. Given the sequential nature of communicating information, a general formulation based on goal-based Markov Decision Processes for generating progressive explanation is presented. The reward function of this MDP is learned via inverse reinforcement learning based on explanations that are provided by human subjects. Our method is evaluated in an escape-room domain. The results show that our progressive explanation generation method reduces the cognitive load over two baselines.

explanation, feature change, robot, (15 more...)

arXiv.org Artificial Intelligence

2004.07822

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Arizona (0.04)
Europe > United Kingdom > Wales (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Government > Regional Government > North America Government > United States Government (0.69)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (0.68)

Add feedback

Reinforcement Learning in a Physics-Inspired Semi-Markov Environment

Bellinger, Colin, Coles, Rory, Crowley, Mark, Tamblyn, Isaac

arXiv.org Artificial IntelligenceApr-15-2020

Reinforcement learning (RL) has been demonstrated to have great potential in many applications of scientific discovery and design. Recent work includes, for example, the design of new structures and compositions of molecules for therapeutic drugs. Much of the existing work related to the application of RL to scientific domains, however, assumes that the available state representation obeys the Markov property. For reasons associated with time, cost, sensor accuracy, and gaps in scientific knowledge, many scientific design and discovery problems do not satisfy the Markov property. Thus, something other than a Markov decision process (MDP) should be used to plan / find the optimal policy. In this paper, we present a physics-inspired semi-Markov RL environment, namely the phase change environment. In addition, we evaluate the performance of value-based RL algorithms for both MDPs and partially observable MDPs (POMDPs) on the proposed environment. Our results demonstrate deep recurrent Q-networks (DRQN) significantly outperform deep Q-networks (DQN), and that DRQNs benefit from training with hindsight experience replay. Implications for the use of semi-Markovian RL and POMDPs for scientific laboratories are also discussed.

agent, change environment, phase change environment, (15 more...)

arXiv.org Artificial Intelligence

2004.07333

Country:

North America > Canada > Ontario > National Capital Region > Ottawa (0.14)
North America > United States > District of Columbia > Washington (0.04)
North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

BabyAI++: Towards Grounded-Language Learning beyond Memorization

Cao, Tianshi, Wang, Jingkang, Zhang, Yining, Manivasagam, Sivabalan

arXiv.org Artificial IntelligenceApr-15-2020

Despite success in many real-world tasks (e.g., robotics), reinforcement learning (RL) agents still learn from tabula rasa when facing new and dynamic scenarios. By contrast, humans can offload this burden through textual descriptions. Although recent works have shown the benefits of instructive texts in goal-conditioned RL, few have studied whether descriptive texts help agents to generalize across dynamic environments. To promote research in this direction, we introduce a new platform, BabyAI++, to generate various dynamic environments along with corresponding descriptive texts. Moreover, we benchmark several baselines inherited from the instruction following setting and develop a novel approach towards visually-grounded language learning on our platform. Extensive experiments show strong evidence that using descriptive texts improves the generalization of RL agents across environments with varied dynamics.

agent, descriptive text, image text, (17 more...)

arXiv.org Artificial Intelligence

2004.072

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > California > Los Angeles County > Long Beach (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Genre: Research Report (1.00)

Industry: Education > Curriculum > Subject-Specific Education (0.60)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.40)

Add feedback

ActionSpotter: Deep Reinforcement Learning Framework for Temporal Action Spotting in Videos

Vaudaux-Ruth, Guillaume, Chan-Hon-Tong, Adrien, Achard, Catherine

arXiv.org Artificial IntelligenceApr-15-2020

Summarizing video content is an important task in many applications. This task can be defined as the computation of the ordered list of actions present in a video. Such a list could be extracted using action detection algorithms. However, it is not necessary to determine the temporal boundaries of actions to know their existence. Moreover, localizing precise boundaries usually requires dense video analysis to be effective. In this work, we propose to directly compute this ordered list by sparsely browsing the video and selecting one frame per action instance, task known as action spotting in literature. To do this, we propose ActionSpotter, a spotting algorithm that takes advantage of Deep Reinforcement Learning to efficiently spot actions while adapting its video browsing speed, without additional supervision. Experiments performed on datasets THUMOS14 and ActivityNet show that our framework outperforms state of the art detection methods. In particular, the spotting mean Average Precision on THUMOS14 is significantly improved from 59.7% to 65.6% while skipping 23% of video.

actionspotter, spot frame, video, (13 more...)

arXiv.org Artificial Intelligence

2004.06971

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Genre: Research Report (0.41)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Adversarial Evaluation of Autonomous Vehicles in Lane-Change Scenarios

Chen, Baiming, Li, Liang

arXiv.org Machine LearningApr-14-2020

Autonomous vehicles must be comprehensively evaluated before deployed in cities and highways. Current evaluation procedures lack the abilities of weakness-aiming and evolving, thus they could hardly generate adversarial environments for autonomous vehicles, leading to insufficient challenges. To overcome the shortage of static evaluation methods, this paper proposes a novel method to generate adversarial environments with deep reinforcement learning, and to cluster them with a nonparametric Bayesian method. As a representative task of autonomous driving, lane-change is used to demonstrate the superiority of the proposed method. First, two lane-change models are separately developed by a rule-based method and a learning-based method, waiting for evaluation and comparison. Next, adversarial environments are generated by training surrounding interactive vehicles with deep reinforcement learning for local optimal ensembles. Then, a nonparametric Bayesian approach is utilized to cluster the adversarial policies of the interactive vehicles. Finally, the adversarial environment patterns are illustrated and the performances of two lane-change models are evaluated and compared. The simulation results indicate that both models perform significantly worse in adversarial environments than in naturalistic environments, with plenty of weaknesses successfully extracted in a few tests.

ego vehicle, scenario, vehicle, (16 more...)

arXiv.org Machine Learning

2004.06531

Country:

Asia > China > Beijing > Beijing (0.04)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Promising Solution (0.34)

Industry:

Transportation > Ground > Road (1.00)
Government (1.00)
Automobiles & Trucks (0.89)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.54)

Add feedback

Sequential Batch Learning in Finite-Action Linear Contextual Bandits

Han, Yanjun, Zhou, Zhengqing, Zhou, Zhengyuan, Blanchet, Jose, Glynn, Peter W., Ye, Yinyu

arXiv.org Machine LearningApr-14-2020

We study the sequential batch learning problem in linear contextual bandits with finite action sets, where the decision maker is constrained to split incoming individuals into (at most) a fixed number of batches and can only observe outcomes for the individuals within a batch at the batch's end. Compared to both standard online contextual bandits learning or offline policy learning in contexutal bandits, this sequential batch learning problem provides a finer-grained formulation of many personalized sequential decision making problems in practical applications, including medical treatment in clinical trials, product recommendation in e-commerce and adaptive experiment design in crowdsourcing. We study two settings of the problem: one where the contexts are arbitrarily generated and the other where the contexts are \textit{iid} drawn from some distribution. In each setting, we establish a regret lower bound and provide an algorithm, whose regret upper bound nearly matches the lower bound. As an important insight revealed therefrom, in the former setting, we show that the number of batches required to achieve the fully online performance is polynomial in the time horizon, while for the latter setting, a pure-exploitation algorithm with a judicious batch partition scheme achieves the fully online performance even when the number of batches is less than logarithmic in the time horizon. Together, our results provide a near-complete characterization of sequential decision making in linear contextual bandits when batch constraints are present.

algorithm, batch, sequential batch, (13 more...)

arXiv.org Machine Learning

2004.06321

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Research Report > New Finding (0.54)
Research Report > Experimental Study (0.34)

Industry:

Information Technology > Services (0.48)
Education > Focused Education > Special Education (0.44)
Health & Medicine > Pharmaceuticals & Biotechnology (0.34)

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Add feedback

A non-cooperative meta-modeling game for automated third-party calibrating, validating, and falsifying constitutive laws with parallelized adversarial attacks

Wang, Kun, Sun, WaiChing, Du, Qiang

arXiv.org Artificial IntelligenceApr-13-2020

The evaluation of constitutive models, especially for high-risk and high-regret engineering applications, requires efficient and rigorous third-party calibration, validation and falsification. While there are numerous efforts to develop paradigms and standard procedures to validate models, difficulties may arise due to the sequential, manual and often biased nature of the commonly adopted calibration and validation processes, thus slowing down data collections, hampering the progress towards discovering new physics, increasing expenses and possibly leading to misinterpretations of the credibility and application ranges of proposed models. This work attempts to introduce concepts from game theory and machine learning techniques to overcome many of these existing difficulties. We introduce an automated meta-modeling game where two competing AI agents systematically generate experimental data to calibrate a given constitutive model and to explore its weakness, in order to improve experiment design and model robustness through competition. The two agents automatically search for the Nash equilibrium of the meta-modeling game in an adversarial reinforcement learning framework without human intervention. By capturing all possible design options of the laboratory experiments into a single decision tree, we recast the design of experiments as a game of combinatorial moves that can be resolved through deep reinforcement learning by the two competing players. Our adversarial framework emulates idealized scientific collaborations and competitions among researchers to achieve a better understanding of the application range of the learned material laws and prevent misinterpretations caused by conventional AI-based third-party validation.

computer game, game score, upstream oil & gas, (24 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.cma.2020.113514

2004.09392

Country: North America > United States > Wisconsin (0.14)

Genre: Research Report (1.00)

Industry:

Government > Military (1.00)
Energy > Oil & Gas > Upstream (1.00)
Leisure & Entertainment > Games > Computer Games (0.68)
Government > Regional Government > North America Government > United States Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Demonstration of Issues with Value-Based Multiobjective Reinforcement Learning Under Stochastic State Transitions

Vamplew, Peter, Foale, Cameron, Dazeley, Richard

arXiv.org Machine LearningApr-13-2020

We report a previously unidentified issue with model-free, value-based approaches to multiobjective reinforcement learning in the context of environments with stochastic state transitions. An example multiobjective Markov Decision Process (MOMDP) is used to demonstrate that under such conditions these approaches may be unable to discover the policy which maximises the Scalarised Expected Return, and in fact may converge to a Pareto-dominated solution. We discuss several alternative methods which may be more suitable for maximising SER in MOMDPs with stochastic transitions.

agent, momdp, reinforcement, (11 more...)

arXiv.org Machine Learning

2004.06277

Country:

Oceania > Australia > Victoria (0.05)
North America > United States > Oklahoma > Payne County > Cushing (0.04)
North America > United States > Arizona (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Distributed Learning: Sequential Decision Making in Resource-Constrained Environments

Madhushani, Udari, Leonard, Naomi Ehrich

arXiv.org Machine LearningApr-13-2020

We study cost-effective communication strategies that can be used to improve the performance of distributed learning systems in resource-constrained environments. For distributed learning in sequential decision making, we propose a new cost-effective partial communication protocol. We illustrate that with this protocol the group obtains the same order of performance that it obtains with full communication. Moreover, we prove that under the proposed partial communication protocol the communication cost is $O(\log T)$, where $T$ is the time horizon of the decision-making process. This improves significantly on protocols with full communication, which incur a communication cost that is $O(T)$. We validate our theoretical results using numerical simulations.

agent, communication, communication cost, (15 more...)

arXiv.org Machine Learning

2004.06171

Country:

North America > United States (0.04)
Africa > South Sudan > Equatoria > Central Equatoria > Juba (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.71)

Add feedback