AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Stock Market News NSDQ, NYSE, and AMEX Stock Market News, Market News Categories, Market Indicators

#artificialintelligenceSep-23-2019, 15:43:47 GMT

VBTansform 2019, was a largest AI conference (the AI event of the year) held at San Francisco, CA by VentureBeat Magazine. Accenture Chief Data Scientist, Dr. Ganapathi Pulipaka attended and joined 900 AI executives and practitioners (director-and-above C-Suite) from innovative brands with leading best practices and spoke along with other 120 speakers with disruptive emerging companies who presented more than 48 sessions. Several exhibitors from top tier brands like Accenture, Google, Verizon, IBM, Amazon, Cisco, Oracle, New York University, Microsoft, Uber, Data Robot, Intel, eBay, Johnson and Johnson, GE, Gap, Lyft, Etsy, Kohl's, New York Times, Amazon and many more brand speakers showcased their AI products and presented stories about real business results with their production strategies around the deployment with specialists in this area with practical lessons from their deployments and took the audience on a journey of disruptive AI technologies to keep an eye on. The sessions focused on six AI trends on natural language processing, smart speech, computer vision, Business AI integration, implementing AI across organization, IoT and AI at the Edge, intelligent RPA and automation. Reinforcement learning has been disruptive and the history of AI has showed that it took the gaming industry by storm.

amex stock market news, reinforcement learning, stock market news nsdq, (6 more...)

#artificialintelligence

Country:

North America > United States > California > San Francisco County > San Francisco (0.56)
North America > United States > New York (0.25)

Industry:

Information Technology (1.00)
Banking & Finance > Trading (1.00)
Aerospace & Defense (0.74)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.52)

Add feedback

r/MachineLearning - [P] SOTA Atari learning with Recurrent IQN

#artificialintelligenceSep-23-2019, 10:28:26 GMT

I've recently implemented a recurrent version of the IQN reinforcement learning algorithm, combining IQN/Rainbow/R2D2 features, which can reach state-of-the-art (In sample efficiency) results on the Atari benchmark. Any feedback is more than welcome.

machinelearning, recurrent iqn, sota atari

#artificialintelligence

Industry: Media > News (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.96)

Add feedback

Sensor-Augmented Neural Adaptive Bitrate Video Streaming on UAVs

Xiao, Xuedou, Wang, Wei, Chen, Taobin, Cao, Yang, Jiang, Tao, Zhang, Qian

arXiv.org Machine LearningSep-23-2019

Recent advances in unmanned aerial vehicle (UAV) technology have revolutionized a broad class of civil and military applications. However, the designs of wireless technologies that enable real-time streaming of high-definition video between UAVs and ground clients present a conundrum. Most existing adaptive bitrate (ABR) algorithms are not optimized for the air-to-ground links, which usually fluctuate dramatically due to the dynamic flight states of the UAV. In this paper, we present SA-ABR, a new sensor-augmented system that generates ABR video streaming algorithms with the assistance of various kinds of inherent sensor data that are used to pilot UAVs. By incorporating the inherent sensor data with network observations, SA-ABR trains a deep reinforcement learning (DRL) model to extract salient features from the flight state information and automatically learn an ABR algorithm to adapt to the varying UAV channel capacity through the training process. SA-ABR does not rely on any assumptions or models about UAV's flight states or the environment, but instead, it makes decisions by exploiting temporal properties of past throughput through the long short-term memory (LSTM) to adapt itself to a wide range of highly dynamic environments. We have implemented SA-ABR in a commercial UAV and evaluated it in the wild. We compare SA-ABR with a variety of existing state-of-the-art ABR algorithms, and the results show that our system outperforms the best known existing ABR algorithm by 21.4% in terms of the average quality of experience (QoE) reward.

algorithm, sensor data, throughput, (16 more...)

arXiv.org Machine Learning

1909.10914

Country:

Asia > China > Hong Kong (0.04)
Asia > China > Hubei Province > Wuhan (0.04)

Genre: Research Report > New Finding (0.88)

Industry: Information Technology (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Integrating independent and centralized multi-agent reinforcement learning for traffic signal network optimization

Zhang, Zhi, Yang, Jiachen, Zha, Hongyuan

arXiv.org Machine LearningSep-23-2019

Traffic congestion in metropolitan areas is a world-wide problem that can be ameliorated by traffic lights that respond dynamically to real-time conditions. Recent studies applying deep reinforcement learning (RL) to optimize single traffic lights have shown significant improvement over conventional control. However, optimization of global traffic condition over a large road network fundamentally is a cooperative multi-agent control problem, for which single-agent RL is not suitable due to environment non-stationarity and infeasibility of optimizing over an exponential joint-action space. Motivated by these challenges, we propose QCOMBO, a simple yet effective multi-agent reinforcement learning (MARL) algorithm that combines the advantages of independent and centralized learning. We ensure scalability by selecting actions from individually optimized utility functions, which are shaped to maximize global performance via a novel consistency regularization loss between individual utility and a global action-value function. Experiments on diverse road topologies and traffic flow conditions in the SUMO traffic simulator show competitive performance of QCOMBO versus recent state-of-the-art MARL algorithms. We further show that policies trained on small sub-networks can effectively generalize to larger networks under different traffic flow conditions, providing empirical evidence for the suitability of MARL for intelligent traffic control.

agent, algorithm, traffic light, (15 more...)

arXiv.org Machine Learning

1909.10651

Country:

North America > Canada > Ontario > Toronto (0.04)
Asia > Singapore (0.04)

Genre: Research Report (1.00)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.48)

Add feedback

Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Estimators for Reinforcement Learning

Farquhar, Gregory, Whiteson, Shimon, Foerster, Jakob

arXiv.org Machine LearningSep-23-2019

Gradient-based methods for optimisation of objectives in stochastic settings with unknown or intractable dynamics require estimators of derivatives. We derive an objective that, under automatic differentiation, produces low-variance unbiased estimators of derivatives at any order. Our objective is compatible with arbitrary advantage estimators, which allows the control of the bias and variance of any-order derivatives when using function approximation. Furthermore, we propose a method to trade off bias and variance of higher order derivatives by discounting the impact of more distant causal dependencies. We demonstrate the correctness and utility of our objective in analytically tractable MDPs and in meta-reinforcement-learning for continuous control.

estimator, objective, variance, (15 more...)

arXiv.org Machine Learning

1909.10549

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
North America > Canada (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.73)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.47)

Add feedback

Active Learning for Risk-Sensitive Inverse Reinforcement Learning

Chen, Rui, Wang, Wenshuo, Zhao, Zirui, Zhao, Ding

arXiv.org Machine LearningSep-23-2019

Personal use of this material is permitted. Abstract -- One typical assumption in inverse reinforcement learning (IRL) is that human experts act to optimize the expected utility of a stochastic cost with a fixed distribution. Risk-sensitive inverse reinforcement learning (RS-IRL) bridges such gap by assuming that humans act according to a random cost with respect to a set of subjectively distorted distributions instead of a fixed one. Such assumption provides the additional flexibility to model human's risk preferences, represented by a risk envelope, in safe-critical tasks. However, like other learning from demonstration techniques, RS-IRL could also suffer inefficient learning due to redundant demonstrations. Inspired by the concept of active learning, this research derives a probabilistic disturbance sampling scheme to enable an RS-IRL agent to query expert support that is likely to expose unrevealed boundaries of the expert's risk envelope. Experimental results confirm that our approach accelerates the convergence of RS-IRL algorithms with lower variance while still guaranteeing unbiased convergence. Inverse reinforcement learning (IRL) provides a novel framework for recovering cost functions utilized in human decision making [1]-[6]. The original IRL algorithms [1], [2] are formed as linear programming constrained by op-timality conditions [7]. More recent advancements in IRL include the guided cost learning algorithm [10] which combines MaxEnt IRL and deep learning techniques. The flexibility of IRL framework has prompted its application to a variety of tasks such as autonomous helicopter aerobatics [11] and robot locomotion [12].

constraint, demonstration, disturbance, (10 more...)

arXiv.org Machine Learning

1909.07843

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
(3 more...)

Genre: Research Report (0.70)

Industry: Transportation (0.88)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

Add feedback

Modular Deep Reinforcement Learning with Temporal Logic Specifications

Yuan, Lim Zun, Hasanbeig, Mohammadhosein, Abate, Alessandro, Kroening, Daniel

arXiv.org Artificial IntelligenceSep-23-2019

We propose an actor-critic, model-free, and online Reinforcement Learning (RL) framework for continuous-state continuous-action Markov Decision Processes (MDPs) when the reward is highly sparse but encompasses a high-level temporal structure. We represent this temporal structure by a finite-state machine and construct an on-the-fly synchronised product with the MDP and the finite machine. The temporal structure acts as a guide for the RL agent within the product, where a modular Deep Deterministic Policy Gradient (DDPG) architecture is proposed to generate a low-level control policy. We evaluate our framework in a Mars rover experiment and we present the success rate of the synthesised policy.

agent, algorithm, automaton state, (13 more...)

arXiv.org Artificial Intelligence

1909.11591

Country:

North America > United States > Arizona (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Add feedback

Why Does Hierarchy (Sometimes) Work So Well in Reinforcement Learning?

Nachum, Ofir, Tang, Haoran, Lu, Xingyu, Gu, Shixiang, Lee, Honglak, Levine, Sergey

arXiv.org Artificial IntelligenceSep-23-2019

Hierarchical reinforcement learning has demonstrated significant success at solving difficult reinforcement learning (RL) tasks. Previous works have motivated the use of hierarchy by appealing to a number of intuitive benefits, including learning over temporally extended transitions, exploring over temporally extended periods, and training and exploring in a more semantically meaningful action space, among others. However, in fully observed, Markovian settings, it is not immediately clear why hierarchical RL should provide benefits over standard "shallow" RL architectures. In this work, we isolate and evaluate the claimed benefits of hierarchical RL on a suite of tasks encompassing locomotion, navigation, and manipulation. Surprisingly, we find that most of the observed benefits of hierarchy can be attributed to improved exploration, as opposed to easier policy learning or imposed hierarchical structures. Given this insight, we present exploration techniques inspired by hierarchy that achieve performance competitive with hierarchical RL while at the same time being much simpler to use and implement.

agent, exploration, hierarchy, (14 more...)

arXiv.org Artificial Intelligence

1909.10618

Country: North America > United States > Massachusetts > Hampshire County > Amherst (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Improving Generative Visual Dialog by Answering Diverse Questions

Murahari, Vishvak, Chattopadhyay, Prithvijit, Batra, Dhruv, Parikh, Devi, Das, Abhishek

arXiv.org Artificial IntelligenceSep-23-2019

Prior work on training generative Visual Dialog models with reinforcement learning(Das et al.) has explored a Qbot-Abot image-guessing game and shown that this 'self-talk' approach can lead to improved performance at the downstream dialog-conditioned image-guessing task. However, this improvement saturates and starts degrading after a few rounds of interaction, and does not lead to a better Visual Dialog model. We find that this is due in part to repeated interactions between Qbot and Abot during self-talk, which are not informative with respect to the image. To improve this, we devise a simple auxiliary objective that incentivizes Qbot to ask diverse questions, thus reducing repetitions and in turn enabling Abot to explore a larger state space during RL ie. be exposed to more visual concepts to talk about, and varied questions to answer. We evaluate our approach via a host of automatic metrics and human studies, and demonstrate that it leads to better dialog, ie. dialog that is more diverse (ie. less repetitive), consistent (ie. has fewer conflicting exchanges), fluent (ie. more human-like),and detailed, while still being comparably image-relevant as prior work and ablations.

a-bot, dialog, q-bot, (17 more...)

arXiv.org Artificial Intelligence

1909.1047

Country: North America > United States (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.48)

Add feedback

Robot Navigation in Crowds by Graph Convolutional Networks with Attention Learned from Human Gaze

Chen, Yuying, Liu, Congcong, Liu, Ming, Shi, Bertram E.

arXiv.org Artificial IntelligenceSep-23-2019

Safe and efficient crowd navigation for mobile robot is a crucial yet challenging task. Previous work has shown the power of deep reinforcement learning frameworks to train efficient policies. However, their performance deteriorates when the crowd size grows. We suggest that this can be addressed by enabling the network to identify and pay attention to the humans in the crowd that are most critical to navigation. We propose a novel network utilizing a graph representation to learn the policy. We first train a graph convolutional network based on human gaze data that accurately predicts human attention to different agents in the crowd. Then we incorporate the learned attention into a graph-based reinforcement learning architecture. The proposed attention mechanism enables the assignment of meaningful weightings to the neighbors of the robot, and has the additional benefit of interpretability. Experiments on real-world dense pedestrian datasets with various crowd sizes demonstrate that our model outperforms state-of-art methods by 18.4% in task accomplishment and by 16.4% in time efficiency.

adjacency matrix, attention weight, robot, (15 more...)

arXiv.org Artificial Intelligence

1909.104

Country: Asia > China > Hong Kong (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback