Reinforcement Learning
Intercon World Keynote Dr. Ganapathi Pulipaka Receives a Top 50 Technology Leader Award for His Contributions to AI, Machine Learning, Mathematics, and Data Science
At the Intercon conference, Dr. GP gave a motivational keynote speech on Deep Reinforcement Learning and the landscape of machine learning and artificial intelligence that inspired the audience. He noted that the MIT Technology Review has downloaded 16,625 research papers from arxiv that are publicly available under the computer science and artificial intelligence section through November 2018. Through natural language processing techniques on the abstracts, the words "constraint," "theory," "rule," "logic," "program," "learning," "network," "data," "task," and "performance" have been evaluated to find the reinforcement learning boom in recent times. Dr. GP said trends have shown the rise of traditional neural networks in the 1950s and 1960s, symbolic approaches in the 1970s, knowledge-based and rule-based systems in 1980s, support vector machines in 1990s, and the reign of neural networks in the 2010s with the advent of heavy implementation of deep neural networks. Deep Traffic is a reinforcement learning simulation based on the 24,000 entries received on MIT's Deep Traffic competition on self-driving cars that drive on a multi-lane freeway with a model-free off-policy reinforcement learning process that inspires a number of data scientists and machine learning enthusiasts to evaluate the Deep-Q-Learning reinforcement learning network variants and hyperparameter configurations with episodic iterations training of 96.6 years of RL simulations, 572.2 million crowdsourced and optimized DQN hyperparameters to train the agents successfully.
Urban flows prediction from spatial-temporal data using machine learning: A survey
Xie, Peng, Li, Tianrui, Liu, Jia, Du, Shengdong, Yang, Xin, Zhang, Junbo
Urban spatial-temporal flows prediction is of great importance to traffic management, land use, public safety, etc. Urban flows are affected by several complex and dynamic factors, such as patterns of human activities, weather, events and holidays. Datasets evaluated the flows come from various sources in different domains, e.g. mobile phone data, taxi trajectories data, metro/bus swiping data, bike-sharing data and so on. To summarize these methodologies of urban flows prediction, in this paper, we first introduce four main factors affecting urban flows. Second, in order to further analysis urban flows, a preparation process of multi-sources spatial-temporal data related with urban flows is partitioned into three groups. Third, we choose the spatial-temporal dynamic data as a case study for the urban flows prediction task. Fourth, we analyze and compare some well-known and state-of-the-art flows prediction methods in detail, classifying them into five categories: statistics-based, traditional machine learning-based, deep learning-based, reinforcement learning-based and transfer learning-based methods. Finally, we give open challenges of urban flows prediction and an outlook in the future of this field. This paper will facilitate researchers find suitable methods and open datasets for addressing urban spatial-temporal flows forecast problems.
AI Development and Trends in E-Commerce
The traditional retail industry is undergoing a significant reinvention and upgrade as more and more brick and mortar stores boost business by adopting e-commerce platforms powered by cutting-edge tech. The recent rapid development and deployment of AI technologies such as machine learning, computer vision and reinforcement learning have enabled new e-commerce products and solutions for various scenarios and strengthened the retail value chain. Alibaba's Taobao and Tmall, Amazon, JD.com); or on a brand's own official web stores (e.g. Thanks to recent advancements in AI and digital technologies, operating costs for e-commerce have been reduced, enabling more retailers to realize e-commerce transformations. The 2018 global retail e-commerce market amounted to US$2.8 trillion and is expected to grow 75 percent to US$4.9 trillion by 2021.
Tutorial and Survey on Probabilistic Graphical Model and Variational Inference in Deep Reinforcement Learning
Probabilistic Graphical Modeling and Variational Inference play an important role in recent advances in Deep Reinforcement Learning. Aiming at a self-consistent tutorial survey, this article illustrates basic concepts of reinforcement learning with Probabilistic Graphical Models, as well as derivation of some basic formula as a recap. Reviews and comparisons on recent advances in deep reinforcement learning with different research directions are made from various aspects. We offer Probabilistic Graphical Models, detailed explanation and derivation to several use cases of Variational Inference, which serve as a complementary material on top of the original contributions.
Reinforcement Learning in Healthcare: A Survey
Yu, Chao, Liu, Jiming, Nemati, Shamim
As a subfield of machine learning, \emph{reinforcement learning} (RL) aims at empowering one's capabilities in behavioural decision making by using interaction experience with the world and an evaluative feedback. Unlike traditional supervised learning methods that usually rely on one-shot, exhaustive and supervised reward signals, RL tackles with sequential decision making problems with sampled, evaluative and delayed feedback simultaneously. Such distinctive features make RL technique a suitable candidate for developing powerful solutions in a variety of healthcare domains, where diagnosing decisions or treatment regimes are usually characterized by a prolonged and sequential procedure. This survey will discuss the broad applications of RL techniques in healthcare domains, in order to provide the research community with systematic understanding of theoretical foundations, enabling methods and techniques, existing challenges, and new insights of this emerging paradigm. By first briefly examining theoretical foundations and key techniques in RL research from efficient and representational directions, we then provide an overview of RL applications in a variety of healthcare domains, ranging from dynamic treatment regimes in chronic diseases and critical care, automated medical diagnosis from both unstructured and structured clinical data, as well as many other control or scheduling domains that have infiltrated many aspects of a healthcare system. Finally, we summarize the challenges and open issues in current research, and point out some potential solutions and directions for future research.
Practical Risk Measures in Reinforcement Learning
Di Castro, Dotan, Oren, Joel, Mannor, Shie
Practical application of Reinforcement Learning (RL) often involves risk considerations. We study a generalized approximation scheme for risk measures, based on Monte-Carlo simulations, where the risk measures need not necessarily be \emph{coherent}. We demonstrate that, even in simple problems, measures such as the variance of the reward-to-go do not capture the risk in a satisfactory manner. In addition, we show how a risk measure can be derived from model's realizations. We propose a neural architecture for estimating the risk and suggest the risk critic architecture that can be use to optimize a policy under general risk measures. We conclude our work with experiments that demonstrate the efficacy of our approach.
Opponent Aware Reinforcement Learning
Gallego, Victor, Naveiro, Roi, Insua, David Rios, Oteiza, David Gomez-Ullate
In several reinforcement learning (RL) scenarios such as security settings, there may be adversaries trying to interfere with the reward generating process for their own benefit. We introduce Threatened Markov Decision Processes (TMDPs) as a framework to support an agent against potential opponents in a RL context. We also propose a level-k thinking scheme resulting in a novel learning approach to deal with TMDPs. After introducing our framework and deriving theoretical results, relevant empirical evidence is given via extensive experiments, showing the benefits of accounting for adversaries in RL while the agent learns