AITopics

1911.01547

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre:

Research Report (0.81)
Instructional Material (0.67)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Education > Assessment & Standards > Measuring Intelligence (1.00)
Leisure & Entertainment > Games > Chess (0.93)
Leisure & Entertainment > Sports (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > History (1.00)
(6 more...)

#artificialintelligenceNov-24-2019, 23:04:37 GMT

How AI Is Manipulating Economics to Create Appreciating Assets

"If you buy a Tesla today, I believe you're buying an appreciating asset, not a depreciating asset." Think about that statement for a second…you're buying an appreciating asset, not a depreciating asset. And what is driving the appreciation of that asset? Tesla cars become "smarter" and consequently more valuable with every mile each of the 400,000 Autopilot-equipped cars are driven. Imagine a mindset of leveraging Deep Reinforcement Learning with new operational data to create products (vehicles, trains, cranes, compressors, chillers, turbines, drills) that appreciate with usage because the products are getting more reliable, more predictive, more efficient, more effective, safer and consequently more valuable.

autonomous vehicle, create appreciating asset, product and operational insight, (10 more...)

#artificialintelligence

Country: North America > United States > California > San Francisco County > San Francisco (0.07)

Genre: Personal (0.30)

Industry: Transportation > Ground > Road (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.59)

Zhang, Kaiqing, Yang, Zhuoran, Başar, Tamer

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Recent years have witnessed significant advances in reinforcement learning (RL), which has registered great success in solving various sequential decision-making problems in machine learning. Most of the successful RL applications, e.g., the games of Go and Poker, robotics, and autonomous driving, involve the participation of more than one single agent, which naturally fall into the realm of multi-agent RL (MARL), a domain with a relatively long history, and has recently re-emerged due to advances in single-agent RL techniques. Though empirically successful, theoretical foundations for MARL are relatively lacking in the literature. In this chapter, we provide a selective overview of MARL, with focus on algorithms backed by theoretical analysis. More specifically, we review the theoretical results of MARL algorithms mainly within two representative frameworks, Markov/stochastic games and extensive-form games, in accordance with the types of tasks they address, i.e., fully cooperative, fully competitive, and a mix of the two. We also introduce several significant but challenging applications of these algorithms. Orthogonal to the existing reviews on MARL, we highlight several new angles and taxonomies of MARL theory, including learning in extensive-form games, decentralized MARL with networked agents, MARL in the mean-field regime, (non-)convergence of policy-based methods for learning in games, etc. Some of the new angles extrapolate from our own research endeavors and interests. Our overall goal with this chapter is, beyond providing an assessment of the current state of the field on the mark, to identify fruitful future research directions on theoretical studies of MARL. We expect this chapter to serve as continuing stimulus for researchers interested in working on this exciting while challenging topic.

agent, algorithm, arxiv preprint arxiv, (11 more...)

1911.10635

Country:

North America > Canada > Alberta (0.14)
North America > United States > Texas (0.04)
Asia > Middle East > Jordan (0.04)
(8 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Transportation (1.00)
Leisure & Entertainment > Games > Computer Games (1.00)
Energy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

A Deep Reinforcement Learning Architecture for Multi-stage Optimal Control

Yang, Yuguang

Deep reinforcement learning for high dimensional, hierarchical control tasks usually requires the use of complex neural networks as functional approximators, which can lead to inefficiency, instability and even divergence in the training process. Here, we introduce stacked deep Q learning (SDQL), a flexible modularized deep reinforcement learning architecture, that can enable finding of optimal control policy of control tasks consisting of multiple linear stages in a stable and efficient way. SDQL exploits the linear stage structure by approximating the Q function via a collection of deep Q sub-networks stacking along an axis marking the stage-wise progress of the whole task. By back-propagating the learned state values from later stages to earlier stages, all sub-networks co-adapt to maximize the total reward of the whole task, although each sub-network is responsible for learning optimal control policy for its own stage. This modularized architecture offers considerable flexibility in terms of environment and policy modeling, as it allows choices of different state spaces, action spaces, reward structures, and Q networks for each stage, Further, the backward stage-wise training procedure of SDQL can offers additional transparency, stability, and flexibility to the training process, thus facilitating model fine-tuning and hyper-parameter search. We demonstrate that SDQL is capable of learning competitive strategies for problems with characteristics of high-dimensional state space, heterogeneous action space(both discrete and continuous), multiple scales, and sparse and delayed rewards.

control task, discount factor, neural network, (15 more...)

1911.10684

Country: North America > United States (0.14)

Genre: Research Report (0.50)

Industry: Energy (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Machine LearningNov-24-2019

Merging Deterministic Policy Gradient Estimations with Varied Bias-Variance Tradeoff for Effective Deep Reinforcement Learning

Chen, Gang

Deep reinforcement learning (DRL) on Markov decision processes (MDPs) with continuous action spaces is often approached by directly updating parametric policies along the direction of estimated policy gradients (PGs). Previous research revealed that the performance of these PG algorithms depends heavily on the bias-variance tradeoff involved in estimating and using PGs. A notable approach towards balancing this tradeoff is to merge both on-policy and off-policy gradient estimations for the purpose of training stochastic policies. However this method cannot be utilized directly by sample-efficient off-policy PG algorithms such as Deep Deterministic Policy Gradient (DDPG) and twin-delayed DDPG (TD3), which have been designed to train deterministic policies. It is hence important to develop new techniques to merge multiple off-policy estimations of deterministic PG (DPG). Driven by this research question, this paper introduces elite DPG which will be estimated differently from conventional DPG to emphasize on the variance reduction effect at the expense of increased learning bias. To mitigate the extra bias, policy consolidation techniques will be developed to distill policy behavioral knowledge from elite trajectories and use the distilled generative model to further regularize policy training. Moreover, we will study both theoretically and experimentally two different DPG merging methods, i.e., interpolation merging and two-step merging, with the aim to induce varied bias-variance tradeoff through combined use of both conventional DPG and elite DPG. Experiments on six benchmark control tasks confirm that these two merging methods can noticeably improve the learning performance of TD3, significantly outperforming several state-of-the-art DRL algorithms.

algorithm, arxiv preprint arxiv, dpg, (12 more...)

arXiv.org Machine Learning

1911.10527

Country:

Asia > Middle East > Jordan (0.04)
Oceania > New Zealand > North Island > Wellington Region > Wellington (0.04)

Genre:

Research Report > New Finding (0.48)
Research Report > Experimental Study (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

ORL: Reinforcement Learning Benchmarks for Online Stochastic Optimization Problems

Balaji, Bharathan, Bell-Masterson, Jordan, Bilgin, Enes, Damianou, Andreas, Garcia, Pablo Moreno, Jain, Arpit, Luo, Runfei, Maggiar, Alvaro, Narayanaswamy, Balakrishnan, Ye, Chun

Reinforcement Learning (RL) has achieved state-of-the-art results in domains such as robotics and games. We build on this previous work by applying RL algorithms to a selection of canonical online stochastic optimization problems with a range of practical applications: Bin Packing, Newsvendor, and Vehicle Routing. While there is a nascent literature that applies RL to these problems, there are no commonly accepted benchmarks which can be used to compare proposed approaches rigorously in terms of performance, scale, or generalizability. This paper aims to fill that gap. For each problem we apply both standard approaches as well as newer RL algorithms and analyze results. In each case, the performance of the trained RL policy is competitive with or superior to the corresponding baselines, while not requiring much in the way of domain knowledge. This highlights the potential of RL in real-world dynamic resource allocation problems.

algorithm, baseline, bin, (14 more...)

1911.10641

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)

Genre: Research Report (0.50)

Industry: Transportation > Freight & Logistics Services (0.89)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Tschantz, Alexander, Baltieri, Manuel, Seth, Anil. K., Buckley, Christopher L.

Scaling active inference

In reinforcement learning (RL), agents often operate in partially observed and uncertain environments. Model-based RL suggests that this is best achieved by learning and exploiting a probabilistic model of the world. 'Active inference' is an emerging normative framework in cognitive and computational neuroscience that offers a unifying account of how biological agents achieve this. On this framework, inference, learning and action emerge from a single imperative to maximize the Bayesian evidence for a niched model of the world. However, implementations of this process have thus far been restricted to low-dimensional and idealized situations. Here, we present a working implementation of active inference that applies to high-dimensional tasks, with proof-of-principle results demonstrating efficient exploration and an order of magnitude increase in sample efficiency over strong model-free baselines. Our results demonstrate the feasibility of applying active inference at scale and highlight the operational homologies between active inference and current model-based approaches to RL.

active inference, agent, inference, (12 more...)

1911.10601

Country:

Europe > United Kingdom > England > East Sussex > Brighton (0.04)
Asia > Japan > Honshū > Kantō > Saitama Prefecture > Saitama (0.04)

Genre: Research Report > New Finding (0.88)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.87)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
(2 more...)

Causality for Machine Learning

Schölkopf, Bernhard

Graphical causal inference as pioneered by Judea Pearl arose from research on artificial intelligence (AI), and for a long time had little connection to the field of machine learning. This article discusses where links have been and should be established, introducing key concepts along the way. It argues that the hard open problems of machine learning and AI are intrinsically related to causality, and explains how the field is beginning to understand them.

information, lkopf, mechanism, (14 more...)

1911.105

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(11 more...)

Genre: Research Report (1.00)

Industry:

Government (0.46)
Information Technology (0.46)
Leisure & Entertainment > Games (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceNov-23-2019

From Persistent Homology to Reinforcement Learning with Applications for Retail Banking

Charlier, Jeremy

The retail banking services are one of the pillars of the modern economic growth. However, the evolution of the client's habits in modern societies and the recent European regulations promoting more competition mean the retail banks will encounter serious challenges for the next few years, endangering their activities. They now face an impossible compromise: maximizing the satisfaction of their hyper-connected clients while avoiding any risk of default and being regulatory compliant. Therefore, advanced and novel research concepts are a serious game-changer to gain a competitive advantage. In this context, we investigate in this thesis different concepts bridging the gap between persistent homology, neural networks, recommender engines and reinforcement learning with the aim of improving the quality of the retail banking services. Our contribution is threefold. First, we highlight how to overcome insufficient financial data by generating artificial data using generative models and persistent homology. Then, we present how to perform accurate financial recommendations in multi-dimensions. Finally, we underline a reinforcement learning model-free approach to determine the optimal policy of money management based on the aggregated financial transactions of the clients. Our experimental data sets, extracted from well-known institutions where the privacy and the confidentiality of the clients were not put at risk, support our contributions. In this work, we provide the motivations of our retail banking research project, describe the theory employed to improve the financial services quality and evaluate quantitatively and qualitatively our methodologies for each of the proposed research scenarios.

accurate tensor resolution, persistent homology, user-device authentication, (14 more...)

1911.11573

Country:

Europe > Switzerland > Basel-City > Basel (0.04)
Africa > Senegal > Kolda Region > Kolda (0.04)
North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
(14 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Banking & Finance > Trading (0.92)
Banking & Finance > Financial Services (0.88)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Sherstan, Craig, Dohare, Shibhansh, MacGlashan, James, Günther, Johannes, Pilarski, Patrick M.

Gamma-Nets: Generalizing Value Estimation over Timescale

arXiv.org Artificial IntelligenceNov-23-2019

We present $\Gamma$-nets, a method for generalizing value function estimation over timescale. By using the timescale as one of the estimator's inputs we can estimate value for arbitrary timescales. As a result, the prediction target for any timescale is available and we are free to train on multiple timescales at each timestep. Here we empirically evaluate $\Gamma$-nets in the policy evaluation setting. We first demonstrate the approach on a square wave and then on a robot arm using linear function approximation. Next, we consider the deep reinforcement learning setting using several Atari video games. Our results show that $\Gamma$-nets can be effective for predicting arbitrary timescales, with only a small cost in accuracy as compared to learning estimators for fixed timescales. $\Gamma$-nets provide a method for compactly making predictions at many timescales without requiring a priori knowledge of the task, making it a valuable contribution to ongoing work on model-based planning, representation learning, and lifelong learning algorithms.

agent, prediction, timescale, (15 more...)

1911.07794

Country:

North America > Canada > Alberta (0.14)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(7 more...)

Genre: Research Report > New Finding (0.68)

Industry:

Leisure & Entertainment > Games > Computer Games (0.49)
Education (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)