AITopics

2002.05513

Country:

Asia > China > Beijing > Beijing (0.04)
Europe > Germany (0.04)

Genre: Research Report (1.00)

Industry: Transportation > Freight & Logistics Services (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

The Big Three: A Methodology to Increase Data Science ROI by Answering the Questions Companies Care About

Griffin, Daniel K.

Companies may be achieving only a third of the value they could be getting from data science in industry applications. In this paper, we propose a methodology for categorizing and answering 'The Big Three' questions (what is going on, what is causing it, and what actions can I take that will optimize what I care about) using data science. The applications of data science seem to be nearly endless in today's modern landscape, with each company jockeying for position in the new data and insights economy. Yet, data scientists seem to be solely focused on using classification, regression, and clustering methods to answer the question 'what is going on'. Answering questions about why things are happening or how to take optimal actions to improve metrics are relegated to niche fields of research and generally neglected in industry data science analysis. We survey technical methods to answer these other important questions, describe areas in which some of these methods are being applied, and provide a practical example of how to apply our methodology and selected methods to a real business use case.

application, information, reinforcement, (16 more...)

2002.07069

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(7 more...)

Genre:

Research Report > Experimental Study (1.00)
Overview (0.92)

Industry:

Leisure & Entertainment > Games (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Information Technology > Services (0.93)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
(3 more...)

Volodin, Sergei, Wichers, Nevan, Nixon, Jeremy

Resolving Spurious Correlations in Causal Models of Environments via Interventions

Causal models could increase interpretability, robustness to distributional shift and sample efficiency of RL agents. In this vein, we address the question of learning a causal model of an RL environment. This problem is known to be difficult due to spurious correlations. We overcome this difficulty by rewarding an RL agent for designing and executing interventions to discover the true model. We compare rewarding the agent for disproving uncertain edges in the causal graph, rewarding the agent for activating a certain node, or rewarding the agent for increasing the causal graph loss. We show that our methods result in a better causal graph than one generated by following the random policy, or a policy trained on the environment's reward. We find that rewarding for the causal graph loss works the best.

causal graph, graph, intervention, (12 more...)

2002.05217

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.83)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Regret Bounds for Discounted MDPs

Liu, Shuang, Su, Hao

Recently, it has been shown that carefully designed reinforcement learning (RL) algorithms can achieve near-optimal regret in the episodic or the average-reward setting. However, in practice, RL algorithms are applied mostly to the infinite-horizon discounted-reward setting, so it is natural to ask what the lowest regret an algorithm can achieve is in this case, and how close to the optimal the regrets of existing RL algorithms are. In this paper, we prove a regret lower bound of $\Omega\left(\frac{\sqrt{SAT}}{1 - \gamma} - \frac{1}{(1 - \gamma)^2}\right)$ when $T\geq SA$ on any learning algorithm for infinite-horizon discounted Markov decision processes (MDP), where $S$ and $A$ are the numbers of states and actions, $T$ is the number of actions taken, and $\gamma$ is the discounting factor. We also show that a modified version of the double Q-learning algorithm gives a regret upper bound of $\tilde{O}\left(\frac{\sqrt{SAT}}{(1 - \gamma)^{2.5}}\right)$ when $T\geq SA$. Compared to our bounds, previous best lower and upper bounds both have worse dependencies on $T$ and $\gamma$, while our dependencies on $S, A, T$ are optimal. The proof of our upper bound is inspired by recent advances in the analysis of Q-learning in the episodic setting, but the cyclic nature of infinite-horizon MDPs poses many new challenges.

algorithm, dependency, mdp, (13 more...)

2002.05138

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > California > San Diego County > San Diego (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Fallah, Alireza, Mokhtari, Aryan, Ozdaglar, Asuman

Provably Convergent Policy Gradient Methods for Model-Agnostic Meta-Reinforcement Learning

We consider Model-Agnostic Meta-Learning (MAML) methods for Reinforcement Learning (RL) problems where the goal is to find a policy (using data from several tasks represented by Markov Decision Processes (MDPs)) that can be updated by one step of stochastic policy gradient for the realized MDP. In particular, using stochastic gradients in MAML update step is crucial for RL problems since computation of exact gradients requires access to a large number of possible trajectories. For this formulation, we propose a variant of the MAML method, named Stochastic Gradient Meta-Reinforcement Learning (SG-MRL), and study its convergence properties. We derive the iteration and sample complexity of SG-MRL to find an $\epsilon$-first-order stationary point, which, to the best of our knowledge, provides the first convergence guarantee for model-agnostic meta-reinforcement learning algorithms. We further show how our results extend to the case where more than one step of stochastic policy gradient method is used in the update during the test time.

dtest, gradient, lipschitz, (16 more...)

2002.05135

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > Texas > Travis County > Austin (0.14)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.69)

Raschka, Sebastian, Patterson, Joshua, Nolet, Corey

Machine Learning in Python: Main developments and technology trends in data science, machine learning, and artificial intelligence

Smarter applications are making better use of the insights gleaned from data, having an impact on every industry and research discipline. At the core of this revolution lies the tools and the methods that are driving it, from processing the massive piles of data generated each day to learning from and taking useful action. Deep neural networks, along with advancements in classical ML and scalable general-purpose GPU computing, have become critical components of artificial intelligence, enabling many of these astounding breakthroughs and lowering the barrier to adoption. Python continues to be the most preferred language for scientific computing, data science, and machine learning, boosting both performance and productivity by enabling the use of low-level libraries and clean high-level APIs. This survey offers insight into the field of machine learning with Python, taking a tour through important topics to identify some of the core hardware and software paradigms that have enabled it. We cover widely-used libraries and concepts, collected together for holistic comparison, with the goal of educating the reader and driving the field of Python machine learning forward.

algorithm, arxiv preprint arxiv, library, (15 more...)

2002.04803

Country:

North America > United States > Wisconsin > Dane County > Madison (0.14)
North America > United States > Maryland > Baltimore County (0.04)
North America > United States > Maryland > Baltimore (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.67)

Industry:

Information Technology > Security & Privacy (0.68)
Leisure & Entertainment > Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.93)
(4 more...)

Bontrager, Philip, Togelius, Julian

Fully Differentiable Procedural Content Generation through Generative Playing Networks

To procedurally create interactive content such as environments or game levels, we need agents that can evaluate the content; but to train such agents, we need content they can train on. Generative Playing Networks is a framework that learns agent policies and generates environments in tandem through a symbiotic process. Policies are learned using an actor-critic reinforcement learning algorithm so as to master the environment, and environments are created by a generator network which tries to provide an appropriate level of challenge for the agent. This is accomplished by the generator learning to make content based on estimates by the critic. Thus, this process provides an implicit curriculum for the agent, creating more complex environments over time. Unlike previous approaches to procedural content generation, Generative Playing Networks is end-to-end differentiable and does not require human-designed examples or domain knowledge. We demonstrate the capability of this framework by training an agent and level generator for a 2D dungeon crawler game.

agent, generative, generator, (12 more...)

2002.05259

Country: North America > United States > New York (0.04)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games > Computer Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Pesce, Emanuele, Montana, Giovanni

Connectivity-driven Communication in Multi-agent Reinforcement Learning through Diffusion Processes on Graphs

We discuss the problem of learning collaborative behaviour in multi-agent systems using deep reinforcement learning (DRL). A connectivity-driven communication (CDC) algorithm is proposed to address three key aspects: what agents to involve in the communication, what information content to share, and how often to share it. We introduce the notion of a connectivity network, modelled as a weighted graph, where nodes represent agents and edges represent the degree of connectivity between pairs of agents. The optimal graph topology is learned end-to-end concurrently with the stochastic policy so as to maximise future expected returns. The communication patterns depend on the graph's topology through a diffusion process on the graph, the heat kernel, which is found by exponentiating the Laplacian eigensystem through time and is fully differentiable. Empirical results show that CDC is capable of superior performance over alternative algorithms for a range of cooperative navigation tasks.

agent, connectivity-driven communication, multi-agent reinforcement learning, (11 more...)

2002.05233

Country:

North America > United States > Montana (0.05)
North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Data Efficient Training for Reinforcement Learning with Adaptive Behavior Policy Sharing

Liu, Ge, Wu, Rui, Cheng, Heng-Tze, Wang, Jing, Ooi, Jayden, Li, Lihong, Li, Ang, Li, Wai Lok Sibon, Boutilier, Craig, Chi, Ed

Deep Reinforcement Learning (RL) is proven powerful for decision making in simulated environments. However, training deep RL model is challenging in real world applications such as production-scale health-care or recommender systems because of the expensiveness of interaction and limitation of budget at deployment. One aspect of the data inefficiency comes from the expensive hyper-parameter tuning when optimizing deep neural networks. We propose Adaptive Behavior Policy Sharing (ABPS), a data-efficient training algorithm that allows sharing of experience collected by behavior policy that is adaptively selected from a pool of agents trained with an ensemble of hyper-parameters. We further extend ABPS to evolve hyper-parameters during training by hybridizing ABPS with an adapted version of Population Based Training (ABPS-PBT). We conduct experiments with multiple Atari games with up to 16 hyper-parameter/architecture setups. ABPS achieves superior overall performance, reduced variance on top 25% agents, and equivalent performance on the best agent compared to conventional hyper-parameter tuning with independent training, even though ABPS only requires the same number of environmental interactions as training a single agent. We also show that ABPS-PBT further improves the convergence speed and reduces the variance.

adaptive behavior policy, agent, hyper-parameter pool, (14 more...)

2002.05229

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.66)

Industry:

Leisure & Entertainment > Games > Computer Games (0.54)
Health & Medicine (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Jeon, Hong Jun, Milli, Smitha, Dragan, Anca D.

Reward-rational (implicit) choice: A unifying formalism for reward learning

It is often difficult to hand-specify what the correct reward function is for a task, so researchers have instead aimed to learn reward functions from human behavior or feedback. The types of behavior interpreted as evidence of the reward function have expanded greatly in recent years. We've gone from demonstrations, to comparisons, to reading into the information leaked when the human is pushing the robot away or turning it off. And surely, there is more to come. How will a robot make sense of all these diverse types of behavior? Our key insight is that different types of behavior can be interpreted in a single unifying formalism - as a reward-rational choice that the human is making, often implicitly. The formalism offers both a unifying lens with which to view past work, as well as a recipe for interpreting new sources of information that are yet to be uncovered. We provide two examples to showcase this: interpreting a new feedback type, and reading into how the choice of feedback itself leaks information about the reward.

information, robot, trajectory, (17 more...)

2002.04833

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)