AITopics

2002.09162

Country:

North America > United States > Wisconsin (0.04)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > San Mateo County > Menlo Park (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
(2 more...)

Zhong, Yuanyi, Schwing, Alexander, Peng, Jian

Disentangling Controllable Object through Video Prediction Improves Visual Reinforcement Learning

arXiv.org Machine LearningFeb-21-2020

In many vision-based reinforcement learning (RL) problems, the agent controls a movable object in its visual field, e.g., the player's avatar in video games and the robotic arm in visual grasping and manipulation. Leveraging action-conditioned video prediction, we propose an end-to-end learning framework to disentangle the controllable object from the observation signal. The disentangled representation is shown to be useful for RL as additional observation channels to the agent. Experiments on a set of Atari games with the popular Double DQN algorithm demonstrate improved sample efficiency and game performance (from 222.8% to 261.4% measured in normalized game scores, with prediction bonus reward).

agent, learning, prediction model, (11 more...)

2002.09136

Country:

North America > United States > Illinois > Champaign County > Champaign (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

arXiv.org Machine LearningFeb-21-2020

Discrete Action On-Policy Learning with Action-Value Critic

Yue, Yuguang, Tang, Yunhao, Yin, Mingzhang, Zhou, Mingyuan

Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension, making it challenging to apply existing on-policy gradient based deep RL algorithms efficiently. To effectively operate in multidimensional discrete action spaces, we construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation. We follow rigorous statistical analysis to design how to generate and combine these correlated actions, and how to sparsify the gradients by shutting down the contributions from certain dimensions. These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques. We demonstrate these properties on OpenAI Gym benchmark tasks, and illustrate how discretizing the action space could benefit the exploration phase and hence facilitate convergence to a better local optimal solution thanks to the flexibility of discrete policy.

action space, algorithm, gradient, (11 more...)

2002.03534

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Texas (0.04)
Europe > Italy > Sicily > Palermo (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Colas, Cédric, Karch, Tristan, Lair, Nicolas, Dussoux, Jean-Michel, Moulin-Frier, Clément, Dominey, Peter Ford, Oudeyer, Pierre-Yves

Language as a Cognitive Tool to Imagine Goals in Curiosity-Driven Exploration

arXiv.org Artificial IntelligenceFeb-21-2020

Autonomous reinforcement learning agents must be intrinsically motivated to explore their environment, discover potential goals, represent them and learn how to achieve them. As children do the same, they benefit from exposure to language, using it to formulate goals and imagine new ones as they learn their meaning. In our proposed learning architecture (IMAGINE), the agent freely explores its environment and turns natural language descriptions of interesting interactions from a social partner into potential goals. IMAGINE learns to represent goals by jointly learning a language model and a goal-conditioned reward function. Just like humans, our agent uses language compositionality to generate new goals by composing known ones. Leveraging modular model architectures based on Deep Sets and gated-attention mechanisms, IMAGINE autonomously builds a repertoire of behaviors and shows good zero-shot generalization properties for various types of generalization. When imagining its own goals, the agent leverages zero-shot generalization of the reward function to further train on imagined goals and refine its behavior. We present experiments in a simulated domain where the agent interacts with procedurally generated scenes containing objects of various types and colors, discovers goals, imagines others and learns to achieve them.

architecture, generalization, reward function, (12 more...)

arXiv.org Artificial Intelligence

2002.09253

Country: North America > United States > California > Los Angeles County > Long Beach (0.04)

Genre:

Research Report (0.64)
Instructional Material > Course Syllabus & Notes (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

arXiv.org Artificial IntelligenceFeb-21-2020

Estimating Q(s,s') with Deep Deterministic Dynamics Gradients

Edwards, Ashley D., Sahni, Himanshu, Liu, Rosanne, Hung, Jane, Jain, Ankit, Wang, Rui, Ecoffet, Adrien, Miconi, Thomas, Isbell, Charles, Yosinski, Jason

In this paper, we introduce a novel form of value function, $Q(s, s')$, that expresses the utility of transitioning from a state $s$ to a neighboring state $s'$ and then acting optimally thereafter. In order to derive an optimal policy, we develop a forward dynamics model that learns to make next-state predictions that maximize this value. This formulation decouples actions from values while still learning off-policy. We highlight the benefits of this approach in terms of value function transfer, learning within redundant action spaces, and learning off-policy from state observations generated by sub-optimal or completely random policies. Code and videos are available at \url{sites.google.com/view/qss-paper}.

dynamic model, experiment, qss, (10 more...)

arXiv.org Artificial Intelligence

2002.09505

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

#artificialintelligenceFeb-20-2020, 04:26:14 GMT

Researchers Improve Robotic Arm Used in Surgery

Facebook has recently created an algorithm that enhances an AI agent's ability to navigate an environment, letting the agent determine the shortest route through new environments without access to a map. While mobile robots typically have a map programmed into them, the new algorithm that Facebook designed could enable the creation of robots that can navigate environments without the need for maps. According to a post created by Facebook researchers, a major challenge for robot navigation is endowing AI systems with the ability to navigate through novel environments and reaching programmed destinations without a map. In order to tackle this challenge, Facebook created a reinforcement learning algorithm distributed across multiple learners. The algorithm was called decentralized distributed proximal policy optimization (DD-PPO).

agent, algorithm, research team, (12 more...)

#artificialintelligence

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.57)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.39)

Munir, Md. Shirajum, Abedin, Sarder Fakhrul, Tran, Nguyen H., Han, Zhu, Huh, Eui Nam, Hong, Choong Seon

Risk-Aware Energy Scheduling for Edge Computing with Microgrid: A Multi-Agent Deep Reinforcement Learning Approach

In recent years, multi-access edge computing (MEC) is a key enabler for handling the massive expansion of Internet of Things (IoT) applications and services. However, energy consumption of a MEC network depends on volatile tasks that induces risk for energy demand estimations. As an energy supplier, a microgrid can facilitate seamless energy supply. However, the risk associated with energy supply is also increased due to unpredictable energy generation from renewable and non-renewable sources. Especially, the risk of energy shortfall is involved with uncertainties in both energy consumption and generation. In this paper, we study a risk-aware energy scheduling problem for a microgrid-powered MEC network. First, we formulate an optimization problem considering the conditional value-at-risk (CVaR) measurement for both energy consumption and generation, where the objective is to minimize the loss of energy shortfall of the MEC networks and we show this problem is an NP-hard problem. Second, we analyze our formulated problem using a multi-agent stochastic game that ensures the joint policy Nash equilibrium, and show the convergence of the proposed model. Third, we derive the solution by applying a multi-agent deep reinforcement learning (MADRL)-based asynchronous advantage actor-critic (A3C) algorithm with shared neural networks. This method mitigates the curse of dimensionality of the state space and chooses the best policy among the agents for the proposed problem. Finally, the experimental results establish a significant performance gain by considering CVaR for high accuracy energy scheduling of the proposed model than both the single and random agent models.

energy consumption, energy scheduling, university, (11 more...)

2003.02157

Country:

North America > United States > Maryland > Prince George's County > College Park (0.14)
Asia > South Korea > Seoul > Seoul (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(24 more...)

Genre:

Personal (0.68)
Research Report (0.63)

Industry:

Energy > Renewable (1.00)
Energy > Power Industry (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Wang, Ran, Parunandi, Karthikeya S., Yu, Dan, Kalathil, Dileep, Chakravorty, Suman

On the Search for Feedback in Reinforcement Learning

This paper addresses the problem of learning the optimal feedback policy for a nonlinear stochastic dynamical system with continuous state space, continuous action space and unknown dynamics. Feedback policies are complex objects that typically need a large dimensional parametrization, which makes Reinforcement Learning algorithms that search for an optimum in this large parameter space, sample inefficient and subject to high variance. We propose a "decoupling" principle that drastically reduces the feedback parameter space while still remaining near-optimal to the fourth-order in a small noise parameter. Based on this principle, we propose a decoupled data-based control (D2C) algorithm that addresses the stochastic control problem: first, an open-loop deterministic trajectory optimization problem is solved using a black-box simulation model of the dynamical system. Then, a linear closed-loop control is developed around this nominal trajectory using only a simulation model. Empirical evidence suggests significant reduction in training time, as well as the training variance, compared to other state of the art Reinforcement Learning algorithms.

algorithm, equation, noise, (17 more...)

2002.09478

Country:

North America > United States > Texas > Brazos County > College Station (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)
North America > United States > New York (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry:

Energy (0.50)
Transportation > Air (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Brown, Daniel S., Coleman, Russell, Srinivasan, Ravi, Niekum, Scott

Safe Imitation Learning via Fast Bayesian Reward Inference from Preferences

Bayesian reward learning from demonstrations enables rigorous safety and uncertainty analysis when performing imitation learning. However, Bayesian reward learning methods are typically computationally intractable for complex control problems. We propose a highly efficient Bayesian reward learning algorithm that scales to high-dimensional imitation learning problems by first pre-training a low-dimensional feature encoding via self-supervised tasks and then leveraging preferences over demonstrations to perform fast Bayesian inference. We evaluate our proposed approach on the task of learning to play Atari games from demonstrations, without access to the game score. For Atari games our approach enables us to generate 100,000 samples from the posterior over reward functions in only 5 minutes using a personal laptop. Furthermore, our proposed approach achieves comparable or better imitation learning performance than state-of-the-art methods that only find a point estimate of the reward function. Finally, we show that our approach enables efficient high-confidence policy performance bounds. We show that these high-confidence performance bounds can be used to rank the performance and risk of a variety of evaluation policies, despite not having samples of the reward function. We also show evidence that high-confidence performance bounds can be used to detect reward hacking in complex imitation learning problems.

bayesian rex, demonstration, reward function, (7 more...)

2002.09089

Country:

North America > United States > Texas (0.04)
North America > United States > New York (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games > Computer Games (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.87)
(2 more...)

GenDICE: Generalized Offline Estimation of Stationary Values

Zhang, Ruiyi, Dai, Bo, Li, Lihong, Schuurmans, Dale

An important problem that arises in reinforcement learning and Monte Carlo methods is estimating quantities defined by the stationary distribution of a Markov chain. In many real-world applications, access to the underlying transition operator is limited to a fixed set of data that has already been collected, without additional interaction with the environment being available. We show that consistent estimation remains possible in this challenging scenario, and that effective estimation can still be achieved in important applications. Our approach is based on estimating a ratio that corrects for the discrepancy between the stationary and empirical distributions, derived from fundamental properties of the stationary distribution, and exploiting constraint reformulations based on variational divergence minimization. The resulting algorithm, GenDICE, is straightforward and effective. We prove its consistency under general conditions, provide an error analysis, and demonstrate strong empirical performance on benchmark problems, including off-line PageRank and off-policy policy evaluation.

behavior policy, conference paper, gendice, (14 more...)

2002.09072

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Middle East > Jordan (0.04)
(2 more...)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)