AITopics | acktr

Collaborating Authors

acktr

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation

Yuhuai Wu, Elman Mansimov, Roger B. Grosse, Shun Liao, Jimmy Ba

Neural Information Processing SystemsNov-21-2025, 07:08:24 GMT

In this work, we propose to apply trust region optimization to deep reinforcement learning using a recently proposed Kronecker-factored approximation to the curvature.

approximation, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.15)
Asia > Middle East > Jordan (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Industry: Leisure & Entertainment > Games (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.30)

Add feedback

Reviews: Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation

Neural Information Processing SystemsOct-7-2024, 17:52:28 GMT

The manuscript discusses an important topic, which is optimization in deep reinforcement learning. The authors extend the use of Kronecker-Factored approximation to develop a second order optimization method for deep reinforcement learning. The optimization method use kronecker-factored approximation to the Fisher matrix to estimate the curvature of the cost, resulting in a scalable approximation to natural gradients. The authors demonstrate the power of the method (termed ACKTR) in terms of the performance of agents in Atari and Mujoco RL environments, and compare the proposed algorithm to two previous methods (A2C and TRPO). Overall the manuscript is well-written and to my knowledge the methodology is a novel application to Kronecker-factored approximation.

algorithm, approximation, kronecker-factored approximation, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation

Yuhuai Wu, Elman Mansimov, Roger B. Grosse, Shun Liao, Jimmy Ba

Neural Information Processing SystemsOct-3-2024, 04:02:10 GMT

In this work, we propose to apply trust region optimization to deep reinforcement learning using a recently proposed Kronecker-factored approximation to the curvature. We extend the framework of natural policy gradient and propose to optimize both the actor and the critic using Kronecker-factored approximate curvature (K-FAC) with trust region; hence we call our method Actor Critic using Kronecker-Factored Trust Region (ACKTR). To the best of our knowledge, this is the first scalable trust region natural gradient method for actor-critic methods. It is also the method that learns non-trivial tasks in continuous control as well as discrete control policies directly from raw pixel inputs. We tested our approach across discrete domains in Atari games as well as continuous domains in the Mu-JoCo environment. With the proposed methods, we are able to achieve higher rewards and a 2-to 3-fold improvement in sample efficiency on average, compared to previous state-of-the-art on-policy actor-critic methods.

acktr, approximation, sample efficiency, (13 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.29)
Asia > Middle East > Jordan (0.04)
North America > United States > New York (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Industry: Leisure & Entertainment > Games (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Emergence of Different Modes of Tool Use in a Reaching and Dragging Task

Nguyen, Khuong, Choe, Yoonsuck

arXiv.org Artificial IntelligenceDec-8-2020

Tool use is an important milestone in the evolution of intelligence. In this paper, we investigate different modes of tool use that emerge in a reaching and dragging task. In this task, a jointed arm with a gripper must grab a tool (T, I, or L-shaped) and drag an object down to the target location (the bottom of the arena). The simulated environment had real physics such as gravity and friction. We trained a deep-reinforcement learning based controller (with raw visual and proprioceptive input) with minimal reward shaping information to tackle this task. We observed the emergence of a wide range of unexpected behaviors, not directly encoded in the motor primitives or reward functions. Examples include hitting the object to the target location, correcting error of initial contact, throwing the tool toward the object, as well as normal expected behavior such as wide sweep. Also, we further analyzed these behaviors based on the type of tool and the initial position of the target object. Our results show a rich repertoire of behaviors, beyond the basic built-in mechanisms of the deep reinforcement learning method we used.

agent, i-shaped tool, l-shaped tool, (15 more...)

arXiv.org Artificial Intelligence

2012.047

Country:

North America > United States > Texas > Brazos County > College Station (0.14)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.86)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

ROS2Learn: a reinforcement learning framework for ROS 2

Nuin, Yue Leire Erro, Lopez, Nestor Gonzalez, Moral, Elias Barba, Juan, Lander Usategui San, Rueda, Alejandro Solano, Vilches, Víctor Mayoral, Kojcev, Risto

arXiv.org Artificial IntelligenceMar-18-2019

We propose a novel framework for Deep Reinforcement Learning (DRL) in modular robotics to train a robot directly from joint states, using traditional robotic tools. We use an state-of-the-art implementation of the Proximal Policy Optimization, Trust Region Policy Optimization and Actor-Critic Kronecker-Factored Trust Region algorithms to learn policies in four different Modular Articulated Robotic Arm (MARA) environments. We support this process using a framework that communicates with typical tools used in robotics, such as Gazebo and Robot Operating System 2 (ROS 2). We evaluate several algorithms in modular robots with an empirical study in simulation.

algorithm, international conference, robot, (12 more...)

arXiv.org Artificial Intelligence

1903.06282

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)

Add feedback

On-Policy Trust Region Policy Optimisation with Replay Buffers

Kangin, Dmitry, Pugeault, Nicolas

arXiv.org Machine LearningJan-18-2019

Building upon the recent success of deep reinforcement learning methods, we investigate the possibility of on-policy reinforcement learning improvement by reusing the data from several consecutive policies. On-policy methods bring many benefits, such as ability to evaluate each resulting policy. However, they usually discard all the information about the policies which existed before. In this work, we propose adaptation of the replay buffer concept, borrowed from the off-policy learning setting, to create the method, combining advantages of on- and off-policy learning. To achieve this, the proposed algorithm generalises the $Q$-, value and advantage functions for data from multiple policies. The method uses trust region optimisation, while avoiding some of the common problems of the algorithms such as TRPO or ACKTR: it uses hyperparameters to replace the trust region selection heuristics, as well as the trainable covariance matrix instead of the fixed one. In many cases, the method not only improves the results comparing to the state-of-the-art trust region on-policy learning algorithms such as PPO, ACKTR and TRPO, but also with respect to their off-policy counterpart DDPG.

covariance matrix, reinforcement, replay buffer, (13 more...)

arXiv.org Machine Learning

1901.06212

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > United Kingdom > England > Devon > Exeter (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Improving On-policy Learning with Statistical Reward Accumulation

Deng, Yubin, Yu, Ke, Lin, Dahua, Tang, Xiaoou, Loy, Chen Change

arXiv.org Artificial IntelligenceSep-7-2018

Deep reinforcement learning has obtained significant breakthroughs in recent years. Most methods in deep-RL achieve good results via the maximization of the reward signal provided by the environment, typically in the form of discounted cumulative returns. Such reward signals represent the immediate feedback of a particular action performed by an agent. However, tasks with sparse reward signals are still challenging to on-policy methods. In this paper, we introduce an effective characterization of past reward statistics (which can be seen as long-term feedback signals) to supplement this immediate reward feedback. In particular, value functions are learned with multi-critics supervision, enabling complex value functions to be more easily approximated in on-policy learning, even when the reward signals are sparse. We also introduce a novel exploration mechanism called "hot-wiring" that can give a boost to seemingly trapped agents. We demonstrate the effectiveness of our advantage actor multi-critic (A2MC) method across the discrete domains in Atari games as well as continuous domains in the MuJoCo environments. A video demo is provided at https://youtu.be/zBmpf3Yz8tc.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

1809.02387

Country:

Asia > Middle East > Jordan (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games > Computer Games (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

An Empirical Analysis of Proximal Policy Optimization with Kronecker-factored Natural Gradients

Song, Jiaming, Wu, Yuhuai

arXiv.org Machine LearningJan-17-2018

In this technical report, we consider an approach that combines the PPO objective and K-FAC natural gradient optimization, for which we call PPOKFAC. We perform a range of empirical analysis on various aspects of the algorithm, such as sample complexity, training speed, and sensitivity to batch size and training epochs. We observe that PPOKFAC is able to outperform PPO in terms of sample complexity and speed in a range of MuJoCo environments, while being scalable in terms of batch size. In spite of this, it seems that adding more epochs is not necessarily helpful for sample efficiency, and PPOKFAC seems to be worse than its A2C counterpart, ACKTR.

artificial intelligence, batch size, machine learning, (12 more...)

arXiv.org Machine Learning

1801.05566

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.31)

Add feedback

Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation

Wu, Yuhuai, Mansimov, Elman, Grosse, Roger B., Liao, Shun, Ba, Jimmy

Neural Information Processing SystemsDec-31-2017

In this work, we propose to apply trust region optimization to deep reinforcement learning using a recently proposed Kronecker-factored approximation to the curvature. We extend the framework of natural policy gradient and propose to optimize both the actor and the critic using Kronecker-factored approximate curvature (K-FAC) with trust region; hence we call our method Actor Critic using Kronecker-Factored Trust Region (ACKTR). To the best of our knowledge, this is the first scalable trust region natural gradient method for actor-critic methods. It is also the method that learns non-trivial tasks in continuous control as well as discrete control policies directly from raw pixel inputs. We tested our approach across discrete domains in Atari games as well as continuous domains in the MuJoCo environment. With the proposed methods, we are able to achieve higher rewards and a 2- to 3-fold improvement in sample efficiency on average, compared to previous state-of-the-art on-policy actor-critic methods. Code is available at https://github.com/openai/baselines

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States (0.28)
North America > Canada > Ontario > Toronto (0.15)

Industry: Leisure & Entertainment > Games > Computer Games (0.57)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Elon Musk's Research Venture Has Trained AI To Teach Itself

#artificialintelligenceOct-2-2017, 17:25:15 GMT

As part of its effort to find better ways to develop and train "safe artificial general intelligence," OpenAI has been releasing its own versions of reinforcement learning algorithms. They call these OpenAI Baselines, and the most recent additions to these algorithms are two baselines that are meant to enhance machine learning performance by making it more efficient. The first is a baseline implementation called Actor Critic using Kronecker-factored Trust Region (ACKTR). Developed by researchers from the University of Toronto (UofT) and New York University (NYU), ACKTR improves on the way AI policies perform deep reinforcement learning -- learning that is accomplished only by trial and error, and obtained only through raw observation. In a paper published online, the UofT and NYU researchers used simulated robots and Atari games to test how ACKTR learns control policies.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

#artificialintelligence

Country:

North America > Canada > Ontario > Toronto (0.57)
North America > United States > New York (0.26)

Genre: Research Report (0.57)

Industry: Leisure & Entertainment > Games > Computer Games (0.73)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.60)

Add feedback