AITopics

1901.10995

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Massachusetts > Middlesex County > Reading (0.04)
North America > United States > Iowa (0.04)
(3 more...)

Genre: Research Report > New Finding (0.45)

Industry: Leisure & Entertainment > Games > Computer Games (0.86)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)

Chu, Casey, Blanchet, Jose, Glynn, Peter

Probability Functional Descent: A Unifying Perspective on GANs, Variational Inference, and Reinforcement Learning

arXiv.org Machine LearningJan-30-2019

The goal of this paper is to provide a unifying view of a wide range of problems of interest in machine learning by framing them as the minimization of functionals defined on the space of probability measures. In particular, we show that generative adversarial networks, variational inference, and actor-critic methods in reinforcement learning can all be seen through the lens of our framework. We then discuss a generic optimization algorithm for our formulation, called probability functional descent (PFD), and show how this algorithm recovers existing methods developed independently in the settings mentioned earlier.

algorithm, descent step, influence function, (10 more...)

1901.10691

Country:

North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

arXiv.org Artificial IntelligenceJan-30-2019

Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement

Barreto, André, Borsa, Diana, Quan, John, Schaul, Tom, Silver, David, Hessel, Matteo, Mankowitz, Daniel, Žídek, Augustin, Munos, Rémi

The ability to transfer skills across tasks has the potential to scale up reinforcement learning (RL) agents to environments currently out of reach. Recently, a framework based on two ideas, successor features (SFs) and generalised policy improvement (GPI), has been introduced as a principled way of transferring skills. In this paper we extend the SFs & GPI framework in two ways. One of the basic assumptions underlying the original formulation of SFs & GPI is that rewards for all tasks of interest can be computed as linear combinations of a fixed set of features. We relax this constraint and show that the theoretical guarantees supporting the framework can be extended to any set of tasks that only differ in the reward function. Our second contribution is to show that one can use the reward functions themselves as features for future tasks, without any loss of expressiveness, thus removing the need to specify a set of features beforehand. This makes it possible to combine SFs & GPI with deep learning in a more stable way. We empirically verify this claim on a complex 3D environment where observations are images from a first-person perspective. We show that the transfer promoted by SFs & GPI leads to very good policies on unseen tasks almost instantaneously. We also describe how to learn policies specialised to the new tasks in a way that allows them to be added to the agent's set of skills, and thus be reused in the future.

deep reinforcement learning, environment step, successor feature, (11 more...)

arXiv.org Artificial Intelligence

1901.10964

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Mishkin, Dmytro, Dosovitskiy, Alexey, Koltun, Vladlen

Benchmarking Classic and Learned Navigation in Complex 3D Environments

arXiv.org Artificial IntelligenceJan-30-2019

Navigation research is attracting renewed interest with the advent of learning-based methods. However, this new line of work is largely disconnected from well-established classic navigation approaches. In this paper, we take a step towards coordinating these two directions of research. We set up classic and learning-based navigation systems in common simulated environments and thoroughly evaluate them in indoor spaces of varying complexity, with access to different sensory modalities. Additionally, we measure human performance in the same environments. We find that a classic pipeline, when properly tuned, can perform very well in complex cluttered environments. On the other hand, learned systems can operate more robustly with a limited sensor suite. Both approaches are still far from human-level performance.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

1901.10915

Country: Europe (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

#artificialintelligenceJan-29-2019, 11:03:25 GMT

Berkeley startup to train robots like puppets – RtoZ.Org – Latest Technology News

The Researchers from the University of California, Berkeley, have launched a start-up, Embodied Intelligence, Inc., to use the latest techniques of deep reinforcement learning and artificial intelligence to make industrial robots easily teachable. Robots today must be programmed by writing computer code, but imagine donning a VR headset and virtually guiding a robot through a task, like you would move the arms of a puppet, and then letting the robot take it from there. That's the vision of Pieter Abbeel, a professor of electrical engineering and computer science at the University of California, Berkeley, and his students, Peter Chen, Rocky Duan and Tianhao Zhang, who have launched a startup, Embodied Intelligence Inc., to use the latest techniques of deep reinforcement learning and artificial intelligence to make industrial robots easily teachable. "Right now, if you want to set up a robot, you program that robot to do what you want it to do, which takes a lot of time and a lot of expertise," said Abbeel, who is currently on leave to turn his vision into reality. "With our advances in machine learning, we can write a piece of software once -- machine learning code that enables the robot to learn -- and then when the robot needs to be equipped with a new skill, we simply provide new data."

artificial intelligence, machine learning, reinforcement learning, (12 more...)

#artificialintelligence

Country: North America > United States > California > Alameda County > Berkeley (0.46)

Industry:

Education (0.70)
Leisure & Entertainment > Games > Go (0.33)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

#artificialintelligenceJan-29-2019, 05:04:30 GMT

Quantum generative adversarial learning in a superconducting quantum circuit

Machine learning (1), or more broadly artificial intelligence (2), represents an important area with general practical applications where near-term quantum devices may offer a substantial speedup over classical ones. With this vision, an intriguing interdisciplinary field of quantum machine learning/artificial intelligence has emerged and attracted tremendous attention in recent years (3, 4). A number of quantum algorithms that promise exponential speedups have been theoretically proposed (3–6), and some were demonstrated in proof-of-principle experiments (7, 8). Yet, in most of these previous scenarios, the input datasets considered are typically classical. As a result, certain costly processes or techniques, such as quantum random access memories (9), are required to first map the classical data to quantum wave functions so as to be processed by quantum devices, rendering the potential speedups nullified (10).

artificial intelligence, machine learning, reinforcement learning, (6 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.43)

Safe, Efficient, and Comfortable Velocity Control based on Reinforcement Learning for Autonomous Driving

Zhu, Meixin, Wang, Yinhai, Hu, Jingyun, Wang, Xuesong, Ke, Ruimin

A model used for velocity control during car following was proposed based on deep reinforcement learning (RL). To fulfil the multi-objectives of car following, a reward function reflecting driving safety, efficiency, and comfort was constructed. With the reward function, the RL agent learns to control vehicle speed in a fashion that maximizes cumulative rewards, through trials and errors in the simulation environment. A total of 1,341 car-following events extracted from the Next Generation Simulation (NGSIM) dataset were used to train the model. Car-following behavior produced by the model were compared with that observed in the empirical NGSIM data, to demonstrate the model's ability to follow a lead vehicle safely, efficiently, and comfortably. Results show that the model demonstrates the capability of safe, efficient, and comfortable velocity control in that it 1) has small percentages (8\%) of dangerous minimum time to collision values (\textless\ 5s) than human drivers in the NGSIM data (35\%); 2) can maintain efficient and safe headways in the range of 1s to 2s; and 3) can follow the lead vehicle comfortably with smooth acceleration. The results indicate that reinforcement learning methods could contribute to the development of autonomous driving systems.

car-following event, headway, ngsim data, (14 more...)

1902.00089

Country:

North America > United States > Florida > Orange County > Orlando (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
Asia > China > Shanghai > Shanghai (0.05)
(6 more...)

Genre: Research Report > New Finding (0.88)

Industry:

Transportation > Ground > Road (1.00)
Automobiles & Trucks (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Krupnik, Orr, Mordatch, Igor, Tamar, Aviv

Multi Agent Reinforcement Learning with Multi-Step Generative Models

The dynamics between agents and the environment are an important component of multi-agent Reinforcement Learning (RL), and learning them provides a basis for decision making. However, a major challenge in optimizing a learned dynamics model is the accumulation of error when predicting multiple steps into the future. Recent advances in variational inference provide model based solutions that predict complete trajectory segments, and optimize over a latent representation of trajectories. For single-agent scenarios, several recent studies have explored this idea, and showed its benefits over conventional methods. In this work, we extend this approach to the multi-agent case, and effectively optimize over a latent space that encodes multi-agent strategies. We discuss the challenges in optimizing over a latent variable model for multiple agents, both in the optimization algorithm and in the model representation, and propose a method for both cooperative and competitive settings based on risk-sensitive optimization. We evaluate our method on tasks in the multi-agent particle environment and on a simulated RoboCup domain.

agent, multi agent reinforcement learning, trajectory, (13 more...)

1901.10251

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Israel > Haifa District > Haifa (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Sports > Soccer (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Tallec, Corentin, Blier, Léonard, Ollivier, Yann

Making Deep Q-learning methods robust to time discretization

Despite remarkable successes, Deep Reinforcement Learning (DRL) is not robust to hyperparameterization, implementation details, or small environment changes (Henderson et al. 2017, Zhang et al. 2018). Overcoming such sensitivity is key to making DRL applicable to real world problems. In this paper, we identify sensitivity to time discretization in near continuous-time environments as a critical factor; this covers, e.g., changing the number of frames per second, or the action frequency of the controller. Empirically, we find that Q-learning-based approaches such as Deep Q- learning (Mnih et al., 2015) and Deep Deterministic Policy Gradient (Lillicrap et al., 2015) collapse with small time steps. Formally, we prove that Q-learning does not exist in continuous time. We detail a principled way to build an off-policy RL algorithm that yields similar performances over a wide range of time discretizations, and confirm this robustness empirically.

algorithm, deep q-learning method robust, time discretization, (11 more...)

1901.09732

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Wu, Yueh-Hua, Charoenphakdee, Nontawat, Bao, Han, Tangkaratt, Voot, Sugiyama, Masashi

Imitation Learning from Imperfect Demonstration

Imitation learning (IL) has become of great interest because obtaining demonstrations is usually easier than designing reward. Reward is a signal to instruct agents to complete the desired tasks. However, ill-designed reward functions usually lead to unexpected behaviors [Amodei et al., 2016; Dewey, 2014; Everitt and Hutter, 2016]. There are two main approaches that can be used to solve IL: behavioral cloning (BC) [Schaal, 1999], which adopts supervised learning approaches to learn an action predictor that is trained directly from demonstration data; and apprenticeship learning (AL), which attempts to find a policy that is better than the expert policy for a class of cost functions [Abbeel and Ng, 2004]. Even though BC can be trained with supervised learning approaches directly, it has been shown that BC cannot imitate the expert policy without a large amount of demonstration data for not considering the transition of environments [Ross et al., 2011].

demonstration, optimal policy, unlabeled data, (15 more...)

1901.09387

Country:

North America > United States > New York > New York County > New York City (0.04)
Asia > Taiwan (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Sports > Basketball (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)