Goto

Collaborating Authors

 Reinforcement Learning


AI learning technique may illustrate function of reward pathways in the brain

#artificialintelligence

A team of researchers from DeepMind, University College and Harvard University has found that lessons learned in applying learning techniques to AI systems may help explain how reward pathways work in the brain. In their paper published in the journal Nature, the group describes comparing distributional reinforcement learning in a computer with dopamine processing in the mouse brain, and what they learned from it. Prior research has shown that dopamine produced in the brain is involved in reward processing--it is produced when something good happens, and its expression results in feelings of pleasure. Some studies have also suggested that the neurons in the brain that respond to the presence of dopamine all respond in the same ways--an event causes a person or a mouse to feel either good or bad. Other studies have suggested that neuronal response is more of a gradient.


Reinforcement Learning -- The Fellowship of Las Vegas

#artificialintelligence

I wanted to name this Adventures in Reinforcement Learning. Then I realized that it was probably the lamest name I could ever create. Wikipedia can explain it better. Then why do you even have a blog post? Well I had to take a graduate level AI class to understand Reinforcement Learning enough for me to try playing around with examples I found online and tweak them to my interests, ultimately creating something in an hour while listening to RetroWave.


cube2net: Efficient Query-Specific Network Construction with Data Cube Organization

arXiv.org Machine Learning

Networks are widely used to model objects with interactions and have enabled various downstream applications. However, in the real world, network mining is often done on particular query sets of objects, which does not require the construction and computation of networks including all objects in the datasets. In this work, for the first time, we propose to address the problem of query-specific network construction, to break the efficiency bottlenecks of existing network mining algorithms and facilitate various downstream tasks. To deal with real-world massive networks with complex attributes, we propose to leverage the well-developed data cube technology to organize network objects w.r.t. their essential attributes. An efficient reinforcement learning algorithm is then developed to automatically explore the data cube structures and construct the optimal query-specific networks. With extensive experiments of two classic network mining tasks on different real-world large datasets, we show that our proposed cube2net pipeline is general, and much more effective and efficient in query-specific network construction, compared with other methods without the leverage of data cube or reinforcement learning.


Effects of sparse rewards of different magnitudes in the speed of learning of model-based actor critic methods

arXiv.org Machine Learning

Actor critic methods with sparse rewards in model-based deep reinforcement learning typically require a deterministic binary reward function that reflects only two possible outcomes: if, for each step, the goal has been achieved or not. Our hypothesis is that we can influence an agent to learn faster by applying an external environmental pressure during training, which adversely impacts its ability to get higher rewards. As such, we deviate from the classical paradigm of sparse rewards and add a uniformly sampled reward value to the baseline reward to show that (1) sample efficiency of the training process can be correlated to the adversity experienced during training, (2) it is possible to achieve higher performance in less time and with less resources, (3) we can reduce the performance variability experienced seed over seed, (4) there is a maximum point after which more pressure will not generate better results, and (5) that random positive incentives have an adverse effect when using a negative reward strategy, making an agent under those conditions learn poorly and more slowly. These results have been shown to be valid for Deep Deterministic Policy Gradients using Hindsight Experience Replay in a well known Mujoco environment, but we argue that they could be generalized to other methods and environments as well.


Graph Ordering: Towards the Optimal by Learning

arXiv.org Artificial Intelligence

Graph representation learning has achieved a remarkable success in many graph-based applications, such as node classification, link prediction, and community detection. These models are usually designed to preserve the vertex information at different granularity and reduce the problems in discrete space to some machine learning tasks in continuous space. However, regardless of the fruitful progress, for some kind of graph applications, such as graph compression and edge partition, it is very hard to reduce them to some graph representation learning tasks. Moreover, these problems are closely related to reformulating a global layout for a specific graph, which is an important NP-hard combinatorial optimization problem: graph ordering. In this paper, we propose to attack the graph ordering problem behind such applications by a novel learning approach. Distinguished from greedy algorithms based on predefined heuristics, we propose a neural network model: Deep Order Network (DON) to capture the hidden locality structure from partial vertex order sets. Supervised by sampled partial order, DON has the ability to infer unseen combinations. Furthermore, to alleviate the combinatorial explosion in the training space of DON and make the efficient partial vertex order sampling , we employ a reinforcement learning model: the Policy Network, to adjust the partial order sampling probabilities during the training phase of DON automatically. To this end, the Policy Network can improve the training efficiency and guide DON to evolve towards a more effective model automatically. Comprehensive experiments on both synthetic and real data validate that DON-RL outperforms the current state-of-the-art heuristic algorithm consistently. Two case studies on graph compression and edge partitioning demonstrate the potential power of DON-RL in real applications.


Multi-agent Motion Planning for Dense and Dynamic Environments via Deep Reinforcement Learning

arXiv.org Artificial Intelligence

How Abstract --This paper introduces a hybrid algorithm of deep reinforcement learning (RL) and Force-based motion planning (FMP) to solve distributed motion planning problem in dense and dynamic environments. Individually, RL and FMP algorithms each have their own limitations. FMP is not able to produce time-optimal paths and existing RL solutions are not able to produce collision-free paths in dense environments. Therefore, we first tried improving the performance of recent RL approaches by introducing a new reward function that not only eliminates the requirement of a pre supervised learning (SL) step but also decreases the chance of collision in crowded environments. That improved things, but there were still a lot of failure cases. So, we developed a hybrid approach to leverage the simpler FMP approach in stuck, simple and high-risk cases, and continue using RL for normal cases in which FMP can't produce optimal path. Also, we extend GA3C-CADRL algorithm to 3D environment. Simulation results show that the proposed algorithm outperforms both deep RL and FMP algorithms and produces up to 50 % more successful scenarios than deep RL and up to 75 % less extra time to reach goal than FMP . Index T erms --Motion planning, distributed algorithms, collision avoidance, deep learning, reinforcement learning, trajectory optimization, hybrid control. I NTRODUCTION M UL TI-AGENT motion planning has recently attracted much interest in the research community and has many applications including robot navigation among pedestrians, self-driving cars, and drone shows.


Plato Dialogue System: A Flexible Conversational AI Research Platform

arXiv.org Artificial Intelligence

As the field of Spoken Dialogue Systems and Conversational AI grows, so does the need for tools and environments that abstract away implementation details in order to expedite the development process, lower the barrier of entry to the field, and offer a common test-bed for new ideas. In this paper, we present Plato, a flexible Conversational AI platform written in Python that supports any kind of conversational agent architecture, from standard architectures to architectures with jointly-trained components, single- or multi-party interactions, and offline or online training of any conversational agent component. Plato has been designed to be easy to understand and debug and is agnostic to the underlying learning frameworks that train each component.


Lipschitz Lifelong Reinforcement Learning

arXiv.org Artificial Intelligence

We consider the problem of knowledge transfer when an agent is facing a series of Reinforcement Learning (RL) tasks. We introduce a novel metric between Markov Decision Processes and establish that close MDPs have close optimal value functions. Formally, the optimal value functions are Lipschitz continuous with respect to the tasks space. These theoretical results lead us to a value transfer method for Lifelong RL, which we use to build a PAC-MDP algorithm with improved convergence rate. We illustrate the benefits of the method in Lifelong RL experiments.


Major AI breakthrough unlocks secrets of human brain

#artificialintelligence

Researchers at Google-owned DeepMind discovered that a recent development in computer science regarding reinforcement learning could be applied to how the brain's dopamine system works. The research, published in the scientific journal Nature, has implications for better understanding mental health, as well as for learning and motivation disorders. It found evidence that something referred to as "distributional reinforcement learning" used in AI algorithms actually mimics the dopamine reward system within the brain. The technique allows the brain to use distribute the probability of future rewards rather than focussing on actions that result in immediate rewards. "We found that dopamine neurons in the brain were each tuned to different levels of pessimism or optimism," the researchers explained in a blog post describing their discovery.


SEERL: Sample Efficient Ensemble Reinforcement Learning

#artificialintelligence

Ensemble learning is a very prevalent method employed in machine learning. The relative success of ensemble methods is attributed to its ability to tackle a wide range of instances and complex problems that require different low-level approaches. However, ensemble methods are relatively less popular in reinforcement learning owing to the high sample complexity and computational expense involved. We present a new training and evaluation framework for model-free algorithms that use ensembles of policies obtained from a single training instance. These policies are diverse in nature and are learned through directed perturbation of the model parameters at regular intervals.