AITopics

Genre: Research Report > New Finding (0.75)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.41)

#artificialintelligenceJan-18-2020, 00:23:06 GMT

Reinforcement Learning -- The Fellowship of Las Vegas

I wanted to name this Adventures in Reinforcement Learning. Then I realized that it was probably the lamest name I could ever create. Wikipedia can explain it better. Then why do you even have a blog post? Well I had to take a graduate level AI class to understand Reinforcement Learning enough for me to try playing around with examples I found online and tweak them to my interests, ultimately creating something in an hour while listening to RetroWave.

probability, reinforcement learning, slot machine, (9 more...)

Country:

North America > United States > Nevada > Clark County > Las Vegas (0.42)
Europe > Monaco (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.84)

arXiv.org Machine LearningJan-18-2020

cube2net: Efficient Query-Specific Network Construction with Data Cube Organization

Yang, Carl, Liu, Mengxiong, He, Frank, Peng, Jian, Han, Jiawei

Networks are widely used to model objects with interactions and have enabled various downstream applications. However, in the real world, network mining is often done on particular query sets of objects, which does not require the construction and computation of networks including all objects in the datasets. In this work, for the first time, we propose to address the problem of query-specific network construction, to break the efficiency bottlenecks of existing network mining algorithms and facilitate various downstream tasks. To deal with real-world massive networks with complex attributes, we propose to leverage the well-developed data cube technology to organize network objects w.r.t. their essential attributes. An efficient reinforcement learning algorithm is then developed to automatically explore the data cube structures and construct the optimal query-specific networks. With extensive experiments of two classic network mining tasks on different real-world large datasets, we show that our proposed cube2net pipeline is general, and much more effective and efficient in query-specific network construction, compared with other methods without the leverage of data cube or reinforcement learning.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Machine Learning

2002.00841

Country:

North America > United States > Illinois > Champaign County > Urbana (0.14)
North America > United States > Nevada (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report (1.00)

Industry: Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.94)

Vargas, Juan, Andjelic, Lazar, Farimani, Amir Barati

Effects of sparse rewards of different magnitudes in the speed of learning of model-based actor critic methods

arXiv.org Machine LearningJan-18-2020

Actor critic methods with sparse rewards in model-based deep reinforcement learning typically require a deterministic binary reward function that reflects only two possible outcomes: if, for each step, the goal has been achieved or not. Our hypothesis is that we can influence an agent to learn faster by applying an external environmental pressure during training, which adversely impacts its ability to get higher rewards. As such, we deviate from the classical paradigm of sparse rewards and add a uniformly sampled reward value to the baseline reward to show that (1) sample efficiency of the training process can be correlated to the adversity experienced during training, (2) it is possible to achieve higher performance in less time and with less resources, (3) we can reduce the performance variability experienced seed over seed, (4) there is a maximum point after which more pressure will not generate better results, and (5) that random positive incentives have an adverse effect when using a negative reward strategy, making an agent under those conditions learn poorly and more slowly. These results have been shown to be valid for Deep Deterministic Policy Gradients using Hindsight Experience Replay in a well known Mujoco environment, but we argue that they could be generalized to other methods and environments as well.

bonus reward, experiment, reward function, (15 more...)

arXiv.org Machine Learning

2001.06725

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.05)
North America > Puerto Rico > San Juan > San Juan (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

arXiv.org Artificial IntelligenceJan-18-2020

Graph Ordering: Towards the Optimal by Learning

Zhao, Kangfei, Rong, Yu, Yu, Jeffrey Xu, Huang, Junzhou, Zhang, Hao

Graph representation learning has achieved a remarkable success in many graph-based applications, such as node classification, link prediction, and community detection. These models are usually designed to preserve the vertex information at different granularity and reduce the problems in discrete space to some machine learning tasks in continuous space. However, regardless of the fruitful progress, for some kind of graph applications, such as graph compression and edge partition, it is very hard to reduce them to some graph representation learning tasks. Moreover, these problems are closely related to reformulating a global layout for a specific graph, which is an important NP-hard combinatorial optimization problem: graph ordering. In this paper, we propose to attack the graph ordering problem behind such applications by a novel learning approach. Distinguished from greedy algorithms based on predefined heuristics, we propose a neural network model: Deep Order Network (DON) to capture the hidden locality structure from partial vertex order sets. Supervised by sampled partial order, DON has the ability to infer unseen combinations. Furthermore, to alleviate the combinatorial explosion in the training space of DON and make the efficient partial vertex order sampling , we employ a reinforcement learning model: the Policy Network, to adjust the partial order sampling probabilities during the training phase of DON automatically. To this end, the Policy Network can improve the training efficiency and guide DON to evolve towards a more effective model automatically. Comprehensive experiments on both synthetic and real data validate that DON-RL outperforms the current state-of-the-art heuristic algorithm consistently. Two case studies on graph compression and edge partitioning demonstrate the potential power of DON-RL in real applications.

graph, policy network, vertex, (15 more...)

2001.06631

Country:

Asia > China > Hong Kong (0.04)
North America > United States > District of Columbia > Washington (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Hungary (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)

Semnani, Samaneh Hosseini, Liu, Hugh, Everett, Michael, de Ruiter, Anton, How, Jonathan P.

Multi-agent Motion Planning for Dense and Dynamic Environments via Deep Reinforcement Learning

arXiv.org Artificial IntelligenceJan-18-2020

How Abstract --This paper introduces a hybrid algorithm of deep reinforcement learning (RL) and Force-based motion planning (FMP) to solve distributed motion planning problem in dense and dynamic environments. Individually, RL and FMP algorithms each have their own limitations. FMP is not able to produce time-optimal paths and existing RL solutions are not able to produce collision-free paths in dense environments. Therefore, we first tried improving the performance of recent RL approaches by introducing a new reward function that not only eliminates the requirement of a pre supervised learning (SL) step but also decreases the chance of collision in crowded environments. That improved things, but there were still a lot of failure cases. So, we developed a hybrid approach to leverage the simpler FMP approach in stuck, simple and high-risk cases, and continue using RL for normal cases in which FMP can't produce optimal path. Also, we extend GA3C-CADRL algorithm to 3D environment. Simulation results show that the proposed algorithm outperforms both deep RL and FMP algorithms and produces up to 50 % more successful scenarios than deep RL and up to 75 % less extra time to reach goal than FMP . Index T erms --Motion planning, distributed algorithms, collision avoidance, deep learning, reinforcement learning, trajectory optimization, hybrid control. I NTRODUCTION M UL TI-AGENT motion planning has recently attracted much interest in the research community and has many applications including robot navigation among pedestrians, self-driving cars, and drone shows.

agent, algorithm, scenario, (14 more...)

2001.06627

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California (0.04)
(2 more...)

Genre: Research Report > New Finding (0.34)

Industry:

Transportation (0.90)
Information Technology (0.66)

Technology:

Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Papangelis, Alexandros, Namazifar, Mahdi, Khatri, Chandra, Wang, Yi-Chia, Molino, Piero, Tur, Gokhan

Plato Dialogue System: A Flexible Conversational AI Research Platform

arXiv.org Artificial IntelligenceJan-17-2020

As the field of Spoken Dialogue Systems and Conversational AI grows, so does the need for tools and environments that abstract away implementation details in order to expedite the development process, lower the barrier of entry to the field, and offer a common test-bed for new ideas. In this paper, we present Plato, a flexible Conversational AI platform written in Python that supports any kind of conversational agent architecture, from standard architectures to architectures with jointly-trained components, single- or multi-party interactions, and offline or online training of any conversational agent component. Plato has been designed to be easy to understand and debug and is agnostic to the underlying learning frameworks that train each component.

agent, conversational agent, plato, (16 more...)

2001.06463

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Europe > France (0.04)
(7 more...)

Genre: Research Report (0.64)

Industry: Education > Educational Setting > Online (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Lecarpentier, Erwan, Abel, David, Asadi, Kavosh, Jinnai, Yuu, Rachelson, Emmanuel, Littman, Michael L.

Lipschitz Lifelong Reinforcement Learning

arXiv.org Artificial IntelligenceJan-17-2020

We consider the problem of knowledge transfer when an agent is facing a series of Reinforcement Learning (RL) tasks. We introduce a novel metric between Markov Decision Processes and establish that close MDPs have close optimal value functions. Formally, the optimal value functions are Lipschitz continuous with respect to the tasks space. These theoretical results lead us to a value transfer method for Lifelong RL, which we use to build a PAC-MDP algorithm with improved convergence rate. We illustrate the benefits of the method in Lifelong RL experiments.

algorithm, mdp, probability, (13 more...)

2001.05411

Country:

North America > United States > Rhode Island > Providence County > Providence (0.04)
Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

#artificialintelligenceJan-16-2020, 23:14:32 GMT

Major AI breakthrough unlocks secrets of human brain

Researchers at Google-owned DeepMind discovered that a recent development in computer science regarding reinforcement learning could be applied to how the brain's dopamine system works. The research, published in the scientific journal Nature, has implications for better understanding mental health, as well as for learning and motivation disorders. It found evidence that something referred to as "distributional reinforcement learning" used in AI algorithms actually mimics the dopamine reward system within the brain. The technique allows the brain to use distribute the probability of future rewards rather than focussing on actions that result in immediate rewards. "We found that dopamine neurons in the brain were each tuned to different levels of pessimism or optimism," the researchers explained in a blog post describing their discovery.

brain, breakthrough, reinforcement, (9 more...)

Genre: Research Report (0.39)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.52)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.56)

#artificialintelligenceJan-16-2020, 18:51:37 GMT

SEERL: Sample Efficient Ensemble Reinforcement Learning

Ensemble learning is a very prevalent method employed in machine learning. The relative success of ensemble methods is attributed to its ability to tackle a wide range of instances and complex problems that require different low-level approaches. However, ensemble methods are relatively less popular in reinforcement learning owing to the high sample complexity and computational expense involved. We present a new training and evaluation framework for model-free algorithms that use ensembles of policies obtained from a single training instance. These policies are diverse in nature and are learned through directed perturbation of the model parameters at regular intervals.

ensemble method, sample efficient ensemble reinforcement learning, seerl

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.64)