AITopics

1912.11077

Country: North America > United States > New Jersey > Middlesex County > Piscataway (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Games (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

arXiv.org Artificial IntelligenceDec-23-2019

Towards Practical Multi-Object Manipulation using Relational Reinforcement Learning

Li, Richard, Jabri, Allan, Darrell, Trevor, Agrawal, Pulkit

Learning robotic manipulation tasks using reinforcement learning with sparse rewards is currently impractical due to the outrageous data requirements. Many practical tasks require manipulation of multiple objects, and the complexity of such tasks increases with the number of objects. Learning from a curriculum of increasingly complex tasks appears to be a natural solution, but unfortunately, does not work for many scenarios. We hypothesize that the inability of the state-of-the-art algorithms to effectively utilize a task curriculum stems from the absence of inductive biases for transferring knowledge from simpler to complex tasks. We show that graph-based relational architectures overcome this limitation and enable learning of complex tasks when provided with a simple curriculum of tasks with increasing numbers of objects. We demonstrate the utility of our framework on a simulated block stacking task. Starting from scratch, our agent learns to stack six blocks into a tower. Despite using step-wise sparse rewards, our method is orders of magnitude more data-efficient and outperforms the existing state-of-the-art method that utilizes human demonstrations. Furthermore, the learned policy exhibits zero-shot generalization, successfully stacking blocks into taller towers and previously unseen configurations such as pyramids, without any further training.

curriculum, learning, reinforcement learning, (13 more...)

1912.11032

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)

Genre: Research Report (1.00)

Industry: Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

#artificialintelligenceDec-22-2019, 12:27:33 GMT

OpenAI Open Sources Safety Gym to Improve Safety in Reinforcement Learning Agents

Safety is one of the emerging concerns in deep learning systems. In the context of deep learning systems, safety is related to building agents that respect safety dynamics in a given environment. In many cases such as supervised learning, safety is modeled as part of the training datasets. However, other methods such as reinforcement learning require agents to master the dynamics of the environments by experimenting with it which introduces its own set of safety concerns. To address some of these challenges, OpenAI has recently open sourced Safety Gym, a suite of environments and tools for measuring progress towards reinforcement learning agents that respect safety constraints while training.

agent, reinforcement, safety gym, (12 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.64)

arXiv.org Machine LearningDec-22-2019

Finite-Time Analysis and Restarting Scheme for Linear Two-Time-Scale Stochastic Approximation

Doan, Thinh T.

Motivated by their broad applications in reinforcement learning, we study the linear two-time-scale stochastic approximation, an iterative method using two different step sizes for finding the solutions of a system of two equations. Our main focus is to characterize the finite-time complexity of this method under time-varying step sizes and Markovian noise. In particular, we show that the mean square errors of the variables generated by the method converge to zero at a sublinear rate $\mathcal{O}(k^{2/3})$, where $k$ is the number of iterations. We then improve the performance of this method by considering the restarting scheme, where we restart the algorithm after a predetermined number of iterations. We show that using this restarting method the complexity of the algorithm under time-varying step sizes is as good as the one using constant step sizes, but still achieving an exact converge to the desired solution. Moreover, the restarting scheme also helps to prevent the step sizes from getting too small, which is useful for the practical implementation of the linear two-time-scale stochastic approximation.

assumption 2, convergence, step size, (15 more...)

1912.10583

Country:

North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceDec-22-2019

Direct and indirect reinforcement learning

Guan, Yang, Li, Shengbo Eben, Duan, Jingliang, Li, Jie, Ren, Yangang, Cheng, Bo

Reinforcement learning (RL) algorithms have been successfully applied to a range of challenging sequential decision making and control tasks. In this paper, we classify RL into direct and indirect methods according to how they seek optimal policy of the Markov Decision Process (MDP) problem. The former solves optimal policy by directly maximizing an objective function using gradient descent method, in which the objective function is usually the expectation of accumulative future rewards. The latter indirectly finds the optimal policy by solving the Bellman equation, which is the sufficient and necessary condition from Bellman's principle of optimality. We take vanilla policy gradient and approximate policy iteration to study their internal relationship, and reveal that both direct and indirect methods can be unified in actor-critic architecture and are equivalent if we always choose stationary state distribution of current policy as initial state distribution of MDP. Finally, we classify the current mainstream RL algorithms and compare the differences between other criteria including value-based and policy-based, model-based and model-free.

algorithm, gradient, state distribution, (12 more...)

1912.106

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Asia > Middle East > Jordan (0.04)
Asia > China > Beijing > Beijing (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)

Tan, Tian, Xiong, Zhihan, Dwaracherla, Vikranth R.

Parameterized Indexed Value Function for Efficient Exploration in Reinforcement Learning

arXiv.org Artificial IntelligenceDec-22-2019

It is well known that quantifying uncertainty in the action-value estimates is crucial for efficient exploration in reinforcement learning. Ensemble sampling offers a relatively computationally tractable way of doing this using randomized value functions. However, it still requires a huge amount of computational resources for complex problems. In this paper, we present an alternative, computationally efficient way to induce exploration using index sampling. We use an indexed value function to represent uncertainty in our action-value estimates. We first present an algorithm to learn parameterized indexed value function through a distributional version of temporal difference in a tabular setting and prove its regret bound. Then, in a computational point of view, we propose a dual-network architecture, Parameterized Indexed Networks (PINs), comprising one mean network and one uncertainty network to learn the indexed value function. Finally, we show the efficacy of PINs through computational experiments.

algorithm 1, exploration, value function, (14 more...)

1912.10577

Country: North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

#artificialintelligenceDec-21-2019, 22:44:17 GMT

The 10 Algorithms Data Scientist must have to Know.

Let's say I am given an Excel sheet with data about various fruits and I have to tell which look like Apples. What I will do is ask a question "Which fruits are red and round?" and divide all fruits which answer yes and no to the question. Now, All Red and Round fruits might not be apples and all apples won't be red and round. So I will ask a question "Which fruits have red or yellow color hints on them? " on red and round fruits and will ask "Which fruits are green and round?" on not red and round fruits. Based on these questions I can tell with considerable accuracy which are apples. This cascade of questions is what a decision tree is. However, this is a decision tree based on my intuition.

algorithm, sklearn, stable module, (12 more...)

#artificialintelligence

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.05)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Asia > Middle East > Qatar (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.73)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)

Sun, Yanchao, Huang, Furong

Can Agents Learn by Analogy? An Inferable Model for PAC Reinforcement Learning

arXiv.org Machine LearningDec-21-2019

Model-based reinforcement learning algorithms make decisions by building and utilizing a model of the environment. However, none of the existing algorithms attempts to infer the dynamics of any state-action pair from known state-action pairs before meeting it for sufficient times. We propose a new model-based method called Greedy Inference Model (GIM) that infers the unknown dynamics from known dynamics based on the internal spectral properties of the environment. In other words, GIM can "learn by analogy". We further introduce a new exploration strategy which ensures that the agent rapidly and evenly visits unknown state-action pairs. GIM is much more computationally efficient than state-of-the-art model-based algorithms, as the number of dynamic programming operations is independent of the environment size. Lower sample complexity could also be achieved under mild conditions compared against methods without inferring. Experimental results demonstrate the effectiveness and efficiency of GIM in a variety of real-world tasks.

artificial intelligence, state-action pair, upstream oil & gas, (17 more...)

1912.10329

Country:

North America > United States > Maryland (0.14)
Europe > United Kingdom > England (0.14)

Genre:

Research Report > Promising Solution (0.48)
Research Report > New Finding (0.48)

Industry: Energy > Oil & Gas > Upstream (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Roy, Arghyadip, Borkar, Vivek, Karandikar, Abhay, Chaporkar, Prasanna

Online Reinforcement Learning of Optimal Threshold Policies for Markov Decision Processes

arXiv.org Machine LearningDec-21-2019

Markov Decision Process (MDP) problems can be solved using Dynamic Programming (DP) methods which suffer from the curse of dimensionality and the curse of modeling. To overcome these issues, Reinforcement Learning (RL) methods are adopted in practice. In this paper, we aim to obtain the optimal admission control policy in a system where different classes of customers are present. Using DP techniques, we prove that it is optimal to admit the $i$ th class of customers only upto a threshold $\tau(i)$ which is a non-increasing function of $i$. Contrary to traditional RL algorithms which do not take into account the structural properties of the optimal policy while learning, we propose a structure-aware learning algorithm which exploits the threshold structure of the optimal policy. We prove the asymptotic convergence of the proposed algorithm to the optimal policy. Due to the reduction in the policy space, the structure-aware learning algorithm provides remarkable improvements in storage and computational complexities over classical RL algorithms. Simulation results also establish the gain in the convergence rate of the proposed algorithm over other RL algorithms. The techniques presented in the paper can be applied to any general MDP problem covering various applications such as inventory management, financial planning and communication networking.

algorithm, optimal policy, value function, (14 more...)

1912.10325

Country:

North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
North America > United States > Illinois (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > India > Maharashtra > Mumbai (0.04)

Genre: Research Report (0.64)

Industry: Energy (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.71)

arXiv.org Machine LearningDec-21-2019

Exploring TD error as a heuristic for $\sigma$ selection in Q($\sigma$, $\lambda$)

Nan, Abhishek

In the landscape of TD algorithms, the Q( σ,λ) algorithm is an algorithm with the ability to perform a multi-step backup in an online manner while also successfully unifying the concepts of sampling with using the expectation across all actions for a state. Selecting the value of σ can be based on characteristics of the current state rather than having a constant value or being time based. This project explores the viability of such a TD-error based scheme. Introduction While having different dimensions of generalizability in an algorithm can serve as a powerful tool, in most cases it comes with the associated burden of having to manually select values along these dimensions, commonly referred to as hyper-parameter selection. In case of learning algorithms, an ideal algorithm would be completely general, even to the point that they do not need a fixed set of hyper-parameters for which they perform optimally for a given problem. In the context of Q( σ,λ), the introduction of the σ parameter gives us flexibility in terms of adjusting the proportion of sampling and expectation we want in our updates. But at the same time, while σ does serve as a hyper-parameter, atypically a constant value of σ was found to not have the best performance by De Asis, Hernandez-Garcia, Holland and Sutton (2018). They used a Dynamic Decay σ scheme for n-step Q( σ) where they reduced the value of σ after every episode by a factor of 0.95.

algorithm, experiment, td error, (14 more...)

1912.10316

Country: North America > Canada > Alberta (0.14)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)