AITopics

1802.03753

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report > New Finding (0.47)

Industry: Leisure & Entertainment > Games (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

#artificialintelligenceFeb-10-2018, 06:31:35 GMT

[D] What could scientists learn from learned solutions? • r/MachineLearning

If an algorithm is set to learning some policy about how to interact with the world to achieve a specific task, what is there to be learned from the algorithm's solution? For example, have new physical or biological principles governing the robustness of - or tradeoffs in locomotion strategies been learned from analysis of the learned movement patterns of the Deepmind walkers? I'm a biology PhD student and I've been wondering how my field could take advantage of advances in machine learning to move biology forward. It's one thing to be able to make predictions, but it seems to me that reinforcement learning approaches offer the potential for machines to act as scientists themselves.

deep learning, machinelearning, scientist learn, (4 more...)

#artificialintelligence

Industry: Media > News (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.32)

arXiv.org Machine LearningFeb-10-2018

Beyond the One Step Greedy Approach in Reinforcement Learning

Efroni, Yonathan, Dalal, Gal, Scherrer, Bruno, Mannor, Shie

The famous Policy Iteration algorithm alternates between policy improvement and policy evaluation. Implementations of this algorithm with several variants of the latter evaluation stage, e.g, $n$-step and trace-based returns, have been analyzed in previous works. However, the case of multiple-step lookahead policy improvement, despite the recent increase in empirical evidence of its strength, has to our knowledge not been carefully analyzed yet. In this work, we introduce the first such analysis. Namely, we formulate variants of multiple-step policy improvement, derive new algorithms using these definitions and prove their convergence. Moreover, we show that recent prominent Reinforcement Learning algorithms are, in fact, instances of our framework. We thus shed light on their empirical success and give a recipe for deriving new algorithms for future study.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

1802.03654

Country: Europe > Spain (0.28)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Zhang, Xiaoqin, Ma, Huimin

Pretraining Deep Actor-Critic Reinforcement Learning Algorithms With Expert Demonstrations

Pretraining with expert demonstrations have been found useful in speeding up the training process of deep reinforcement learning algorithms since less online simulation data is required. Some people use supervised learning to speed up the process of feature learning, others pretrain the policies by imitating expert demonstrations. However, these methods are unstable and not suitable for actor-critic reinforcement learning algorithms. Also, some existing methods rely on the global optimum assumption, which is not true in most scenarios. In this paper, we employ expert demonstrations in a actor-critic reinforcement learning framework, and meanwhile ensure that the performance is not affected by the fact that expert demonstrations are not global optimal. We theoretically derive a method for computing policy gradients and value estimators with only expert demonstrations. Our method is theoretically plausible for actor-critic reinforcement learning algorithms that pretrains both policy and value functions. We apply our method to two of the typical actor-critic reinforcement learning algorithms, DDPG and ACER, and demonstrate with experiments that our method not only outperforms the RL algorithms without pretraining process, but also is more simulation efficient.

algorithm, expert demonstration, gradient, (13 more...)

1801.10459

Country: Asia > China (0.04)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Nachum, Ofir, Chow, Yinlam, Ghavamzadeh, Mohammad

Path Consistency Learning in Tsallis Entropy Regularized MDPs

We study the sparse entropy-regularized reinforcement learning (ERL) problem in which the entropy term is a special form of the Tsallis entropy. The optimal policy of this formulation is sparse, i.e.,~at each state, it has non-zero probability for only a small number of actions. This addresses the main drawback of the standard Shannon entropy-regularized RL (soft ERL) formulation, in which the optimal policy is softmax, and thus, may assign a non-negligible probability mass to non-optimal actions. This problem is aggravated as the number of actions is increased. In this paper, we follow the work of Nachum et al. (2017) in the soft ERL setting, and propose a class of novel path consistency learning (PCL) algorithms, called {\em sparse PCL}, for the sparse ERL problem that can work with both on-policy and off-policy data. We first derive a {\em sparse consistency} equation that specifies a relationship between the optimal value function and policy of the sparse ERL along any system trajectory. Crucially, a weak form of the converse is also true, and we quantify the sub-optimality of a policy which satisfies sparse consistency, and show that as we increase the number of actions, this sub-optimality is better than that of the soft ERL optimal policy. We then use this result to derive the sparse PCL algorithms. We empirically compare sparse PCL with its soft counterpart, and show its advantage, especially in problems with a large number of actions.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

1802.03501

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Mankowitz, Daniel J., Mann, Timothy A., Bacon, Pierre-Luc, Precup, Doina, Mannor, Shie

Learning Robust Options

Robust reinforcement learning aims to produce policies that have strong guarantees even in the face of environments/transition models whose parameters have strong uncertainty. Existing work uses value-based methods and the usual primitive action setting. In this paper, we propose robust methods for learning temporally abstract actions, in the framework of options. We present a Robust Options Policy Iteration (ROPI) algorithm with convergence guarantees, which learns options that are robust to model uncertainty. We utilize ROPI to learn robust options with the Robust Options Deep Q Network (RO-DQN) that solves multiple tasks and mitigates model misspecification due to model uncertainty. We present experimental results which suggest that policy iteration with linear features may have an inherent form of robustness when using coarse feature representations. In addition, we present experimental results which demonstrate that robustness helps policy iteration implemented on top of deep neural networks to generalize over a much broader range of dynamics than non-robust policy iteration.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

1802.03236

Country: North America > Canada (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

A Unified Approach for Multi-step Temporal-Difference Learning with Eligibility Traces in Reinforcement Learning

Yang, Long, Shi, Minhao, Zheng, Qian, Meng, Wenjia, Pan, Gang

Recently, a new multi-step temporal learning algorithm, called $Q(\sigma)$, unifies $n$-step Tree-Backup (when $\sigma=0$) and $n$-step Sarsa (when $\sigma=1$) by introducing a sampling parameter $\sigma$. However, similar to other multi-step temporal-difference learning algorithms, $Q(\sigma)$ needs much memory consumption and computation time. Eligibility trace is an important mechanism to transform the off-line updates into efficient on-line ones which consume less memory and computation time. In this paper, we further develop the original $Q(\sigma)$, combine it with eligibility traces and propose a new algorithm, called $Q(\sigma ,\lambda)$, in which $\lambda$ is trace-decay parameter. This idea unifies Sarsa$(\lambda)$ (when $\sigma =1$) and $Q^{\pi}(\lambda)$ (when $\sigma =0$). Furthermore, we give an upper error bound of $Q(\sigma ,\lambda)$ policy evaluation algorithm. We prove that $Q(\sigma,\lambda)$ control algorithm can converge to the optimal value function exponentially. We also empirically compare it with conventional temporal-difference learning methods. Results show that, with an intermediate value of $\sigma$, $Q(\sigma ,\lambda)$ creates a mixture of the existing algorithms that can learn the optimal value significantly faster than the extreme end ($\sigma=0$, or $1$).

artificial intelligence, machine learning, reinforcement learning, (15 more...)

1802.03171

Country:

Europe (0.28)
North America > United States > Massachusetts (0.14)

Genre: Research Report > New Finding (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

AAAI ConferencesFeb-8-2018

Diverse Exploration for Fast and Safe Policy Improvement

Cohen, Andrew (Binghamton University ) | Yu, Lei (Binghamton University) | Wright, Robert (Yantai University)

We study an important yet under-addressed problem of quickly and safely improving policies in online reinforcement learning domains. As its solution, we propose a novel exploration strategy - diverse exploration (DE), which learns and deploys a diverse set of safe policies to explore the environment. We provide DE theory explaining why diversity in behavior policies enables effective exploration without sacrificing exploitation. Our empirical study shows that an online policy improvement algorithm framework implementing the DE strategy can achieve both fast policy improvement and safe online performance.

iteration, optimization problem, upstream oil & gas, (20 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Genre: Research Report (0.47)

Industry: Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

AAAI ConferencesFeb-8-2018

Feature Engineering for Predictive Modeling Using Reinforcement Learning

Khurana, Udayan (IBM Research AI) | Samulowitz, Horst (IBM Research AI) | Turaga, Deepak (IBM Research AI)

Feature engineering is a crucial step in the process of predictive modeling. It involves the transformation of given feature space, typically using mathematical functions, with the objective of reducing the modeling error for a given target. However, there is no well-defined basis for performing effective feature engineering. It involves domain knowledge, intuition, and most of all, a lengthy process of trial and error. The human attention involved in overseeing this process significantly influences the cost of model generation. We present a new framework to automate feature engineering. It is based on performance driven exploration of a transformation graph, which systematically and compactly captures the space of given options. A highly efficient exploration strategy is derived through reinforcement learning on past examples.

inductive learning, transformation, upstream oil & gas, (18 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country: Europe (0.14)

Industry: Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.68)

AAAI ConferencesFeb-8-2018

BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems

We present a new algorithm that significantly improves the efficiency of exploration for deep Q-learning agents in dialogue systems. Our agents explore via Thompson sampling, drawing Monte Carlo samples from a Bayes-by-Backprop neural network. Our algorithm learns much faster than common exploration strategies such as ε-greedy, Boltzmann, bootstrapping, and intrinsic-reward-based ones. Additionally, we show that spiking the replay buffer with experiences from just a few successful episodes can make Q-learning feasible when it might otherwise fail.

artificial intelligence, exploration, upstream oil & gas, (18 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country:

North America > United States > Washington > King County (0.28)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Leisure & Entertainment > Games (0.46)
Energy > Oil & Gas > Upstream (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)