AITopics

1906.09624

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Lin, Kaixiang, Zhou, Jiayu

Ranking Policy Gradient

arXiv.org Artificial IntelligenceJun-23-2019

Sample inefficiency is a long-lasting problem in reinforcement learning (RL). The state-of-the-art uses value function to derive policy while it usually requires an extensive search over the state-action space, which is one reason for the inefficiency. Towards the sample-efficient RL, we propose ranking policy gradient (RPG), a policy gradient method that learns the optimal ranking of a set of discrete actions. To accelerate the learning of policy gradient methods, we describe a novel off-policy learning framework and establish the equivalence between maximizing the lower bound of return and imitating a near-optimal policy without accessing any oracles. These results lead to a general sample-efficient off-policy learning framework, which accelerates learning and reduces variance. Furthermore, the sample complexity of RPG does not depend on the dimension of state space, which enables RPG for large-scale problems. We conduct extensive experiments showing that when consolidating with the off-policy learning framework, RPG substantially reduces the sample complexity, comparing to the state-of-the-art.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

1906.09674

Country: North America > United States > Michigan (0.28)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

#artificialintelligenceJun-22-2019, 09:14:38 GMT

Coherent transport of quantum states by deep reinforcement learning

Some problems in physics are solved as a result of the discovery of an ansatz solution, namely a successful test guess, but unfortunately there is no general method to generate one. Recently, machine learning has increasingly proved to be a viable tool for modeling hidden features and effective rules in complex systems. Among the classes of machine learning algorithms, deep reinforcement learning (DRL)1 is providing some of the most spectacular results due to its ability to identify strategies for achieving a goal in a complex space of solutions without prior knowledge of the system2,3,4,5,6,7. Contrary to supervised learning, which has already been applied to quantum systems, such as in the determination of high-fidelity gates and the optimization of quantum memories by dynamic decoupling8, DRL has only very recently been proposed for the control of quantum systems9,10,11,12,13,14,15,16, along with a strictly quantum reinforcement learning implementation14,17. To show the power of DRL, we apply DRL to the problem of coherent transport by adiabatic passage (CTAP) where an electron (encoding the quantum state) is transferred through an array of quantum dots.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

#artificialintelligenceJun-22-2019, 00:07:33 GMT

Python Programming Tutorials

Deep Q Networks are the deep learning/neural network versions of Q-Learning. With DQNs, instead of a Q Table to look up values, you have a model that you inference (make predictions from), and rather than updating the Q table, you fit (train) your model. The DQN neural network model is a regression model, which typically will output values for each of our possible actions. These values will be continuous float values, and they are directly our Q values. As we enage in the environment, we will do a .predict() to figure out our next move (or move randomly).

artificial intelligence, machine learning, reinforcement learning, (13 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.37)

arXiv.org Artificial IntelligenceJun-22-2019

Explainable Knowledge Graph-based Recommendation via Deep Reinforcement Learning

Song, Weiping, Duan, Zhijian, Yang, Ziqing, Zhu, Hao, Zhang, Ming, Tang, Jian

This paper studies recommender systems with knowledge graphs, which can effectively address the problems of data sparsity and cold start. Recently, a variety of methods have been developed for this problem, which generally try to learn effective representations of users and items and then match items to users according to their representations. Though these methods have been shown quite effective, they lack good explanations, which are critical to recommender systems. In this paper, we take a different path and propose generating recommendations by finding meaningful paths from users to items. Specifically, we formulate the problem as a sequential decision process, where the target user is defined as the initial state, and the walks on the graphs are defined as actions. We shape the rewards according to existing state-of-the-art methods and then train a policy function with policy gradient methods. Experimental results on three real-world datasets show that our proposed method not only provides effective recommendations but also offers good explanations .

artificial intelligence, machine learning, reinforcement learning, (18 more...)

1906.09506

Country:

North America > United States > New York > New York County > New York City (0.07)
North America > Canada > Quebec > Montreal (0.04)
Europe > Switzerland > Geneva > Geneva (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Industry:

Media (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Vertes, Eszter, Sahani, Maneesh

A neurally plausible model learns successor representations in partially observable environments

arXiv.org Machine LearningJun-22-2019

Animals need to devise strategies to maximize returns while interacting with their environment based on incoming noisy sensory observations. Task-relevant states, such as the agent's location within an environment or the presence of a predator, are often not directly observable but must be inferred using available sensory information. Successor representations (SR) have been proposed as a middle-ground between model-based and model-free reinforcement learning strategies, allowing for fast value computation and rapid adaptation to changes in the reward function or goal locations. Indeed, recent studies suggest that features of neural responses are consistent with the SR framework. However, it is not clear how such representations might be learned and computed in partially observed, noisy environments. Here, we introduce a neurally plausible model using distributional successor features, which builds on the distributed distributional code for the representation and computation of uncertainty, and which allows for efficient value function computation in partially observed environments via the successor representation. We show that distributional successor features can support reinforcement learning in noisy environments in which direct learning of successful policies is infeasible.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

1906.0948

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Kuang, Nikki Lijing, Leung, Clement H. C.

Leveraging Reinforcement Learning Techniques for Effective Policy Adoption and Validation

Rewards and punishments in different forms are pervasive and present in a wide variety of decision-making scenarios. By observing the outcome of a sufficient number of repeated trials, one would gradually learn the value and usefulness of a particular policy or strategy. However, in a given environment, the outcomes resulting from different trials are subject to chance influence and variations. In learning about the usefulness of a given policy, significant costs are involved in systematically undertaking the sequential trials; therefore, in most learning episodes, one would wish to keep the cost within bounds by adopting learning stopping rules. In this paper, we examine the deployment of different stopping strategies in given learning environments which vary from highly stringent for mission critical operations to highly tolerant for non-mission critical operations, and emphasis is placed on the former with particular application to aviation safety. In policy evaluation, two sequential phases of learning are identified, and we describe the outcomes variations using a probabilistic model, with closedform expressions obtained for the key measures of performance. Decision rules that map the trial observations to policy choices are also formulated. In addition, simulation experiments are performed, which corroborate the validity of the theoretical results.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

1906.0934

Country:

Oceania > Australia > South Australia > Adelaide (0.04)
Oceania > Australia > Queensland > Brisbane (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(3 more...)

Genre: Research Report (0.64)

Industry:

Transportation > Air (0.35)
Education (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Asri, Layla El, Trischler, Adam

A Study of State Aliasing in Structured Prediction with RNNs

End-to-end reinforcement learning agents learn a state representation and a policy at the same time. Recurrent neural networks (RNNs) have been trained successfully as reinforcement learning agents in settings like dialogue that require structured prediction. In this paper, we investigate the representations learned by RNN-based agents when trained with both policy gradient and value-based methods. We show through extensive experiments and analysis that, when trained with policy gradient, recurrent neural networks often fail to learn a state representation that leads to an optimal policy in settings where the same action should be taken at different states. To explain this failure, we highlight the problem of state aliasing, which entails conflating two or more distinct states in the representation space. We demonstrate that state aliasing occurs when several states share the same optimal action and the agent is trained via policy gradient. We characterize this phenomenon through experiments on a simple maze setting and a more complex text-based game, and make recommendations for training RNNs with reinforcement learning.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

1906.0931

Country: North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Miryoosefi, Sobhan, Brantley, Kianté, Daumé, Hal III, Dudik, Miroslav, Schapire, Robert

Reinforcement Learning with Convex Constraints

In standard reinforcement learning (RL), a learning agent seeks to optimize the overall reward. However, many key aspects of a desired behavior are more naturally expressed as constraints. For instance, the designer may want to limit the use of unsafe actions, increase the diversity of trajectories to enable exploration, or approximate expert trajectories when rewards are sparse. In this paper, we propose an algorithmic scheme that can handle a wide class of constraints in RL tasks, specifically, any constraints that require expected values of some vector measurements (such as the use of an action) to lie in a convex set. This captures previously studied constraints (such as safety and proximity to an expert), but also enables new classes of constraints (such as diversity). Our approach comes with rigorous theoretical guarantees and only relies on the ability to approximately solve standard RL tasks. As a result, it can be easily adapted to work with any model-free or model-based RL algorithm. In our experiments, we show that it matches previous algorithms that enforce safety via constraints, but can also enforce new properties that these algorithms cannot incorporate, such as diversity.

constraint, machine learning, reinforcement learning, (17 more...)

1906.09323

Country:

North America > United States > Maryland (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Petangoda, Janith C., Pascual-Diaz, Sergio, Adam, Vincent, Vrancx, Peter, Grau-Moya, Jordi

Disentangled Skill Embeddings for Reinforcement Learning

We propose a novel framework for multi-task reinforcement learning (MTRL). Using a variational inference formulation, we learn policies that generalize across both changing dynamics and goals. The resulting policies are parametrized by shared parameters that allow for transfer between different dynamics and goal conditions, and by task-specific latent-space embeddings that allow for specialization to particular tasks. We show how the latent-spaces enable generalization to unseen dynamics and goals conditions. Additionally, policies equipped with such embeddings serve as a space of skills (or options) for hierarchical reinforcement learning. Since we can change task dynamics and goals independently, we name our framework Disentangled Skill Embeddings (DSE).

machine learning, reinforcement learning, trajectory, (14 more...)

1906.09223

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)