AITopics

1906.07073

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Usama, Muhammad, Chang, Dong Eui

Learning-Driven Exploration for Reinforcement Learning

arXiv.org Machine LearningJun-17-2019

Deep reinforcement learning algorithms have been shown to learn complex skills using only high-dimensional observations and scalar reward. Effective and intelligent exploration still remains an unresolved problem for reinforcement learning. Most contemporary reinforcement learning relies on simple heuristic strategies such as $\epsilon$-greedy exploration or adding Gaussian noise to actions. These heuristics, however, are unable to intelligently distinguish the well explored and the unexplored regions of the state space, which can lead to inefficient use of training time. We introduce entropy-based exploration (EBE) that enables an agent to explore efficiently the unexplored regions of the state space. EBE quantifies the agent's learning in a state using merely state dependent action values and adaptively explores the state space, i.e. more exploration for the unexplored region of the state space. We perform experiments on many environments including a simple linear environment, a simpler version of the breakout game and multiple first-person shooter (FPS) games of VizDoom platform. We demonstrate that EBE enables efficient exploration that ultimately results in faster learning without having to tune hyperparameters.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

1906.0689

Country: North America > United States (0.68)

Genre: Research Report (0.64)

Industry:

Education (0.66)
Leisure & Entertainment > Games > Computer Games (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Azarafrooz, Ari, Brock, John

Hierarchical Soft Actor-Critic: Adversarial Exploration via Mutual Information Optimization

arXiv.org Artificial IntelligenceJun-17-2019

We describe a novel extension of soft actor-critics for hierarchical Deep Q-Networks (HDQN) architectures using mutual information metric. The proposed extension provides a suitable framework for encouraging explorations in such hierarchical networks. A natural utilization of this framework is an adversarial setting, where meta-controller and controller play minimax over the mutual information objective but cooperate on maximizing expected rewards.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

1906.07122

Country: North America > United States (0.15)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.49)

arXiv.org Artificial IntelligenceJun-16-2019

Mo\"ET: Interpretable and Verifiable Reinforcement Learning via Mixture of Expert Trees

Vasic, Marko, Petrovic, Andrija, Wang, Kaiyuan, Nikolic, Mladen, Singh, Rishabh, Khurshid, Sarfraz

Deep Reinforcement Learning (DRL) has led to many recent breakthroughs on complex control tasks, such as defeating the best human player in the game of Go. However, decisions made by the DRL agent are not explainable, hindering its applicability in safety-critical settings. Viper, a recently proposed technique, constructs a decision tree policy by mimicking the DRL agent. Decision trees are interpretable as each action made can be traced back to the decision rule path that lead to it. However, one global decision tree approximating the DRL policy has significant limitations with respect to the geometry of decision boundaries. We propose Mo\"ET, a more expressive, yet still interpretable model based on Mixture of Experts, consisting of a gating function that partitions the state space, and multiple decision tree experts that specialize on different partitions. We propose a training procedure to support non-differentiable decision tree experts and integrate it into imitation learning procedure of Viper. We evaluate our algorithm on four OpenAI gym environments, and show that the policy constructed in such a way is more performant and better mimics the DRL agent by lowering mispredictions and increasing the reward. We also show that Mo\"ET policies are amenable for verification using off-the-shelf automated theorem provers such as Z3.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

1906.06717

Country:

Europe > Serbia > Central Serbia > Belgrade (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > Texas > Travis County > Austin (0.04)

Genre:

Research Report (0.64)
Overview (0.46)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.91)
(2 more...)

arXiv.org Artificial IntelligenceJun-16-2019

On Value Functions and the Agent-Environment Boundary

Jiang, Nan

When function approximation is deployed in reinforcement learning (RL), the same problem may be formulated in different ways, often by treating a pre-processing step as a part of the environment or as part of the agent. As a consequence, fundamental concepts in RL, such as (optimal) value functions, are not uniquely defined as they depend on where we draw this agent-environment boundary, causing problems in theoretical analyses that provide optimality guarantees. We address this issue via a simple and novel boundary-invariant analysis of Fitted Q-Iteration, a representative RL algorithm, where the assumptions and the guarantees are invariant to the choice of boundary. We also discuss closely related issues on state resetting and Monte-Carlo Tree Search, deterministic vs stochastic systems, imitation learning, and the verifiability of theoretical assumptions from data.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

1905.13341

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > New Jersey (0.04)
North America > United States > Massachusetts (0.04)
North America > United States > Illinois > Champaign County > Urbana (0.04)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Diddigi, Raghuram Bharadwaj, Kamanchi, Chandramouli, Bhatnagar, Shalabh

Solution of Two-Player Zero-Sum Game by Successive Relaxation

arXiv.org Machine LearningJun-16-2019

We consider the problem of two-player zero-sum game. In this setting, there are two agents working against each other. Both the agents observe the same state and the objective of the agents is to compute a strategy profile that maximizes their rewards. However, the reward of the second agent is negative of reward obtained by the first agent. Therefore, the objective of the second agent is to minimize the total reward obtained by the first agent. This problem is formulated as a min-max Markov game in the literature. The solution of this game, which is the max-min reward (of first player), starting from a given state is called the equilibrium value of the state. In this work, we compute the solution of the two-player zero-sum game utilizing the technique of successive relaxation. Successive relaxation has been successfully applied in the literature to compute a faster value iteration algorithm in the context of Markov Decision Processes. We extend the concept of successive relaxation to the two-player zero-sum games. We prove that, under a special structure, this technique computes the optimal solution faster than the techniques in the literature. We then derive a generalized minimax Q-learning algorithm that computes the optimal policy when the model information is not known. Finally, we prove the convergence of the proposed generalized minimax Q-learning algorithm.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

1906.06659

Country:

Asia > India > Karnataka > Bengaluru (0.05)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Foglino, Francesco, Leonetti, Matteo, Sagratella, Simone, Seccia, Ruggiero

A gray-box approach for curriculum learning

arXiv.org Artificial IntelligenceJun-16-2019

Curriculum learning is often employed in deep reinforcement learning to let the agent progress more quickly towards better behaviors. Numerical methods for curriculum learning in the literature provides only initial heuristic solutions, with little to no guarantee on their quality. We define a new gray-box function that, including a suitable scheduling problem, can be effectively used to reformulate the curriculum learning problem. We propose different efficient numerical methods to address this gray-box reformulation. Preliminary numerical results on a benchmark task in the curriculum learning literature show the viability of the proposed approach.

curriculum, machine learning, reinforcement learning, (15 more...)

doi: 10.1007/978-3-030-21803-4_72

1906.06812

Genre:

Instructional Material > Course Syllabus & Notes (0.68)
Research Report (0.50)

Industry: Education (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.71)

Nie, Xinkun, Brunskill, Emma, Wager, Stefan

Learning When-to-Treat Policies

arXiv.org Machine LearningJun-15-2019

Any solution to the "policy learning" problem needs to deal with numerous difficulties, including how to incorporate robustness to potential selection bias as well as fairness constraints articulated by stakeholders, and there have been several notable advances that address these difficulties over the past few years. One limitation of this line of work, however, is that the results cited above all focus on a static setting where a decision-maker only sees each subject once and immediately decides how to treat the subject. In contrast, many problems of applied interest involve a dynamic component whereby the decision-maker makes a series of decisions based on time-varying covariates. In medicine, if a patient has a disease for which all known cures are invasive and have serious side effects, their doctor may choose to monitor disease progression for some time before prescribing one of these invasive treatments. Meanwhile, a health inspector needs to not only choose which restaurants to inspect, but also when to carry out these inspections.

machine learning, policy class, reinforcement learning, (15 more...)

1905.09751

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Epidemiology (0.66)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

arXiv.org Machine LearningJun-15-2019

Reinforcement Learning Driven Heuristic Optimization

Cai, Qingpeng, Hang, Will, Mirhoseini, Azalia, Tucker, George, Wang, Jingtao, Wei, Wei

Heuristic algorithms such as simulated annealing, Concorde, and METIS are effective and widely used approaches to find solutions to combinatorial optimization problems. However, they are limited by the high sample complexity required to reach a reasonable solution from a cold-start. In this paper, we introduce a novel framework to generate better initial solutions for heuristic algorithms using reinforcement learning (RL), named RLHO. We augment the ability of heuristic algorithms to greedily improve upon an existing initial solution generated by RL, and demonstrate novel results where RL is able to leverage the performance of heuristics as a learning signal to generate better initialization. We apply this framework to Proximal Policy Optimization (PPO) and Simulated Annealing (SA). We conduct a series of experiments on the well-known NP-complete bin packing problem, and show that the RLHO method outperforms our baselines. We show that on the bin packing problem, RL can learn to help heuristics perform even better, allowing us to combine the best parts of both approaches.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

1906.06639

Country:

North America > United States > California > Santa Clara County > Mountain View (0.14)
North America > United States > Alaska > Anchorage Municipality > Anchorage (0.05)
Asia > China > Beijing > Beijing (0.05)
(4 more...)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.57)

Badreddine, Samy, Spranger, Michael

Injecting Prior Knowledge for Transfer Learning into Reinforcement Learning Algorithms using Logic Tensor Networks

arXiv.org Machine LearningJun-15-2019

Human ability at solving complex tasks is helped by priors on object and event semantics of their environment. This paper investigates the use of similar prior knowledge for transfer learning in Reinforcement Learning agents. In particular, the paper proposes to use a first-order-logic language grounded in deep neural networks to represent facts about objects and their semantics in the real world. Facts are provided as background knowledge a priori to learning a policy for how to act in the world. The priors are injected with the conventional input in a single agent architecture. As proof-of-concept, the paper tests the system in simple experiments that show the importance of symbolic abstraction and flexible fact derivation. The paper shows that the proposed system can learn to take advantage of both the symbolic layer and the image layer in a single decision selection module.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

1906.06576

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)