Goto

Collaborating Authors

 Reinforcement Learning


An Interaction Framework for Studying Co-Creative AI

arXiv.org Artificial Intelligence

Machine learning has been applied to a number of creative, design-oriented tasks. However, it remains unclear how to best empower human users with these machine learning approaches, particularly those users without technical expertise. In this paper we propose a general framework for turn-based interaction between human users and AI agents designed to support human creativity, called {co-creative systems}. The framework can be used to better understand the space of possible designs of co-creative systems and reveal future research directions. We demonstrate how to apply this framework in conjunction with a pair of recent human subject studies, comparing between the four human-AI systems employed in these studies and generating hypotheses towards future studies.


Explaining Reinforcement Learning to Mere Mortals: An Empirical Study

arXiv.org Artificial Intelligence

We present a user study to investigate the impact of explanations on non-experts' understanding of reinforcement learning (RL) agents. We investigate both a common RL visualization, saliency maps (the focus of attention), and a more recent explanation type, reward-decomposition bars (predictions of future types of rewards). We designed a 124 participant, four-treatment experiment to compare participants' mental models of an RL agent in a simple Real-Time Strategy (RTS) game. Our results show that the combination of both saliency and reward bars were needed to achieve a statistically significant improvement in mental model score over the control. In addition, our qualitative analysis of the data reveals a number of effects for further study.


Deep Hierarchical Reinforcement Learning Based Recommendations via Multi-goals Abstraction

arXiv.org Artificial Intelligence

The recommender system is an important form of intelligent application, which assists users to alleviate from information redundancy. Among the metrics used to evaluate a recommender system, the metric of conversion has become more and more important. The majority of existing recommender systems perform poorly on the metric of conversion due to its extremely sparse feedback signal. To tackle this challenge, we propose a deep hierarchical reinforcement learning based recommendation framework, which consists of two components, i.e., high-level agent and low-level agent. The high-level agent catches long-term sparse conversion signals, and automatically sets abstract goals for low-level agent, while the low-level agent follows the abstract goals and interacts with real-time environment. To solve the inherent problem in hierarchical reinforcement learning, we propose a novel deep hierarchical reinforcement learning algorithm via multi-goals abstraction (HRL-MG). Our proposed algorithm contains three characteristics: 1) the high-level agent generates multiple goals to guide the low-level agent in different stages, which reduces the difficulty of approaching high-level goals; 2) different goals share the same state encoder parameters, which increases the update frequency of the high-level agent and thus accelerates the convergence of our proposed algorithm; 3) an appreciate benefit assignment function is designed to allocate rewards in each goal so as to coordinate different goals in a consistent direction. We evaluate our proposed algorithm based on a real-world e-commerce dataset and validate its effectiveness.


Macro Action Reinforcement Learning with Sequence Disentanglement using Variational Autoencoder

arXiv.org Artificial Intelligence

One problem in the application of reinforcement learning to real-world problems is the curse of dimensionality on the action space. Macro actions, a sequence of primitive actions, have been studied to diminish the dimensionality of the action space with regard to the time axis. However, previous studies relied on humans defining macro actions or assumed macro actions as repetitions of the same primitive actions. We present Factorized Macro Action Reinforcement Learning (FaMARL) which autonomously learns disentangled factor representation of a sequence of actions to generate macro actions that can be directly applied to general reinforcement learning algorithms. FaMARL exhibits higher scores than other reinforcement learning algorithms on environments that require an extensive amount of search.


Distributed off-Policy Actor-Critic Reinforcement Learning with Policy Consensus

arXiv.org Artificial Intelligence

In this paper, we propose a distributed off-policy actor critic method to solve multi-agent reinforcement learning problems. Specifically, we assume that all agents keep local estimates of the global optimal policy parameter and update their local value function estimates independently. Then, we introduce an additional consensus step to let all the agents asymptotically achieve agreement on the global optimal policy function. The convergence analysis of the proposed algorithm is provided and the effectiveness of the proposed algorithm is validated using a distributed resource allocation example. Compared to relevant distributed actor critic methods, here the agents do not share information about their local tasks, but instead they coordinate to estimate the global policy function.


Interpretable Reinforcement Learning via Differentiable Decision Trees

arXiv.org Machine Learning

Decision trees are ubiquitous in machine learning for their ease of use and interpretability; however, they are not typically implemented in reinforcement learning because they cannot be updated via stochastic gradient descent. Traditional applications of decision trees for reinforcement learning have focused instead on making commitments to decision boundaries as the tree is grown one layer at a time. We overcome this critical limitation by allowing for a gradient update over the entire tree structure that improves sample complexity when a tree is fuzzy and interpretability when sharp. We offer three key contributions towards this goal. First, we motivate the need for policy gradient-based learning by examining the theoretical properties of gradient descent over differentiable decision trees. Second, we introduce a regularization framework that yields interpretability via sparsity in the tree structure. Third, we demonstrate the ability to construct a decision tree via policy gradient in canonical reinforcement learning domains and supervised learning benchmarks.


DQN with model-based exploration: efficient learning on environments with sparse rewards

arXiv.org Machine Learning

We propose Deep Q-Networks (DQN) with model-based exploration, an algorithm combining both model-free and model-based approaches that explores better and learns environments with sparse rewards more efficiently. DQN is a generalpurpose, model-free algorithm and has been proven to perform well in a variety of tasks including Atari 2600 games since it's first proposed by Minh et el[1]. However, like many other reinforcement learning (RL) algorithms, DQN suffers from poor sample efficiency when rewards are sparse in an environment. As a result, most of the transitions stored in the replay memory have no informative reward signal, and provide limited value to the convergence and training of the Q-Network. However, one insight is that these transitions can be used to learn the dynamics of the environment as a supervised learning problem. The transitions also provide information of the distribution of visited states. Our algorithm utilizes these two observations to perform a one-step planning during exploration to pick an action that leads to states least likely to be seen, thus improving the performance of exploration. We demonstrate our agent's performance in two classic environments with sparse rewards in OpenAI gym[2]: Mountain Car and Lunar Lander.


Improving Safety in Reinforcement Learning Using Model-Based Architectures and Human Intervention

arXiv.org Artificial Intelligence

Recent progress in AI and Reinforcement learning has shown great success in solving complex problems with high dimensional state spaces. However, most of these successes have been primarily in simulated environments where failure is of little or no consequence. Most real-world applications, however, require training solutions that are safe to operate as catastrophic failures are inadmissible especially when there is human interaction involved. Currently, Safe RL systems use human oversight during training and exploration in order to make sure the RL agent does not go into a catastrophic state. These methods require a large amount of human labor and it is very difficult to scale up. We present a hybrid method for reducing the human intervention time by combining model-based approaches and training a supervised learner to improve sample efficiency while also ensuring safety. We evaluate these methods on various grid-world environments using both standard and visual representations and show that our approach achieves better performance in terms of sample efficiency, number of catastrophic states reached as well as overall task performance compared to traditional model-free approaches


Towards Characterizing Divergence in Deep Q-Learning

arXiv.org Artificial Intelligence

The most common failure algorithms for control, employs three techniques mode is divergence, where the Q-function approximator collectively known as the'deadly triad' in learns to ascribe unrealistically high values to state-action reinforcement learning: bootstrapping, off-policy pairs, in turn destroying the quality of the greedy control learning, and function approximation. Prior work policy derived from Q (van Hasselt et al., 2018). Divergence has demonstrated that together these can lead to in DQL is often attributed to three components common divergence in Q-learning algorithms, but the conditions to all DQL algorithms, which are collectively considered under which divergence occurs are not the'deadly triad' of reinforcement learning (Sutton, 1988; well-understood. In this note, we give a simple Sutton & Barto, 2018): analysis based on a linear approximation to the Q-value updates, which we believe provides insight - function approximation, in this case the use of deep into divergence under the deadly triad. The neural networks, central point in our analysis is to consider when the leading order approximation to the deep-Q - off-policy learning, the use of data collected on one update is or is not a contraction in the sup norm.


Artificial Intelligence : from Research to Application ; the Upper-Rhine Artificial Intelligence Symposium (UR-AI 2019)

arXiv.org Artificial Intelligence

The TriRhenaTech alliance universities and their partners presented their competences in the field of artificial intelligence and their cross-border cooperations with the industry at the tri-national conference 'Artificial Intelligence : from Research to Application' on March 13th, 2019 in Offenburg. The TriRhenaTech alliance is a network of universities in the Upper Rhine Trinational Metropolitan Region comprising of the German universities of applied sciences in Furtwangen, Kaiserslautern, Karlsruhe, and Offenburg, the Baden-Wuerttemberg Cooperative State University Loerrach, the French university network Alsace Tech (comprised of 14 'grandes \'ecoles' in the fields of engineering, architecture and management) and the University of Applied Sciences and Arts Northwestern Switzerland. The alliance's common goal is to reinforce the transfer of knowledge, research, and technology, as well as the cross-border mobility of students.