AITopics

Machine learning has been applied to a number of creative, design-oriented tasks. However, it remains unclear how to best empower human users with these machine learning approaches, particularly those users without technical expertise. In this paper we propose a general framework for turn-based interaction between human users and AI agents designed to support human creativity, called {co-creative systems}. The framework can be used to better understand the space of possible designs of co-creative systems and reveal future research directions. We demonstrate how to apply this framework in conjunction with a pair of recent human subject studies, comparing between the four human-AI systems employed in these studies and generating hypotheses towards future studies.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

1903.09709

Country: North America > United States (0.29)

Genre: Research Report > New Finding (0.49)

Industry: Leisure & Entertainment (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.72)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.56)
Information Technology > Artificial Intelligence > Cognitive Science > Creativity & Intelligence (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.47)

Explaining Reinforcement Learning to Mere Mortals: An Empirical Study

Anderson, Andrew, Dodge, Jonathan, Sadarangani, Amrita, Juozapaitis, Zoe, Newman, Evan, Irvine, Jed, Chattopadhyay, Souti, Fern, Alan, Burnett, Margaret

We present a user study to investigate the impact of explanations on non-experts' understanding of reinforcement learning (RL) agents. We investigate both a common RL visualization, saliency maps (the focus of attention), and a more recent explanation type, reward-decomposition bars (predictions of future types of rewards). We designed a 124 participant, four-treatment experiment to compare participants' mental models of an RL agent in a simple Real-Time Strategy (RTS) game. Our results show that the combination of both saliency and reward bars were needed to achieve a statistically significant improvement in mental model score over the control. In addition, our qualitative analysis of the data reveals a number of effects for further study.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

1903.09708

Genre: Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment > Games (0.69)
Health & Medicine (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Deep Hierarchical Reinforcement Learning Based Recommendations via Multi-goals Abstraction

Zhao, Dongyang, Zhang, Liang, Zhang, Bo, Zheng, Lizhou, Bao, Yongjun, Yan, Weipeng

The recommender system is an important form of intelligent application, which assists users to alleviate from information redundancy. Among the metrics used to evaluate a recommender system, the metric of conversion has become more and more important. The majority of existing recommender systems perform poorly on the metric of conversion due to its extremely sparse feedback signal. To tackle this challenge, we propose a deep hierarchical reinforcement learning based recommendation framework, which consists of two components, i.e., high-level agent and low-level agent. The high-level agent catches long-term sparse conversion signals, and automatically sets abstract goals for low-level agent, while the low-level agent follows the abstract goals and interacts with real-time environment. To solve the inherent problem in hierarchical reinforcement learning, we propose a novel deep hierarchical reinforcement learning algorithm via multi-goals abstraction (HRL-MG). Our proposed algorithm contains three characteristics: 1) the high-level agent generates multiple goals to guide the low-level agent in different stages, which reduces the difficulty of approaching high-level goals; 2) different goals share the same state encoder parameters, which increases the update frequency of the high-level agent and thus accelerates the convergence of our proposed algorithm; 3) an appreciate benefit assignment function is designed to allocate rewards in each goal so as to coordinate different goals in a consistent direction. We evaluate our proposed algorithm based on a real-world e-commerce dataset and validate its effectiveness.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

1903.09374

Country: North America > United States > Alaska (0.16)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Heecheol, Kim, Yamada, Masanori, Miyoshi, Kosuke, Yamakawa, Hiroshi

Macro Action Reinforcement Learning with Sequence Disentanglement using Variational Autoencoder

One problem in the application of reinforcement learning to real-world problems is the curse of dimensionality on the action space. Macro actions, a sequence of primitive actions, have been studied to diminish the dimensionality of the action space with regard to the time axis. However, previous studies relied on humans defining macro actions or assumed macro actions as repetitions of the same primitive actions. We present Factorized Macro Action Reinforcement Learning (FaMARL) which autonomously learns disentangled factor representation of a sequence of actions to generate macro actions that can be directly applied to general reinforcement learning algorithms. FaMARL exhibits higher scores than other reinforcement learning algorithms on environments that require an extensive amount of search.

machine learning, macro action, reinforcement learning, (16 more...)

1903.09366

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Zhang, Yan, Zavlanos, Michael M.

Distributed off-Policy Actor-Critic Reinforcement Learning with Policy Consensus

arXiv.org Artificial IntelligenceMar-21-2019

In this paper, we propose a distributed off-policy actor critic method to solve multi-agent reinforcement learning problems. Specifically, we assume that all agents keep local estimates of the global optimal policy parameter and update their local value function estimates independently. Then, we introduce an additional consensus step to let all the agents asymptotically achieve agreement on the global optimal policy function. The convergence analysis of the proposed algorithm is provided and the effectiveness of the proposed algorithm is validated using a distributed resource allocation example. Compared to relevant distributed actor critic methods, here the agents do not share information about their local tasks, but instead they coordinate to estimate the global policy function.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

1903.09255

Country: North America > United States > North Carolina > Durham County > Durham (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Rodriguez, Ivan Dario Jimenez, Killian, Taylor, Son, Sung-Hyun, Gombolay, Matthew

Interpretable Reinforcement Learning via Differentiable Decision Trees

arXiv.org Machine LearningMar-21-2019

Decision trees are ubiquitous in machine learning for their ease of use and interpretability; however, they are not typically implemented in reinforcement learning because they cannot be updated via stochastic gradient descent. Traditional applications of decision trees for reinforcement learning have focused instead on making commitments to decision boundaries as the tree is grown one layer at a time. We overcome this critical limitation by allowing for a gradient update over the entire tree structure that improves sample complexity when a tree is fuzzy and interpretability when sharp. We offer three key contributions towards this goal. First, we motivate the need for policy gradient-based learning by examining the theoretical properties of gradient descent over differentiable decision trees. Second, we introduce a regularization framework that yields interpretability via sparsity in the tree structure. Third, we demonstrate the ability to construct a decision tree via policy gradient in canonical reinforcement learning domains and supervised learning benchmarks.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Machine Learning

1903.09338

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Gou, Stephen Zhen, Liu, Yuyang

DQN with model-based exploration: efficient learning on environments with sparse rewards

arXiv.org Machine LearningMar-21-2019

We propose Deep Q-Networks (DQN) with model-based exploration, an algorithm combining both model-free and model-based approaches that explores better and learns environments with sparse rewards more efficiently. DQN is a generalpurpose, model-free algorithm and has been proven to perform well in a variety of tasks including Atari 2600 games since it's first proposed by Minh et el[1]. However, like many other reinforcement learning (RL) algorithms, DQN suffers from poor sample efficiency when rewards are sparse in an environment. As a result, most of the transitions stored in the replay memory have no informative reward signal, and provide limited value to the convergence and training of the Q-Network. However, one insight is that these transitions can be used to learn the dynamics of the environment as a supervised learning problem. The transitions also provide information of the distribution of visited states. Our algorithm utilizes these two observations to perform a one-step planning during exploration to pick an action that leads to states least likely to be seen, thus improving the performance of exploration. We demonstrate our agent's performance in two classic environments with sparse rewards in OpenAI gym[2]: Mountain Car and Lunar Lander.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

1903.09295

Country: North America > Canada > Ontario > Toronto (0.29)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Prakash, Bharat, Khatwani, Mohit, Waytowich, Nicholas, Mohsenin, Tinoosh

Improving Safety in Reinforcement Learning Using Model-Based Architectures and Human Intervention

arXiv.org Artificial IntelligenceMar-21-2019

Recent progress in AI and Reinforcement learning has shown great success in solving complex problems with high dimensional state spaces. However, most of these successes have been primarily in simulated environments where failure is of little or no consequence. Most real-world applications, however, require training solutions that are safe to operate as catastrophic failures are inadmissible especially when there is human interaction involved. Currently, Safe RL systems use human oversight during training and exploration in order to make sure the RL agent does not go into a catastrophic state. These methods require a large amount of human labor and it is very difficult to scale up. We present a hybrid method for reducing the human intervention time by combining model-based approaches and training a supervised learner to improve sample efficiency while also ensuring safety. We evaluate these methods on various grid-world environments using both standard and visual representations and show that our approach achieves better performance in terms of sample efficiency, number of catastrophic states reached as well as overall task performance compared to traditional model-free approaches

artificial intelligence, machine learning, reinforcement learning, (14 more...)

1903.09328

Country: North America > United States > Maryland (0.28)

Genre: Research Report (0.50)

Industry: Government (0.95)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Achiam, Joshua, Knight, Ethan, Abbeel, Pieter

Towards Characterizing Divergence in Deep Q-Learning

arXiv.org Artificial IntelligenceMar-21-2019

The most common failure algorithms for control, employs three techniques mode is divergence, where the Q-function approximator collectively known as the'deadly triad' in learns to ascribe unrealistically high values to state-action reinforcement learning: bootstrapping, off-policy pairs, in turn destroying the quality of the greedy control learning, and function approximation. Prior work policy derived from Q (van Hasselt et al., 2018). Divergence has demonstrated that together these can lead to in DQL is often attributed to three components common divergence in Q-learning algorithms, but the conditions to all DQL algorithms, which are collectively considered under which divergence occurs are not the'deadly triad' of reinforcement learning (Sutton, 1988; well-understood. In this note, we give a simple Sutton & Barto, 2018): analysis based on a linear approximation to the Q-value updates, which we believe provides insight - function approximation, in this case the use of deep into divergence under the deadly triad. The neural networks, central point in our analysis is to consider when the leading order approximation to the deep-Q - off-policy learning, the use of data collected on one update is or is not a contraction in the sup norm.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

1903.08894

Genre: Research Report (0.84)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Christ, Andreas, Quint, Franz

Artificial Intelligence : from Research to Application ; the Upper-Rhine Artificial Intelligence Symposium (UR-AI 2019)

arXiv.org Artificial IntelligenceMar-20-2019

The TriRhenaTech alliance universities and their partners presented their competences in the field of artificial intelligence and their cross-border cooperations with the industry at the tri-national conference 'Artificial Intelligence : from Research to Application' on March 13th, 2019 in Offenburg. The TriRhenaTech alliance is a network of universities in the Upper Rhine Trinational Metropolitan Region comprising of the German universities of applied sciences in Furtwangen, Kaiserslautern, Karlsruhe, and Offenburg, the Baden-Wuerttemberg Cooperative State University Loerrach, the French university network Alsace Tech (comprised of 14 'grandes \'ecoles' in the fields of engineering, architecture and management) and the University of Applied Sciences and Arts Northwestern Switzerland. The alliance's common goal is to reinforce the transfer of knowledge, research, and technology, as well as the cross-border mobility of students.

deep learning, neural network, upstream oil & gas, (28 more...)

1903.08495

Country:

Europe > Switzerland (0.34)
North America > United States > New York (0.27)
Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.24)
(10 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Instructional Material > Course Syllabus & Notes (1.00)

Industry:

Law (1.00)
Information Technology > Services (1.00)
Health & Medicine > Diagnostic Medicine (1.00)
(10 more...)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science > Data Mining (1.00)
(15 more...)