Goto

Collaborating Authors

 Reinforcement Learning


Now hiring AI futurists: It's time for artificial intelligence to take a seat in the C-Suite ZDNet

#artificialintelligence

Machine learning, task automation and robotics are already widely used in business. These and other AI technologies are about to multiply, and we look at how organizations can best take advantage of them. COVID-19 disruption has left enterprises with no choice but to reassess digital transformation investments and roadmaps. While less important projects are delayed, transformation projects involving AI and automation are receiving a lot of attention right now. In just the last 60 days, the adoption of varying levels of AI technologies across the enterprise surged with an incredible sense of urgency.


A reinforcement learning based decision support system in textile manufacturing process

arXiv.org Artificial Intelligence

This paper introduced a reinforcement learning based decision support system in textile manufacturing process. A solution optimization problem of color fading ozonation is discussed and set up as a Markov Decision Process (MDP) in terms of tuple {S, A, P, R}. Q-learning is used to train an agent in the interaction with the setup environment by accumulating the reward R. According to the application result, it is found that the proposed MDP model has well expressed the optimization problem of textile manufacturing process discussed in this paper, therefore the use of reinforcement learning to support decision making in this sector is conducted and proven that is applicable with promising prospects.


Reinforcement Learning for Variable Selection in a Branch and Bound Algorithm

arXiv.org Machine Learning

Mixed integer linear programs are commonly solved by Branch and Bound algorithms. A key factor of the efficiency of the most successful commercial solvers is their fine-tuned heuristics. In this paper, we leverage patterns in real-world instances to learn from scratch a new branching strategy optimised for a given problem and compare it with a commercial solver. We propose FMSTS, a novel Reinforcement Learning approach specifically designed for this task. The strength of our method lies in the consistency between a local value function and a global metric of interest. In addition, we provide insights for adapting known RL techniques to the Branch and Bound setting, and present a new neural network architecture inspired from the literature. To our knowledge, it is the first time Reinforcement Learning has been used to fully optimise the branching strategy. Computational experiments show that our method is appropriate and able to generalise well to new instances.


Deep Reinforcement Learning for High Level Character Control

arXiv.org Machine Learning

In this paper, we propose the use of traditional animations, heuristic behavior and reinforcement learning in the creation of intelligent characters for computational media. The traditional animation and heuristic gives artistic control over the behavior while the reinforcement learning adds generalization. The use case presented is a dog character with a high-level controller in a 3D environment which is built around the desired behaviors to be learned, such as fetching an item. As the development of the environment is the key for learning, further analysis is conducted of how to build those learning environments, the effects of environment and agent modeling choices, training procedures and generalization of the learned behavior. This analysis builds insight of the aforementioned factors and may serve as guide in the development of environments in general.


What You Need to Know About Deep Reinforcement Learning - KDnuggets

#artificialintelligence

It is useful, for the forthcoming discussion, to have a better understanding of some key terms used in RL. Agent: A software/hardware mechanism which takes certain action depending on its interaction with the surrounding environment; for example, a drone making a delivery, or Super Mario navigating a video game. The algorithm is the agent. Action: An action is one of all the possible moves the agent can make. An action is almost self-explanatory, but it should be noted that agents usually choose from a list of discrete possible actions.


Prototypical Q Networks for Automatic Conversational Diagnosis and Few-Shot New Disease Adaption

arXiv.org Artificial Intelligence

Spoken dialog systems have seen applications in many domains, including medical for automatic conversational diagnosis. State-of-the-art dialog managers are usually driven by deep reinforcement learning models, such as deep Q networks (DQNs), which learn by interacting with a simulator to explore the entire action space since real conversations are limited. However, the DQN-based automatic diagnosis models do not achieve satisfying performances when adapted to new, unseen diseases with only a few training samples. In this work, we propose the Prototypical Q Networks (ProtoQN) as the dialog manager for the automatic diagnosis systems. The model calculates prototype embeddings with real conversations between doctors and patients, learning from them and simulator-augmented dialogs more efficiently. We create both supervised and few-shot learning tasks with the Muzhi corpus. Experiments showed that the ProtoQN significantly outperformed the baseline DQN model in both supervised and few-shot learning scenarios, and achieves state-of-the-art few-shot learning performances.


Privileged Information Dropout in Reinforcement Learning

arXiv.org Artificial Intelligence

Using privileged information during training can improve the sample efficiency and performance of machine learning systems. This paradigm has been applied to reinforcement learning (RL), primarily in the form of distillation or auxiliary tasks, and less commonly in the form of augmenting the inputs of agents. In this work, we investigate Privileged Information Dropout (PI-Dropout) for achieving the latter which can be applied equally to value-based and policy-based RL algorithms. Within a simple partially-observed environment, we demonstrate that PI-Dropout outperforms alternatives for leveraging privileged information, including distillation and auxiliary tasks, and can successfully utilise different types of privileged information. Finally, we analyse its effect on the learned representations.


Safe Learning for Near Optimal Scheduling

arXiv.org Artificial Intelligence

In this paper, we investigate the combination of synthesis techniques and learning techniques to obtain safe and near optimal schedulers for a preemptible task scheduling problem. We study both model-based learning techniques with PAC guarantees and model-free learning techniques based on shielded deep Q-learning. The new learning algorithms have been implemented to conduct experimental evaluations.


Experience Augmentation: Boosting and Accelerating Off-Policy Multi-Agent Reinforcement Learning

arXiv.org Artificial Intelligence

Exploration of the high-dimensional state action space is one of the biggest challenges in Reinforcement Learning (RL), especially in multi-agent domain. We present a novel technique called Experience Augmentation, which enables a time-efficient and boosted learning based on a fast, fair and thorough exploration to the environment. It can be combined with arbitrary off-policy MARL algorithms and is applicable to either homogeneous or heterogeneous environments. We demonstrate our approach by combining it with MADDPG and verifing the performance in two homogeneous and one heterogeneous environments. In the best performing scenario, the MADDPG with experience augmentation reaches to the convergence reward of vanilla MADDPG with 1/4 realistic time, and its convergence beats the original model by a significant margin. Our ablation studies show that experience augmentation is a crucial ingredient which accelerates the training process and boosts the convergence.


A Survey of Reinforcement Learning Algorithms for Dynamically Varying Environments

arXiv.org Artificial Intelligence

Reinforcement learning (RL) algorithms find applications in inventory control, recommender systems, vehicular traffic management, cloud computing and robotics. The real-world complications of many tasks arising in these domains makes them difficult to solve with the basic assumptions underlying classical RL algorithms. RL agents in these applications often need to react and adapt to changing operating conditions. A significant part of research on single-agent RL techniques focuses on developing algorithms when the underlying assumption of stationary environment model is relaxed. This paper provides a survey of RL methods developed for handling dynamically varying environment models. The goal of methods not limited by the stationarity assumption is to help autonomous agents adapt to varying operating conditions. This is possible either by minimizing the rewards lost during learning by RL agent or by finding a suitable policy for the RL agent which leads to efficient operation of the underlying system. A representative collection of these algorithms is discussed in detail in this work along with their categorization and their relative merits and demerits. Additionally we also review works which are tailored to application domains. Finally, we discuss future enhancements for this field.