Reinforcement Learning
Divide-and-Conquer Adversarial Learning for High-Resolution Image and Video Enhancement
Huang, Zhiwu, Paudel, Danda Pani, Li, Guanju, Wu, Jiqing, Timofte, Radu, Van Gool, Luc
This paper introduces a divide-and-conquer inspired adversarial learning (DA-CAL) approach for photo enhancement. The key idea is to decompose the photo enhancement process into hierarchically multiple sub-problems, which can be better conquered from bottom to up. On the top level, we propose a perception-based division to learn additive and multiplicative components, required to translate a low-quality image or video into its high-quality counterpart. On the intermediate level, we use a frequency-based division with generative adversarial network (GAN) to weakly supervise the photo enhancement process. On the lower level, we design a dimension-based division that enables the GAN model to better approximates the distribution distance on multiple independent one-dimensional data to train the GAN model. While considering all three hierarchies, we develop multiscale and recurrent training approaches to optimize the image and video enhancement process in a weakly-supervised manner. Both quantitative and qualitative results clearly demonstrate that the proposed DACAL achieves the state-of- the-art performance for high-resolution image and video enhancement. Despite many mobile camera technological advances we have today, our captured images often still come with limited dynamic range, undesirable color rendition, and unsatisfactory texture sharpness. Among many possible causes, low-light environments and under/overexposed regions usually introduce severe lack of texture details and low-dynamic range coverage, respectively. Another critical issue is the amplification (during the enhancement process) of noise in the dark and/or texture-less regions, where the enhancement may not even be necessary.
Deep Reinforcement Learning Based Power control for Wireless Multicast Systems
Raghu, Ramkumar, Upadhyaya, Pratheek, Panju, Mahadesh, Aggarwal, Vaneet, Sharma, Vinod
Deep Reinforcement Learning Based Power control for Wireless Multicast Systems Ramkumar Raghu 1, Pratheek Upadhyaya 1, Mahadesh Panju 1, V aneet Aggarwal 1,2, and Vinod Sharma 1 1 Indian Institute of Science, Bangalore, INDIA. Abstract -- We consider a multicast scheme recently proposed for a wireless downlink in [1]. It was shown earlier that power control can significantly improve its performance. However for this system, obtaining optimal power control is intractable because of a very large state space. Therefore in this paper we use deep reinforcement learning where we use function approximation of the Q-function via a deep neural network. We show that optimal power control can be learnt for reasonably large systems via this approach. The average power constraint is ensured via a Lagrange multiplier, which is also learnt. Finally, we demonstrate that a slight modification of the learning algorithm allows the optimal control to track the time varying system statistics. I NTRODUCTION Wireless networks are being constantly refined to cater for seamless delivery of huge amount of data to the end users. With increased user generated contents and proliferation of social networking sites, almost 78% of mobile data traffic is expected to be due to mobile videos [2]. Also, the requested traffic for these contents is ridden with redundant requests [3]. Thus, multicasting is a natural way to address these requests. A multicast queue with network coding is studied in [4], [5] with infinite library of files.
Attention-based Curiosity-driven Exploration in Deep Reinforcement Learning
Reizinger, Patrik, Szemenyei, Márton
Reinforcement Learning enables to train an agent via interaction with the environment. However, in the majority of real-world scenarios, the extrinsic feedback is sparse or not sufficient, thus intrinsic reward formulations are needed to successfully train the agent. This work investigates and extends the paradigm of curiosity-driven exploration. First, a probabilistic approach is taken to exploit the advantages of the attention mechanism, which is successfully applied in other domains of Deep Learning. Combining them, we propose new methods, such as AttA2C, an extension of the Actor-Critic framework. Second, another curiosity-based approach - ICM - is extended. The proposed model utilizes attention to emphasize features for the dynamic models within ICM, moreover, we also modify the loss function, resulting in a new curiosity formulation, which we call rational curiosity. The corresponding implementation can be found at https://github.com/rpatrik96/AttA2C/.
Partially Detected Intelligent Traffic Signal Control: Environmental Adaptation
Zhang, Rusheng, Leteurtre, Romain, Striner, Benjamin, Alanazi, Ammar, Alghafis, Abdullah, Tonguz, Ozan K.
--Partially Detected Intelligent Traffic Signal Control (PD-ITSC) systems that can optimize traffic signals based on limited detected information could be a cost-efficient solution for mitigating traffic congestion in the future. In this paper, we focus on a particular problem in PD-ITSC - adaptation to changing environments. T o this end, we investigate different reinforcement learning algorithms, including Q-Learning, Proximal Policy Optimization (PPO), Advantage Actor-Critic (A2C), and Actor-Critic with Kronecker-Factored Trust-Region (ACKTR). Our findings suggest that RL algorithms can find optimal strategies under partial vehicle detection; however, policy-based algorithms can adapt to changing environments more efficiently than value-based algorithms. We use these findings to draw conclusions about the value of different models for PD-ITSC systems.
Learning to Design Games: Strategic Environments in Reinforcement Learning
Zhang, Haifeng, Wang, Jun, Zhou, Zhiming, Zhang, Weinan, Wen, Ying, Yu, Yong, Li, Wenxin
In typical reinforcement learning (RL), the environment is assumed given and the goal of the learning is to identify an optimal policy for the agent taking actions through its interactions with the environment. In this paper, we extend this setting by considering the environment is not given, but controllable and learnable through its interaction with the agent at the same time. This extension is motivated by environment design scenarios in the real-world, including game design, shopping space design and traffic signal design. Theoretically, we find a dual Markov decision process (MDP) w.r.t. the environment to that w.r.t. the agent, and derive a policy gradient solution to optimizing the parametrized environment. Furthermore, discontinuous environments are addressed by a proposed general generative framework. Our experiments on a Maze game design task show the effectiveness of the proposed algorithms in generating diverse and challenging Mazes against various agent settings.
Sample Complexity of Reinforcement Learning using Linearly Combined Model Ensembles
Modi, Aditya, Jiang, Nan, Tewari, Ambuj, Singh, Satinder
Reinforcement learning (RL) methods have been shown to be capable of learning intelligent behavior in rich domains. However, this has largely been done in simulated domains without adequate focus on the process of building the simulator. In this paper, we consider a setting where we have access to an ensemble of pre-trained and possibly inaccurate simulators (models). We approximate the real environment using a state-dependent linear combination of the ensemble, where the coefficients are determined by the given state features and some unknown parameters. Our proposed algorithm provably learns a near-optimal policy with a sample complexity polynomial in the number of unknown parameters, and incurs no dependence on the size of the state (or action) space. As an extension, we also consider the more challenging problem of model selection, where the state features are unknown and can be chosen from a large candidate set. We provide exponential lower bounds that illustrate the fundamental hardness of this problem, and develop a provably efficient algorithm under additional natural assumptions.
Robust Domain Randomization for Reinforcement Learning
Slaoui, Reda Bahi, Clements, William R., Foerster, Jakob N., Toth, Sébastien
Producing agents that can generalize to a wide range of environments is a significant challenge in reinforcement learning. One method for overcoming this issue is domain randomization, whereby at the start of each training episode some parameters of the environment are randomized so that the agent is exposed to many possible variations. However, domain randomization is highly inefficient and may lead to policies with high variance across domains. In this work, we formalize the domain randomization problem, and show that minimizing the policy's Lipschitz constant with respect to the randomization parameters leads to low variance in the learned policies. We propose a method where the agent only needs to be trained on one variation of the environment, and its learned state representations are regularized during training to minimize this constant. We conduct experiments that demonstrate that our technique leads to more efficient and robust learning than standard domain randomization, while achieving equal generalization scores.
Reinforcement Learning with Structured Hierarchical Grammar Representations of Actions
Christodoulou, Petros, Lange, Robert Tjarko, Shafti, Ali, Faisal, A. Aldo
From a young age humans learn to use grammatical principles to hierarchically combine words into sentences. Action grammars is the parallel idea, that there is an underlying set of rules (a "grammar") that govern how we hierarchically combine actions to form new, more complex actions. We introduce the Action Grammar Reinforcement Learning (AG-RL) framework which leverages the concept of action grammars to consistently improve the sample efficiency of Reinforcement Learning agents. AG-RL works by using a grammar inference algorithm to infer the "action grammar" of an agent midway through training. The agent's action space is then augmented with macro-actions identified by the grammar. We apply this framework to Double Deep Q-Learning (AG-DDQN) and a discrete action version of Soft Actor-Critic (AG-SAC) and find that it improves performance in 8 out of 8 tested Atari games (median +31%, max +668%) and 19 out of 20 tested Atari games (median +96%, maximum +3,756%) respectively without substantive hyperparameter tuning. We also show that AG-SAC beats the model-free state-of-the-art for sample efficiency in 17 out of the 20 tested Atari games (median +62%, maximum +13,140%), again without substantive hyperparameter tuning.
Learning Preferences by Looking at the World
It would be great if we could all have household robots do our chores for us. Chores are tasks that we want done to make our houses cater more to our preferences; they are a way in which we want our house to be different from the way it currently is. However, most "different" states are not very desirable: Surely our robot wouldn't be so dumb as to go around breaking stuff when we ask it to clean our house? Unfortunately, AI systems trained with reinforcement learning only optimize features specified in the reward function and are indifferent to anything we might've inadvertently left out. Generally, it is easy to get the reward wrong by forgetting to include preferences for things that should stay the same, since we are so used to having these preferences satisfied, and there are so many of them.
How Deep Reinforcement Learning Can Make Factories Efficient & Dispatch Products Faster
Manufacturing and production systems have a lot of catching up to do with the world of software. The manufacturing ecosystem has seen a lot of upgrade and innovation but it still lags in terms of software application. With the onslaught of artificial intelligence, new opportunities have opened for the sector to leverage new technology and improve productivity. In the recent decade, deep learning is driving most of the innovation in AI. Deep learning systems have found applications in a variety of fields such as healthcare, aviation, agriculture and many others.