Country
An Iterative Polishing Framework based on Quality Aware Masked Language Model for Chinese Poetry Generation
Deng, Liming, Wang, Jie, Liang, Hangming, Chen, Hui, Xie, Zhiqiang, Zhuang, Bojin, Wang, Shaojun, Xiao, Jing
Owing to its unique literal and aesthetical characteristics, automatic generation of Chinese poetry is still challenging in Artificial Intelligence, which can hardly be straightforwardly realized by end-to-end methods. In this paper, we propose a novel iterative polishing framework for highly qualified Chinese poetry generation. In the first stage, an encoder-decoder structure is utilized to generate a poem draft. Afterwards, our proposed Quality-Aware Masked Language Model (QAMLM) is employed to polish the draft towards higher quality in terms of linguistics and literalness. Based on a multi-task learning scheme, QA-MLM is able to determine whether polishing is needed based on the poem draft. Furthermore, QAMLM is able to localize improper characters of the poem draft and substitute with newly predicted ones accordingly. Benefited from the masked language model structure, QAMLM incorporates global context information into the polishing process, which can obtain more appropriate polishing results than the unidirectional sequential decoding. Moreover, the iterative polishing process will be terminated automatically when QA-MLM regards the processed poem as a qualified one. Both human and automatic evaluation have been conducted, and the results demonstrate that our approach is effective to improve the performance of encoder-decoder structure.
Learning Domain-Independent Planning Heuristics with Hypergraph Networks
Shen, William, Trevizan, Felipe, Thiรฉbaux, Sylvie
We present the first approach capable of learning domain-independent planning heuristics entirely from scratch. The heuristics we learn map the hypergraph representation of the delete-relaxation of the planning problem at hand, to a cost estimate that approximates that of the least-cost path from the current state to the goal through the hypergraph. We generalise Graph Networks to obtain a new framework for learning over hypergraphs, which we specialise to learn planning heuristics by training over state/value pairs obtained from optimal cost plans. Our experiments show that the resulting architecture, STRIPS-HGNs, is capable of learning heuristics that are competitive with existing delete-relaxation heuristics including LM-cut. We show that the heuristics we learn are able to generalise across different problems and domains, including to domains that were not seen during training.
Procedural Content Generation: From Automatically Generating Game Levels to Increasing Generality in Machine Learning
Risi, Sebastian, Togelius, Julian
The idea behind procedural content generation (PCG) in games is to create content automatically, using algorithms, instead of relying on user-designed content. While PCG approaches have traditionally focused on creating content for video games, they are now being applied to all kinds of virtual environments, thereby enabling training of machine learning systems that are significantly more general. For example, PCG's ability to generate never-ending streams of new levels has allowed DeepMind's Capture the Flag agent to reach beyond human-level-performance. Additionally, PCG-inspired methods such as domain randomization enabled OpenAI's robot arm to learn to manipulate objects with unprecedented dexterity. Level generation in 2D arcade games has also illuminated some shortcomings of standard deep RL methods, suggesting potential ways to train more general policies. This Review looks at key aspect of PCG approaches, including its ability to (1) enable new video games (such as No Man's Sky), (2) create open-ended learning environments, (3) combat overfitting in supervised and reinforcement learning tasks, and (4) create better benchmarks that could ultimately spur the development of better learning algorithms. We hope this article can introduce the broader machine learning community to PCG, which we believe will be a critical tool in creating a more general machine intelligence.
Distributed Soft Actor-Critic with Multivariate Reward Representation and Knowledge Distillation
In this paper, we describe NeurIPS 2019 Learning to Move - Walk Around challenge physics-based environment and present our solution to this competition which scored 1303.727 mean reward points and took 3rd place. Our method combines recent advances from both continuous- and discrete-action space reinforcement learning, such as Soft Actor-Critic and Recurrent Experience Replay in Distributed Reinforcement Learning. We trained our agent in two stages: to move somewhere at the first stage and to follow the target velocity field at the second stage. We also introduce novel Q-function split technique, which we believe facilitates the task of training an agent, allows critic pretraining and reusing it for solving harder problems, and mitigate reward shaping design efforts.
Class Teaching for Inverse Reinforcement Learners
Lopes, Manuel, Melo, Francisco
In this paper we propose the first machine teaching algorithm for multiple inverse reinforcement learners. Specifically, our contributions are: (i) we formally introduce the problem of teaching a sequential task to a heterogeneous group of learners; (ii) we identify conditions under which it is possible to conduct such teaching using the same demonstration for all learners; and (iii) we propose and evaluate a simple algorithm that computes a demonstration(s) ensuring that all agents in a heterogeneous class learn a task description that is compatible with the target task. Our analysis shows that, contrary to other teaching problems, teaching a heterogeneous class with a single demonstration may not be possible as the differences between agents increase. We also showcase the advantages of our proposed machine teaching approach against several possible alternatives.
Transflow Learning: Repurposing Flow Models Without Retraining
Gambardella, Andrew, Baydin, Atฤฑlฤฑm Gรผneล, Torr, Philip H. S.
It is well known that deep generative models have a rich latent space, and that it is possible to smoothly manipulate their outputs by traversing this latent space. Recently, architectures have emerged that allow for more complex manipulations, such as making an image look as though it were from a different class, or painted in a certain style. These methods typically require large amounts of training in order to learn a single class of manipulations. We present Transflow Learning, a method for transforming a pre-trained generative model so that its outputs more closely resemble data that we provide afterwards. In contrast to previous methods, Transflow Learning does not require any training at all, and instead warps the probability distribution from which we sample latent vectors using Bayesian inference. Transflow Learning can be used to solve a wide variety of tasks, such as neural style transfer and few-shot classification.
Adversarially Robust Low Dimensional Representations
Awasthi, Pranjal, Chatziafratis, Vaggos, Chen, Xue, Vijayaraghavan, Aravindan
Adversarial or test time robustness measures the susceptibility of a machine learning system to small perturbations made to the input at test time. This has attracted much interest on the empirical side, since many existing ML systems perform poorly under imperceptible adversarial perturbations to the test inputs. On the other hand, our theoretical understanding of this phenomenon is limited, and has mostly focused on supervised learning tasks. In this work we study the problem of computing adversarially robust representations of data. We formulate a natural extension of Principal Component Analysis (PCA) where the goal is to find a low dimensional subspace to represent the given data with minimum projection error, and that is in addition robust to small perturbations measured in $\ell_q$ norm (say $q=\infty$). Unlike PCA which is solvable in polynomial time, our formulation is computationally intractable to optimize as it captures the well-studied sparse PCA objective. We show the following algorithmic and statistical results. - Polynomial time algorithms in the worst-case that achieve constant factor approximations to the objective while only violating the robustness constraint by a constant factor. - We prove that our formulation (and algorithms) also enjoy significant statistical benefits in terms of sample complexity over standard PCA on account of a "regularization effect", that is formalized using the well-studied spiked covariance model. - Surprisingly, we show that our algorithmic techniques can also be made robust to corruptions in the training data, in addition to yielding representations that are robust at test time! Here an adversary is allowed to corrupt potentially every data point up to a specified amount in the $\ell_q$ norm. We further apply these techniques for mean estimation and clustering under adversarial corruptions to the training data.
Short Term Prediction of Parking Area states Using Real Time Data and Machine Learning Techniques
Provoost, Jesper, Wismans, Luc, Van der Drift, Sander, Kamilaris, Andreas, Van Keulen, Maurice
Public road authorities and private mobility service providers need information derived from the current and predicted traffic states to act upon the daily urban system and its spatial and temporal dynamics. In this research, a real-time parking area state (occupancy, in- and outflux) prediction model (up to 60 minutes ahead) has been developed using publicly available historic and real time data sources. Based on a case study in a real-life scenario in the city of Arnhem, a Neural Network-based approach outperforms a Random Forest-based one on all assessed performance measures, although the differences are small. Both are outperforming a naive seasonal random walk model. Although the performance degrades with increasing prediction horizon, the model shows a performance gain of over 150% at a prediction horizon of 60 minutes compared with the naive model. Furthermore, it is shown that predicting the in- and outflux is a far more difficult task (i.e. performance gains of 30%) which needs more training data, not based exclusively on occupancy rate. However, the performance of predicting in- and outflux is less sensitive to the prediction horizon. In addition, it is shown that real-time information of current occupancy rate is the independent variable with the highest contribution to the performance, although time, traffic flow and weather variables also deliver a significant contribution. During real-time deployment, the model performs three times better than the naive model on average. As a result, it can provide valuable information for proactive traffic management as well as mobility service providers.
DIFAR: Deep Image Formation and Retouching
Moran, Sean, Slabaugh, Gregory
Given (a) poorly exposed image, DIF AR(c) produces an image with pleasing contrast and colour better matching the groundtruth (d) compared to the state-of-the-art DeepUPE model [42] (b). Abstract W e present a novel neural network architecture for the image signal processing (ISP) pipeline. In a camera system, the ISP is a critical component that forms a high quality RGB image from RA W camera sensor data. Typical ISP pipelines sequentially apply a complex set of traditional image processing modules, such as demosaicing, denoising, tone mapping, etc. W e introduce a new deep network that replaces all these modules, dubbed Deep Image Formation And Retouching (DIFAR) . DIF AR introduces a multi-scale context-aware pixel-level block for local de-noising/demosaicing operations and a retouching block for global refinement of image colour, luminance and saturation. DIF AR can also be trained for RGB to RGB image enhancement. DIF AR is parameter-efficient and outperforms recently proposed deep learning approaches in both objective and perceptual metrics, setting new state-of-the-art performance on multiple datasets including Samsung S7 [38] and MIT-Adobe 5k [6]. 1. Introduction Image quality is of fundamental importance in any imaging system, including DSLR and smartphone cameras. At the imaging sensor, RA W data is normally captured on a color filter array (such as the well-known Bayer pattern) where at each pixel, only a red, green, or blue color is available. This mosaiced RA W data suffers from noise, vignetting, lack of white balance, and many other defects and additionally has a high dynamic range. The camera's image signal processing (ISP) pipeline is responsible for forming a high quality RGB image with minimal noise, pleasing colors, sharp detail, and good contrast from the degraded RA W data. In most cases, the ISP is realised as a modular sequence of traditional image signal processing algorithms (Figure 2) each responsible for a single well-defined image operation (e.g.
VIABLE: Fast Adaptation via Backpropagating Learned Loss
Feng, Leo, Zintgraf, Luisa, Peng, Bei, Whiteson, Shimon
In few-shot learning, typically, the loss function which is applied at test time is the one we are ultimately interested in minimising, such as the mean-squared-error loss for a regression problem. However, given that we have few samples at test time, we argue that the loss function that we are interested in minimising is not necessarily the loss function most suitable for computing gradients in a few-shot setting. We propose VIABLE, a generic meta-learning extension that builds on existing meta-gradient-based methods by learning a differentiable loss function, replacing the pre-defined inner-loop loss function in performing task-specific updates. We show that learning a loss function capable of leveraging relational information between samples reduces underfitting, and significantly improves performance and sample efficiency on a simple regression task. Furthermore, we show VIABLE is scalable by evaluating on the Mini-Imagenet dataset.