Oceania
Continual learning with hypernetworks
von Oswald, Johannes, Henning, Christian, Sacramento, João, Grewe, Benjamin F.
Artificial neural networks suffer from catastrophic forgetting when they are sequentially trained on multiple tasks. To overcome this problem, we present a novel approach based on task-conditioned hypernetworks, i.e., networks that generate the weights of a target model based on task identity. Continual learning (CL) is less difficult for this class of models thanks to a simple key observation: instead of relying on recalling the input-output relations of all previously seen data, task-conditioned hypernetworks only require rehearsing previous weight realizations, which can be maintained in memory using a simple regularizer. Besides achieving good performance on standard CL benchmarks, additional experiments on long task sequences reveal that task-conditioned hypernetworks display an unprecedented capacity to retain previous memories. Notably, such long memory lifetimes are achieved in a compressive regime, when the number of trainable weights is comparable or smaller than target network size. We provide insight into the structure of low-dimensional task embedding spaces (the input space of the hypernetwork) and show that task-conditioned hypernetworks demonstrate transfer learning properties. Finally, forward information transfer is further supported by empirical results on a challenging CL benchmark based on the CIFAR-10/100 image datasets.
Hybrid Machine Learning Forecasts for the FIFA Women's World Cup 2019
Groll, Andreas, Ley, Christophe, Schauberger, Gunther, Van Eetvelde, Hans, Zeileis, Achim
In this work, we combine two different ranking methods together with several other predictors in a joint random forest approach for the scores of soccer matches. The first ranking method is based on the bookmaker consensus, the second ranking method estimates adequate ability parameters that reflect the current strength of the teams best. The proposed combined approach is then applied to the data from the two previous FIFA Women's World Cups 2011 and 2015. Finally, based on the resulting estimates, the FIFA Women's World Cup 2019 is simulated repeatedly and winning probabilities are obtained for all teams. The model clearly favors the defending champion USA before the host France.
Neural Network-based Object Classification by Known and Unknown Features (Based on Text Queries)
Artemov, A., Bolokhov, I., Kem, D., Khasenevich, I.
The article presents a method that improves the quality of classification of objects described by a combination of known and unknown features. The method is based on modernized Informational Neurobayesian Approach with consideration of unknown features. The proposed method was developed and trained on 1500 text queries of Promobot users in Russian to classify them into 20 categories (classes). As a result, the use of the method allowed to completely solve the problem of misclassification for queries with combining known and unknown features of the model. The theoretical substantiation of the method is presented by the formulated and proved theorem On the Model with Limited Knowledge. It states, that in conditions of limited data, an equal number of equally unknown features of an object cannot have different significance for the classification problem. Keywords: Informational Neurobayesian Approach, Neural describes a car, a race car, or an excavator. Unknown words Networks, Unknown Features, Machine Learning, NLP may bring us closer or farther from these categories. For example, "super-fast"- identifies a race car, and "a dipper" resembles an excavator.
A Fast-Optimal Guaranteed Algorithm For Learning Sub-Interval Relationships in Time Series
Agrawal, Saurabh, Verma, Saurabh, Karpatne, Anuj, Liess, Stefan, Chatterjee, Snigdhansu, Kumar, Vipin
Traditional approaches focus on finding relationships between two entire time series, however, many interesting relationships exist in small sub-intervals of time and remain feeble during other sub-intervals. We define the notion of a sub-interval relationship (SIR) to capture such interactions that are prominent only in certain sub-intervals of time. To that end, we propose a fast-optimal guaranteed algorithm to find most interesting SIR relationship in a pair of time series. Lastly, we demonstrate the utility of our method in climate science domain based on a real-world dataset along with its scalability scope and obtain useful domain insights.
Prototype Propagation Networks (PPN) for Weakly-supervised Few-shot Learning on Category Graph
Liu, Lu, Zhou, Tianyi, Long, Guodong, Jiang, Jing, Yao, Lina, Zhang, Chengqi
A variety of machine learning applications expect to achieve rapid learning from a limited number of labeled data. However, the success of most current models is the result of heavy training on big data. Meta-learning addresses this problem by extracting common knowledge across different tasks that can be quickly adapted to new tasks. However, they do not fully explore weakly-supervised information, which is usually free or cheap to collect. In this paper, we show that weakly-labeled data can significantly improve the performance of meta-learning on few-shot classification. We propose prototype propagation network (PPN) trained on few-shot tasks together with data annotated by coarse-label. Given a category graph of the targeted fine-classes and some weakly-labeled coarse-classes, PPN learns an attention mechanism which propagates the prototype of one class to another on the graph, so that the K-nearest neighbor (KNN) classifier defined on the propagated prototypes results in high accuracy across different few-shot tasks. The training tasks are generated by subgraph sampling, and the training objective is obtained by accumulating the level-wise classification loss on the subgraph. The resulting graph of prototypes can be continually re-used and updated for new tasks and classes. We also introduce two practical test/inference settings which differ according to whether the test task can leverage any weakly-supervised information as in training. On two benchmarks, PPN significantly outperforms most recent few-shot learning methods in different settings, even when they are also allowed to train on weakly-labeled data.
DOM Pizza Checker
Me! I am a world-first smart scanner that checks the quality of every Domino's pizza before it goes out the door. I've done my research and I know that nowadays it's ALL about looking good, so every pizza must be #nofilter Insta-worthy and meet our extremely high Quality Guarantee. With me in every store across Australia and New Zealand, product quality and consistency is about to go THROUGH THE ROOF! I sit above the cut bench – that's the area every pizza goes before being cut, boxed and delivered (not something in your gym class). I take a picture of the pizza and can recognise, analyse and grade pizzas based on pizza type, correct toppings and even whether the cheese is evenly spread!
Memorized Sparse Backpropagation
Zhang, Zhiyuan, Yang, Pengcheng, Ren, Xuancheng, Sun, Xu
Neural network learning is typically slow since backpropagation needs to compute full gradients and backpropagate them across multiple layers. Despite its success of existing work in accelerating propagation through sparseness, the relevant theoretical characteristics remain unexplored and we empirically find that they suffer from the loss of information contained in unpropagated gradients. To tackle these problems, in this work, we present a unified sparse backpropagation framework and provide a detailed analysis of its theoretical characteristics. Analysis reveals that when applied to a multilayer perceptron, our framework essentially performs gradient descent using an estimated gradient similar enough to the true gradient, resulting in convergence in probability under certain conditions. Furthermore, a simple yet effective algorithm named memorized sparse backpropagation (MSBP) is proposed to remedy the problem of information loss by storing unpropagated gradients in memory for the next learning. The experiments demonstrate that the proposed MSBP is able to effectively alleviate the information loss in traditional sparse backpropagation while achieving comparable acceleration.
Patch Learning
There have been different strategies to improve the performance of a machine learning model, e.g., increasing the depth, width, and/or nonlinearity of the model, and using ensemble learning to aggregate multiple base/weak learners in parallel or in series. This paper proposes a novel strategy called patch learning (PL) for this problem. It consists of three steps: 1) train an initial global model using all training data; 2) identify from the initial global model the patches which contribute the most to the learning error, and train a (local) patch model for each such patch; and, 3) update the global model using training data that do not fall into any patch. To use a PL model, we first determine if the input falls into any patch. If yes, then the corresponding patch model is used to compute the output. Otherwise, the global model is used. We explain in detail how PL can be implemented using fuzzy systems. Five regression problems on 1D/2D/3D curve fitting, nonlinear system identification, and chaotic time-series prediction, verified its effectiveness. To our knowledge, the PL idea has not appeared in the literature before, and it opens up a promising new line of research in machine learning.
The Principle of Unchanged Optimality in Reinforcement Learning Generalization
Several recent papers have examined generalization in reinforcement learning (RL), by proposing new environments or ways to add noise to existing environments, then benchmarking algorithms and model architectures on those environments. We discuss subtle conceptual properties of RL benchmarks that are not required in supervised learning (SL), and also properties that an RL benchmark should possess. Chief among them is one we call the principle of unchanged optimality: there should exist a single $\pi$ that is optimal across all train and test tasks. In this work, we argue why this principle is important, and ways it can be broken or satisfied due to subtle choices in state representation or model architecture. We conclude by discussing challenges and future lines of research in theoretically analyzing generalization benchmarks.
Facebook patents high-tech drone that uses kites to stay in the air for long periods of time
Facebook has patented a high-tech drone that uses a unique apparatus to stay afloat. The filing, titled'Dual-kite aerial vehicle,' describes an unmanned aerial vehicle that is attached to two kites and can be flown at different altitudes. The kites allow the drone to remain in the air for an extended period of time'while consuming little or no fuel,' according to the patent. Facebook has patented a high-tech drone that uses a unique apparatus to stay afloat. The filing, 'Dual-kite aerial vehicle,' describes an unmanned aerial vehicle tethered to two kites The drone is attached to the two kites via a tether, which are each able to maintain flight at different altitudes.