Goto

Collaborating Authors

 Country


HiLLoC: Lossless Image Compression with Hierarchical Latent Variable Models

arXiv.org Machine Learning

We make the following striking observation: fully convolutional VAE models trained on 32x32 ImageNet can generalize well, not just to 64x64 but also to far larger photographs, with no changes to the model. We use this property, applying fully convolutional models to lossless compression, demonstrating a method to scale the VAE-based 'Bits-Back with ANS' algorithm for lossless compression to large color photographs, and achieving state of the art for compression of full size ImageNet images. We release Craystack, an open source library for convenient prototyping of lossless compression using probabilistic models, along with full implementations of all of our compression results.


Second-order Information in First-order Optimization Methods

arXiv.org Machine Learning

In this paper, we try to uncover the second-order essence of several first-order optimization methods. For Nesterov Accelerated Gradient, we rigorously prove that the algorithm makes use of the difference between past and current gradients, thus approximates the Hessian and accelerates the training. For adaptive methods, we related Adam and Adagrad to a powerful technique in computation statistics---Natural Gradient Descent. These adaptive methods can in fact be treated as relaxations of NGD with only a slight difference lying in the square root of the denominator in the update rules. Skeptical about the effect of such difference, we design a new algorithm---AdaSqrt, which removes the square root in the denominator and scales the learning rate by sqrt(T). Surprisingly, our new algorithm is comparable to various first-order methods(such as SGD and Adam) on MNIST and even beats Adam on CIFAR-10! This phenomenon casts doubt on the convention view that the square root is crucial and training without it will lead to terrible performance. As far as we have concerned, so long as the algorithm tries to explore second or even higher information of the loss surface, then proper scaling of the learning rate alone will guarantee fast training and good generalization performance. To the best of our knowledge, this is the first paper that seriously considers the necessity of square root among all adaptive methods. We believe that our work can shed light on the importance of higher-order information and inspire the design of more powerful algorithms in the future.


Dependable Neural Networks for Safety Critical Tasks

arXiv.org Machine Learning

Neural Networks are being integrated into safety critical systems, e.g., perception systems for autonomous vehicles, which require trained networks to perform safely in novel scenarios. It is challenging to verify neural networks because their decisions are not explainable, they cannot be exhaustively tested, and finite test samples cannot capture the variation across all operating conditions. Existing work seeks to train models robust to new scenarios via domain adaptation, style transfer, or few-shot learning. But these techniques fail to predict how a trained model will perform when the operating conditions differ from the testing conditions. We propose a metric, Machine Learning (ML) Dependability, that measures the network's probability of success in specified operating conditions which need not be the testing conditions. In addition, we propose the metrics Task Undependability and Harmful Undependability to distinguish network failures by their consequences. We evaluate the performance of a Neural Network agent trained using Reinforcement Learning in a simulated robot manipulation task. Our results demonstrate that we can accurately predict the ML Dependability, Task Undependability, and Harmful Undependability for operating conditions that are significantly different from the testing conditions. Finally, we design a Safety Function, using harmful failures identified during testing, that reduces harmful failures, in one example, by a factor of 700 while maintaining a high probability of success.


Meta-Graph: Few shot Link Prediction via Meta Learning

arXiv.org Machine Learning

We consider the task of few shot link prediction, where the goal is to predict missing edges across multiple graphs using only a small sample of known edges. We show that current link prediction methods are generally ill-equipped to handle this task--as they cannot effectively transfer knowledge between graphs in a multi-graph setting and are unable to effectively learn from very sparse data. To address this challenge, we introduce a new gradient-based meta learning framework, Meta-Graph, that leverages higher-order gradients along with a learned graph signature function that conditionally generates a graph neural network initialization. Using a novel set of few shot link prediction benchmarks, we show that Meta-Graph enables not only fast adaptation but also better final convergence and can effectively learn using only a small sample of true edges. Given a graph representing known relationships between a set of nodes, the goal of link prediction is to learn from the graph and infer novel or previously unknown relationships (Liben-Nowell & Kleinberg, 2003). For instance, in a social network we may use link prediction to power a friendship recommendation system (Aiello et al., 2012), or in the case of biological network data we might use link prediction to infer possible relationships between drugs, proteins, and diseases (Zitnik & Leskovec, 2017). However, despite its popularity, previous work on link prediction generally focuses only on one particular problem setting: it generally assumes that link prediction is to be performed on a single large graph and that this graph is relatively complete, i.e., that at least 50% of the true edges are observed during training (e.g., see Grover & Leskovec, 2016; Kipf & Welling, 2016b; Liben-Nowell & Kleinberg, 2003; L u & Zhou, 2011).


Lightweight and Unobtrusive Privacy Preservation for Remote Inference via Edge Data Obfuscation

arXiv.org Machine Learning

The growing momentum of instrumenting the Internet of Things (IoT) with advanced machine learning techniques such as deep neural networks (DNNs) faces two practical challenges of limited compute power of edge devices and the need of protecting the confidentiality of the DNNs. The remote inference scheme that executes the DNNs on the server-class or cloud backend can address the above two challenges. However, it brings the concern of leaking the privacy of the IoT devices' users to the curious backend since the user-generated/related data is to be transmitted to the backend. This work develops a lightweight and unobtrusive approach to obfuscate the data before being transmitted to the backend for remote inference. In this approach, the edge device only needs to execute a small-scale neural network, incurring light compute overhead. Moreover, the edge device does not need to inform the backend on whether the data is obfuscated, making the protection unobtrusive. We apply the approach to three case studies of free spoken digit recognition, handwritten digit recognition, and American sign language recognition. The evaluation results obtained from the case studies show that our approach prevents the backend from obtaining the raw forms of the inference data while maintaining the DNN's inference accuracy at the backend.


Explainability and Adversarial Robustness for RNNs

arXiv.org Machine Learning

Recurrent Neural Networks (RNNs) yield attractive properties for constructing Intrusion Detection Systems (IDSs) for network data. With the rise of ubiquitous Machine Learning (ML) systems, malicious actors have been catching up quickly to find new ways to exploit ML vulnerabilities for profit. Recently developed adversarial ML techniques focus on computer vision and their applicability to network traffic is not straightforward: Network packets expose fewer features than an image, are sequential and impose several constraints on their features. We show that despite these completely different characteristics, adversarial samples can be generated reliably for RNNs. To understand a classifier's potential for misclassification, we extend existing explainability techniques and propose new ones, suitable particularly for sequential data. Applying them shows that already the first packets of a communication flow are of crucial importance and are likely to be targeted by attackers. Feature importance methods show that even relatively unimportant features can be effectively abused to generate adversarial samples. Since traditional evaluation metrics such as accuracy are not sufficient for quantifying the adversarial threat, we propose the Adversarial Robustness Score (ARS) for comparing IDSs, capturing a common notion of adversarial robustness, and show that an adversarial training procedure can significantly and successfully reduce the attack surface.


Prediction of Physical Load Level by Machine Learning Analysis of Heart Activity after Exercises

arXiv.org Machine Learning

The assessment of energy expenditure in real life is of great importance for monitoring the current physical state of people, especially in work, sport, elderly care, health care, and everyday life even. This work reports about application of some machine learning methods (linear regression, linear discriminant analysis, k-nearest neighbors, decision tree, random forest, Gaussian naive Bayes, support-vector machine) for monitoring energy expenditures in athletes. The classification problem was to predict the known level of the in-exercise loads (in three categories by calories) by the heart rate activity features measured during the short period of time (1 minute only) after training, i.e by features of the post-exercise load. The results obtained shown that the post-exercise heart activity features preserve the information of the in-exercise training loads and allow us to predict their actual in-exercise levels. The best performance can be obtained by the random forest classifier with all 8 heart rate features (micro-averaged area under curve value AUCmicro = 0.87 and macro-averaged one AUCmacro = 0.88) and the k-nearest neighbors classifier with 4 most important heart rate features (AUCmicro = 0.91 and AUCmacro = 0.89). The limitations and perspectives of the ML methods used are outlined, and some practical advices are proposed as to their improvement and implementation for the better prediction of in-exercise energy expenditures.


Background Hardly Matters: Understanding Personality Attribution in Deep Residual Networks

arXiv.org Machine Learning

Perceived personality traits attributed to an individual do not have to correspond to their actual personality traits and may be determined in part by the context in which one encounters a person. These apparent traits determine, to a large extent, how other people will behave towards them. Deep neural networks are increasingly being used to perform automated personality attribution (e.g., job interviews). It is important that we understand the driving factors behind the predictions, in humans and in deep neural networks. This paper explicitly studies the effect of the image background on apparent personality prediction while addressing two important confounds present in existing literature; overlapping data splits and including facial information in the background. Surprisingly, we found no evidence that background information improves model predictions for apparent personality traits. In fact, when background is explicitly added to the input, a decrease in performance was measured across all models.


SCR-Apriori for Mining `Sets of Contrasting Rules'

arXiv.org Machine Learning

--In this paper, we propose an efficient algorithm for mining novel'Set of Contrasting Rules'-pattern (SCR-pattern), which consists of several association rules. This pattern is of high interest due to the guaranteed quality of the rules forming it and its ability to discover useful knowledge. However, SCR-pattern has no efficient mining algorithm. We propose SCR-Apriori algorithm, which results in the same set of SCR-patterns as the state-of-the-art approache, but is less computationally expensive. We also show experimentally that by incorporating the knowledge about the pattern structure into Apriori algorithm, SCR-Apriori can significantly prune the search space of frequent itemsets to be analysed. I NTRODUCTION Association rules learning is a popular technique in data mining [1]. However, it is known that finding rules of high quality is not always an easy task [2]. This issue is even more significant in domains where the reliability of the obtained knowledge is required to be high (for example, in medicine). Also, association rules mining techniques usually generate a huge number of rules that have to be analysed by a human in order to choose meaningful and useful ones [3].


A Survey on Distributed Machine Learning

arXiv.org Machine Learning

The demand for artificial intelligence has grown significantly over the last decade and this growth has been fueled by advances in machine learning techniques and the ability to leverage hardware acceleration. However, in order to increase the quality of predictions and render machine learning solutions feasible for more complex applications, a substantial amount of training data is required. Although small machine learning models can be trained with modest amounts of data, the input for training larger models such as neural networks grows exponentially with the number of parameters. Since the demand for processing training data has outpaced the increase in computation power of computing machinery, there is a need for distributing the machine learning workload across multiple machines, and turning the centralized into a distributed system. These distributed systems present new challenges, first and foremost the efficient parallelization of the training process and the creation of a coherent model. This article provides an extensive overview of the current state-of-the-art in the field by outlining the challenges and opportunities of distributed machine learning over conventional (centralized) machine learning, discussing the techniques used for distributed machine learning, and providing an overview of the systems that are available.