AITopics

Deep neural networks frequently contain far more weights, represented at a higher precision, than are required for the specific task which they are trained to perform. Consequently, they can often be compressed using techniques such as weight pruning and quantization that reduce both model size and inference time without appreciable loss in accuracy. Compressing models before they are deployed can therefore result in significantly more efficient systems. However, while the results are desirable, finding the best compression strategy for a given neural network, target platform, and optimization objective often requires extensive experimentation. Moreover, finding optimal hyperparameters for a given compression strategy typically results in even more expensive, frequently manual, trial-and-error exploration. In this paper, we introduce a programmable system for model compression called Condensa. Users programmatically compose simple operators, in Python, to build complex compression strategies. Given a strategy and a user-provided objective, such as minimization of running time, Condensa uses a novel sample-efficient constrained Bayesian optimization algorithm to automatically infer desirable sparsity ratios. Our experiments on three real-world image classification and language modeling tasks demonstrate memory footprint reductions of up to 65x and runtime throughput improvements of up to 2.22x using at most 10 samples per search. We have released a reference implementation of Condensa at https://github.com/NVlabs/condensa.

arxiv preprint arxiv, pruning, sparsity ratio, (15 more...)

1911.02497

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > Canada > Alberta (0.14)
North America > United States > Utah (0.04)

Genre: Research Report (0.64)

Industry: Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

OpenML-Python: an extensible Python API for OpenML

Feurer, Matthias, van Rijn, Jan N., Kadra, Arlind, Gijsbers, Pieter, Mallik, Neeratyoy, Ravi, Sahithya, Müller, Andreas, Vanschoren, Joaquin, Hutter, Frank

University of Freiburg, Freiburg & Bosch Center for Artificial Intelligence, Germany Abstract OpenML is an online platform for open science collaboration in machine learning, used to share datasets and results of machine learning experiments. In this paper we introduce OpenML-Python, a client API for Python, opening up the OpenML platform for a wide range of Python-based tools. It provides easy access to all datasets, tasks and experiments on OpenML from within Python. It also provides functionality to conduct machine learning experiments, upload the results to OpenML, and reproduce results which are stored on OpenML. Furthermore, it comes with a scikit-learn plugin and a plugin mechanism to easily integrate other machine learning libraries written in Python into the OpenML ecosystem.

experiment, openml-python, university, (11 more...)

1911.0249

Country:

Europe > Germany > Baden-Württemberg > Freiburg (0.49)
Europe > Netherlands > North Brabant > Eindhoven (0.07)
Europe > Netherlands > South Holland > Leiden (0.05)
North America > United States > New York (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Balayn, Agathe, Bozzon, Alessandro

Designing Evaluations of Machine Learning Models for Subjective Inference: The Case of Sentence Toxicity

Machine Learning (ML) is increasingly applied in real-life scenarios, raising concerns about bias in automatic decision making. We focus on bias as a notion of opinion exclusion, that stems from the direct application of traditional ML pipelines to infer subjective properties. We argue that such ML systems should be evaluated with subjectivity and bias in mind. Considering the lack of evaluation standards yet to create evaluation benchmarks, we propose an initial list of specifications to define prior to creating evaluation datasets, in order to later accurately evaluate the biases. With the example of a sentence toxicity inference system, we illustrate how the specifications support the analysis of biases related to subjectivity. We highlight difficulties in instantiating these specifications and list future work for the crowdsourcing community to help the creation of appropriate evaluation datasets.

dataset, evaluation, subjectivity, (16 more...)

1911.02471

Country:

North America > United States > Hawaii (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > United Kingdom > Scotland > City of Glasgow > Glasgow (0.04)
(3 more...)

Genre: Research Report (0.50)

Industry: Information Technology (0.68)

Technology:

Information Technology > Communications > Social Media > Crowdsourcing (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Lucas, James, Tucker, George, Grosse, Roger, Norouzi, Mohammad

Don't Blame the ELBO! A Linear VAE Perspective on Posterior Collapse

Posterior collapse in Variational Autoencoders (VAEs) arises when the variational posterior distribution closely matches the prior for a subset of latent variables. This paper presents a simple and intuitive explanation for posterior collapse through the analysis of linear VAEs and their direct correspondence with Probabilistic PCA (pPCA). We explain how posterior collapse may occur in pPCA due to local maxima in the log marginal likelihood. Unexpectedly, we prove that the ELBO objective for the linear VAE does not introduce additional spurious local maxima relative to log marginal likelihood. We show further that training a linear VAE with exact variational inference recovers an identifiable global maximum corresponding to the principal component directions. Empirically, we find that our linear analysis is predictive even for high-capacity, non-linear VAEs and helps explain the relationship between the observation noise, local maxima, and posterior collapse in deep Gaussian VAEs.

posterior collapse, stationary point, variational distribution, (12 more...)

1911.02469

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)

Balayn, Agathe, Bozzon, Alessandro, Szlavik, Zoltan

Unfairness towards subjective opinions in Machine Learning

Despite the high interest for Machine Learning (ML) in academia and industry, many issues related to the application of ML to real-life problems are yet to be addressed. Here we put forward one limitation which arises from a lack of adaptation of ML models and datasets to specific applications. We formalise a new notion of unfairness as exclusion of opinions. We propose ways to quantify this unfairness, and aid understanding its causes through visualisation. These insights into the functioning of ML-based systems hint at methods to mitigate unfairness.

annotation, annotator, unfairness, (11 more...)

1911.02455

Country:

Europe > Netherlands > South Holland > Delft (0.05)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Yang, Zhaohui, Chen, Mingzhe, Saad, Walid, Hong, Choong Seon, Shikh-Bahaei, Mohammad

Energy Efficient Federated Learning Over Wireless Communication Networks

In this paper, the problem of energy efficient transmission and computation resource allocation for federated learning (FL) over wireless communication networks is investigated. In the considered model, each user exploits limited local computational resources to train a local FL model with its collected data and, then, sends the trained FL model parameters to a base station (BS) which aggregates the local FL model and broadcasts it back to all of the users. Since FL involves an exchange of a learning model between users and the BS, both computation and communication latencies are determined by the learning accuracy level. Meanwhile, due to the limited energy budget of the wireless users, both local computation energy and transmission energy must be considered during the FL process. This joint learning and communication problem is formulated as an optimization problem whose goal is to minimize a weighted sum of the completion time of FL, local computation energy, and transmission energy of all users, that captures the tradeoff of latency and energy consumption for FL. To solve this problem, an iterative algorithm is proposed where, at every step, closed-form solutions for time allocation, bandwidth allocation, power control, computation frequency, and learning accuracy are derived. For the special case that only minimizes the completion time, a bisection-based algorithm is proposed to obtain the optimal solution. Numerical results show that the proposed algorithms can reduce up to 25.6% delay and 37.6% energy consumption compared to conventional FL methods.

completion time, log 2, optimal solution, (16 more...)

1911.02417

Country:

North America > United States > Virginia (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(4 more...)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology (0.93)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.70)

Searching to Exploit Memorization Effect in Learning from Corrupted Labels

Yang, Hansi, Yao, Quanming, Han, Bo, Niu, Gang

Sample-selection approaches, which attempt to pick up clean instances from the noisy training data set, have become one promising direction to robust learning from corrupted labels. These methods all build on the memorization effect, which means deep networks learn easy patterns first and then gradually over-fit the training data set. In this paper, we show how to properly select instances so that the training process can benefit the most from the memorization effect is a hard problem. Specifically, memorization can heavily depend on many factors, e.g., data set and network architecture. Nonetheless, there still exist general patterns of how memorization can occur. These facts motivate us to exploit memorization by automated machine learning (AutoML) techniques. First, we design an expressive but compact search space based on observed general patterns. Then, we propose to use the natural gradient-based search algorithm to efficiently search through space. Finally, extensive experiments on both synthetic data sets and benchmark data sets demonstrate that the proposed method can not only be much efficient than existing AutoML algorithms but can also achieve much better performance than the state-of-the-art approaches for learning from corrupted labels.

algorithm, memorization effect, search space, (13 more...)

1911.02377

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Asia > China > Hong Kong (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)

Munin, Evgenii, Blais, Antoine, Couellan, Nicolas

Convolutional Neural Network for Multipath Detection in GNSS Receivers

Global Navigation Satellite System (GNSS) signals are subject to different kinds of events causing significant errors in positioning. This work explores the application of Machine Learning (ML) methods of anomaly detection applied to GNSS receiver signals. More specifically, our study focuses on multipath contamination, using samples of the correlator output signal. The GPS L1 C/A signal data is used and sourced directly from the correlator output. To extract the important features and patterns from such data, we use deep convolutional neural networks (CNN), which have proven to be efficient in image analysis in particular. To take advantage of CNN, the correlator output signal is mapped as a 2D input image and fed to the convolutional layers of a neural network. The network automatically extracts the relevant features from the input samples and proceeds with the multipath detection. We train the CNN using synthetic signals. To optimize the model architecture with respect to the GNSS correlator complexity, the evaluation of the CNN performance is done as a function of the number of correlator output points.

convolutional layer, correlator output, detection, (14 more...)

1911.02347

Country:

Europe > France > Occitanie > Haute-Garonne > Toulouse (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (1.00)

Industry: Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Klopfenstein, Quentin, Vaiter, Samuel

Linear Support Vector Regression with Linear Constraints

This paper studies the addition of linear constraints to the Support Vector Regression (SVR) when the kernel is linear. Adding those constraints into the problem allows to add prior knowledge on the estimator obtained, such as finding probability vector or monotone data. We propose a generalization of the Sequential Minimal Optimization (SMO) algorithm for solving the optimization problem with linear constraints and prove its convergence. Then, practical performances of this estimator are shown on simulated and real datasets with different settings: non negative regression, regression onto the simplex for biomedical data and isotonic regression for weather forecast.

algorithm, estimator, optimization problem, (14 more...)

1911.02306

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > France > Bourgogne-Franche-Comté > Côte-d'Or > Dijon (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.67)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.92)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Ghasemipour, Seyed Kamyar Seyed, Zemel, Richard, Gu, Shixiang

A Divergence Minimization Perspective on Imitation Learning Methods

In many settings, it is desirable to learn decision-making and control policies through learning or bootstrapping from expert demonstrations. The most common approaches under this Imitation Learning (IL) framework are Behavioural Cloning (BC), and Inverse Reinforcement Learning (IRL). Recent methods for IRL have demonstrated the capacity to learn effective policies with access to a very limited set of demonstrations, a scenario in which BC methods often fail. Unfortunately, due to multiple factors of variation, directly comparing these methods does not provide adequate intuition for understanding this difference in performance. In this work, we present a unified probabilistic perspective on IL algorithms based on divergence minimization. We present $f$-MAX, an $f$-divergence generalization of AIRL [Fu et al., 2018], a state-of-the-art IRL method. $f$-MAX enables us to relate prior IRL methods such as GAIL [Ho & Ermon, 2016] and AIRL [Fu et al., 2018], and understand their algorithmic properties. Through the lens of divergence minimization we tease apart the differences between BC and successful IRL approaches, and empirically evaluate these nuances on simulated high-dimensional continuous control domains. Our findings conclusively identify that IRL's state-marginal matching objective contributes most to its superior performance. Lastly, we apply our new understanding of IL methods to the problem of state-marginal matching, where we demonstrate that in simulated arm pushing environments we can teach agents a diverse range of behaviours using simply hand-specified state distributions and no reward functions or expert demonstrations. For datasets and reproducing results please refer to https://github.com/KamyarGh/rl_swiss/blob/master/reproducing/fmax_paper.md .

demonstration, exp, expert demonstration, (14 more...)

1911.02256

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > California > Santa Clara County > Mountain View (0.04)
(2 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)