AITopics

1901.09515

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
North America > United States > Connecticut > New Haven County > New Haven (0.04)
North America > Canada > Quebec (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine (0.93)
Transportation > Air (0.84)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)

Mishchenko, Konstantin, Hanzely, Filip, Richtárik, Peter

99% of Parallel Optimization is Inevitably a Waste of Time

arXiv.org Machine LearningJan-27-2019

It is well known that many optimization methods, including SGD, SAGA, and Accelerated SGD for over-parameterized models, do not scale linearly in the parallel setting. In this paper, we present a new version of block coordinate descent that solves this issue for a number of methods. The core idea is to make the sampling of coordinate blocks on each parallel unit independent of the others. Surprisingly, we prove that the optimal number of blocks to be updated by each of $n$ units in every iteration is equal to $m/n$, where $m$ is the total number of blocks. As an illustration, this means that when $n=100$ parallel units are used, $99\%$ of work is a waste of time. We demonstrate that with $m/n$ blocks used by each unit the iteration complexity often remains the same. Among other applications which we mention, this fact can be exploited in the setting of distributed optimization to break the communication bottleneck. Our claims are justified by numerical experiments which demonstrate almost a perfect match with our theory on a number of datasets.

dataset, parallel optimization, tau 0, (12 more...)

1901.09437

Country:

Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)
Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
Asia > Russia (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)

Moreau, Thomas, Gramfort, Alexandre

Distributed Convolutional Dictionary Learning (DiCoDiLe): Pattern Discovery in Large Images and Signals

arXiv.org Machine LearningJan-26-2019

Convolutional dictionary learning (CDL) estimates shift invariant basis adapted to multidimensional data. CDL has proven useful for image denoising or inpainting, as well as for pattern discovery on multivariate signals. As estimated patterns can be positioned anywhere in signals or images, optimization techniques face the difficulty of working in extremely high dimensions with millions of pixels or time samples, contrarily to standard patch-based dictionary learning. To address this optimization problem, this work proposes a distributed and asynchronous algorithm, employing locally greedy coordinate descent and an asynchronous locking mechanism that does not require a central server. This algorithm can be used to distribute the computation on a number of workers which scales linearly with the encoded signal's size. Experiments confirm the scaling properties which allows us to learn patterns on large scales images from the Hubble Space Telescope.

algorithm, complexity, dicodile-z, (11 more...)

1901.09235

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > Texas > Dallas County > Dallas (0.04)
North America > United States > Oregon > Multnomah County > Portland (0.04)
(6 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Meyer, Raphael Arkady, Honorio, Jean

On the Statistical Efficiency of Optimal Kernel Sum Classifiers

arXiv.org Machine LearningJan-25-2019

We propose a novel combination of optimization tools with learning theory bounds in order to analyze the sample complexity of optimal kernel sum classifiers. This contrasts the typical learning theoretic results which hold for all (potentially suboptimal) classifiers. Our work also justifies assumptions made in prior work on multiple kernel learning. As a byproduct of our analysis, we also provide a new form of Rademacher complexity for hypothesis classes containing only optimal classifiers.

kernel, log 2, rademacher complexity, (12 more...)

1901.09087

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New York > New York County > New York City (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.49)

Zantedeschi, Valentina, Bellet, Aurélien, Tommasi, Marc

Communication-Efficient and Decentralized Multi-Task Boosting while Learning the Collaboration Graph

arXiv.org Machine LearningJan-25-2019

In the era of big data, the classical paradigm is to build huge data centers to collect and process users' data. This centralized access to resources and datasets simplifies some procedures, such as building predictive models with machine learning, but also comes with important drawbacks. From the company point of view, the need to gather and analyze the data centrally induces high infrastructure costs. As the server represents a single point of entry, it must also be secure enough to prevent attacks that could put the entire user database in jeopardy. On the user end, disadvantages include limited control over one's personal data as well as possible privacy risks, which may come from the aforementioned attacks but also from potentially loose data governance policies on the part of the companies.

agent, algorithm, graph, (15 more...)

1901.0846

Country:

Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > France > Hauts-de-France > Pas-de-Calais (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)

Genre: Research Report (0.50)

Industry:

Education (0.92)
Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
(2 more...)

Lezcano-Casado, Mario, Martínez-Rubio, David

Cheap Orthogonal Constraints in Neural Networks: A Simple Parametrization of the Orthogonal and Unitary Group

arXiv.org Machine LearningJan-25-2019

We introduce a novel approach to perform first-order optimization with orthogonal and unitary constraints. This approach is based on a parametrization stemming from Lie group theory through the exponential map. The parametrization transforms the constrained optimization problem into an unconstrained one over a Euclidean space, for which common first-order optimization methods can be used. The theoretical results presented are general enough to cover the special orthogonal group, the unitary group and, in general, any connected compact Lie group. We discuss how this and other parametrizations can be computed efficiently through an implementation trick, making numerically complex parametrizations usable at a negligible runtime cost in neural networks. In particular, we apply our results to RNNs with orthogonal recurrent weights, yielding a new architecture called expRNN. We demonstrate how our method constitutes a more robust approach to optimization with orthogonal constraints, showing faster, accurate, and more stable convergence in several tasks designed to test RNNs.

lie group, matrix, parametrization, (14 more...)

1901.08428

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre:

Research Report > New Finding (0.34)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Learning to compress and search visual data in large-scale systems

Ferdowsi, Sohrab

The problem of high-dimensional and large-scale representation of visual data is addressed from an unsupervised learning perspective. The emphasis is put on discrete representations, where the description length can be measured in bits and hence the model capacity can be controlled. The algorithmic infrastructure is developed based on the synthesis and analysis prior models whose rate-distortion properties, as well as capacity vs. sample complexity trade-offs are carefully optimized. These models are then extended to multi-layers, namely the RRQ and the ML-STC frameworks, where the latter is further evolved as a powerful deep neural network architecture with fast and sample-efficient training and discrete representations. For the developed algorithms, three important applications are developed. First, the problem of large-scale similarity search in retrieval systems is addressed, where a double-stage solution is proposed leading to faster query times and shorter database storage. Second, the problem of learned image compression is targeted, where the proposed models can capture more redundancies from the training images than the conventional compression codecs. Finally, the proposed algorithms are used to solve ill-posed inverse problems. In particular, the problems of image denoising and compressive sensing are addressed with promising results.

application, deep learning, neural network, (23 more...)

1901.08437

Country:

North America > United States (0.92)
Europe > Switzerland > Geneva > Geneva (0.14)
Oceania > Australia (0.13)
Asia > Middle East > Iran (0.13)

Genre: Research Report > New Finding (1.00)

Industry:

Energy > Oil & Gas (0.67)
Education (0.67)
Information Technology > Security & Privacy (0.67)
Health & Medicine (0.67)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(3 more...)

Albuquerque, Isabela, Monteiro, João, Doan, Thang, Considine, Breandan, Falk, Tiago, Mitliagkas, Ioannis

Multi-objective training of Generative Adversarial Networks with multiple discriminators

Recent literature has demonstrated promising results for training Generative Adversarial Networks by employing a set of discriminators, in contrast to the traditional game involving one generator against a single adversary. Such methods perform single-objective optimization on some simple consolidation of the losses, e.g. an arithmetic average. In this work, we revisit the multiple-discriminator setting by framing the simultaneous minimization of losses provided by different models as a multi-objective optimization problem. Specifically, we evaluate the performance of multiple gradient descent and the hypervolume maximization algorithm on a number of different datasets. Moreover, we argue that the previously proposed methods and hypervolume maximization can all be seen as variations of multiple gradient descent in which the update direction can be computed efficiently. Our results indicate that hypervolume maximization presents a better compromise between sample quality and computational cost than previous methods.

discriminator, multi-objective training, multiple discriminator, (13 more...)

1901.0868

Country:

North America > Canada > Quebec > Montreal (0.14)
North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

SAGA with Arbitrary Sampling

Qian, Xu, Qu, Zheng, Richtárik, Peter

We study the problem of minimizing the average ofa very large number of smooth functions, which is of key importance in training supervised learning models. One of the most celebrated methods in this context is the SAGA algorithm of Defazio et al. (2014). Despite years of research onthe topic, a general-purpose version of SAGA--one that would include arbitrary importance samplingand minibatching schemes--does not exist. We remedy this situation and propose a general and flexible variant of SAGA following the arbitrary sampling paradigm. We perform an iteration complexity analysis of the method, largely possible due to the construction of new stochastic Lyapunov functions. We establish linear convergencerates in the smooth and strongly convex regime, and under a quadratic functional growth condition (i.e., in a regime not assuming strong convexity). Our rates match those of the primal-dual method Quartz (Qu et al., 2015) for which an arbitrary sampling analysis is available, which makes a significant step towards closing the gap in our understanding of complexity of primal anddual methods for finite sum problems.

complexity, saga, saga-as, (15 more...)

1901.08669

Country:

Asia > China > Hong Kong (0.04)
Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(4 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Kim, Jungtaek, Choi, Seungjin

On Local Optimizers of Acquisition Functions in Bayesian Optimization

Bayesian optimization is a sample-efficient method for finding a global optimum of an expensive-to-evaluate black-box function. A global solution is found by accumulating a pair of query point and corresponding function value, repeating these two procedures: (i) learning a surrogate model for the objective function using the data observed so far; (ii) the maximization of an acquisition function to determine where next to query the objective function. Convergence guarantees are only valid when the global optimizer of the acquisition function is found and selected as the next query point. In practice, however, local optimizers of acquisition functions are also used, since searching the exact optimizer of the acquisition function is often a non-trivial or time-consuming task. In this paper we present an analysis on the behavior of local optimizers of acquisition functions, in terms of instantaneous regrets over global optimizers. We also present the performance analysis when multi-started local optimizers are used to find the maximum of the acquisition function. Numerical experiments confirm the validity of our theoretical analysis.

acquisition function, local optimizer, optimizer, (12 more...)

1901.0835

Country:

Asia > South Korea > Gyeongsangbuk-do > Pohang (0.04)
North America > Canada > Quebec > Montreal (0.04)
Asia > Middle East > Israel > Haifa District > Haifa (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)