AITopics | Jaggi, Martin

Collaborating Authors

Jaggi, Martin

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Global linear convergence of Newton's method without strong-convexity or Lipschitz gradients

Karimireddy, Sai Praneeth, Stich, Sebastian U., Jaggi, Martin

arXiv.org Machine LearningJun-1-2018

We show that Newton's method converges globally at a linear rate for objective functions whose Hessians are stable. This class of problems includes many functions which are not strongly convex, such as logistic regression. Our linear convergence result is (i) affine-invariant, and holds even if an (ii) approximate Hessian is used, and if the subproblems are (iii) only solved approximately. Thus we theoretically demonstrate the superiority of Newton's method over first-order methods, which would only achieve a sublinear $O(1/t^2)$ rate under similar conditions.

artificial intelligence, hessian, machine learning, (19 more...)

arXiv.org Machine Learning

1806.00413

Country: North America > United States (0.68)

Genre: Research Report > New Finding (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Mathematics of Computing (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)

Add feedback

End-to-End DNN Training with Block Floating Point Arithmetic

Drumond, Mario, Lin, Tao, Jaggi, Martin, Falsafi, Babak

arXiv.org Machine LearningApr-4-2018

DNNs are ubiquitous datacenter workloads, requiring orders of magnitude more computing power from servers than traditional workloads. As such, datacenter operators are forced to adopt domain-specific accelerators that employ half-precision floating-point (FP) numeric representations to improve arithmetic density. Unfortunately, even these representations are not dense enough, and are, therefore, sub-optimal for DNNs. We propose a hybrid approach that employs dense block floating-point (BFP) arithmetic on dot product computations and FP arithmetic elsewhere. While using BFP improves the performance of dot product operations, that compose most of DNN computations, allowing values to freely float between dot product operations leads to a better choice of tensor exponents when converting values to back BFP. We show that models trained with hybrid BFP-FP arithmetic either match or outperform their FP32 counterparts, leading to more compact models and denser arithmetic in computing platforms.

deep learning, exponent, neural network, (20 more...)

arXiv.org Machine Learning

1804.01526

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Revisiting First-Order Convex Optimization Over Linear Spaces

Locatello, Francesco, Raj, Anant, Reddy, Sai Praneeth, Rätsch, Gunnar, Schölkopf, Bernhard, Stich, Sebastian U., Jaggi, Martin

arXiv.org Machine LearningMar-26-2018

Two popular examples of first-order optimization methods over linear spaces are coordinate descent and matching pursuit algorithms, with their randomized variants. While the former targets the optimization by moving along coordinates, the latter considers a generalized notion of directions. Exploiting the connection between the two algorithms, we present a unified analysis of both, providing affine invariant sublinear $\mathcal{O}(1/t)$ rates on smooth objectives and linear convergence on strongly convex objectives. As a byproduct of our affine invariant analysis of matching pursuit, our rates for steepest coordinate descent are the tightest known. Furthermore, we show the first accelerated convergence rate $\mathcal{O}(1/t^2)$ for matching pursuit on convex objectives.

algorithm, artificial intelligence, optimization problem, (16 more...)

arXiv.org Machine Learning

1803.09539

Country:

Europe (0.67)
North America > United States > New York (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.93)

Add feedback

Safe Adaptive Importance Sampling

Stich, Sebastian U., Raj, Anant, Jaggi, Martin

Neural Information Processing SystemsDec-31-2017

Importance sampling has become an indispensable strategy to speed up optimization algorithms for large-scale applications. Improved adaptive variants -- using importance values defined by the complete gradient information which changes during optimization -- enjoy favorable theoretical properties, but are typically computationally infeasible. In this paper we propose an efficient approximation of gradient-based sampling, which is based on safe bounds on the gradient. The proposed sampling distribution is (i) provably the \emph{best sampling} with respect to the given bounds, (ii) always better than uniform sampling and fixed importance sampling and (iii) can efficiently be computed -- in many applications at negligible extra cost. The proposed sampling scheme is generic and can easily be integrated into existing algorithms. In particular, we show that coordinate-descent (CD) and stochastic gradient descent (SGD) can enjoy significant a speed-up under the novel scheme. The proven efficiency of the proposed sampling is verified by extensive numerical testing.

artificial intelligence, machine learning, optimization problem, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.70)

Add feedback

Efficient Use of Limited-Memory Accelerators for Linear Learning on Heterogeneous Systems

Dünner, Celestine, Parnell, Thomas, Jaggi, Martin

Neural Information Processing SystemsDec-31-2017

We propose a generic algorithmic building block to accelerate training of machine learning models on heterogeneous compute systems. Our scheme allows to efficiently employ compute accelerators such as GPUs and FPGAs for the training of large-scale machine learning models, when the training data exceeds their memory capacity. Also, it provides adaptivity to any system's memory hierarchy in terms of size and processing speed. Our technique is built upon novel theoretical insights regarding primal-dual coordinate methods, and uses duality gap information to dynamically decide which part of the data should be made available for fast processing. To illustrate the power of our approach we demonstrate its performance for training of generalized linear models on a large-scale dataset exceeding the memory size of a modern GPU, showing an order-of-magnitude speedup over existing approaches.

algorithm 1, artificial intelligence, machine learning, (15 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > New York > New York County > New York City (0.14)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Greedy Algorithms for Cone Constrained Optimization with Convergence Guarantees

Locatello, Francesco, Tschannen, Michael, Raetsch, Gunnar, Jaggi, Martin

Neural Information Processing SystemsDec-31-2017

Greedy optimization methods such as Matching Pursuit (MP) and Frank-Wolfe (FW) algorithms regained popularity in recent years due to their simplicity, effectiveness and theoretical guarantees. MP and FW address optimization over the linear span and the convex hull of a set of atoms, respectively. In this paper, we consider the intermediate case of optimization over the convex cone, parametrized as the conic hull of a generic atom set, leading to the first principled definitions of non-negative MP algorithms for which we give explicit convergence rates and demonstrate excellent empirical performance. In particular, we derive sublinear (O(1/t)) convergence on general smooth and convex objectives, and linear convergence (O(e^{-t})) on strongly convex objectives, in both cases for general sets of atoms. Furthermore, we establish a clear correspondence of our algorithms to known algorithms from the MP and FW literature. Our novel algorithms and analyses target general atom sets and general objective functions, and hence are directly applicable to a large variety of learning settings.

algorithm, artificial intelligence, optimization problem, (14 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)

Add feedback

Greedy Algorithms for Cone Constrained Optimization with Convergence Guarantees

Locatello, Francesco, Tschannen, Michael, Rätsch, Gunnar, Jaggi, Martin

arXiv.org Machine LearningNov-19-2017

Greedy optimization methods such as Matching Pursuit (MP) and Frank-Wolfe (FW) algorithms regained popularity in recent years due to their simplicity, effectiveness and theoretical guarantees. MP and FW address optimization over the linear span and the convex hull of a set of atoms, respectively. In this paper, we consider the intermediate case of optimization over the convex cone, parametrized as the conic hull of a generic atom set, leading to the first principled definitions of non-negative MP algorithms for which we give explicit convergence rates and demonstrate excellent empirical performance. In particular, we derive sublinear ($\mathcal{O}(1/t)$) convergence on general smooth and convex objectives, and linear convergence ($\mathcal{O}(e^{-t})$) on strongly convex objectives, in both cases for general sets of atoms. Furthermore, we establish a clear correspondence of our algorithms to known algorithms from the MP and FW literature. Our novel algorithms and analyses target general atom sets and general objective functions, and hence are directly applicable to a large variety of learning settings.

algorithm, artificial intelligence, optimization problem, (17 more...)

arXiv.org Machine Learning

1705.11041

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Efficient Use of Limited-Memory Accelerators for Linear Learning on Heterogeneous Systems

Dünner, Celestine, Parnell, Thomas, Jaggi, Martin

arXiv.org Machine LearningNov-7-2017

algorithm 1, artificial intelligence, optimization problem, (15 more...)

arXiv.org Machine Learning

1708.05357

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > New York > New York County > New York City (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Unsupervised robust nonparametric learning of hidden community properties

Langovoy, Mikhail A., Gotmare, Akhilesh, Jaggi, Martin, Sra, Suvrit

arXiv.org Machine LearningJul-11-2017

We consider learning of fundamental properties of communities in large noisy networks, in the prototypical situation where the nodes or users are split into two classes, e.g., according to their opinions or preferences on a topic. We propose a nonparametric, unsupervised, and scalable graph scan procedure that is, in addition, robust against a class of powerful adversaries. In our setup, one of the communities can fall under the influence of a strong and knowledgeable adversarial leader, who knows the full network structure, has unlimited computational resources and can completely foresee our planned actions on the network. We prove strong consistency of our results in a setup with minimal assumptions. In particular, the learning procedure estimates the baseline activity of normal users asymptotically correctly with probability 1; the only assumption being the existence of a single implicit community of asymptotically negligible logarithmic size. We provide experiments on real and synthetic data to illustrate the performance of our method, including examples with adversaries.

artificial intelligence, data mining, estimator, (20 more...)

arXiv.org Machine Learning

1707.03494

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Government > Voting & Elections (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)

Add feedback

Unsupervised Learning of Sentence Embeddings using Compositional n-Gram Features

Pagliardini, Matteo, Gupta, Prakhar, Jaggi, Martin

arXiv.org Artificial IntelligenceJul-10-2017

The recent tremendous success of unsupervised word embeddings in a multitude of applications raises the obvious question if similar methods could be derived to improve embeddings (i.e. semantic representations) of word sequences as well. We present a simple but efficient unsupervised objective to train distributed representations of sentences. Our method outperforms the state-of-the-art unsupervised models on most benchmark tasks, highlighting the robustness of the produced general-purpose sentence embeddings.

deep learning, neural network, unsupervised learning, (17 more...)

arXiv.org Artificial Intelligence

1703.02507

Country:

Europe (0.93)
North America > Canada (0.30)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback