Goto

Collaborating Authors

 logz


Sparse Variational Inference: Bayesian Coresets from Scratch

Trevor Campbell, Boyan Beronov

Neural Information Processing Systems

Thisperspectiveleadstoanovel construction via greedy optimization, and also provides a unifying informationgeometric viewofpresent andpastmethods. TheproposedRiemannian coreset construction algorithm is fully automated, requiring no problem-specific inputs aside from theprobabilistic model and dataset.




Near-Optimality of Contrastive Divergence Algorithms

Glaser, Pierre, Huang, Kevin Han, Gretton, Arthur

arXiv.org Machine Learning

We perform a non-asymptotic analysis of the contrastive divergence (CD) algorithm, a training method for unnormalized models. While prior work has established that (for exponential family distributions) the CD iterates asymptotically converge at an $O(n^{-1 / 3})$ rate to the true parameter of the data distribution, we show, under some regularity assumptions, that CD can achieve the parametric rate $O(n^{-1 / 2})$. Our analysis provides results for various data batching schemes, including the fully online and minibatch ones. We additionally show that CD can be near-optimal, in the sense that its asymptotic variance is close to the Cramér-Rao lower bound.


Dynamic Importance Sampling for Anytime Bounds of the Partition Function

Qi Lou, Rina Dechter, Alexander T. Ihler

Neural Information Processing Systems

Computing the partition function is a key inference task in many graphical models. In this paper, we propose a dynamic importance sampling scheme that provides anytime finite-sample bounds for the partition function. Our algorithm balances the advantages of the three major inference strategies, heuristic search, variational bounds, and Monte Carlo methods, blending sampling with search to refine a variationally defined proposal. Our algorithm combines and generalizes recent work on anytime search [16] and probabilistic bounds [15] of the partition function. By using an intelligently chosen weighted average over the samples, we construct an unbiased estimator of the partition function with strong finite-sample confidence intervals that inherit both the rapid early improvement rate of sampling and the long-term benefits of an improved proposal from search. This gives significantly improved anytime behavior, and more flexible trade-offs between memory, time, and solution quality. We demonstrate the effectiveness of our approach empirically on real-world problem instances taken from recent UAI competitions.


Softmax Attention with Constant Cost per Token

Heinsen, Franz A.

arXiv.org Artificial Intelligence

We propose a simple modification to the conventional attention mechanism applied by Transformers: Instead of quantifying pairwise query-key similarity with scaled dot-products, we quantify it with the logarithms of scaled dot-products of exponentials. Our modification linearizes attention with exponential kernel feature maps, whose corresponding feature function is infinite dimensional. We show that our modification is expressible as a composition of log-sums of exponentials, with a latent space of constant size, enabling application with constant time and space complexity per token. We implement our modification, verify that it works in practice, and conclude that it is a promising alternative to conventional attention.


Top 10 Emerging Artificial Intelligence Startups in Israel

#artificialintelligence

Artificial intelligence (AI) has become ubiquitous across the industry verticals. From boardroom discussion to a trending topic in news, artificial intelligence has managed to capture the attention of every tech enthusiast worldwide. With organizations cashing the benefits of its application, this tech discipline has managed to live up to its hype. While the tech war between USA, China, European Union and other prominent nations escalates, Israel too aims to lead the race. Some surveys have found that Israel ranks among the top 5 countries in the world for AI solutions.


Finding the Bug in the Haystack with Machine Learning: Logz.io Exceptions in Kibana

#artificialintelligence

Logz.io is releasing its AI-powered Exceptions, a revamped version of our Application Insights, fully embedded in your Kibana Discover experience, to boost your troubleshooting experience and help you find bugs in the log haystack. How many of your production issues stem from bugs in code you deployed? The introduction of agile software methodology and its release early, release often mentality has exacerbated the problem, with more frequent code releases, in earlier stages. How do you hunt down these bugs in production? How do you ensure that your deployed code hasn't caused any issues?


Information geometry for approximate Bayesian computation

Spiliopoulos, Konstantinos

arXiv.org Machine Learning

The goal of this paper is to explore the basic Approximate Bayesian Computation (ABC) algorithm via the lens of information theory. ABC is a widely used algorithm in cases where the likelihood of the data is hard to work with or intractable, but one can simulate from it. We use relative entropy ideas to analyze the behavior of the algorithm as a function of the thresholding parameter and of the size of the data. Relative entropy here is data driven as it depends on the values of the observed statistics. We allow different thresholding parameters for each different direction (i.e. for different observed statistic) and compute the weighted effect on each direction. The latter allows to find important directions via sensitivity analysis leading to potentially larger acceptance regions, which in turn brings the computational cost of the algorithm down for the same level of accuracy. In addition, we also investigate the bias of the estimators for generic observables as a function of both the thresholding parameters and the size of the data. Our analysis provides error bounds on performance for positive tolerances and finite sample sizes. Simulation studies complement and illustrate the theoretical results.


Regularized Loss Minimizers with Local Data Perturbation: Consistency and Data Irrecoverability

Li, Zitao, Honorio, Jean

arXiv.org Machine Learning

We show that there are several regularized loss minimization problems that can use locally perturbed data with theoretical guarantees of generalization, i.e., loss consistency. Our results quantitatively connect the convergence rates of the learning problems to the impossibility for any adversary for recovering the original data from perturbed observations. To this end, we introduce a new concept of data irrecoverability, and show that the well-studied concept of data privacy implies data irrecoverability.