logz
Sparse Variational Inference: Bayesian Coresets from Scratch
Trevor Campbell, Boyan Beronov
Thisperspectiveleadstoanovel construction via greedy optimization, and also provides a unifying informationgeometric viewofpresent andpastmethods. TheproposedRiemannian coreset construction algorithm is fully automated, requiring no problem-specific inputs aside from theprobabilistic model and dataset.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Near-Optimality of Contrastive Divergence Algorithms
Glaser, Pierre, Huang, Kevin Han, Gretton, Arthur
We perform a non-asymptotic analysis of the contrastive divergence (CD) algorithm, a training method for unnormalized models. While prior work has established that (for exponential family distributions) the CD iterates asymptotically converge at an $O(n^{-1 / 3})$ rate to the true parameter of the data distribution, we show, under some regularity assumptions, that CD can achieve the parametric rate $O(n^{-1 / 2})$. Our analysis provides results for various data batching schemes, including the fully online and minibatch ones. We additionally show that CD can be near-optimal, in the sense that its asymptotic variance is close to the Cramér-Rao lower bound.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
Dynamic Importance Sampling for Anytime Bounds of the Partition Function
Qi Lou, Rina Dechter, Alexander T. Ihler
Computing the partition function is a key inference task in many graphical models. In this paper, we propose a dynamic importance sampling scheme that provides anytime finite-sample bounds for the partition function. Our algorithm balances the advantages of the three major inference strategies, heuristic search, variational bounds, and Monte Carlo methods, blending sampling with search to refine a variationally defined proposal. Our algorithm combines and generalizes recent work on anytime search [16] and probabilistic bounds [15] of the partition function. By using an intelligently chosen weighted average over the samples, we construct an unbiased estimator of the partition function with strong finite-sample confidence intervals that inherit both the rapid early improvement rate of sampling and the long-term benefits of an improved proposal from search. This gives significantly improved anytime behavior, and more flexible trade-offs between memory, time, and solution quality. We demonstrate the effectiveness of our approach empirically on real-world problem instances taken from recent UAI competitions.
- North America > United States > California > Orange County > Irvine (0.15)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- (2 more...)
Softmax Attention with Constant Cost per Token
We propose a simple modification to the conventional attention mechanism applied by Transformers: Instead of quantifying pairwise query-key similarity with scaled dot-products, we quantify it with the logarithms of scaled dot-products of exponentials. Our modification linearizes attention with exponential kernel feature maps, whose corresponding feature function is infinite dimensional. We show that our modification is expressible as a composition of log-sums of exponentials, with a latent space of constant size, enabling application with constant time and space complexity per token. We implement our modification, verify that it works in practice, and conclude that it is a promising alternative to conventional attention.
Top 10 Emerging Artificial Intelligence Startups in Israel
Artificial intelligence (AI) has become ubiquitous across the industry verticals. From boardroom discussion to a trending topic in news, artificial intelligence has managed to capture the attention of every tech enthusiast worldwide. With organizations cashing the benefits of its application, this tech discipline has managed to live up to its hype. While the tech war between USA, China, European Union and other prominent nations escalates, Israel too aims to lead the race. Some surveys have found that Israel ranks among the top 5 countries in the world for AI solutions.
- North America > United States > California (0.05)
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.05)
- Asia > China > Beijing > Beijing (0.05)
- Health & Medicine (1.00)
- Information Technology (0.72)
- Government > Regional Government (0.35)
Finding the Bug in the Haystack with Machine Learning: Logz.io Exceptions in Kibana
Logz.io is releasing its AI-powered Exceptions, a revamped version of our Application Insights, fully embedded in your Kibana Discover experience, to boost your troubleshooting experience and help you find bugs in the log haystack. How many of your production issues stem from bugs in code you deployed? The introduction of agile software methodology and its release early, release often mentality has exacerbated the problem, with more frequent code releases, in earlier stages. How do you hunt down these bugs in production? How do you ensure that your deployed code hasn't caused any issues?
Information geometry for approximate Bayesian computation
The goal of this paper is to explore the basic Approximate Bayesian Computation (ABC) algorithm via the lens of information theory. ABC is a widely used algorithm in cases where the likelihood of the data is hard to work with or intractable, but one can simulate from it. We use relative entropy ideas to analyze the behavior of the algorithm as a function of the thresholding parameter and of the size of the data. Relative entropy here is data driven as it depends on the values of the observed statistics. We allow different thresholding parameters for each different direction (i.e. for different observed statistic) and compute the weighted effect on each direction. The latter allows to find important directions via sensitivity analysis leading to potentially larger acceptance regions, which in turn brings the computational cost of the algorithm down for the same level of accuracy. In addition, we also investigate the bias of the estimators for generic observables as a function of both the thresholding parameters and the size of the data. Our analysis provides error bounds on performance for positive tolerances and finite sample sizes. Simulation studies complement and illustrate the theoretical results.
Regularized Loss Minimizers with Local Data Perturbation: Consistency and Data Irrecoverability
We show that there are several regularized loss minimization problems that can use locally perturbed data with theoretical guarantees of generalization, i.e., loss consistency. Our results quantitatively connect the convergence rates of the learning problems to the impossibility for any adversary for recovering the original data from perturbed observations. To this end, we introduce a new concept of data irrecoverability, and show that the well-studied concept of data privacy implies data irrecoverability.
- Asia > Middle East > Jordan (0.04)
- North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
- North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)