Goto

Collaborating Authors

 random sample


Optimizing Computational-Statistical Runtime for Wasserstein Distance Estimation

arXiv.org Machine Learning

Squared Wasserstein distance is a frequently used tool to measure discrepancy between probability distributions. This distance is typically computed between empirical measures of size $n$ from two underlying random samples. Unfortunately, even in lower dimensional Euclidean space problems $\left( d \in \{2,3\} \right)$, algorithms for Wasserstein distance computation with approximate or exact precision guarantees scale poorly in the runtime as a function of $n$ and the desired precision. In response, we consider the computational-statistical runtime, where the goal is to estimate from samples the Wasserstein distance between potentially smooth measures up to $ฮต$-additive error in expectation with respect to the sampling; we allow $O(1)$ computational cost for collecting a sample. Towards this, we develop a Sample-Sketch-Solve paradigm where we introduce a regular cartesian grid sketch of the samples. We show that (especially under $ฮฑ$-Hรถlder smooth distributions) this can compress the data without increasing asymptotic error, and also regularizes the structure which enables faster exact algorithms. Ultimately, we approximate $W_2^2(P,Q)$ within $ฮต$ error in $ฮต^{-\max(2,\frac{d+1+o(1)}{1+ฮฑ})}$ time for $0 < ฮฑ< 1$ Hรถlder smooth distributions $P,Q$ on $(0,1)^{d}$; an optimal $ฮ˜(ฮต^{-2})$ for $ฮฑ> 1/2$ when $d=2$ and nearly optimal as $ฮฑ\to 1$ when $d = 3$.



Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text

Neural Information Processing Systems

This format not only enables few-shot learning via interleaving independent supervised (image, text) examples, but also, more complex prompts involving interaction between images, e.g., "What do image A and image B have in common?" To support this interface, pretraining occurs over web corpora that similarly contain interleaved images+text. To date, however, large-scale data of this form have not been publicly available. We release Multimodal C4 (mmc4), an augmentation of the popular text-only c4 corpus2 with images interleaved. We use a linear assignment algorithm to place images into longer bodies of text using CLIP features [24], a process that we show outperforms alternatives.


Baselines

Neural Information Processing Systems

As shown in the main text, under the assumption that the influence network is unbiased, our factor baselines are indeed valid control variates. We prove this result below, repeating the statement itself for posterity and providing a supplementary lemma on control variates as a restatement of known results. Let X, Y and Zbe random variables where the law of Xconditional on Z is denoted Pฮธ(X|Z), and Y is independent of X conditioned on Z; i.e. Then, we have that E[Y ฮธln Pฮธ(X)] = 0. Proof. Factor baselines are valid control variates if Gฮฃ is true to the MDP (i.e.



1cdf14d1e3699d61d237cf76ce1c2dca-Supplemental.pdf

Neural Information Processing Systems

We follow [21] and implement our image compression models as "VQGANs". More specifically, we use the official implementation provided at https://github.com/CompVis/ For FFHQ, we train such a compression model from scratch. See Tab. 4 for an overview. As some of the codebook entries remain unused after training, we shrink the codebook to its effective size when training a generative model on top of it.