Goto

Collaborating Authors

Low-Rank Optimal Transport through Factor Relaxation with Latent Coupling Peter Halmos, Julian Gold 2,*, and Benjamin J. Raphael

Neural Information Processing Systems

Optimal transport (OT) is a general framework for finding a minimum-cost transport plan, or coupling, between probability distributions, and has many applications in machine learning. A key challenge in applying OT to massive datasets is the quadratic scaling of the coupling matrix with the size of the dataset. Forrow et al. (2019) introduced a factored coupling for the k-Wasserstein barycenter problem, which Scetbon et al. (2021) adapted to solve the primal low-rank OT problem. We derive an alternative parameterization of the low-rank problem based on the latent coupling (LC) factorization previously introduced by Lin et al. (2021) generalizing Forrow et al. (2019). The LC factorization has multiple advantages for low-rank OT including decoupling the problem into three OT problems and greater flexibility and interpretability. We leverage these advantages to derive a new algorithm Factor Relaxation with Latent Coupling (FRLC), which uses coordinate mirror descent to compute the LC factorization. FRLC handles multiple OT objectives (Wasserstein, Gromov-Wasserstein, Fused Gromov-Wasserstein), and marginal constraints (balanced, unbalanced, and semi-relaxed) with linear space complexity. We provide theoretical results on FRLC, and demonstrate superior performance on diverse applications - including graph clustering and spatial transcriptomics - while demonstrating its interpretability.


BAN: Detecting Backdoors Activated by Adversarial Neuron Noise

Neural Information Processing Systems

Backdoor attacks on deep learning represent a recent threat that has gained significant attention in the research community. Backdoor defenses are mainly based on backdoor inversion, which has been shown to be generic, model-agnostic, and applicable to practical threat scenarios. State-of-the-art backdoor inversion recovers a mask in the feature space to locate prominent backdoor features, where benign and backdoor features can be disentangled. However, it suffers from high computational overhead, and we also find that it overly relies on prominent backdoor features that are highly distinguishable from benign features. To tackle these shortcomings, this paper improves backdoor feature inversion for backdoor detection by incorporating extra neuron activation information. In particular, we adversarially increase the loss of backdoored models with respect to weights to activate the backdoor effect, based on which we can easily differentiate backdoored and clean models. Experimental results demonstrate our defense, BAN, is 1.37 (on CIFAR-10) and 5.11 (on ImageNet200) more efficient with an average 9.99% higher detect success rate than the state-of-the-art defense BTI-DBF. Our code and trained models are publicly available at https://github.com/xiaoyunxxy/ban.


Graph Clustering: Block-models and model free results

Neural Information Processing Systems

Clustering graphs under the Stochastic Block Model (SBM) and extensions are well studied. Guarantees of correctness exist under the assumption that the data is sampled from a model. In this paper, we propose a framework, in which we obtain "correctness" guarantees without assuming the data comes from a model. The guarantees we obtain depend instead on the statistics of the data that can be checked. We also show that this framework ties in with the existing model-based framework, and that we can exploit results in model-based recovery, as well as strengthen the results existing in that area of research.


Two Time-scale Off-Policy TD Learning: Non-asymptotic Analysis over Markovian Samples

Neural Information Processing Systems

Gradient-based temporal difference (GTD) algorithms are widely used in off-policy learning scenarios. Among them, the two time-scale TD with gradient correction (TDC) algorithm has been shown to have superior performance. In contrast to previous studies that characterized the non-asymptotic convergence rate of TDC only under identical and independently distributed (i.i.d.) data samples, we provide the first non-asymptotic convergence analysis for two time-scale TDC under a non-i.i.d.


e7c573c14a09b84f6b7782ce3965f335-AuthorFeedback.pdf

Neural Information Processing Systems

Reviewer 1: Q: Compare and highlight new challenges of analyzing TDC relative to other GTD algorithms in [30,34]. A: Comparing to conventional optimization with stagewise stepsize, here we need to handle the bias induced by non-i.i.d. Q: How the theoretical guarantees can be affected in the non-asymptotic analysis of actor-critic and gradient Q-learning. Q: How would worst-case errors predicted by the bound compare to errors observed empirically in experiments. It is a good idea to plot theoretical errors and compare with empirical bounds.


Towards Hardware-Aware Tractable Learning of Probabilistic Models Laura I. Galindez Olascoaga Marian Verhelst

Neural Information Processing Systems

Smart portable applications increasingly rely on edge computing due to privacy and latency concerns. But guaranteeing always-on functionality comes with two major challenges: heavily resource-constrained hardware; and dynamic application conditions. Probabilistic models present an ideal solution to these challenges: they are robust to missing data, allow for joint predictions and have small data needs. In addition, ongoing efforts in the field of tractable learning have resulted in probabilistic models with strict inference efficiency guarantees. However, the current notions of tractability are often limited to model complexity, disregarding the hardware's specifications and constraints. We propose a novel resource-aware cost metric that takes into consideration the hardware's properties in determining whether the inference task can be efficiently deployed. We use this metric to evaluate the performance versus resource trade-off relevant to the application of interest, and we propose a strategy that selects the device settings that can optimally meet users' requirements.


e77910ebb93b511588557806310f78f1-AuthorFeedback.pdf

Neural Information Processing Systems

We would like to thank the reviewers for their thoughtful comments and valuable suggestions. We will clarify this point in the paper. Our algorithms are agnostic to the leaf distributions used. As such, there is memory overhead but no computational overhead. We will include this breakdown in the paper.



Artemis: Towards Referential Understanding in Complex Videos 1 Xi Tang

Neural Information Processing Systems

Videos carry rich visual information including object description, action, interaction, etc., but the existing multimodal large language models (MLLMs) fell short in referential understanding scenarios such as video-based referring. In this paper, we present Artemis, an MLLM that pushes video-based referential understanding to a finer level. Given a video, Artemis receives a natural-language question with a bounding box in any video frame and describes the referred target in the entire video. The key to achieving this goal lies in extracting compact, targetspecific video features, where we set a solid baseline by tracking and selecting spatiotemporal features from the video. We train Artemis on the newly established VideoRef45K dataset with 45K video-QA pairs and design a computationally efficient, three-stage training procedure. Results are promising both quantitatively and qualitatively. Additionally, we show that Artemis can be integrated with video grounding and text summarization tools to understand more complex scenarios.


Copula-like Variational Inference

Neural Information Processing Systems

This paper considers a new family of variational distributions motivated by Sklar's theorem. This family is based on new copula-like densities on the hypercube with non-uniform marginals which can be sampled efficiently, i.e. with a complexity linear in the dimension d of the state space. Then, the proposed variational densities that we suggest can be seen as arising from these copula-like densities used as base distributions on the hypercube with Gaussian quantile functions and sparse rotation matrices as normalizing flows. The latter correspond to a rotation of the marginals with complexity O(d log d). We provide some empirical evidence that such a variational family can also approximate non-Gaussian posteriors and can be beneficial compared to Gaussian approximations. Our method performs largely comparably to state-of-the-art variational approximations on standard regression and classification benchmarks for Bayesian Neural Networks.