Goto

Collaborating Authors

 Neural Information Processing Systems


Conjugated Semantic Pool Improves OOD Detection with Pre-trained Vision-Language Models

Neural Information Processing Systems

A straightforward pipeline for zero-shot out-of-distribution (OOD) detection involves selecting potential OOD labels from an extensive semantic pool and then leveraging a pre-trained vision-language model to perform classification on both in-distribution (ID) and OOD labels. In this paper, we theorize that enhancing performance requires expanding the semantic pool, while increasing the expected probability of selected OOD labels being activated by OOD samples, and ensuring low mutual dependence among the activations of these OOD labels. A natural expansion manner is to adopt a larger lexicon; however, the inevitable introduction of numerous synonyms and uncommon words fails to meet the above requirements, indicating that viable expansion manners move beyond merely selecting words from a lexicon. Since OOD detection aims to correctly classify input images into ID/OOD class groups, we can "make up" OOD label candidates which are not standard class names but beneficial for the process. Observing that the original semantic pool is comprised of unmodified specific class names, we correspondingly construct a conjugated semantic pool (CSP) consisting of modified superclass names, each serving as a cluster center for samples sharing similar properties across different categories. Consistent with our established theory, expanding OOD label candidates with the CSP satisfies the requirements and outperforms existing works by 7.89% in FPR95.


Optimal Rates for Vector-Valued Spectral Regularization Learning Algorithms

Neural Information Processing Systems

We study theoretical properties of a broad class of regularized algorithms with vector-valued output. These spectral algorithms include kernel ridge regression, kernel principal component regression and various implementations of gradient descent.


Debiased Bayesian inference for average treatment effects

Neural Information Processing Systems

Bayesian approaches have become increasingly popular in causal inference problems due to their conceptual simplicity, excellent performance and in-built uncertainty quantification ('posterior credible sets'). We investigate Bayesian inference for average treatment effects from observational data, which is a challenging problem due to the missing counterfactuals and selection bias. Working in the standard potential outcomes framework, we propose a data-driven modification to an arbitrary (nonparametric) prior based on the propensity score that corrects for the first-order posterior bias, thereby improving performance. We illustrate our method for Gaussian process (GP) priors using (semi-)synthetic data. Our experiments demonstrate significant improvement in both estimation accuracy and uncertainty quantification compared to the unmodified GP, rendering our approach highly competitive with the state-of-the-art.



Stand-Alone Self-Attention in Vision Models

Neural Information Processing Systems

Convolutions are a fundamental building block of modern computer vision systems. Recent approaches have argued for going beyond convolutions in order to capture long-range dependencies. These efforts focus on augmenting convolutional models with content-based interactions, such as self-attention and non-local means, to achieve gains on a number of vision tasks. The natural question that arises is whether attention can be a stand-alone primitive for vision models instead of serving as just an augmentation on top of convolutions. In developing and testing a pure self-attention vision model, we verify that self-attention can indeed be an effective stand-alone layer.


3416a75f4cea9109507cacd8e2f2aefc-AuthorFeedback.pdf

Neural Information Processing Systems

We would like to thank the reviewers for their time and thoughtful comments. In the final version, we will add error bars to capture the variance. We suspect that the effect of changing k is task dependent. Also somehow most of the references were missing in the paper." We leave this to future work.


Error Correction Output Codes for Robust Neural Networks against Weight-errors: A Neural Tangent Kernel Point of View

Neural Information Processing Systems

Error correcting output code (ECOC) is a classic method that encodes binary classifiers to tackle the multi-class classification problem in decision trees and neural networks. Among ECOCs, the one-hot code has become the default choice in modern deep neural networks (DNNs) due to its simplicity in decision making. However, it suffers from a significant limitation in its ability to achieve high robust accuracy, particularly in the presence of weight-errors. While recent studies have experimentally demonstrated that the non-one-hot ECOCs with multi-bits error correction ability, could be a better solution, there is a notable absence of theoretical foundations that can elucidate the relationship between codeword design, weighterror magnitude, and network characteristics, so as to provide robustness guarantees. This work is positioned to bridge this gap through the lens of neural tangent kernel (NTK).


Diffusion Model with Cross Attention as an Inductive Bias for Disentanglement Tao Yang

Neural Information Processing Systems

Disentangled representation learning strives to extract the intrinsic factors within the observed data. Factoring these representations in an unsupervised manner is notably challenging and usually requires tailored loss functions or specific structural designs. In this paper, we introduce a new perspective and framework, demonstrating that diffusion models with cross-attention itself can serve as a powerful inductive bias to facilitate the learning of disentangled representations. We propose to encode an image into a set of concept tokens and treat them as the condition of the latent diffusion model for image reconstruction, where cross attention over the concept tokens is used to bridge the encoder and the U-Net of the diffusion model. We analyze that the diffusion process inherently possesses the time-varying information bottlenecks.


Optimal Pricing in Repeated Posted-Price Auctions with Different Patience of the Seller and the Buyer

Neural Information Processing Systems

We study revenue optimization pricing algorithms for repeated posted-price auctions where a seller interacts with a single strategic buyer that holds a fixed private valuation. When the participants non-equally discount their cumulative utilities, we show that the optimal constant pricing (which offers the Myerson price) is no longer optimal. In the case of more patient seller, we propose a novel multidimensional optimization functional -- a generalization of the one used to determine Myerson's price. This functional allows to find the optimal algorithm and to boost revenue of the optimal static pricing by an efficient low-dimensional approximation. Numerical experiments are provided to support our results.


back-propagated output error gradients; (2) A simple training algorithm, sparse in forward and

Neural Information Processing Systems

We thank the reviewers for their feedback. Our paper will be updated to reflect the responses below. E.g., for ResNet18 on ImageNet at 50% sparsity DSG suffers an accuracy loss of 4.6%. Reviewer 2: (1) "Drastic drop due to sparse activations in forward pass": In Figure 1 we isolate the Notably, this means we use the full activation for the backward pass. Thus, STR, CS, GMP only update the active parameters. L1 response of channels is computed.