Well File:

MetaUAS: Universal Anomaly Segmentation with One-Prompt Meta-Learning

Neural Information Processing Systems

Zero-and few-shot visual anomaly segmentation relies on powerful vision-language models that detect unseen anomalies using manually designed textual prompts. However, visual representations are inherently independent of language. In this paper, we explore the potential of a pure visual foundation model as an alternative to widely used vision-language models for universal visual anomaly segmentation. We present a novel paradigm that unifies anomaly segmentation into change segmentation. This paradigm enables us to leverage large-scale synthetic image pairs, featuring object-level and local region changes, derived from existing image datasets, which are independent of target anomaly datasets.


Appendix A Discussion of Related Works 16 B Proofs of Theorems and Analysis of Bounds 18 B.1 Proof of Theorem 4.2

Neural Information Processing Systems

Comparison to Ye et al. (2021) Ye et al. (2021) derive a generalization bound for the domain generalization problem in terms of variation across features. In comparison, we address the problem of learning with an unforeseen adversary and define unforeseen adversarial generalizability. Our generalization bound using variation is an instantiation of our generalizability framework. In their proposed algorithm, perceptual adversarial training (PAT), they combine standard adversarial training with adversarial examples generated via their LPIPS bounded attack method. In terms of terminology introduced in our paper, the Laidlaw et al. (2021) improve the choice of source threat model while using an existing learning algorithm (adversarial training). Meanwhile, our work takes the perspective of having a fixed source threat model and improving the learning algorithm. This allows us to combine our approach with various source threat models including the attacks used by Laidlaw et al. (2021) in PAT (see Appendix E.10).


Harmonizing Stochasticity and Determinism: Scene-responsive Diverse Human Motion Prediction

Neural Information Processing Systems

Diverse human motion prediction (HMP) is a fundamental application in computer vision that has recently attracted considerable interest. Prior methods primarily focus on the stochastic nature of human motion, while neglecting the specific impact of the external environment, leading to the pronounced artifacts in prediction when applied to real-world scenarios. To fill this gap, this work introduces a novel task: predicting diverse human motion within real-world 3D scenes. In contrast to prior works, it requires harmonizing the deterministic constraints imposed by the surrounding 3D scenes with the stochastic aspect of human motion. For this purpose, we propose DiMoP3D, a diverse motion prediction framework with 3D scene awareness, which leverages the 3D point cloud and observed sequence to generate diverse and high-fidelity predictions.


Uncertainty-based Offline Variational Bayesian Reinforcement Learning for Robustness under Diverse Data Corruptions Rui Yang, Bin Li

Neural Information Processing Systems

Real-world offline datasets are often subject to data corruptions (such as noise or adversarial attacks) due to sensor failures or malicious attacks. Despite advances in robust offline reinforcement learning (RL), existing methods struggle to learn robust agents under high uncertainty caused by the diverse corrupted data (i.e., corrupted states, actions, rewards, and dynamics), leading to performance degradation in clean environments. To tackle this problem, we propose a novel robust variational Bayesian inference for offline RL (TRACER). It introduces Bayesian inference for the first time to capture the uncertainty via offline data for robustness against all types of data corruptions.


The Sample-Communication Complexity Trade-off in Federated Q-Learning

Neural Information Processing Systems

We consider the problem of Federated Q-learning, where M agents aim to collaboratively learn the optimal Q-function of an unknown infinite horizon Markov Decision Process with finite state and action spaces. We investigate the trade-off between sample and communication complexity for the widely used class of intermittent communication algorithms.


An exactly solvable model for emergence and scaling laws in the multitask sparse parity problem

Neural Information Processing Systems

Deep learning models can exhibit what appears to be a sudden ability to solve a new problem as training time, training data, or model size increases, a phenomenon known as emergence. In this paper, we present a framework where each new ability (a skill) is represented as a basis function. We solve a simple multi-linear model in this skill-basis, finding analytic expressions for the emergence of new skills, as well as for scaling laws of the loss with training time, data size, model size, and optimal compute. We compare our detailed calculations to direct simulations of a two-layer neural network trained on multitask sparse parity, where the tasks in the dataset are distributed according to a power-law. Our simple model captures, using a single fit parameter, the sigmoidal emergence of multiple new skills as training time, data size or model size increases in the neural network.



Epipolar-Free 3D Gaussian Splatting for Generalizable Novel View Synthesis Zhiyuan Min Yawei Luo 1, Jianwen Sun 2 Yi Yang

Neural Information Processing Systems

Generalizable 3D Gaussian splitting (3DGS) can reconstruct new scenes from sparse-view observations in a feed-forward inference manner, eliminating the need for scene-specific retraining required in conventional 3DGS. However, existing methods rely heavily on epipolar priors, which can be unreliable in complex realworld scenes, particularly in non-overlapping and occluded regions. In this paper, we propose eFreeSplat, an efficient feed-forward 3DGS-based model for generalizable novel view synthesis that operates independently of epipolar line constraints. To enhance multiview feature extraction with 3D perception, we employ a selfsupervised Vision Transformer (ViT) with cross-view completion pre-training on large-scale datasets. Additionally, we introduce an Iterative Cross-view Gaussians Alignment method to ensure consistent depth scales across different views. Our eFreeSplat represents an innovative approach for generalizable novel view synthesis. Different from the existing pure geometry-free methods, eFreeSplat focuses more on achieving epipolar-free feature matching and encoding by providing 3D priors through cross-view pretraining. We evaluate eFreeSplat on wide-baseline novel view synthesis tasks using the RealEstate10K and ACID datasets. Extensive experiments demonstrate that eFreeSplat surpasses state-of-the-art baselines that rely on epipolar priors, achieving superior geometry reconstruction and novel view synthesis quality.


Text-DiFuse: An Interactive Multi-Modal Image Fusion Framework based on Text-modulated Diffusion Model

Neural Information Processing Systems

Existing multi-modal image fusion methods fail to address the compound degradations presented in source images, resulting in fusion images plagued by noise, color bias, improper exposure, etc. Additionally, these methods often overlook the specificity of foreground objects, weakening the salience of the objects of interest within the fused images. To address these challenges, this study proposes a novel interactive multi-modal image fusion framework based on the text-modulated diffusion model, called Text-DiFuse.


A Framework of CWAEE is passed through the network and gets its calibrated score p c i, c C

Neural Information Processing Systems

For a better understanding of our method, we give the framework of CWAEE. We use the outputs of the one-vs-rest classifiers to detect known and unknown classes in unlabeled data. Then, the class-wise adaptive threshold is calculated with a two-component beta mixture model (BMM) which models the score distributions of known classes and unknown classes in an unsupervised way. The entire process is summarized in Figure 5. Figure 5: The process of detecting known and unknown classes. For Domain Generalization, it is important to exploit the inter-domain information which includes domain-dependent styles and domain-invariant semantics.