Goto

Collaborating Authors

 computer vision


Physen-Noise2Noise: Physics-Guided Self-Supervised Defocus Deblurring with Bias Correction under Low-Light Conditions

arXiv.org Machine Learning

Low-light, long-exposure defocus deblurring remains a challenging problem due to the simultaneous presence of severe blur and complex biased noise. Existing methods typically rely on simplified noise assumptions, which limits their effectiveness under realistic imaging conditions. In this work, we propose Physen-Noise2Noise, a self-supervised deblurring framework guided by the physical model of defocus imaging, which leverages noisy multi-frame observations without requiring clean reference images. Unlike conventional Noise2Noise-based approaches that assume zero-mean noise, we derive a frequency-domain constraint inherent to the defocus imaging process and incorporate it into the learning framework via a learnable noise bias parameter. In addition, a multi-frame noisy initialization strategy is introduced to suppress complex biased noise prior to deblurring, providing a more stable starting point for reconstruction. This formulation explicitly models biased noise and enables joint bias correction and high-frequency detail recovery during training. Furthermore, we develop a pretrain-finetune variant to enhance robustness and generalization under challenging noise conditions. Extensive experiments on both simulation and real-world datasets demonstrate that the proposed method consistently outperforms state-of-the-art self-supervised approaches for defocus deblurring in the presence of complex biased noise.


Consistency Regularised Gradient Flows for Inverse Problems

arXiv.org Machine Learning

Vision-Language Latent Diffusion Models (LDMs) (Rombach et al., 2022) provide powerful generative priors for inverse problems. However, existing LDM-based inverse solvers typically require a large number of neural function evaluations (NFEs) and backpropagation through large pretrained components, leading to substantial computational costs and, in some cases, degraded reconstruction quality. We propose a unified Euclidean-Wasserstein-2 gradient-flow framework that jointly performs posterior sampling and prompt optimization in the latent space through a single flow that aligns the prior and posterior with the observed data. Combined with few-step latent text-to-image models, this formulation enables low-NFE inference without backpropagation through autoencoders. Experiments across several canonical imaging inverse problems show that our method achieves state-of-the-art performance with significantly reduced computational cost.


High-Rank Matrix Completion and Clustering under Self-Expressive Models

Neural Information Processing Systems

We propose efficient algorithms for simultaneous clustering and completion of incomplete high-dimensional data that lie in a union of low-dimensional subspaces. We cast the problem as finding a completion of the data matrix so that each point can be reconstructed as a linear or affine combination of a few data points. Since the problem is NP-hard, we propose a lifting framework and reformulate the problem as a group-sparse recovery of each incomplete data point in a dictionary built using incomplete data, subject to rank-one constraints. To solve the problem efficiently, we propose a rank pursuit algorithm and a convex relaxation. The solution of our algorithms recover missing entries and provides a similarity matrix for clustering. Our algorithms can deal with both low-rank and high-rank matrices, does not suffer from initialization, does not need to know dimensions of subspaces and can work with a small number of data points. By extensive experiments on synthetic data and real problems of video motion segmentation and completion of motion capture data, we show that when the data matrix is low-rank, our algorithm performs on par with or better than low-rank matrix completion methods, while for high-rank data matrices, our method significantly outperforms existing algorithms.


Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization

Neural Information Processing Systems

The promising zero-shot generalization of vision-language models such as CLIP has led to their adoption using prompt learning for numerous downstream tasks. Previous works have shown test-time prompt tuning using entropy minimization to adapt text prompts for unseen domains. While effective, this overlooks the key cause for performance degradation to unseen domains - distribution shift. In this work, we explicitly handle this problem by aligning the out-of-distribution (OOD) test sample statistics to those of the source data using prompt tuning. We use a single test sample to adapt multi-modal prompts at test time by minimizing the feature distribution shift to bridge the gap in the test domain. Evaluating against the domain generalization benchmark, our method improves zero-shot top1 accuracy beyond existing prompt-learning techniques, with a 3.08%improvement over the baseline MaPLe. In cross-dataset generalization with unseen categories across 10 datasets, our method improves consistently across all datasets compared to the existing state-of-the-art.




Unsupervised Image Denoising with Score Function

Neural Information Processing Systems

Though achieving excellent performance in some cases, current unsupervised learning methods for single image denoising usually have constraints in applications. In this paper, we propose a new approach which is more general and applicable to complicated noise models. Utilizing the property of score function, the gradient of logarithmic probability, we define a solving system for denoising. Once the score function of noisy images has been estimated, the denoised result can be obtained through the solving system. Our approach can be applied to multiple noise models, such as the mixture of multiplicative and additive noise combined with structured correlation. Experimental results show that our method is comparable when the noise model is simple, and has good performance in complicated cases where other methods are not applicable or perform poorly.



Probabilistic Attention for Interactive Segmentation

Neural Information Processing Systems

We provide a probabilistic interpretation of attention and show that the standard dotproduct attention in transformers is a special case of Maximum APosteriori (MAP) inference. The proposed approach suggests the use of Expectation Maximization algorithms for online adaptation of key and value model parameters. This approach is useful for cases in which external agents, e.g., annotators, provide inference-time information about the correct values of some tokens, e.g., the semantic category of some pixels, and we need for this new information to propagate to other tokens in a principled manner. We illustrate the approach on an interactive semantic segmentation task in which annotators and models collaborate online to improve annotation efficiency. Using standard benchmarks, we observe that key adaptation boosts model performance ( 10% mIoU) in the low feedback regime and value propagation improves model responsiveness in the high feedback regime.


MBW: Multi-view Bootstrapping in the Wild

Neural Information Processing Systems

Labeling articulated objects in unconstrained settings has a wide variety of applications including entertainment, neuroscience, psychology, ethology, and many fields of medicine. Large offline labeled datasets do not exist for all but the most common articulated object categories (e.g., humans). Hand labeling these landmarks within a video sequence is a laborious task. Learned landmark detectors can help, but can be error-prone when trained from only a few examples. Multi-camera systems that train fine-grained detectors have shown significant promise in detecting such errors, allowing for self-supervised solutions that only need a small percentage of the video sequence to be hand-labeled.