Goto

Collaborating Authors

 Industry


ASingle-Loop Gradient Algorithm for Pessimistic Bilevel Optimization via Smooth Approximation

Neural Information Processing Systems

Bilevel optimization has garnered significant attention in the machine learning community recently, particularly regarding the development of efficient numerical methods. While substantial progress has been made in developing efficient algorithms for optimistic bilevel optimization, the study of methods for solving Pessimistic Bilevel Optimization (PBO) remains relatively less explored, especially the design of fully first-order, single-loop gradient-based algorithms. This paper aims to bridge this research gap. We first propose a novel smooth approximation to the PBO problem, using penalization and regularization techniques. Building upon this approximation, we then propose SiPBA (Single-loop Pessimistic Bilevel Algorithm), a new gradient-based method specifically designed for PBO which avoids second-order derivative information or inner-loop iterations for subproblem solving. We provide theoretical validation for the proposed smooth approximation scheme and establish theoretical convergence for the algorithm SiPBA. Numerical experiments on synthetic examples and practical applications demonstrate the effectiveness and efficiency of SiPBA.


Minimal Semantic Sufficiency Meets Unsupervised Domain Generalization

Neural Information Processing Systems

The generalization ability of deep learning has been extensively studied in supervised settings, yet it remains less explored in unsupervised scenarios. Recently, the Unsupervised Domain Generalization (UDG) task has been proposed to enhance the generalization of models trained with prevalent unsupervised learning techniques, such as Self-Supervised Learning (SSL). UDG confronts the challenge of distinguishing semantics from variations without category labels. Although some recent methods have employed domain labels to tackle this issue, such domain labels are often unavailable in real-world contexts. In this paper, we address these limitations by formalizing UDG as the task of learning a Minimal Sufficient Semantic Representation: a representation that (i) preserves all semantic information shared across augmented views (sufficiency), and (ii) maximally removes information irrelevant to semantics (minimality). We theoretically ground these objectives from the perspective of information theory, demonstrating that optimizing representations to achieve sufficiency and minimality directly reduces out-of-distribution risk. Practically, we implement this optimization through Minimal-Sufficient UDG (MSUDG), a learnable model by integrating (a) an InfoNCE-based objective to achieve sufficiency; (b) two complementary components to promote minimality: a novel semantic-variation disentanglement loss and a reconstruction-based mechanism for capturing adequate variation. Empirically, MS-UDG sets a new state-of-the-art on popular unsupervised domain-generalization benchmarks, consistently outperforming existing SSL and UDG methods, without category or domain labels during representation learning.


Probing Knowledge Holes in Unlearned LLMs

Neural Information Processing Systems

Machine unlearning has emerged as a prevalent technical solution for selectively removing unwanted knowledge absorbed during pre-training, without requiring full retraining. While recent unlearning techniques can effectively remove undesirable content without severely compromising performance on standard benchmarks, we find that they may inadvertently create "knowledge holes"--unintended losses of benign knowledge that standard benchmarks fail to capture. To probe where unlearned models reveal knowledge holes, we propose a test case generation framework that explores both immediate neighbors of unlearned content and broader areas of potential failures. Our evaluation demonstrates significant hidden costs of unlearning: up to 98.7% of the test cases yield irrelevant or nonsensical responses from unlearned models, despite being answerable by the pretrained model.


Dit Hub: AModular Framework for Incremental Open-Vocabulary Object Detection

Neural Information Processing Systems

Open-Vocabulary object detectors can generalize to an unrestricted set of categories through simple textual prompting. However, adapting these models to rare classes or reinforcing their abilities on multiple specialized domains remains essential. While recent methods rely on monolithic adaptation strategies with a single set of weights, we embrace modular deep learning. We introduce DitHub, a framework designed to build and maintain a library of efficient adaptation modules. Inspired by Version Control Systems, DitHub manages expert modules as branches that can be fetched and merged as needed. This modular approach allows us to conduct an in-depth exploration of the compositional properties of adaptation modules, marking the first such study in Object Detection. Our method achieves state-of-theart performance on the ODinW-13 benchmark and ODinW-O, a newly introduced benchmark designed to assess class reappearance.


DINO Mitigating Gradient Dilution via Category Queries for Vast Vocabulary Object Detection

Neural Information Processing Systems

With the exponential growth of data, traditional object detection methods are increasingly struggling to handle vast vocabulary object detection tasks effectively. We analyze two key limitations of classification-based detectors: positive gradient dilution, where rare positive categories receive insufficient learning signals, and hard negative gradient dilution, where discriminative gradients are overwhelmed by numerous easy negatives. To address these challenges, we propose CQ-DINO, a category query-based object detection framework that reformulates classification as a contrastive task between object queries and learnable category queries. Our method introduces image-guided query selection, which reduces the negative space by adaptively retrieving the top-K relevant categories per image via cross-attention, thereby rebalancing gradient distributions and facilitating implicit hard example mining. Furthermore, CQ-DINO flexibly integrates explicit hierarchical category relationships in structured datasets (e.g., V3Det) or learns implicit category correlations via self-attention in generic datasets (e.g., COCO). Experiments demonstrate that CQ-DINO achieves superior performance on the challenging V3Det benchmark (surpassing previous methods by 2.1% AP) while maintaining competitiveness on COCO. Our work provides a scalable solution for real-world detection systems requiring wide category coverage.


Glowing 'mother orb' spotted launching swarms of mystery objects near secret US site, new Pentagon files reveal

Daily Mail - Science & tech

Former Olympian seen in handcuffs as Trump threatens'years in jail' and more arrests after vandals SABOTAGE Reflecting Pool with'corrosive and destructive chemicals' Keir Starmer'will announce as early as Monday that he is quitting as Prime Minister' after spending weekend locked in tense talks about his future with his wife Victoria at Chequers Mortifying truth about Clavicular's'botched' nose job: Infertile influencer's'trans' admission to friends... as insider reveals what's said behind closed doors - and twisted secrets that'll leave fans floored Giorgia Meloni rips'senseless' attacks from Trump as Italian Prime Minister refuses to back down amid G7 feud Inside America's new fattest town: Burgers are the size of your head, gyms lie empty and custom mobility scooters carry 800lb loads... as we investigate why Ozempic just DOESN'T work Call me cynical, but the real reason Gruesome Twosome Harry and Meghan are returning to the UK is just so obvious... and highly humiliating: MAUREEN CALLAHAN Stingy fast food giant named America's favorite restaurant AGAIN... and experts think they know why I lost 50lb without jabs using this easy but overlooked method. But I still felt dowdy - until I discovered these expert anti-ageing fashion and beauty tips. Embattled Alexi Lalas makes controversial World Cup declaration amid tension with Fox colleagues: 'Makes you look like a weak poser' No one can see the real reason Jelly Roll divorced Bunnie XO. Angelina Jolie's son Pax, 22, surfaces in LA after bombshell revelation about his relationship to Brad Pitt Scientists propose radical new theory of consciousness - and claim it doesn't depend on flesh and blood Candace Owens hits out at nasty rumors claiming she was DEAD... as fellow MAGA influencer claims her account was hacked Blake Lively runs errands in frumpy outfit after reconciling with ex-BFF Taylor Swift... miles away from reported'bachelorette party' Grace Kelly's lookalike granddaughter, 27, wows in bikini snaps...as she packs on the PDA during beach getaway TV star mom, 46, who appeared on'quitting everything to change your life' show died in fire at luxury Caribbean beach resort that sent 1,700 tourists running for their lives The four mistakes that led to bungee tragedy on Skeleton Bridge: FRED KELLY saw the scene for himself, now he retraces the prelude to disaster. So was it really an accident?


Dr. RAW: Towards General High-Level Vision from RAW with Efficient Task Conditioning

Neural Information Processing Systems

We introduce Dr. RAW, a unified and tuning-efficient framework for high-level computer vision tasks directly operating on camera RAW data. Unlike previous approaches that optimize image signal processing (ISP) pipelines and fully finetune networks for each task, Dr. RAW achieves state-of-the-art performance with minimal parameter updates and frozen backbone weights. At the input stage, we apply lightweight pre-processing steps, including sensor and illumination mapping, along with re-mosaicing, to mitigate data inconsistencies stemming from sensor variations and lighting conditions. At the network level, we introduce task-specific adaptation through two modules: Sensor Prior Prompts (SPP) and task-specific Low-Rank Adaptation (LoRA). SPP injects sensor-aware conditioning into the network via learnable prompts derived from RAW pixel distribution priors, while LoRA enables efficient task-specific tuning by updating only low-rank matrices in key backbone layers. Despite minimal tuning, Dr. RAW delivers superior results across four RAW-based tasks (object detection, semantic segmentation, instance segmentation, and pose estimation) on nine datasets encompassing various light conditions.


Federated Continual Learning via Orchestrating Multi-Scale Expertise

Neural Information Processing Systems

Federated continual learning (FCL) aims to maintain the model's performance on old tasks (i.e., stability) while enhancing its ability to acquire knowledge from current tasks (i.e., plasticity). With the development of pre-trained models (PTMs), fine-tuning PTMs on clients has become a promising approach to leveraging their extensive knowledge in FCL. In this paper, we propose MultiFCL, a novel FCL framework that fine-tunes PTMs to adapt to FCL while preserving their strong generalization capabilities. Specifically, to ensure the stability, MultiFCL introduces lightweight adapters for task adaption, which are subsequently frozen to prevent catastrophic forgetting. Moreover, by utilizing the semantic features of old tasks, MultiFCL performs multi-modal initialization of new task class prototypes. To enhance the plasticity, MultiFCL employs a multi-expert training mechanism that integrates multi-scale feature learning with multi-teacher dynamic self-distillation.


VisualSync: Multi-Camera Synchronization via Cross-View Object Motion

Neural Information Processing Systems

Today, people can easily record memorable moments, ranging from concerts, sports events, lectures, family gatherings, and birthday parties with multiple consumer cameras. However, synchronizing these cross-camera streams remains challenging. Existing methods assume controlled settings, specific targets, manual correction, or costly hardware. We present VisualSync, an optimization framework based on multi-view dynamics that aligns unposed, unsynchronized videos at millisecond accuracy. Our key insight is that any moving 3D point, when co-visible in two cameras, obeys epipolar constraints once properly synchronized.


Topology-Aware Learning of Tubular Manifolds via SE(3)-Equivariant Network on Ball B-Spline Curve

Neural Information Processing Systems

Tubular-like system shape analysis is quite difficult in geometry and topology, while it is widely used in plants and organs analysis in practice. However, traditional discrete representations such as voxels and point clouds often require substantial storage and may lead to the loss of fine-grained geometric and topological details. To address these challenges, we propose SE(3)-BBSCformerGCN, a novel framework for learning shape-aware representations from continuous tubular topological manifolds with equivariance to rotations and translations. Our approach leverages Ball B-Spline Curve (BBSC) to define tubular manifolds and its functional space. We provide a formal mathematical definition and analysis of the resulting manifolds and the BBSC functional space, and incorporate an equivariant mapping that preserves geometric and topological stability. Compared to the point cloud and voxel based representations, our manifold-based formulation significantly reduces data complexity while preserving geometric attributes together with topological features.