Goto

Collaborating Authors

 China


UniGist: Towards General and Hardware-aligned Sequence-level Long Context Compression

Neural Information Processing Systems

Large language models are increasingly capable of handling long-context inputs, but the memory overhead of key-value (KV) cache remains a major bottleneck for general-purpose deployment. While various compression strategies have been explored, sequence-level compression, which drops the full KV caches for certain tokens, is particularly challenging as it can lead to the loss of important contextual information. To address this, we introduce UniGist, a sequence-level long-context compression framework that efficiently preserves context information by replacing raw tokens with special compression tokens (gists) in a fine-grained manner. We adopt a chunk-free training strategy and design an efficient kernel with a gist shift trick, enabling optimized GPU training. Our scheme also supports flexible inference by allowing the actual removal of compressed tokens, resulting in realtime memory savings. Experiments across multiple long-context tasks demonstrate that UniGist significantly improves compression quality, with especially strong performance in detail-recalling tasks and long-range dependency modeling.


AdaMSS: Adaptive Multi-Subspace Approach for Parameter-Efficient Fine-Tuning

Neural Information Processing Systems

In this paper, we propose AdaMSS, an adaptive multi-subspace approach for parameter-efficient fine-tuning of large models. Unlike traditional parameterefficient fine-tuning methods that operate within a large single subspace of the network weights, AdaMSS leverages subspace segmentation to obtain multiple smaller subspaces and adaptively reduces the number of trainable parameters during training, ultimately updating only those associated with a small subset of subspaces most relevant to the target downstream task. By using the lowest-rank representation, AdaMSS achieves more compact expressiveness and finer tuning of the model parameters. Theoretical analyses demonstrate that AdaMSS has better generalization guarantee than LoRA, PiSSA, and other single-subspace low-rankbased methods. Extensive experiments across image classification, natural language understanding, and natural language generation tasks show that AdaMSS achieves comparable performance to full fine-tuning and outperforms other parameterefficient fine-tuning methods in most cases, all while requiring fewer trainable parameters. Notably, on the ViT-Large model, AdaMSS achieves 4.7% higher average accuracy than LoRA across seven tasks, using just 15.4% of the trainable parameters. On RoBERTa-Large, AdaMSS outperforms PiSSA by 7% in average accuracy across six tasks while reducing the number of trainable parameters by approximately 94.4%. These results demonstrate the effectiveness of AdaMSS in parameter-efficient fine-tuning. The code for AdaMSS is available at https: //github.com/jzheng20/AdaMSS.


SteerConf: Steering LLMs for Confidence Elicitation

Neural Information Processing Systems

Large Language Models (LLMs) exhibit impressive performance across diverse domains but often suffer from overconfidence, limiting their reliability in critical applications. We propose SteerConf, a novel framework that systematically steers LLMs' confidence scores to improve their calibration and reliability. SteerConf introduces three key components: (1) a steering prompt strategy that guides LLMs to produce confidence scores in specified directions (e.g., conservative or optimistic) by leveraging prompts with varying steering levels; (2) a steered confidence consistency measure that quantifies alignment across multiple steered confidences to enhance calibration; and (3) a steered confidence calibration method that aggregates confidence scores using consistency measures and applies linear quantization for answer selection. SteerConf operates without additional training or fine-tuning, making it broadly applicable to existing LLMs. Experiments on seven benchmarks spanning professional knowledge, common sense, ethics, and reasoning tasks, using advanced LLM models (GPT-3.5, LLaMA 3, GPT-4), demonstrate that SteerConf significantly outperforms existing methods, often by a significant margin. Our findings highlight the potential of steering the confidence of LLMs to enhance their reliability for safer deployment in real-world applications.


ReinAD: Towards Real-world Industrial Anomaly Detection with a Comprehensive Contrastive Dataset

Neural Information Processing Systems

Recent years have witnessed significant advancements in industrial anomaly detection (IAD) thanks to existing anomaly detection datasets. However, the large performance gap between these benchmarks and real industrial practice reveals critical limitations in existing datasets. We argue that the mismatch between current datasets and real industrial scenarios becomes the primary barrier to practical IAD deployment. To this end, we propose ReinAD dataset, a comprehensive contrastive dataset towards Real-world industrial Anomaly Detection. Our dataset prioritizes three critical real-world requirements: 1) Contrast-based anomaly definition that is essential for industrial practice, 2) Fine-grained unaligned image pairs reflecting real inspections, and 3) Large-scale data from active production lines spanning multiple industrial categories. Based on our dataset, we introduce the ReinADNet. It takes both normal reference and test images as inputs, achieving anomaly detection through normal-anomaly comparison. To address the fine-grained and unaligned properties of real industrial scenes, our method integrates pyramidal similarity aggregation for comprehensive anomaly characterization and globallocal feature fusion for spatial misalignment tolerance. Our method outperforms all baselines on the ReinAD dataset (e.g., 64.5% v.s.


Last-Iterate Convergence of Smooth Regret Matching + Variants in Learning Nash Equilibria

Neural Information Processing Systems

Regret Matching+ (RM+) variants are widely used to build superhuman Poker AIs, yet few studies investigate their last-iterate convergence in learning a Nash equilibrium (NE). Although their last-iterate convergence is established for games satisfying the Minty Variational Inequality (MVI), no studies have demonstrated that these algorithms achieve such convergence in the broader class of games satisfying the weak MVI. A key challenge in proving last-iterate convergence for RM+ variants in games satisfying the weak MVI is that even if the game's loss gradient satisfies the weak MVI, RM+ variants operate on a transformed loss feedback which does not satisfy the weak MVI. To provide last-iterate convergence for RM+ variants, we introduce a concise yet novel proof paradigm that involves: (i) transforming an RM+ variant into an Online Mirror Descent (OMD) instance that updates within the original strategy space of the game to recover the weak MVI, and (ii) showing last-iterate convergence by proving the distance between accumulated regrets converges to zero via the recovered weak MVI of the feedback. Inspired by our proof paradigm, we propose Smooth Optimistic Gradient Based RM+ (SOGRM+) and show that it achieves last-iterate and finite-time best-iterate convergence in learning an NE of games satisfying the weak MVI, the weakest condition among all known RM+ variants. Experiments show that SOGRM+ significantly outperforms other algorithms. Our code is available at https://github.



UniZyme: AUnified Protein Cleavage Site Predictor Enhanced with Enzyme Active-Site Knowledge

Neural Information Processing Systems

Enzyme-catalyzed protein cleavage is essential for many biological functions. Accurate prediction of cleavage sites can facilitate various applications such as drug development, enzyme design, and a deeper understanding of biological mechanisms. However, most existing models are restricted to an individual enzyme, which neglects shared knowledge of enzymes and fails to generalize to novel enzymes. Thus, we introduce a unified protein cleavage site predictor named UniZyme, which can generalize across diverse enzymes. To enhance the enzyme encoding for the protein cleavage site prediction, UniZyme employs a novel biochemically-informed model architecture along with active-site knowledge of proteolytic enzymes. Extensive experiments demonstrate that UniZyme achieves high accuracy in predicting cleavage sites across a range of proteolytic enzymes, including unseen enzymes. The code is available in https://github.com/Ao-LiChen/UniZyme.


Learning from Disjoint Views: AContrastive Prototype Matching Network for Fully Incomplete Multi-View Clustering

Neural Information Processing Systems

Multi-view clustering aims to enhance clustering performance by leveraging information from diverse sources. However, its practical application is often hindered by a barrier: the lack of correspondences across views. This paper focuses on the understudied problem of fully incomplete multi-view clustering (FIMC), a scenario where existing methods fail due to their reliance on partial alignment. To address this problem, we introduce the Contrastive Prototype Matching Network (CPMN), a novel framework that establishes a new paradigm for cross-view alignment based on matching high-level categorical structures. Instead of aligning individual instances, CPMN performs a more robust cluster prototype alignment. CPMN first employs a correspondence-free graph contrastive learning approach, leveraging mutual k-nearest neighbors (MNN) to uncover intrinsic data structures and establish initial prototypes from entirely unpaired views. Building on the prototypes, we introduce a cross-view prototype graph matching stage to resolve category misalignment and forge a unified clustering structure. Finally, guided by this alignment, we devise a prototype-aware contrastive learning mechanism to promote semantic consistency, replacing the reliance on the initial MNN-based structural similarity. Extensive experiments on benchmark datasets demonstrate that our method significantly outperforms various baselines and ablation variants, validating its effectiveness.


Stability and Sharper Risk Bounds with Convergence Rate O(1/n2)

Neural Information Processing Systems

Prior work (Klochkov & Zhivotovskiy, 2021) establishes at most O(log(n)/n) excess risk bounds via algorithmic stability for strongly-convex learners with high probability. We show that under the similar common assumptions -- PolyakLojasiewicz condition, smoothness, and Lipschitz continous for losses -- rates of O log2(n)/n2 are at most achievable. To our knowledge, our analysis also provides the tightest high-probability bounds for gradient-based generalization gaps in nonconvex settings.


MoBA: Mixture of Block Attention for Long-Context LLMs

Neural Information Processing Systems

Scaling the effective context length is essential for advancing large language models (LLMs) toward artificial general intelligence (AGI). However, the quadratic increase in computational complexity inherent in traditional attention mechanisms presents a prohibitive overhead. Existing approaches either impose strongly biased structures, such as sink or window attention which are task-specific, or radically modify the attention mechanism into linear approximations, whose performance in complex reasoning tasks remains inadequately explored. In this work, we propose a solution that adheres to the "less structure" principle, allowing the model to determine where to attend autonomously, rather than introducing predefined biases. We introduce Mixture of Block Attention (MoBA), an innovative approach that applies the principles of Mixture of Experts (MoE) to the attention mechanism. This novel architecture demonstrates superior performance on long-context tasks while offering a key advantage: the ability to seamlessly transition between full and sparse attention, enhancing efficiency without the risk of compromising performance. MoBA has already been deployed to handle actual production workloads with long-context requirements, demonstrating significant advancements in efficient attention computation for LLMs. Our code is available at https://github.com/MoonshotAI/MoBA.