Goto

Collaborating Authors

 slm


Unlocking for Data Analysis Code Generation via Non Parametric Knowledge Distillation

Neural Information Processing Systems

Knowledge distillation from Large Language Models (LLMs) to locally hosted Small Language Models (SLMs) provides advantages for Data Analysis Code Generation (DACG) such as privacy protection. However, achieving effective distillation without resource-intensive training is challenging. This paper investigates whether LLMs can distill knowledge to SLMs through In-Context Learning (ICL), a training-free method for rapid task adaptation. We present the DARGO: Distillation and Adaptive Reasoning-Guided Orchestration framework, which facilitates automatic knowledge distillation from LLMs to SLMs. DARGO consists of three phases: exploration through an Model Orchestration Interface (MOI), Memory Collection of successful trajectories, and Knoweldge-driven Inference. We evaluate DARGO on three challenging DACG benchmarks (WIKITQ, TABMWP, and BIRD-SQL), each with in-domain training sets that enable detailed analysis of knowledge distillation effectiveness. DARGO demonstrates a substantial relative performance improvement of 27.5% on average for the student SLMs. To further observe generalization capabilities, we evaluate the DARGO across different teacher-student model combinations, knowledge transfer scenarios, and unified memory approaches for more advanced, test-only data analysis tasks. Our findings contribute a novel perspective on distillation methods that enhance performance for SLMs while avoiding intensive fine-tuning.


SmallKV: Small Model Assisted Compensation of KV Cache Compression for Efficient LLMInference

Neural Information Processing Systems

KV cache eviction has emerged as an effective solution to alleviate resource constraints faced by LLMs in long-context scenarios. However, existing token-level eviction methods often overlook two critical aspects: (1) their irreversible eviction strategy fails to adapt to dynamic attention patterns during decoding (the saliency shift problem), and (2) they treat both marginally important tokens and truly unimportant tokens equally, despite the collective significance of marginal tokens to model performance (the marginal information over-compression problem). To address these issues, we design two compensation mechanisms based on the high similarity of attention matrices between LLMs of different scales. We propose SmallKV, a small model assisted compensation method for KV cache compression. SmallKV can maintain attention matching between different-scale LLMs to: 1) assist the larger model in perceiving globally important information of attention; and 2) use the smaller model's attention scores to approximate those of marginal tokens in the larger model. Extensive experiments on benchmarks including GSM8K, BBH, MT-Bench, and LongBench demonstrate the effectiveness of SmallKV. Moreover, efficiency evaluations show that SmallKV achieves 1.75 2.56 times higher throughput than baseline methods, highlighting its potential for efficient and performant LLM inference in resource constrained environments.


ShortListing Model: AStreamlined Simplex Diffusion for Discrete Variable Generation

Neural Information Processing Systems

Generative modeling of discrete variables is challenging yet crucial for applications in natural language processing and biological sequence design. We introduce the Shortlisting Model (SLM), a novel simplex-based diffusion model inspired by progressive candidate pruning. SLM operates on simplex centroids, reducing generation complexity and enhancing scalability. Additionally, SLM incorporates a flexible implementation of classifier-free guidance, enhancing unconditional generation performance. Extensive experiments on DNA promoter and enhancer design, protein design, character-level and large-vocabulary language modeling demonstrate the competitive performance and strong potential of SLM.


Unlocking SLM Potential for Data Analysis Code Generation via Non-Parametric Knowledge Distillation

Neural Information Processing Systems

Knowledge distillation from Large Language Models (LLMs) to locally hosted Small Language Models (SLMs) provides advantages for Data Analysis Code Generation (DACG) such as privacy protection. However, achieving effective distillation without resource-intensive training is challenging. This paper investigates whether LLMs can distill knowledge to SLMs through In-Context Learning (ICL), a training-free method for rapid task adaptation. We present the DarGO: Distillation and Adaptive Reasoning-Guided Orchestration framework, which facilitates automatic knowledge distillation from LLMs to SLMs. DarGO consists of three phases: exploration through an Model Orchestration Interface (MOI), Memory Collection of successful trajectories, and Knoweldge-driven Inference. We evaluate DarGO on three challenging DACG benchmarks (WikiTQ, TabMWP, and Bird-SQL), each with in-domain training sets that enable detailed analysis of knowledge distillation effectiveness. DarGO demonstrates a substantial relative performance improvement of 27.5\% on average for the student SLMs. To further observe generalization capabilities, we evaluate the \method across different teacher-student model combinations, knowledge transfer scenarios, and unified memory approaches for more advanced, test-only data analysis tasks. Our findings contribute a novel perspective on distillation methods that enhance high performance for SLMs while avoiding intensive fine-tuning.


ShortListing Model: A Streamlined Simplex Diffusion for Discrete Variable Generation

Neural Information Processing Systems

Generative modeling of discrete variables is challenging yet crucial for applications in natural language processing and biological sequence design. We introduce the Shortlisting Model (SLM), a novel simplex-based diffusion model inspired by progressive candidate pruning. SLM operates on simplex centroids, reducing generation complexity and enhancing scalability. Additionally, SLM incorporates a flexible implementation of classifier-free guidance, enhancing unconditional generation performance. Extensive experiments on DNA promoter and enhancer design, protein design, character-level and large-vocabulary language modeling demonstrate the competitive performance and strong potential of SLM.




SLM: A Smoothed First-Order Lagrangian Method for Structured Constrained Nonconvex Optimization

Neural Information Processing Systems

Functional constrained optimization (FCO) has emerged as a powerful tool for solving various machine learning problems. However, with the rapid increase in applications of neural networks in recent years, it has become apparent that both the objective and constraints often involve nonconvex functions, which poses significant challenges in obtaining high-quality solutions. In this work, we focus on a class of nonconvex FCO problems with nonconvex constraints, where the two optimization variables are nonlinearly coupled in the inequality constraint. Leveraging the primal-dual optimization framework, we propose a smoothed first-order Lagrangian method (SLM) for solving this class of problems. We establish the theoretical convergence guarantees of SLM to the Karush-Kuhn-Tucker (KKT) solutions through quantifying dual error bounds. By establishing connections between this structured FCO and equilibrium-constrained nonconvex problems (also known as bilevel optimization), we apply the proposed SLM to tackle bilevel optimization oriented problems where the lower-level problem is nonconvex. Numerical results obtained from both toy examples and hyper-data cleaning problems demonstrate the superiority of SLM compared to benchmark methods.