AITopics | proxy model

Antidistillation Sampling

Neural Information Processing SystemsJun-21-2026, 14:09:33 GMT

Frontier models that generate extended reasoning traces inadvertently produce token sequences that can facilitate model distillation. Recognizing this vulnerability, model owners may seek sampling strategies that limit the effectiveness of distillation without compromising model performance. Antidistillation sampling provides exactly this capability.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (1.00)
Overview (0.67)

Industry:

Education (1.00)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)

Add feedback

FedRAM: Federated Reweighting and Aggregation for Multi-Task Learning

Neural Information Processing SystemsJun-19-2026, 22:03:36 GMT

Federated Multi-Task Learning (FL-MTL) enables clients with heterogeneous data to collaboratively train models capable of handling multiple downstream tasks. However, FL-MTL faces key challenges, including statistical heterogeneity, task interference, and the need to balance local learning with global knowledge sharing. Traditional methods like FedAvg struggle in such settings due to the lack of explicit mechanisms to address these issues. In this paper, we propose FedRAM, a threestep framework that progressively updates two scalar hyperparameters: the task importance weight and the client aggregation coefficient. FedRAM introduces a reference-proxy-agent strategy, where the proxy model serves as an intermediate between the local reference model and the global agent model. This design reduces the need for repeated local training while preserving local performance. Extensive experiments on six real-world FL-MTL benchmarks show that FedRAM improves performance by at least 3% over the most baseline on both in-domain and outof-domain tasks, while reducing computational cost by 15 . These results make FedRAM a robust and practical solution for large-scale FL-MTL applications. The code is available at https://github.com/wwffvv/FedRAM.

artificial intelligence, fedram, machine learning, (18 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (1.00)
Overview (0.67)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.48)

Add feedback

Multi-order Orchestrated Curriculum Distillation for Model-Heterogeneous Federated Graph Learning

Neural Information Processing SystemsJun-17-2026, 23:20:29 GMT

Federated Graph Learning (FGL) has been shown to be particularly effective in enabling collaborative training of Graph Neural Networks (GNNs) in decentralized settings. Model-heterogeneous FGL further enhances practical applicability by accommodating client preferences for diverse model architectures. However, existing model-heterogeneous approaches primarily target Euclidean data and fail to account for a crucial aspect of graph-structured data: topological relationships. To address this limitation, we propose TRUST, a novel knowledge distillation-based modelheterogeneous FGL framework. Specifically, we propose Progressive Curriculum Node Scheduler to progressively introduce challenging nodes based on learning difficulty. In Adaptive Curriculum Distillation Modulator, we propose an adaptive temperature modulator that dynamically adjusts knowledge distillation temperature to accommodate varying client capabilities and graph complexity. Moreover, we leverage Wasserstein-Driven Affinity Distillation to enable models to capture crossclass structural relationships through optimal transport. Extensive experiments on multiple graph benchmarks and model-heterogeneous settings show that TRUST outperforms existing methods, achieving an average 3.6% performance gain, particularly under moderate heterogeneity conditions.

artificial intelligence, machine learning, survey article, (19 more...)

Neural Information Processing Systems

Country:

Asia > China (0.28)
North America > United States (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (0.93)

Industry:

Education (0.93)
Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Transforming Gaps into Gains: Bridging Model and Data Heterogeneity in Federated Learning via Knowledge Weak-Aware Zones

Neural Information Processing SystemsJun-17-2026, 12:02:13 GMT

Heterogeneous federated learning enables collaborative training across clients under dual heterogeneity of models and data, posing challenges for effective knowledge transfer. Federated mutual learning employs proxy models to bridge cross-model knowledge exchange; however, existing methods remain limited to direct alignment between the outputs of private and proxy models, ignoring the deep discrepancies in representation and decision spaces between them. Such cognitive biases cause knowledge to be transferred only at shallow levels and trigger performance bottlenecks. To address this, this paper proposes FedKWAZ to identify and exploit Knowledge Weak-Aware Zones (KWAZ)--spatial zones of deep knowledge misalignment between private and proxy models, further refined into Semantic Weak-Aware Zones and Decision Weak-Aware Zones, which characterize cognitive misalignments in representation and decision spaces as focal targets for enhanced bidirectional distillation. FedKWAZ designs a Hierarchical Adaptive Patch Mixing (HAPM) mechanism to generate multiple mixed samples and employs a Knowledge Discrepancy Perceptron (KDP) to select the samples exhibiting the largest representation and decision discrepancies, thereby mining critical KWAZ. These modules are integrated into a two-stage mutual learning framework, achieving global class-level representation-decision consistency alignment and local KWAZguided refinement, structurally bridging cognitive biases across heterogeneous mutual learning models. Experimental results on multiple datasets and model configurations demonstrate the superior performance of FedKWAZ.

artificial intelligence, deep learning, machine learning, (18 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Education (1.00)
Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Nemotron-CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training

Neural Information Processing SystemsJun-15-2026, 17:02:43 GMT

Pre-training datasets are typically collected from web content and lack inherent domain divisions. For instance, widely used datasets like Common Crawl do not include explicit domain labels, while manually curating labeled datasets such as The Pile is labor-intensive. Consequently, identifying an optimal pre-training data mixture remains a challenging problem, despite its significant benefits for pre-training performance. To address these challenges, we propose CLusteringbased Iterative Data Mixture Bootstrapping (Nemotron-CLIMB), an automated framework that discovers, evaluates, and refines data mixtures in a pre-training setting. Specifically, Nemotron-CLIMB embeds and clusters large-scale datasets in a semantic space and then iteratively searches for optimal mixtures using a smaller proxy model and a predictor. This strategy enables effective domain adaptation without relying solely on curated data. When continuously trained on 400B tokens with this mixture, our 1B model exceeds the state-of-the-art Llama-3.2-1B by 2.0%. Moreover, we observe that optimizing for a specific domain (e.g., Social Sciences) yields a 5% improvement over random sampling.

large language model, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry:

Government (0.67)
Health & Medicine > Public Health (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.84)

Add feedback

d601a9b708cacfad167f6c6c45647a18-Paper-Conference.pdf

Neural Information Processing SystemsApr-29-2026, 22:06:12 GMT

artificial intelligence, generator, machine learning, (19 more...)

Neural Information Processing Systems

Genre: Research Report > Promising Solution (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

24b43fb034a10d78bec71274033b4096-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 03:47:47 GMT

artificial intelligence, machine learning, optimization problem, (17 more...)

Neural Information Processing Systems

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

RoMA: Robust Model Adaptation for Offline Model-based Optimization

Neural Information Processing SystemsApr-25-2026, 03:47:43 GMT

We consider the problem of searching an input maximizing a black-box objective function given a static dataset of input-output queries. A popular approach tosolving this problem is maintaining a proxy model, e.g., a deep neural network (DNN), that approximates the true objective function. Here, the main challenge is how to avoid adversarially optimized inputs during the search, i.e., the inputs where the DNN highly overestimates the true objective function. To handle the issue, we propose a new framework, coined robust model adaptation (RoMA), based on gradient-based optimization of inputs over the DNN. Specifically, it consists of two steps: (a) a pre-training strategy to robustly train the proxy model and (b) a novel adaptation procedure of the proxy model to have robust estimates for aspecific set of candidate solutions. At a high level, our scheme utilizes the local smoothness prior to overcome the brittleness of the DNN. Experiments under various tasks show the effectiveness of RoMA compared with previous methods, obtaining state-of-the-art results, e.g., RoMA outperforms all at 4 out of 6 tasks and achieves runner-up results at the remaining tasks.

artificial intelligence, deep learning, machine learning, (16 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback