student
Single-Teacher View Augmentation: Boosting Knowledge Distillation via Angular Diversity
Knowledge Distillation (KD) aims to train a lightweight student model by transferring knowledge from a large, high-capacity teacher. Recent studies have shown that leveraging diverse teacher perspectives can significantly improve distillation performance; however, achieving such diversity typically requires multiple teacher networks, leading to high computational costs. In this work, we propose a novel cost-efficient knowledge augmentation method for KD that generates diverse multiviews by attaching multiple branches to a single teacher. To ensure meaningful semantic variation across multi-views, we introduce two angular diversity objectives: 1) constrained inter-angle diversify loss, which maximizes angles between augmented views while preserving proximity to the original teacher output, and 2) intra-angle diversify loss, which encourages an even distribution of views around the original output. The ensembled knowledge from these angularly diverse views, along with the original teacher, is distilled into the student. We further theoretically demonstrate that our objectives increase the diversity among ensemble members and thereby reduce the upper bound of the ensemble's expected loss, leading to more effective distillation. Experimental results show that our method surpasses an existing knowledge augmentation method across diverse configurations. Moreover, the proposed method is compatible with other KD frameworks in a plug-and-play fashion, providing consistent improvements in generalization performance.
PLD: AChoice-Theoretic List-Wise Knowledge Distillation
Knowledge distillation is a model compression technique in which a compact "student" network is trained to replicate the predictive behavior of a larger "teacher" network. In logit-based knowledge distillation, it has become the de facto approach to augment cross-entropy with a distillation term. Typically, this term is either a KL divergence that matches marginal probabilities or a correlation-based loss that captures intra-and inter-class relationships. In every case, it acts as an additional term to cross-entropy. This term has its own weight, which must be carefully tuned. In this paper, we adopt a choice-theoretic perspective and recast knowledge distillation under the Plackett-Luce model by interpreting teacher logits as "worth" scores. We introduce Plackett-Luce Distillation (PLD), a weighted list-wise ranking loss. In PLD, the teacher model transfers knowledge of its full ranking of classes, weighting each ranked choice by its own confidence.
Carvalho resigns as LAUSD superintendent amid federal investigation
Things to Do in L.A. Tap to enable a layout that focuses on the article. Alberto Carvalho, who resigned Sunday as LAUSD superintendent, addresses students at an elementary school in 2023. This is read by an automated voice. Please report any issues or inconsistencies here . Alberto Carvalho resigned Sunday night.
Improving Task-Specific Multimodal Sentiment Analysis with General MLLMs via Prompting
Multimodal Sentiment Analysis (MSA) aims to predict sentiment from diverse data types, such as video, audio, and language. Recent progress in Multimodal Large Language Models (MLLMs) have demonstrated impressive performance across various tasks. However, in MSA, the increase in computational costs does not always correspond to a significant improvement in performance, raising concerns about the cost-effectiveness of applying MLLMs to MSA. This paper introduces the MLLMGuided Multimodal Sentiment Learning Framework (MMSLF). It improves the performance of task-specific MSA models by leveraging the generalized knowledge of MLLMs through a teacher-student framework, rather than directly using MLLMs for sentiment prediction. First, the proposed teacher built upon a powerful MLLM (e.g., GPT-4o-mini), guides the student model to align multimodal representations through MLLM-generated context-aware prompts. Then, knowledge distillation enables the student to mimic the teacher's predictions, thus allowing it to predict sentiment independently without relying on the context-aware prompts. Extensive experiments on the SIMS, MOSI, and MOSEI datasets demonstrate that our framework enables task-specific models to achieve state-of-the-art performance across most metrics. This also provides new insights into the application of general MLLMs for improving MSA.1
Antidistillation Sampling
Frontier models that generate extended reasoning traces inadvertently produce token sequences that can facilitate model distillation. Recognizing this vulnerability, model owners may seek sampling strategies that limit the effectiveness of distillation without compromising model performance. Antidistillation sampling provides exactly this capability.
Unlocking for Data Analysis Code Generation via Non Parametric Knowledge Distillation
Knowledge distillation from Large Language Models (LLMs) to locally hosted Small Language Models (SLMs) provides advantages for Data Analysis Code Generation (DACG) such as privacy protection. However, achieving effective distillation without resource-intensive training is challenging. This paper investigates whether LLMs can distill knowledge to SLMs through In-Context Learning (ICL), a training-free method for rapid task adaptation. We present the DARGO: Distillation and Adaptive Reasoning-Guided Orchestration framework, which facilitates automatic knowledge distillation from LLMs to SLMs. DARGO consists of three phases: exploration through an Model Orchestration Interface (MOI), Memory Collection of successful trajectories, and Knoweldge-driven Inference. We evaluate DARGO on three challenging DACG benchmarks (WIKITQ, TABMWP, and BIRD-SQL), each with in-domain training sets that enable detailed analysis of knowledge distillation effectiveness. DARGO demonstrates a substantial relative performance improvement of 27.5% on average for the student SLMs. To further observe generalization capabilities, we evaluate the DARGO across different teacher-student model combinations, knowledge transfer scenarios, and unified memory approaches for more advanced, test-only data analysis tasks. Our findings contribute a novel perspective on distillation methods that enhance performance for SLMs while avoiding intensive fine-tuning.
Reinforcement Learning Teachers of Test Time Scaling
Training reasoning language models (LMs) with reinforcement learning (RL) for one-hot correctness inherently relies on the LM being able to explore and solve its task with some chance at initialization. Furthermore, a key use case of reasoning LMs is to act as teachers for distilling new students and cold-starting future RL iterations rather than being deployed themselves. From these considerations, we introduce a new framework that avoids RL's exploration challenge by training a new class of Reinforcement-Learned Teachers (RLTs) focused on yielding the most effective downstream distillation. RLTs are prompted with both the question and solution to each problem, and tasked to simply "connect-the-dots" with detailed explanations tailored for their students. We train RLTs with dense rewards obtained by feeding each explanation to the student and testing its understanding of the problem's solution. In practice, the raw outputs of a 7BRLT provide higher final performance on competition and graduate-level tasks than existing distillation and cold-starting pipelines that collect and postprocess the reasoning traces of orders of magnitude larger LMs. Furthermore, RLTs maintain their effectiveness when training larger students and when applied zero-shot to out-of-distribution tasks, unlocking new levels of efficiency and re-usability for the RL reasoning framework.
'I was taken from school and trained to fly UFOs with my mind,' claims child genius
Terrifying stomach cancer explosion sweeps the US: After fitness influencer's shock death, experts reveal subtle early signs that are too often ignored... and lifestyle tweaks that can PREVENT it Actress, 43, announces she is expecting with sweet video after detailing'complicated' journey to motherhood and hope of having third child Trump foe Rosie O'Donnell to replace Jimmy Kimmel as he steps back from his show Deadly secrets of gorgeous California enclave where college girls were killed by a'sneaker'... now experts say they could have been SAVED The other women left devastated by Jelly Roll's divorce: Why his daughter is now'disgusted'... as Bunnie's baby bombshell rocks Nashville The shaming of America's original mommy influencer after tragedy that divided the nation: Bode Miller's wife Morgan breaks cover to reveal agonizing regret that still haunts her since daughter's drowning Trump boasts there's'no limits' to his power and posts bizarre memo by fake historian comparing him to Hitler More young Americans are living with their parents than ever before... and there is a shocking reason behind the boomerang trend I was mortified when my husband always said no to sex. Then I realised the mistake I was making. This is the change that's completely transformed marital love-making in middle age: ALICE SNAPE Revealed: Hero, 24, who saved man's LIFE in dramatic rescue during New York Knicks victory parade after defying cops' orders: 'I'm just another New Yorker' REVEALED: Gavin Newsom steered millions of dollars of donations to nonprofits connected to his wife... as Trump's DOJ probes couple The shingles vaccine could lower dementia risk'by up to a quarter' - but scientists are still puzzled why Farce of Obama's $850m'monstrosity': As clucking liberal elite cheer Barack's grand opening, outraged Chicago locals tell HARRIET ALEXANDER awkward truth about library Why turnips MUST be in your grocery cart if you're trying to lose weight Taco Bell's finally fixes a glaring menu gap - and brings back a fan favorite after years Mom thought popular'natural' health supplement was safer than Xanax. She took it... then never woke up. Don't make the same mistake Mother and child in critical condition after being swallowed into ocean by ANOTHER monstrous California wave... just days after college students were killed by breaker'I was taken from school and trained to fly UFOs with my mind,' claims child genius A former gifted child has come forward with claims that he was removed from public school and secretly trained to develop psychic abilities for military and UFO-related applications.
Knowledge Starts with Practice: Knowledge-Aware Exercise Generative Recommendation with Adaptive Multi-Agent Cooperation
Adaptive learning, which requires the in-depth understanding of students' learning processes and rational planning of learning resources, plays a crucial role in intelligent education. However, how to effectively model these two processes and seamlessly integrate them poses significant implementation challenges for adaptive learning. As core learning resources, exercises have the potential to diagnose students' knowledge states during the learning processes and provide personalized learning recommendations to strengthen students' knowledge, thereby serving as a bridge to boost student-oriented adaptive learning. Therefore, we introduce a novel task called Knowledge-aware Exercise Generative Recommendation (KEGR). It aims to dynamically infer students' knowledge states from their past exercise responses and customizably generate new exercises. To achieve KEGR, we propose an adaptive multi-agent cooperation framework, called ExeGen, inspired by the excellent reasoning and generative capabilities of LLM-based AI agents. Specifically, ExeGen coordinates four specialized agents for supervision, knowledge state perception, exercise generation, and quality refinement through an adaptive loop workflow pipeline. More importantly, we devise two enhancement mechanisms in ExeGen: 1) A human-simulated knowledge perception mechanism mimics students' cognitive processes and generates interpretable knowledge state descriptions via demonstration-based In-Context Learning (ICL). In this mechanism, a dualmatching strategy is further designed to retrieve highly relevant demonstrations for reliable ICL reasoning.