Not enough data to create a plot.
Try a different view from the menu above.
RhinoInsight: Improving Deep Research through Control Mechanisms for Model Behavior and Context
Lei, Yu, Si, Shuzheng, Wang, Wei, Wu, Yifei, Chen, Gang, Qi, Fanchao, Sun, Maosong
Large language models are evolving from single-turn responders into tool-using agents capable of sustained reasoning and decision-making for deep research. Prevailing systems adopt a linear pipeline of plan to search to write to a report, which suffers from error accumulation and context rot due to the lack of explicit control over both model behavior and context. We introduce RhinoInsight, a deep research framework that adds two control mechanisms to enhance robustness, traceability, and overall quality without parameter updates. First, a Verifiable Checklist module transforms user requirements into traceable and verifiable sub-goals, incorporates human or LLM critics for refinement, and compiles a hierarchical outline to anchor subsequent actions and prevent non-executable planning. Second, an Evidence Audit module structures search content, iteratively updates the outline, and prunes noisy context, while a critic ranks and binds high-quality evidence to drafted content to ensure verifiability and reduce hallucinations. Our experiments demonstrate that RhinoInsight achieves state-of-the-art performance on deep research tasks while remaining competitive on deep search tasks.
MergeVLA: Cross-Skill Model Merging Toward a Generalist Vision-Language-Action Agent
Fu, Yuxia, Zhang, Zhizhen, Zhang, Yuqi, Wang, Zijian, Huang, Zi, Luo, Yadan
Recent Vision-Language-Action (VLA) models reformulate vision-language models by tuning them with millions of robotic demonstrations. While they perform well when fine-tuned for a single embodiment or task family, extending them to multi-skill settings remains challenging: directly merging VLA experts trained on different tasks results in near-zero success rates. This raises a fundamental question: what prevents VLAs from mastering multiple skills within one model? With an empirical decomposition of learnable parameters during VLA fine-tuning, we identify two key sources of non-mergeability: (1) Finetuning drives LoRA adapters in the VLM backbone toward divergent, task-specific directions beyond the capacity of existing merging methods to unify. (2) Action experts develop inter-block dependencies through self-attention feedback, causing task information to spread across layers and preventing modular recombination. To address these challenges, we present MergeVLA, a merging-oriented VLA architecture that preserves mergeability by design. MergeVLA introduces sparsely activated LoRA adapters via task masks to retain consistent parameters and reduce irreconcilable conflicts in the VLM. Its action expert replaces self-attention with cross-attention-only blocks to keep specialization localized and composable. When the task is unknown, it uses a test-time task router to adaptively select the appropriate task mask and expert head from the initial observation, enabling unsupervised task inference. Across LIBERO, LIBERO-Plus, RoboTwin, and multi-task experiments on the real SO101 robotic arm, MergeVLA achieves performance comparable to or even exceeding individually finetuned experts, demonstrating robust generalization across tasks, embodiments, and environments.
- Europe > Austria > Vienna (0.14)
- North America > United States > Tennessee > Davidson County > Nashville (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (11 more...)
Nemotron-Flash: Towards Latency-Optimal Hybrid Small Language Models
Fu, Yonggan, Dong, Xin, Diao, Shizhe, Van keirsbilck, Matthijs, Ye, Hanrong, Byeon, Wonmin, Karnati, Yashaswi, Liebenwein, Lucas, Zhang, Hannah, Binder, Nikolaus, Khadkevich, Maksim, Keller, Alexander, Kautz, Jan, Lin, Yingyan Celine, Molchanov, Pavlo
Efficient deployment of small language models (SLMs) is essential for numerous real-world applications with stringent latency constraints. While previous work on SLM design has primarily focused on reducing the number of parameters to achieve parameter-optimal SLMs, parameter efficiency does not necessarily translate into proportional real-device speed-ups. This work aims to identify the key determinants of SLMs' real-device latency and offer generalizable principles and methodologies for SLM design and training when real-device latency is the primary consideration. Specifically, we identify two central architectural factors: depth-width ratios and operator choices. The former is crucial for small-batch-size latency, while the latter affects both latency and large-batch-size throughput. In light of this, we first study latency-optimal depth-width ratios, with the key finding that although deep-thin models generally achieve better accuracy under the same parameter budget, they may not lie on the accuracy-latency trade-off frontier. Next, we explore emerging efficient attention alternatives to evaluate their potential as candidate building operators. Using the identified promising operators, we construct an evolutionary search framework to automatically discover latency-optimal combinations of these operators within hybrid SLMs, thereby advancing the accuracy-latency frontier. In addition to architectural improvements, we further enhance SLM training using a weight normalization technique that enables more effective weight updates and improves final convergence. Combining these methods, we introduce a new family of hybrid SLMs, called Nemotron-Flash, which significantly advances the accuracy-efficiency frontier of state-of-the-art SLMs, e.g., achieving over +5.5% average accuracy, 1.3x/1.9x lower latency, and 18.7x/45.6x higher throughput compared to Qwen3-1.7B/0.6B, respectively.
- North America > United States (0.14)
- Asia > Middle East > Jordan (0.04)
MIST: Mutual Information Via Supervised Training
Gritsai, German, Richards, Megan, Méloux, Maxime, Cho, Kyunghyun, Peyrard, Maxime
We propose a fully data-driven approach to designing mutual information (MI) estimators. Since any MI estimator is a function of the observed sample from two random variables, we parameterize this function with a neural network (MIST) and train it end-to-end to predict MI values. Training is performed on a large meta-dataset of 625,000 synthetic joint distributions with known ground-truth MI. To handle variable sample sizes and dimensions, we employ a two-dimensional attention scheme ensuring permutation invariance across input samples. To quantify uncertainty, we optimize a quantile regression loss, enabling the estimator to approximate the sampling distribution of MI rather than return a single point estimate. This research program departs from prior work by taking a fully empirical route, trading universal theoretical guarantees for flexibility and efficiency. Empirically, the learned estimators largely outperform classical baselines across sample sizes and dimensions, including on joint distributions unseen during training. The resulting quantile-based intervals are well-calibrated and more reliable than bootstrap-based confidence intervals, while inference is orders of magnitude faster than existing neural baselines. Beyond immediate empirical gains, this framework yields trainable, fully differentiable estimators that can be embedded into larger learning pipelines. Moreover, exploiting MI's invariance to invertible transformations, meta-datasets can be adapted to arbitrary data modalities via normalizing flows, enabling flexible training for diverse target meta-distributions.
- North America > United States (0.28)
- Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
- Asia > Middle East > Jordan (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Large Language Models Require Curated Context for Reliable Political Fact-Checking -- Even with Reasoning and Web Search
DeVerna, Matthew R., Yang, Kai-Cheng, Yan, Harry Yaojun, Menczer, Filippo
Large language models (LLMs) have raised hopes for automated end-to-end fact-checking, but prior studies report mixed results. As mainstream chatbots increasingly ship with reasoning capabilities and web search tools -- and millions of users already rely on them for verification -- rigorous evaluation is urgent. We evaluate 15 recent LLMs from OpenAI, Google, Meta, and DeepSeek on more than 6,000 claims fact-checked by PolitiFact, comparing standard models with reasoning- and web-search variants. Standard models perform poorly, reasoning offers minimal benefits, and web search provides only moderate gains, despite fact-checks being available on the web. In contrast, a curated RAG system using PolitiFact summaries improved macro F1 by 233% on average across model variants. These findings suggest that giving models access to curated high-quality context is a promising path for automated fact-checking.
- North America > United States > Texas (0.04)
- North America > United States > New York > Broome County > Binghamton (0.04)
- North America > United States > Indiana (0.04)
- (3 more...)
- Research Report > New Finding (0.66)
- Research Report > Experimental Study (0.48)
- Media > News (1.00)
- Government > Regional Government > North America Government > United States Government (0.46)
Re-Key-Free, Risky-Free: Adaptable Model Usage Control
Wang, Zihan, Ma, Zhongkui, Feng, Xinguo, Yan, Chuan, Liu, Dongge, Sun, Ruoxi, Wang, Derui, Xue, Minhui, Bai, Guangdong
Deep neural networks (DNNs) have become valuable intellectual property of model owners, due to the substantial resources required for their development. To protect these assets in the deployed environment, recent research has proposed model usage control mechanisms to ensure models cannot be used without proper authorization. These methods typically lock the utility of the model by embedding an access key into its parameters. However, they often assume static deployment, and largely fail to withstand continual post-deployment model updates, such as fine-tuning or task-specific adaptation. In this paper, we propose ADALOC, to endow key-based model usage control with adaptability during model evolution. It strategically selects a subset of weights as an intrinsic access key, which enables all model updates to be confined to this key throughout the evolution lifecycle. ADALOC enables using the access key to restore the keyed model to the latest authorized states without redistributing the entire network (i.e., adaptation), and frees the model owner from full re-keying after each model update (i.e., lock preservation). We establish a formal foundation to underpin ADALOC, providing crucial bounds such as the errors introduced by updates restricted to the access key. Experiments on standard benchmarks, such as CIFAR-100, Caltech-256, and Flowers-102, and modern architectures, including ResNet, DenseNet, and ConvNeXt, demonstrate that ADALOC achieves high accuracy under significant updates while retaining robust protections. Specifically, authorized usages consistently achieve strong task-specific performance, while unauthorized usage accuracy drops to near-random guessing levels (e.g., 1.01% on CIFAR-100), compared to up to 87.01% without ADALOC. This shows that ADALOC can offer a practical solution for adaptive and protected DNN deployment in evolving real-world scenarios.
- Oceania > Australia > Queensland (0.04)
- North America > United States > California (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > China > Hong Kong (0.04)
- Research Report > New Finding (0.46)
- Research Report > Promising Solution (0.46)
Reinforcement Learning for Self-Healing Material Systems
Chatterjee, Maitreyi, Agarwal, Devansh, Chatterjee, Biplab
The transition to autonomous material systems necessitates adaptive control methodologies to maximize structural longevity. This study frames the self-healing process as a Reinforcement Learning (RL) problem within a Markov Decision Process (MDP), enabling agents to autonomously derive optimal policies that efficiently balance structural integrity maintenance against finite resource consumption. A comparative evaluation of discrete-action (Q-learning, DQN) and continuous-action (TD3) agents in a stochastic simulation environment revealed that RL controllers significantly outperform heuristic baselines, achieving near-complete material recovery. Crucially, the TD3 agent utilizing continuous dosage control demonstrated superior convergence speed and stability, underscoring the necessity of fine-grained, proportional actuation in dynamic self-healing applications.
- Asia > India > West Bengal > Kolkata (0.06)
- North America > United States > New York > Tompkins County > Ithaca (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)
MoodBench 1.0: An Evaluation Benchmark for Emotional Companionship Dialogue Systems
Jing, Haifeng, Hou, Yujie, Liu, Junfei, Xie, Rui, Xu, alan, Ma, Jinlong, Deng, Qichun
With the rapid development of Large Language Models, dialogue systems are shifting from information tools to emotional companions, heralding the era of Emotional Companionship Dialogue Systems (ECDs) that provide personalized emotional support for users. However, the field lacks clear definitions and systematic evaluation standards for ECDs. To address this, we first propose a definition of ECDs with formal descriptions. Then, based on this theory and the design principle of "Ability Layer-Task Layer (three level)-Data Layer-Method Layer", we design and implement the first ECD evaluation benchmark - MoodBench 1.0. Through extensive evaluations of 30 mainstream models, we demonstrate that MoodBench 1.0 has excellent discriminant validity and can effectively quantify the differences in emotional companionship abilities among models. Furthermore, the results reveal current models' shortcomings in deep emotional companionship, guiding future technological optimization and significantly aiding developers in enhancing ECDs' user experience.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > China > Shandong Province > Qingdao (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (21 more...)
Uncertainty-Aware Dual-Student Knowledge Distillation for Efficient Image Classification
Gore, Aakash, Dey, Anoushka, Mishra, Aryan
Department of Electrical Engineering Indian Institute of T echnology Bombay 21D070002 aakash.gore@iitb.ac.in Abstract--Knowledge distillation has emerged as a powerful technique for model compression, enabling the transfer of knowledge from large teacher networks to compact student models. However, traditional knowledge distillation methods treat all teacher predictions equally, regardless of the teacher's confidence in those predictions. This paper proposes an uncertainty-aware dual-student knowledge distillation framework that leverages teacher prediction uncertainty to selectively guide student learning. We introduce a peer-learning mechanism where two heterogeneous student architectures, specifically ResNet-18 and MobileNetV2, learn collaboratively from both the teacher network and each other . Experimental results on ImageNet-100 demonstrate that our approach achieves superior performance compared to baseline knowledge distillation methods, with ResNet-18 achieving 83.84% top-1 accuracy and MobileNetV2 achieving 81.46% top-1 accuracy, representing improvements of 2.04% and 0.92% respectively over traditional single-student distillation approaches. Deep neural networks have achieved remarkable success across various computer vision tasks, but their deployment on resource-constrained devices remains challenging due to high computational and memory requirements. This technique has become increasingly important as the demand for deploying sophisticated machine learning models on edge devices, mobile platforms, and embedded systems continues to grow. Traditional knowledge distillation approaches use a weighted combination of hard labels derived from ground truth annotations and soft labels generated by teacher predictions to train student networks.
Pre-Filtering Code Suggestions using Developer Behavioral Telemetry to Optimize LLM-Assisted Programming
Awad, Mohammad Nour Al, Ivanov, Sergey, Tikhonova, Olga
Abstract--Large Language Models (LLMs) are increasingly integrated into code editors to provide AI-powered code suggestions. Y et many of these suggestions are ignored, resulting in wasted computation, increased latency, and unnecessary interruptions. We introduce a lightweight pre-filtering model that predicts the likelihood of suggestion acceptance before invoking the LLM, using only real-time developer telemetry such as typing speed, file navigation, and editing activity. Deployed in a production-grade Visual Studio Code plugin over four months of naturalistic use, our approach nearly doubled acceptance rates (18.4% 34.2%) while suppressing 35% of low-value LLM calls. These findings demonstrate that behavioral signals alone can meaningfully improve both user experience and system efficiency in LLM-assisted programming, highlighting the value of timing-aware, privacy-preserving adaptation mechanisms. The filter operates solely on pre-invocation editor telemetry and never inspects code or prompts. Large Language Models (LLMs) have rapidly transformed the landscape of software development by enabling intelligent code completions, refactorings, and in-editor conversations. These capabilities are increasingly integrated into modern development environments, particularly through plugins for popular IDEs such as Visual Studio Code. However, despite their power, LLM-driven code suggestions often fail to align with developer intent in real-time, leading to low acceptance rates, disrupted workflows, and wasted computational resources [1].
- North America > United States (0.05)
- Europe > Russia > Northwestern Federal District > Leningrad Oblast > Saint Petersburg (0.04)
- Asia > Russia (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)