Energy
Winning the Pruning Gamble: A Unified Approach to Joint Sample and Token Pruning for Efficient Supervised Fine-Tuning
Wang, Shaobo, Wang, Jiaming, Zhang, Jiajun, Wang, Cong, Min, Yue, Wen, Zichen, Huang, Fei, Jiang, Huiqiang, Lin, Junyang, Liu, Dayiheng, Zhang, Linfeng
As supervised fine-tuning (SFT) evolves from a lightweight post-training step into a compute-intensive phase rivaling mid-training in scale, data efficiency has become critical for aligning large language models (LLMs) under tight budgets. Existing data pruning methods suffer from a fragmented design: they operate either at the sample level or the token level in isolation, failing to jointly optimize both dimensions. This disconnect leads to significant inefficiencies--high-value samples may still contain redundant tokens, while token-level pruning often discards crucial instructional or corrective signals embedded in individual examples. To address this bottleneck, we introduce the Error-Uncertainty (EU) Plane, a diagnostic framework that jointly characterizes the heterogeneous utility of training data across samples and tokens. Guided by this insight, we propose Quadrant-based Tuning (Q-Tuning), a unified framework that strategically coordinates sample pruning and token pruning. Q-Tuning employs a two-stage strategy: first, it performs sample-level triage to retain examples rich in informative misconceptions or calibration signals; second, it applies an asymmetric token-pruning policy, using a context-aware scoring mechanism to trim less salient tokens exclusively from misconception samples while preserving calibration samples in their entirety. Our method sets a new state of the art across five diverse benchmarks. Remarkably, on SmolLM2-1.7B, Q-Tuning achieves a +38\% average improvement over the full-data SFT baseline using only 12.5\% of the original training data. As the first dynamic pruning approach to consistently outperform full-data training, Q-Tuning provides a practical and scalable blueprint for maximizing data utilization in budget-constrained LLM SFT.
DexFlyWheel: A Scalable and Self-improving Data Generation Framework for Dexterous Manipulation
Zhu, Kefei, Bai, Fengshuo, Xiang, YuanHao, Cai, Yishuai, Chen, Xinglin, Li, Ruochong, Wang, Xingtao, Dong, Hao, Yang, Yaodong, Fan, Xiaopeng, Chen, Yuanpei
Dexterous manipulation is critical for advancing robot capabilities in real-world applications, yet diverse and high-quality datasets remain scarce. Existing data collection methods either rely on human teleoperation or require significant human engineering, or generate data with limited diversity, which restricts their scalability and generalization. In this paper, we introduce DexFlyWheel, a scalable data generation framework that employs a self-improving cycle to continuously enrich data diversity. Starting from efficient seed demonstrations warmup, DexFlyWheel expands the dataset through iterative cycles. Each cycle follows a closed-loop pipeline that integrates Imitation Learning (IL), residual Reinforcement Learning (RL), rollout trajectory collection, and data augmentation. Specifically, IL extracts human-like behaviors from demonstrations, and residual RL enhances policy generalization. The learned policy is then used to generate trajectories in simulation, which are further augmented across diverse environments and spatial configurations before being fed back into the next cycle. Over successive iterations, a self-improving data flywheel effect emerges, producing datasets that cover diverse scenarios and thereby scaling policy performance. Experimental results demonstrate that DexFlyWheel generates over 2,000 diverse demonstrations across four challenging tasks. Policies trained on our dataset achieve an average success rate of 81.9\% on the challenge test sets and successfully transfer to the real world through digital twin, achieving a 78.3\% success rate on dual-arm lift tasks.
From What to Why: A Multi-Agent System for Evidence-based Chemical Reaction Condition Reasoning
Yang, Cheng, Lu, Jiaxuan, Wan, Haiyuan, Yu, Junchi, Qin, Feiwei
The chemical reaction recommendation is to select proper reaction condition parameters for chemical reactions, which is pivotal to accelerating chemical science. With the rapid development of large language models (LLMs), there is growing interest in leveraging their reasoning and planning capabilities for reaction condition recommendation. Despite their success, existing methods rarely explain the rationale behind the recommended reaction conditions, limiting their utility in high-stakes scientific workflows. In this work, we propose ChemMAS, a multi-agent system that reframes condition prediction as an evidence-based reasoning task. ChemMAS decomposes the task into mechanistic grounding, multi-channel recall, constraint-aware agentic debate, and rationale aggregation. Each decision is backed by interpretable justifications grounded in chemical knowledge and retrieved precedents. Experiments show that ChemMAS achieves 20-35% gains over domain-specific baselines and outperforms general-purpose LLMs by 10-15% in Top-1 accuracy, while offering falsifiable, human-trustable rationales, which establishes a new paradigm for explainable AI in scientific discovery.
Color-Pair Guided Robust Zero-Shot 6D Pose Estimation and Tracking of Cluttered Objects on Edge Devices
Yang, Xingjian, Banerjee, Ashis G.
Abstract-- Robust 6D pose estimation of novel objects under challenging illumination remains a significant challenge, often requiring a trade-off between accurate initial pose estimation and efficient real-time tracking. We present a unified framework explicitly designed for efficient execution on edge devices, which synergizes a robust initial estimation module with a fast motion-based tracker . The key to our approach is a shared, lighting-invariant color-pair feature representation that forms a consistent foundation for both stages. For initial estimation, this feature facilitates robust registration between the live RGB-D view and the object's 3D mesh. Extensive experiments on benchmark datasets demonstrate that our integrated approach is both effective and robust, providing competitive pose estimation accuracy while maintaining high-fidelity tracking even through abrupt pose changes. Estimation of an object's six-degree-of-freedom (6D) pose, which involves determining its 3D rotation and 3D translation relative to a camera, is a fundamental task in computer vision and robotics [1]. Accurate 6D pose information is crucial for a variety of applications, ranging from robotic manipulation and grasping in industrial and household environments to immersive experiences in augmented and mixed reality. The ability of an autonomous system to precisely locate and determine the orientation of objects is a key prerequisite for meaningful physical interaction. Furthermore, in dynamic scenarios, this capability must extend beyond single-frame estimation to continuous, real-time tracking, providing the temporal coherence necessary for tasks such as closed-loop robotic control. Historically, pose estimation has focused on instance-level methods, which require costly, object-specific training and thus cannot generalize to new objects. While category-level approaches can handle unseen instances within a known class, they still fail to address entirely novel categories.
DRIK: Distribution-Robust Inductive Kriging without Information Leakage
Yang, Chen, Zhao, Changhao, Wang, Chen, Fan, Jiansheng
Inductive kriging supports high-resolution spatio-temporal estimation with sparse sensor networks, but conventional training-evaluation setups often suffer from information leakage and poor out-of-distribution (OOD) generalization. We find that the common 2 2 spatio-temporal split allows test data to influence model selection through early stopping, obscuring the true OOD characteristics of inductive kriging. To address this issue, we propose a 3 3 partition that cleanly separates training, validation, and test sets, eliminating leakage and better reflecting real-world applications. Building on this redefined setting, we introduce DRIK, a Distribution-Robust Inductive Kriging approach designed with the intrinsic properties of inductive kriging in mind to explicitly enhance OOD generalization, employing a three-tier strategy at the node, edge, and subgraph levels. DRIK perturbs node coordinates to capture continuous spatial relationships, drops edges to reduce ambiguity in information flow and increase topological diversity, and adds pseudo-labeled subgraphs to strengthen domain generalization. Experiments on six diverse spatio-temporal datasets show that DRIK consistently outperforms existing methods, achieving up to 12.48% lower MAE while maintaining strong scalability. Sensors are widely used to monitor traffic flow (Kong et al., 2024), air quality (Y u et al., 2025), and solar energy production (Jebli et al., 2021), among other applications. However, their high deployment costs often limit sensor density and prevent comprehensive coverage of large areas (Liang et al., 2019; Seo et al., 2017). Inductive kriging provides a promising solution by estimating values at unsensed locations using data from existing sensors (Wu et al., 2021a; Zheng et al., 2023; Xu et al., 2025). Kriging models can generate high-resolution spatio-temporal estimates, improving accuracy while reducing the deployment and maintenance demands of large-scale sensor networks. The standard training and evaluation protocol for inductive kriging (Wu et al., 2021a) generally involves three steps, as shown in Figure 1 (a): (1) The complete spatio-temporal dataset X R This produces a 2 2 partition, with the final training and test sets drawn from diagonally opposite sections. A key limitation of this approach stems from the widespread use of early stopping during model training (Zheng et al., 2023).
Zero-shot Whole-Body Manipulation with a Large-Scale Soft Robotic Torso via Guided Reinforcement Learning
Johnson, Curtis C., Alessi, Carlo, Falotico, Egidio, Killpack, Marc D.
Whole-body manipulation is a powerful yet underexplored approach that enables robots to interact with large, heavy, or awkward objects using more than just their end-effectors. Soft robots, with their inherent passive compliance, are particularly well-suited for such contact-rich manipulation tasks, but their uncertainties in kinematics and dynamics pose significant challenges for simulation and control. In this work, we address this challenge with a simulation that can run up to 350x real time on a single thread in MuJoCo and provide a detailed analysis of the critical tradeoffs between speed and accuracy for this simulation. Using this framework, we demonstrate a successful zero-shot sim-to-real transfer of a learned whole-body manipulation policy, achieving an 88% success rate on the Baloo hardware platform. We show that guiding RL with a simple motion primitive is critical to this success where standard reward shaping methods struggled to produce a stable and successful policy for whole-body manipulation. Furthermore, our analysis reveals that the learned policy does not simply mimic the motion primitive. It exhibits beneficial reactive behavior, such as re-grasping and perturbation recovery. We analyze and contrast this learned policy against an open-loop baseline to show that the policy can also exhibit aggressive over-corrections under perturbation. To our knowledge, this is the first demonstration of forceful, six-DoF whole-body manipulation using two continuum soft arms on a large-scale platform (10 kg payloads), with zero-shot policy transfer.
PHASE: Physics-Integrated, Heterogeneity-Aware Surrogates for Scientific Simulations
Gao, Dawei, Wang, Dali, Gu, Zhuowei, Cao, Qinglei, Wang, Xiao, Thornton, Peter, Ricciuto, Dan, Feng, Yunhe
Large-scale numerical simulations underpin modern scientific discovery but remain constrained by prohibitive computational costs. AI surrogates offer acceleration, yet adoption in mission-critical settings is limited by concerns over physical plausibility, trustworthiness, and the fusion of heterogeneous data. We introduce PHASE, a modular deep-learning framework for physics-integrated, heterogeneity-aware surrogates in scientific simulations. PHASE combines data-type-aware encoders for heterogeneous inputs with multi-level physics-based constraints that promote consistency from local dynamics to global system behavior. Using only the first 20 simulation years, PHASE infers a near-equilibrium state that otherwise requires more than 1,200 years of integration, yielding an effective reduction in required integration length by at least 60 . The framework is enabled by a pipeline for fusing heterogeneous scientific data and demonstrates strong generalization to higher spatial resolutions with minimal fine-tuning. These results indicate that PHASE captures governing physical regularities rather than surface correlations, enabling practical, physically consistent acceleration of land-surface modeling and other complex scientific workflows. Numerical simulations, mainly grounded in domain knowledge and partial differential equations (PDEs), are fundamental pillars of modern scientific discovery, driving advances in fields from climate modeling to materials design (Hao et al., 2024; Koehler et al., 2024; Danabasoglu et al., 2020; Pathak et al., 2020; Reichstein et al., 2019).
Scaling LLM Test-Time Compute with Mobile NPU on Smartphones
Hao, Zixu, Wei, Jianyu, Wang, Tuowei, Huang, Minxing, Jiang, Huiqiang, Jiang, Shiqi, Cao, Ting, Ren, Ju
Deploying Large Language Models (LLMs) on mobile devices faces the challenge of insufficient performance in smaller models and excessive resource consumption in larger ones. This paper highlights that mobile Neural Processing Units (NPUs) have underutilized computational resources, particularly their matrix multiplication units, during typical LLM inference. To leverage this wasted compute capacity, we propose applying parallel test-time scaling techniques on mobile NPUs to enhance the performance of smaller LLMs. However, this approach confronts inherent NPU challenges, including inadequate hardware support for fine-grained quantization and low efficiency in general-purpose computations. To overcome these, we introduce two key techniques: a hardware-aware tile quantization scheme that aligns group quantization with NPU memory access patterns, and efficient LUT-based replacements for complex operations such as Softmax and dequantization. We design and implement an end-to-end inference system that leverages the NPU's compute capability to support test-time scaling on Qualcomm Snapdragon platforms. Experiments show our approach brings significant speedups: up to 19.0 for mixed-precision GEMM and 2.2 for Softmax. More importantly, we demonstrate that smaller models using test-time scaling can match or exceed the accuracy of larger models, achieving a new performance-cost Pareto frontier.
MELCOT: A Hybrid Learning Architecture with Marginal Preservation for Matrix-Valued Regression
Tran, Khang, Cao, Hieu, Pham, Thinh, Diep, Nghiem, Cao, Tri, Nguyen, Binh
Regression is essential across many domains but remains challenging in high-dimensional settings, where existing methods often lose spatial structure or demand heavy storage. In this work, we address the problem of matrix-valued regression, where each sample is naturally represented as a matrix. We propose MELCOT, a hybrid model that integrates a classical machine learning-based Marginal Estimation (ME) block with a deep learning-based Learnable-Cost Optimal Transport (LCOT) block. The ME block estimates data marginals to preserve spatial information, while the LCOT block learns complex global features. This design enables MELCOT to inherit the strengths of both classical and deep learning methods. Extensive experiments across diverse datasets and domains demonstrate that MELCOT consistently outperforms all baselines while remaining highly efficient.
GUARD: Toward a Compromise between Traditional Control and Learning for Safe Robot Systems
Gaus, Johannes A., Yoon, Junheon, Baek, Woo-Jeong, Choi, Seungwon, Park, Suhan, Park, Jaeheung
Abstract-- This paper presents the framework GUARD (Guided robot control via Uncertainty attribution and probAbilistic kernel optimization for Risk-aware Decision making) that combines traditional control with an uncertainty-aware perception technique using active learning with real-time capability for safe robot collision avoidance. By doing so, this manuscript addresses the central challenge in robotics of finding a reasonable compromise between traditional methods and learning algorithms to foster the development of safe, yet efficient and flexible applications. By unifying a reactive model predictive countouring control (RMPCC) with an Iterative Closest Point (ICP) algorithm that enables the attribution of uncertainty sources online using active learning with real-time capability via a probabilistic kernel optimization technique, GUARD inherently handles the existing ambiguity of the term safety that exists in robotics literature. Experimental studies indicate the high performance of GUARD, thereby highlighting the relevance and need to broaden its applicability in future. Developing safe and flexible robot applications is a central, yet open issue in robotics.