Education
FT-NCFM: An Influence-Aware Data Distillation Framework for Efficient VLA Models
Chen, Kewei, Long, Yayu, Li, Shuai, Shang, Mingsheng
The powerful generalization of Vision-Language-Action (VLA) models is bottlenecked by their heavy reliance on massive, redundant, and unevenly valued datasets, hindering their widespread application. Existing model-centric optimization paths, such as model compression (which often leads to performance degradation) or policy distillation (whose products are model-dependent and lack generality), fail to fundamentally address this data-level challenge. To this end, this paper introduces FT-NCFM, a fundamentally different, data-centric generative data distillation framework. Our framework employs a self-contained Fact-Tracing (FT) engine that combines causal attribution with programmatic contrastive verification to assess the intrinsic value of samples. Guided by these assessments, an adversarial NCFM process synthesizes a model-agnostic, information-dense, and reusable data asset. Experimental results on several mainstream VLA benchmarks show that models trained on just 5% of our distilled coreset achieve a success rate of 85-90% compared with training on the full dataset, while reducing training time by over 80%. Our work demonstrates that intelligent data distillation is a highly promising new path for building efficient, high-performance VLA models.
FlipVQA-Miner: Cross-Page Visual Question-Answer Mining from Textbooks
Wong, Zhen Hao, Deng, Jingwen, Liang, Hao, He, Runming, Shen, Chengyu, Zhang, Wentao
The development of Large Language Models (LLMs) increasingly depends on high-quality supervised data, yet existing instruction-tuning and RL datasets remain costly to curate and often rely on synthetic samples that introduce hallucination and limited diversity. At the same time, textbooks and exercise materials contain abundant, high-quality human-authored Question-Answer(QA) content that remains underexploited due to the difficulty of transforming raw PDFs into AI-ready supervision. Although modern OCR and vision-language models can accurately parse document structure, their outputs lack the semantic alignment required for training. We propose an automated pipeline that extracts well-formed QA and visual-QA (VQA) pairs from educational documents by combining layout-aware OCR with LLM-based semantic parsing. Experiments across diverse document types show that the method produces accurate, aligned, and low-noise QA/VQA pairs. This approach enables scalable use of real-world educational content and provides a practical alternative to synthetic data generation for improving reasoning-oriented LLM training. All code and data-processing pipelines are open-sourced at https://github.com/OpenDCAI/DataFlow.
MagBotSim: Physics-Based Simulation and Reinforcement Learning Environments for Magnetic Robotics
Bergmann, Lara, Grothues, Cedric, Neumann, Klaus
Magnetic levitation is about to revolutionize in-machine material flow in industrial automation. Such systems are flexibly configurable and can include a large number of independently actuated shuttles (movers) that dynamically rebalance production capacity. Beyond their capabilities for dynamic transportation, these systems possess the inherent yet unexploited potential to perform manipulation. By merging the fields of transportation and manipulation into a coordinated swarm of magnetic robots (MagBots), we enable manufacturing systems to achieve significantly higher efficiency, adaptability, and compactness. To support the development of intelligent algorithms for magnetic levitation systems, we introduce MagBotSim (Magnetic Robotics Simulation): a physics-based simulation for magnetic levitation systems. By framing magnetic levitation systems as robot swarms and providing a dedicated simulation, this work lays the foundation for next generation manufacturing systems powered by Magnetic Robotics. MagBotSim's documentation, videos, experiments, and code are available at: https://ubi-coro.github.io/MagBotSim/
Labels Matter More Than Models: Quantifying the Benefit of Supervised Time Series Anomaly Detection
Zhong, Zhijie, Yu, Zhiwen, Yang, Kaixiang, Chen, C. L. Philip
Abstract--Time series anomaly detection (TSAD) is a critical data mining task often constrained by label scarcity. Consequently, current research predominantly focuses on Unsupervised Time-series Anomaly Detection (UT AD), relying on complex architectures to model normal data distributions. However, this approach often overlooks the significant performance gains available from limited anomaly labels achievable in practical scenarios. This paper challenges the premise that architectural complexity is the optimal path for TSAD. We conduct the first methodical comparison between supervised and unsupervised paradigms and introduce STAND, a streamlined supervised baseline. Extensive experiments on five public datasets demonstrate that: (1) Labels matter more than models: under a limited labeling budget, simple supervised models significantly outperform complex state-of-the-art unsupervised methods; (2) Supervision yields higher returns: the performance gain from minimal supervision far exceeds that from architectural innovations; and (3) Practicality: STAND exhibits superior prediction consistency and anomaly localization compared to unsupervised counterparts. These findings advocate for a data-centric shift in TSAD research, emphasizing label utilization over purely algorithmic complexity. The code is publicly available at https://github.com/EmorZz1G/ST IME series anomaly detection (TSAD) is a crucial and challenging task in time series data mining, with broad applications in fields such as industrial system monitoring, cybersecurity, and health surveillance [1, 2, 3, 4]. Due to the scarcity of anomaly samples and the high cost of labeling in TSAD, unsupervised time series anomaly detection (UT AD) methods have garnered significant attention in recent years [5, 3, 6, 7]. Typically, unsupervised methods assume that the training time series data primarily consists of normal samples.
CoSP: Reconfigurable Multi-State Metamaterial Inverse Design via Contrastive Pretrained Large Language Model
Yang, Shujie, Zhao, Xuzhe, Zhang, Yuqi, Tang, Yansong, Dong, Kaichen
Metamaterials, known for their ability to manipulate light at subwavelength scales, face significant design challenges due to their complex and sophisticated structures. Consequently, deep learning has emerged as a powerful tool to streamline their design process. Reconfigurable multi-state metamaterials (RMMs) with adjustable parameters can switch their optical characteristics between different states upon external stimulation, leading to numerous applications. However, existing deep learning-based inverse design methods fall short in considering reconfigurability with multi-state switching. To address this challenge, we propose CoSP, an intelligent inverse design method based on contrastive pretrained large language model (LLM). By performing contrastive pretraining on multi-state spectrum, a well-trained spectrum encoder capable of understanding the spectrum is obtained, and it subsequently interacts with a pretrained LLM. This approach allows the model to preserve its linguistic capabilities while also comprehending Maxwell's Equations, enabling it to describe material structures with target optical properties in natural language. Our experiments demonstrate that CoSP can design corresponding thin-film metamaterial structures for arbitrary multi-state, multi-band optical responses, showing great potentials in the intelligent design of RMMs for versatile applications.
SpectralTrain: A Universal Framework for Hyperspectral Image Classification
Zhou, Meihua, Yu, Liping, Cai, Jiawei, Fung, Wai Kin, Hu, Ruiguo, Zhao, Jiarui, Liu, Wenzhuo, Wan, Nan
Hyperspectral image (HSI) classification typically involves large-scale data and computationally intensive training, which limits the practical deployment of deep learning models in real-world remote sensing tasks. This study introduces SpectralTrain, a universal, architecture-agnostic training framework that enhances learning efficiency by integrating curriculum learning (CL) with principal component analysis (PCA)-based spectral downsampling. By gradually introducing spectral complexity while preserving essential information, SpectralTrain enables efficient learning of spectral -- spatial patterns at significantly reduced computational costs. The framework is independent of specific architectures, optimizers, or loss functions and is compatible with both classical and state-of-the-art (SOTA) models. Extensive experiments on three benchmark datasets -- Indian Pines, Salinas-A, and the newly introduced CloudPatch-7 -- demonstrate strong generalization across spatial scales, spectral characteristics, and application domains. The results indicate consistent reductions in training time by 2-7x speedups with small-to-moderate accuracy deltas depending on backbone. Its application to cloud classification further reveals potential in climate-related remote sensing, emphasizing training strategy optimization as an effective complement to architectural design in HSI models. Code is available at https://github.com/mh-zhou/SpectralTrain.
Artificial Intelligence and Accounting Research: A Framework and Agenda
Stratopoulos, Theophanis C., Wang, Victor Xiaoqi
Recent advances in artificial intelligence, particularly generative AI (GenAI) and large language models (LLMs), are fundamentally transforming accounting research, creating both opportunities and competitive threats for scholars. This paper proposes a framework that classifies AI-accounting research along two dimensions: research focus (accounting-centric versus AI-centric) and methodological approach (AI-based versus traditional methods). We apply this framework to papers from the IJAIS special issue and recent AI-accounting research published in leading accounting journals to map existing studies and identify research opportunities. Using this same framework, we analyze how accounting researchers can leverage their expertise through strategic positioning and collaboration, revealing where accounting scholars' strengths create the most value. We further examine how GenAI and LLMs transform the research process itself, comparing the capabilities of human researchers and AI agents across the entire research workflow. This analysis reveals that while GenAI democratizes certain research capabilities, it simultaneously intensifies competition by raising expectations for higher-order contributions where human judgment, creativity, and theoretical depth remain valuable. These shifts call for reforming doctoral education to cultivate comparative advantages while building AI fluency.
Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning
Xia, Peng, Zeng, Kaide, Liu, Jiaqi, Qin, Can, Wu, Fang, Zhou, Yiyang, Xiong, Caiming, Yao, Huaxiu
Large Language Model (LLM) Agents, often trained with Reinforcement Learning (RL), are constrained by a dependency on human-curated data, limiting scalability and tethering AI to human knowledge. Existing self-evolution frameworks offer an alternative but are typically restricted by the model's inherent capabilities and single-round interactions, hindering the development of complex curricula involving tool use or dynamic reasoning. We introduce Agent0, a fully autonomous framework that evolves high-performing agents without external data through multi-step co-evolution and seamless tool integration. Agent0 establishes a symbiotic competition between two agents initialized from the same base LLM: a curriculum agent that proposes increasingly challenging frontier tasks, and an executor agent that learns to solve them. We integrate external tools to enhance the executor's problem-solving capacity; this improvement, in turn, pressures the curriculum agent to construct more complex, tool-aware tasks. Through this iterative process, Agent0 establishes a self-reinforcing cycle that continuously produces high-quality curricula. Empirically, Agent0 substantially boosts reasoning capabilities, improving the Qwen3-8B-Base model by 18% on mathematical reasoning and 24% on general reasoning benchmarks. Code is available at https://github.com/aiming-lab/Agent0.
JudgeBoard: Benchmarking and Enhancing Small Language Models for Reasoning Evaluation
Bi, Zhenyu, Srivastava, Gaurav, Li, Yang, Lu, Meng, Roy, Swastik, Ziyadi, Morteza, Wang, Xuan
While small language models (SLMs) have shown promise on various reasoning tasks, their ability to judge the correctness of answers remains unclear compared to large language models (LLMs). Prior work on LLM-as-a-judge frameworks typically relies on comparing candidate answers against ground-truth labels or other candidate answers using predefined metrics like entailment. However, this approach is inherently indirect and difficult to fully automate, offering limited support for fine-grained and scalable evaluation of reasoning outputs. In this work, we propose JudgeBoard, a novel evaluation pipeline that directly queries models to assess the correctness of candidate answers without requiring extra answer comparisons. We focus on two core reasoning domains: mathematical reasoning and science/commonsense reasoning, and construct task-specific evaluation leaderboards using both accuracy-based ranking and an Elo-based rating system across five benchmark datasets, enabling consistent model comparison as judges rather than comparators. To improve judgment performance in lightweight models, we propose MAJ (Multi-Agent Judging), a novel multi-agent evaluation framework that leverages multiple interacting SLMs with distinct reasoning profiles to approximate LLM-level judgment accuracy through collaborative deliberation. Experimental results reveal a significant performance gap between SLMs and LLMs in isolated judging tasks. However, our MAJ framework substantially improves the reliability and consistency of SLMs. On the MATH dataset, MAJ using smaller-sized models as backbones performs comparatively well or even better than their larger-sized counterparts. Our findings highlight that multi-agent SLM systems can potentially match or exceed LLM performance in judgment tasks, with implications for scalable and efficient assessment.
What Really Counts? Examining Step and Token Level Attribution in Multilingual CoT Reasoning
Ferrao, Jeremias, Basar, Ezgi, Islam, Khondoker Ittehadul, Hassani, Mahrokh
This study investigates the attribution patterns underlying Chain-of-Thought (CoT) reasoning in multilingual LLMs. While prior works demonstrate the role of CoT prompting in improving task performance, there are concerns regarding the faithfulness and interpretability of the generated reasoning chains. To assess these properties across languages, we applied two complementary attribution methods--ContextCite for step-level attribution and Inseq for token-level attribution--to the Qwen2.5 1.5B-Instruct model using the MGSM benchmark. Our experimental results highlight key findings such as: (1) attribution scores excessively emphasize the final reasoning step, particularly in incorrect generations; (2) structured CoT prompting significantly improves accuracy primarily for high-resource Latin-script languages; and (3) controlled perturbations via negation and distractor sentences reduce model accuracy and attribution coherence. These findings highlight the limitations of CoT prompting, particularly in terms of multilingual robustness and interpretive transparency.