Problem Solving
Graph Learning
Xia, Feng, Peng, Ciyuan, Ren, Jing, Febrinanto, Falih Gozi, Luo, Renqiang, Saikrishna, Vidya, Yu, Shuo, Kong, Xiangjie
Graph learning has rapidly evolved into a critical subfield of machine learning and artificial intelligence (AI). Its development began with early graph-theoretic methods, gaining significant momentum with the advent of graph neural networks (GNNs). Over the past decade, progress in scalable architectures, dynamic graph modeling, multimodal learning, generative AI, explainable AI (XAI), and responsible AI has broadened the applicability of graph learning to various challenging environments. Graph learning is significant due to its ability to model complex, non-Euclidean relationships that traditional machine learning struggles to capture, thus better supporting real-world applications ranging from drug discovery and fraud detection to recommender systems and scientific reasoning. However, challenges like scalability, generalization, heterogeneity, interpretability, and trustworthiness must be addressed to unlock its full potential. This survey provides a comprehensive introduction to graph learning, focusing on key dimensions including scalable, temporal, multimodal, generative, explainable, and responsible graph learning. We review state-of-the-art techniques for efficiently handling large-scale graphs, capturing dynamic temporal dependencies, integrating heterogeneous data modalities, generating novel graph samples, and enhancing interpretability to foster trust and transparency. We also explore ethical considerations, such as privacy and fairness, to ensure responsible deployment of graph learning models. Additionally, we identify and discuss emerging topics, highlighting recent integration of graph learning and other AI paradigms and offering insights into future directions. This survey serves as a valuable resource for researchers and practitioners seeking to navigate the rapidly evolving landscape of graph learning.
Optimizing Anytime Reasoning via Budget Relative Policy Optimization
Qi, Penghui, Liu, Zichen, Pang, Tianyu, Du, Chao, Lee, Wee Sun, Lin, Min
Scaling test-time compute is crucial for enhancing the reasoning capabilities of large language models (LLMs). Existing approaches typically employ reinforcement learning (RL) to maximize a verifiable reward obtained at the end of reasoning traces. However, such methods optimize only the final performance under a large and fixed token budget, which hinders efficiency in both training and deployment. In this work, we present a novel framework, AnytimeReasoner, to optimize anytime reasoning performance, which aims to improve token efficiency and the flexibility of reasoning under varying token budget constraints. To achieve this, we truncate the complete thinking process to fit within sampled token budgets from a prior distribution, compelling the model to summarize the optimal answer for each truncated thinking for verification. This introduces verifiable dense rewards into the reasoning process, facilitating more effective credit assignment in RL optimization. We then optimize the thinking and summary policies in a decoupled manner to maximize the cumulative reward. Additionally, we introduce a novel variance reduction technique, Budget Relative Policy Optimization (BRPO), to enhance the robustness and efficiency of the learning process when reinforcing the thinking policy. Empirical results in mathematical reasoning tasks demonstrate that our method consistently outperforms GRPO across all thinking budgets under various prior distributions, enhancing both training and token efficiency.
TimeSearch-R: Adaptive Temporal Search for Long-Form Video Understanding via Self-Verification Reinforcement Learning
Pan, Junwen, Zhang, Qizhe, Zhang, Rui, Lu, Ming, Wan, Xin, Zhang, Yuan, Liu, Chang, She, Qi
Temporal search aims to identify a minimal set of relevant frames from tens of thousands based on a given query, serving as a foundation for accurate long-form video understanding. Existing works attempt to progressively narrow the search space. However, these approaches typically rely on a hand-crafted search process, lacking end-to-end optimization for learning optimal search strategies. In this paper, we propose TimeSearch-R, which reformulates temporal search as interleaved text-video thinking, seamlessly integrating searching video clips into the reasoning process through reinforcement learning (RL). However, applying RL training methods, such as Group Relative Policy Optimization (GRPO), to video reasoning can result in unsupervised intermediate search decisions. This leads to insufficient exploration of the video content and inconsistent logical reasoning. To address these issues, we introduce GRPO with Completeness Self-Verification (GRPO-CSV), which gathers searched video frames from the interleaved reasoning process and utilizes the same policy model to verify the adequacy of searched frames, thereby improving the completeness of video reasoning. Additionally, we construct datasets specifically designed for the SFT cold-start and RL training of GRPO-CSV, filtering out samples with weak temporal dependencies to enhance task difficulty and improve temporal search capabilities. Extensive experiments demonstrate that TimeSearch-R achieves significant improvements on temporal search benchmarks such as Haystack-LVBench and Haystack-Ego4D, as well as long-form video understanding benchmarks like VideoMME and MLVU. Notably, TimeSearch-R establishes a new state-of-the-art on LongVideoBench with 4.1% improvement over the base model Qwen2.5-VL and 2.0% over the advanced video reasoning model Video-R1. Our code is available at https://github.com/Time-Search/TimeSearch-R.
iFlyBot-VLM Technical Report
Nie, Xin, Cheng, Zhiyuan, Zhang, Yuan, Ji, Chao, Wu, Jiajia, Zhang, Yuhan, Pan, Jia
We introduce iFlyBot-VLM, a general-purpose Vision-Language Model (VLM) used to improve the domain of Embodied Intelligence. The central objective of iFlyBot-VLM is to bridge the cross-modal semantic gap between high-dimensional environmental perception and low-level robotic motion control. To this end, the model abstracts complex visual and spatial information into a body-agnostic and transferable Operational Language, thereby enabling seamless perception-action closed-loop coordination across diverse robotic platforms. The architecture of iFlyBot-VLM is systematically designed to realize four key functional capabilities essential for embodied intelligence: 1) Spatial Understanding and Metric Reasoning; 2) Interactive Target Grounding; 3) Action Abstraction and Control Parameter Generation; 4) Task Planning and Skill Sequencing. We envision iFlyBot-VLM as a scalable and generalizable foundation model for embodied AI, facilitating the progression from specialized task-oriented systems toward generalist, cognitively capable agents. We conducted evaluations on 10 current mainstream embodied intelligence-related VLM benchmark datasets, such as Blink and Where2Place, and achieved optimal performance while preserving the model's general capabilities. We will publicly release both the training data and model weights to foster further research and development in the field of Embodied Intelligence.
You Need Reasoning to Learn Reasoning: The Limitations of Label-Free RL in Weak Base Models
Roy, Shuvendu, Hajimirsadeghi, Hossein, Zhai, Mengyao, Samei, Golnoosh
Recent advances in large language models have demonstrated the promise of unsupervised reinforcement learning (RL) methods for enhancing reasoning capabilities without external supervision. However, the generalizability of these label-free RL approaches to smaller base models with limited reasoning capabilities remains unexplored. In this work, we systematically investigate the performance of label-free RL methods across different model sizes and reasoning strengths, from 0.5B to 7B parameters. Our empirical analysis reveals critical limitations: label-free RL is highly dependent on the base model's pre-existing reasoning capability, with performance often degrading below baseline levels for weaker models. We find that smaller models fail to generate sufficiently long or diverse chain-of-thought reasoning to enable effective self-reflection, and that training data difficulty plays a crucial role in determining success. To address these challenges, we propose a simple yet effective method for label-free RL that utilizes curriculum learning to progressively introduce harder problems during training and mask no-majority rollouts during training. Additionally, we introduce a data curation pipeline to generate samples with predefined difficulty. Our approach demonstrates consistent improvements across all model sizes and reasoning capabilities, providing a path toward more robust unsupervised RL that can bootstrap reasoning abilities in resource-constrained models. We make our code available at https://github.com/BorealisAI/CuMa
Learning to reason about rare diseases through retrieval-augmented agents
Kim, Ha Young, Li, Jun, Solana, Ana Beatriz, Pirkl, Carolin M., Wiestler, Benedikt, Schnabel, Julia A., Bercea, Cosmin I.
Rare diseases represent the long tail of medical imaging, where AI models often fail due to the scarcity of representative training data. In clinical workflows, radiologists frequently consult case reports and literature when confronted with unfamiliar findings. Following this line of reasoning, we introduce RADAR, Retrieval Augmented Diagnostic Reasoning Agents, an agentic system for rare disease detection in brain MRI. Our approach uses AI agents with access to external medical knowledge by embedding both case reports and literature using sentence transformers and indexing them with FAISS to enable efficient similarity search. The agent retrieves clinically relevant evidence to guide diagnostic decision making on unseen diseases, without the need of additional training. Designed as a model-agnostic reasoning module, RADAR can be seamlessly integrated with diverse large language models, consistently improving their rare pathology recognition and interpretability. On the NOVA dataset comprising 280 distinct rare diseases, RADAR achieves up to a 10.2% performance gain, with the strongest improvements observed for open source models such as DeepSeek. Beyond accuracy, the retrieved examples provide interpretable, literature grounded explanations, highlighting retrieval-augmented reasoning as a powerful paradigm for low-prevalence conditions in medical imaging.
Every Activation Boosted: Scaling General Reasoner to 1 Trillion Open Language Foundation
Ling Team, null, Li, Ang, Liu, Ben, Hu, Binbin, Li, Bing, Zeng, Bingwei, Ye, Borui, Tang, Caizhi, Tian, Changxin, Huang, Chao, Zhang, Chao, Qian, Chen, Ju, Chenchen, Li, Chenchen, Tang, Chengfu, Fu, Chilin, Ren, Chunshao, Wu, Chunwei, Zhang, Cong, Peng, Cunyin, Xu, Dafeng, Wang, Daixin, Zhang, Dalong, Jin, Dingnan, Zhu, Dingyuan, Hu, Dongke, Zhao, Fangzheng, Wu, Feifan, Zhu, Feng, Wang, Gangshan, Zhang, Haitao, Zhao, Hailin, Zhang, Hanxiao, Wang, Hanzi, Qian, Hao, Yu, Haoyi, Zhang, Heng, Zhang, Hongliang, Luan, Hongzhi, Dong, Huirong, Li, Huizhong, Li, Jia, Liu, Jia, Zhu, Jialong, Sha, Jian, Wei, Jianping, Yang, Jiaolong, Ma, Jieyue, Wu, Jiewei, Huang, Jinjing, Tian, Jingyun, Zhang, Jingyuan, Sun, Jinquan, Tu, Juanhui, Liu, Jun, Xu, Jun, Zhou, Jun, Ou, Junjie, Fang, Junpeng, Zhang, Kaihong, Hu, Kaiqin, Shi, Ke, Tang, Kun, Chen, Kunlong, Mei, Lanyin, Liang, Lei, Xu, Lei, Zhang, Libo, Ju, Lin, Yuan, Lin, Zhong, Ling, Ma, Lintao, Liu, Lu, Yu, Lu, Cai, Lun, Zhu, Meiqi, Li, Mengying, Chen, Min, Xue, Minghao, Cai, Minghong, Yin, Mingming, Jiang, Peijie, Zhao, Peilong, Liu, Pingping, Zhao, Qian, Cui, Qing, Huang, Qingxiang, Yang, Qingyuan, Yu, Quankun, Wei, Shaowei, Lian, Shijie, Zheng, Shoujian, Song, Shun, Zhang, Shungen, Zhang, Shuo, Li, Siyuan, Liu, Song, Guo, Ting, Zhao, Tong, Gu, Wanli, Wu, Weichang, Han, Weiguang, Fang, Wenjing, Wang, Wubin, Shu, Xiang, Shi, Xiao, Lan, Xiaoshun, Zhang, Xiaolu, Sun, Xiaqing, Zhao, Xin, Lu, Xingyu, Xu, Xiong, Wang, Xudong, Wang, Xudong, Yang, Xuemin, Yang, Yajie, Xiang, Yang, Li, Yanzhe, Zhang, Yi, Wang, Yilong, Li, Yingxue, Guo, Yongzhen, Fu, Yuzhuo, Wang, Yuanyuan, Yang, Yue, Yu, Yue, Deng, Yufeng, Zhang, Yun, Yu, Yunfei, Zhang, Yuqi, He, Yuxiao, Gui, Zengke, Huan, Zhaoxin, Wang, Zhaoyang, Zhu, Zhibo, Wang, Zhihao, Zhang, Zhiqiang, Wang, Zhoufei, Zeng, Zihang, Liu, Ziqi, Xuan, Zitao, Tang, Zuoli
We introduce Ling 2.0, a series reasoning-oriented language foundation built upon the principle that every activation boosts reasoning capability. Designed to scale from tens of billions to one trillion parameters under a unified Mixture-of-Experts (MoE) paradigm, Ling 2.0 emphasizes high sparsity, cross-scale consistency, and efficiency guided by empirical scaling laws. The series includes three non-thinking (instruct) models - Ling-mini-2.0, Ling-flash-2.0, and Ling-1T - ranging from 16B to 1T total parameters and achieving up to 7-fold active-compute efficiency compared with dense counterparts. Ling 2.0 integrates coordinated innovations across model architecture, pre-training, post-training, and infrastructure: a high-sparsity MoE with MTP for efficient reasoning, reasoning-oriented data and mid-training CoT activation, reinforcement-based fine-tuning (DFT, Evo-CoT), and full-scale FP8 training with fine-grained heterogeneous pipelines. At the trillion scale, Ling-1T establishes a new Pareto frontier of reasoning accuracy versus computational efficiency, demonstrating that sparse activation, when properly aligned with reasoning objectives, enables scalable and efficient intelligence. Collectively, Ling 2.0 provides a coherent, open, and efficient foundation for advancing future reasoning and thinking models, including the Ring series built upon the same base.
HugAgent: Benchmarking LLMs for Simulation of Individualized Human Reasoning
Li, Chance Jiajie, Mo, Zhenze, Tang, Yuhan, Qu, Ao, Wu, Jiayi, Zhao, Kaiya Ivy, Gan, Yulu, Fan, Jie, Yu, Jiangbo, Jiang, Hang, Liang, Paul Pu, Zhao, Jinhua, Pastor, Luis Alberto Alonso, Larson, Kent
Simulating human reasoning in open-ended tasks has long been a central aspiration in AI and cognitive science. While large language models now approximate human responses at scale, they remain tuned to population-level consensus, often erasing the individuality of reasoning styles and belief trajectories. To advance the vision of more human-like reasoning in machines, we introduce HugAgent (Human-Grounded Agent Benchmark), which rethinks human reasoning simulation along three dimensions: (i) from averaged to individualized reasoning, (ii) from behavioral mimicry to cognitive alignment, and (iii) from vignette-based to open-ended data. The benchmark evaluates whether a model can predict a specific person's behavioral responses and the underlying reasoning dynamics in out-of-distribution scenarios, given partial evidence of their prior views. HugAgent adopts a dual-track design: a human track that automates and scales the think-aloud method to collect ecologically valid human reasoning data, and a synthetic track for further scalability and systematic stress testing. This architecture enables low-cost, extensible expansion to new tasks and populations. Experiments with state-of-the-art language models reveal persistent adaptation gaps, positioning HugAgent as the first extensible benchmark for aligning machine reasoning with the individuality of human thought. The benchmark, along with its complete data collection pipeline and companion chatbot, is open-sourced as HugAgent (https://anonymous.4open.science/r/HugAgent) and TraceYourThinking (https://anonymous.4open.science/r/trace-your-thinking).
Amortized Latent Steering: Low-Cost Alternative to Test-Time Optimization
Egbuna, Nathan, Gaur, Saatvik, Dev, Sunishchal, Panda, Ashwinee, Chaudhary, Maheep
Test-time optimization remains impractical at scale due to prohibitive inference costs--techniques like iterative refinement and multi-step verification can require $10-100\times$ more compute per query than standard decoding. Latent space test-time optimization methods like LatentSeek offer a more direct approach by steering hidden representations, but still demand expensive per-query optimization loops with multiple backward passes. We propose Amortized Latent Steering (ALS), which collapses this iterative optimization into a single offline-computed vector applied at constant cost during inference. ALS computes the mean difference between hidden states from successful versus unsuccessful generations, then uses this direction to calibrate the model's hidden representations: when decoding drifts away from the success manifold, ALS nudges activations back toward it. Across GSM8K and MATH-500 benchmarks, ALS achieves $2-5\times$ speedup over iterative methods while matching or surpassing greedy Chain-of-Thought (CoT) and Self-Consistency baselines, yielding up to 101% improvement in efficiency--accuracy trade-off. These results show that much of latent optimization's benefit can be captured offline, making sophisticated reasoning techniques viable for production deployment. Code is available at https://github.com/negbuna/ALS.
How do Transformers Learn Implicit Reasoning?
Ye, Jiaran, Yao, Zijun, Huang, Zhidian, Pan, Liangming, Liu, Jinxin, Bai, Yushi, Xin, Amy, Liu, Weichuan, Che, Xiaoyin, Hou, Lei, Li, Juanzi
Recent work suggests that large language models (LLMs) can perform multi-hop reasoning implicitly -- producing correct answers without explicitly verbalizing intermediate steps -- but the underlying mechanisms remain poorly understood. In this paper, we study how such implicit reasoning emerges by training transformers from scratch in a controlled symbolic environment. Our analysis reveals a three-stage developmental trajectory: early memorization, followed by in-distribution generalization, and eventually cross-distribution generalization. We find that training with atomic triples is not necessary but accelerates learning, and that second-hop generalization relies on query-level exposure to specific compositional structures. To interpret these behaviors, we introduce two diagnostic tools: cross-query semantic patching, which identifies semantically reusable intermediate representations, and a cosine-based representational lens, which reveals that successful reasoning correlates with the cosine-base clustering in hidden space. This clustering phenomenon in turn provides a coherent explanation for the behavioral dynamics observed across training, linking representational structure to reasoning capability. These findings provide new insights into the interpretability of implicit multi-hop reasoning in LLMs, helping to clarify how complex reasoning processes unfold internally and offering pathways to enhance the transparency of such models.