AITopics

Technology: Information Technology > Artificial Intelligence (0.40)

Neural Information Processing SystemsFeb-13-2026, 08:19:52 GMT

ReFIR: GroundingLargeRestorationModels withRetrievalAugmentation

Recently, diffusion models [10, 11] have emerged as a promising alternative, delivering noteworthy results in real-world image restoration [12, 13, 14].

artificial intelligence, machine learning, natural language, (20 more...)

Country: Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Neural Information Processing SystemsFeb-12-2026, 04:38:21 GMT

LearningRewardMachinesforPartially ObservableReinforcementLearning

RL agents learn policies from experience.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

Country:

North America > Canada > Ontario > Toronto (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Neural Information Processing SystemsFeb-12-2026, 04:38:08 GMT

532435c44bec236b471a47a88d63513d-AuthorFeedback.pdf

abstract level, lrm, observable rl, (16 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.31)

arXiv.org Artificial IntelligenceDec-12-2025

When Models Reason in Your Language: Controlling Thinking Language Comes at the Cost of Accuracy

Qi, Jirui, Chen, Shan, Xiong, Zidi, Fernández, Raquel, Bitterman, Danielle S., Bisazza, Arianna

Recent Large Reasoning Models (LRMs) with thinking traces have shown strong performance on English reasoning tasks. However, their ability to think in other languages is less studied. This capability is as important as answer accuracy for real world applications because users may find the reasoning trace useful for oversight only when it is expressed in their own language. We comprehensively evaluate two leading families of LRMs on our XReasoning benchmark and find that even the most advanced models often revert to English or produce fragmented reasoning in other languages, revealing a substantial gap in multilingual reasoning. Prompt based interventions that force models to reason in the users language improve readability and oversight but reduce answer accuracy, exposing an important trade off. We further show that targeted post training on just 100 examples mitigates this mismatch, though some accuracy loss remains. Our results highlight the limited multilingual reasoning capabilities of current LRMs and outline directions for future work. Code and data are available at https://github.com/Betswish/mCoT-XReasoning.

accuracy, large language model, machine learning, (17 more...)

doi: 10.18653/v1/2025.findings-emnlp.1103

2505.22888

Country:

Asia (0.93)
Europe > Bulgaria (0.28)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.88)
Information Technology > Artificial Intelligence > Cognitive Science (0.87)

Jia, Jinghan, Baracaldo, Nathalie, Liu, Sijia

Beyond SFT: Reinforcement Learning for Safer Large Reasoning Models with Better Reasoning Ability

arXiv.org Artificial IntelligenceDec-2-2025

Large reasoning models (LRMs) extend large language models by generating explicit chain-of-thought (CoT) reasoning, significantly improving mathematical and logical problem solving. However, this explicit reasoning process also introduces new safety risks, as unsafe behaviors often emerge within intermediate reasoning trajectories, even when final answers appear harmless. Existing safety alignment approaches primarily rely on supervised fine-tuning (SFT) over safety-oriented long CoT datasets. While intuitive, we find that SFT produces inconsistent safety improvements, degrades reasoning ability, and generalizes poorly across model families. These limitations suggest that purely supervised approaches are insufficient for robust safety alignment in LRMs. To address this, we investigate reinforcement learning (RL) as a complementary optimization framework for LRM safety training. Unlike SFT, RL directly optimizes model policies with reward feedback, enabling more adaptive and stable alignment. Extensive experiments across multiple model families and benchmarks show that RL achieves stronger and more consistent safety gains while maintaining reasoning competence. Further analysis of reflection dynamics and token-level entropy reveals that RL suppresses unsafe exploratory reasoning while preserving reflective depth, leading to safer and more reliable reasoning processes.

arxiv preprint arxiv, large language model, machine learning, (20 more...)

2512.01848

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
(2 more...)

arXiv.org Artificial IntelligenceDec-2-2025

Probing the "Psyche'' of Large Reasoning Models: Understanding Through a Human Lens

Chen, Yuxiang, Wu, Zuohan, Wang, Ziwei, Yu, Xiangning, Li, Xujia, Yang, Linyi, Yang, Mengyue, Wang, Jun, Chen, Lei

Large reasoning models (LRMs) have garnered significant attention from researchers owing to their exceptional capability in addressing complex tasks. Motivated by the observed human-like behaviors in their reasoning processes, this paper introduces a comprehensive taxonomy to characterize atomic reasoning steps and probe the ``psyche'' of LRM intelligence. Specifically, it comprises five groups and seventeen categories derived from human mental processes, thereby grounding the understanding of LRMs in an interdisciplinary perspective. The taxonomy is then applied for an in-depth understanding of current LRMs, resulting in a distinct labeled dataset that comprises 277,534 atomic reasoning steps. Using this resource, we analyze contemporary LRMs and distill several actionable takeaways for improving training and post-training of reasoning models. Notably, our analysis reveals that prevailing post-answer ``double-checks'' (self-monitoring evaluations) are largely superficial and rarely yield substantive revisions. Thus, incentivizing comprehensive multi-step reflection, rather than simple self-monitoring, may offer a more effective path forward. To complement the taxonomy, an automatic annotation framework, named CAPO, is proposed to leverage large language models (LLMs) for generating the taxonomy-based annotations. Experimental results demonstrate that CAPO achieves higher consistency with human experts compared to baselines, facilitating a scalable and comprehensive analysis of LRMs from a human cognitive perspective. Together, the taxonomy, CAPO, and the derived insights provide a principled, scalable path toward understanding and advancing LRM reasoning.

large language model, machine learning, natural language, (19 more...)

2512.00729

Country:

Asia > China (0.94)
Europe > United Kingdom > England (0.28)
North America > United States > California (0.28)

Genre: Research Report > New Finding (0.48)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

arXiv.org Artificial IntelligenceNov-26-2025

FlagEval Findings Report: A Preliminary Evaluation of Large Reasoning Models on Automatically Verifiable Textual and Visual Questions

Qin, Bowen, Yue, Chen, Yin, Fang, Wang, Hui, Yao, JG, Liu, Jiakang, Zheng, Jing-Shu, Chen, Miguel Hu, Xuan, Richeng, Meng, Shibei, Zhou, Shiqi, Dai, Teng, Ren, Tong-Shuai, Cui, Wei, Yang, Xi, Du, Xialin, Xu, Xiaojing, Sun, Xue, Li, Xuejing, Liu, Yaming, Liu, Yesheng, Liu, Ying, Lin, Yonghua, Zhao, Yu, Zhang, Yunduo, Luo, Yuwen, He, Zheqi, He, Zhiyuan, Wang, Zhongyuan

We conduct a moderate-scale contamination-free (to some extent) evaluation of current large reasoning models (LRMs) with some preliminary findings. We also release ROME, our evaluation benchmark for vision language models intended to test reasoning from visual clues. We attach links to the benchmark, evaluation data, and other updates on this website: https://flageval-baai.github.io/LRM-Eval/

large language model, machine learning, natural language, (18 more...)

2509.17177

Country:

Europe (1.00)
North America > United States (0.67)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology (1.00)
Health & Medicine (0.93)
Media > Television (0.67)
(2 more...)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Patel, Daivik, Patel, Shrenik

Reuse, Don't Recompute: Efficient Large Reasoning Model Inference via Memory Orchestration

arXiv.org Artificial IntelligenceNov-18-2025

Large reasoning models (LRMs) achieve strong accuracy through test-time scaling, generating longer chains of thought or sampling multiple solutions, but at steep costs in tokens and latency. We argue that memory is a core ingredient for efficient reasoning: when evidence already exists, models should think less by reusing structured memory instead of recomputing derivations. We present ENGRAM-R, an inference-time memory layer that integrates typed retrieval with compact fact card representations and explicit citation control. On the LoCoMo benchmark, ENGRAM-R reduces input tokens by 85% and reasoning tokens by 75% compared to full context while maintaining high accuracy. On a multi-hop slice of the LongMemEval benchmark, it achieves similar efficiency with substantial accuracy gains. These results show that memory is not only critical for long-horizon correctness but also a practical lever for efficient reasoning under tight compute, memory, and latency budgets.

large language model, machine learning, natural language, (18 more...)

2511.12987

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)

arXiv.org Artificial IntelligenceNov-18-2025

SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning

Zhou, Kaiwen, Zhao, Xuandong, Liu, Gaowen, Srinivasa, Jayanth, Feng, Aosong, Song, Dawn, Wang, Xin Eric

Large Reasoning Models (LRMs) introduce a new generation paradigm of explicitly reasoning before answering, leading to remarkable improvements in complex tasks. However, they pose great safety risks against harmful queries and adversarial attacks. While recent mainstream safety efforts on LRMs, supervised fine-tuning (SFT), improve safety performance, we find that SFT-aligned models struggle to generalize to unseen jailbreak prompts. After thorough investigation of LRMs' generation, we identify a safety aha moment that can activate safety reasoning and lead to a safe response. This aha moment typically appears in the `key sentence', which follows models' query understanding process and can indicate whether the model will proceed safely. Based on these insights, we propose SafeKey, including two complementary objectives to better activate the safety aha moment in the key sentence: (1) a Dual-Path Safety Head to enhance the safety signal in the model's internal representations before the key sentence, and (2) a Query-Mask Modeling objective to improve the models' attention on its query understanding, which has important safety hints. Experiments across multiple safety benchmarks demonstrate that our methods significantly improve safety generalization to a wide range of jailbreak attacks and out-of-distribution harmful prompts, lowering the average harmfulness rate by 9.6\%, while maintaining general abilities. Our analysis reveals how SafeKey enhances safety by reshaping internal attention and improving the quality of hidden representations.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

2505.16186

Genre: Research Report (1.00)

Industry:

Banking & Finance (0.94)
Information Technology > Security & Privacy (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)