Problem Solving
Long Chain-of-Thought Reasoning Across Languages
Barua, Josh, Eisape, Seun, Yin, Kayo, Suhr, Alane
While large reasoning models have shown remarkable ability to generate long chains-of-thought (CoTs) in English, we still lack understanding of how these long-form reasoning abilities transfer to the vast majority of the world's languages. In this work, we systematically investigate four key stages of model development--scaling, pretraining, post-training, and inference--to understand how long CoT capabilities extend beyond English. We compare two reasoning settings across nine non-English target languages: En-CoT, where models process target-language inputs, but reason in English; and Target-CoT, where models both process inputs and generate long CoTs in the target language. We find that scaling reasoning model size improves multilingual task performance in En-CoT, but Target-CoT performance lags behind. This gap widens for tasks requiring long, multi-step CoTs such as mathematical reasoning. Shifting to pretraining, we find that adding a specialized reasoning stage enhances En-CoT performance but degrades Target-CoT, whereas broad multilingual pretraining improves both modes simultaneously. Given the scarcity of high-quality reasoning traces in languages other than English, we explore synthetic data curation approaches for post-training. We demonstrate that fine-tuning on reasoning traces automatically translated from gold English traces outperforms fine-tuning on target-language traces distilled from large reasoning models. Finally, we report disparities in inference efficiency between languages and uncover language-specific failure modes in CoTs. We release models, datasets, and code to foster further research.
Play to Generalize: Learning to Reason Through Game Play
Xie, Yunfei, Ma, Yinsong, Lan, Shiyi, Yuille, Alan, Xiao, Junfei, Wei, Chen
Developing reasoning capabilities in multimodal large language models (MLLMs) remains challenging. Motivated by literature suggesting that gameplay promotes transferable reasoning skills, we propose a novel post-training method, Visual Game Learning (ViGaL), where MLLMs develop generalizable reasoning skills through playing arcade-like games. Specifically, we show that training a 7B-parameter MLLM via reinforcement learning (RL) on simple games like Snake significantly enhances the downstream performance on multimodal math benchmarks like MathVista, on multi-discipline questions like MMMU and on 3D spatial reasoning benchmarks like VSI-Bench, without seeing any worked solutions, equations, or diagrams during RL. Remarkably, our model outperforms specialist models post-trained on benchmark-oriented multimodal reasoning data, while preserving the model's performance on general visual benchmarks, a challenge where specialist models often fall short. Our findings suggest that multimodal reasoning can emerge from gameplay, pointing to a promising strategy of designing surrogate tasks for RL post-training.
Reports of the Workshops Held at the 2025 AAAI Conference on Artificial Intelligence
The Workshop Program of the Association for the Advancement of Artificial Intelligence's 39th Conference on Artificial Intelligence (AAAI-25) was held in Philadelphia, Pennsylvania, on February 25 - March 4, 2025. TIKA is envisioned to create an open knowledge resource and serve as a hub for research, education and training on knowledge representation and knowledge engineering. Over 50 AI researchers convened at the workshop over two days. The discussions focused on different aspects of creating an open knowledge resource including foundational knowledge, automated reasoning, knowledge curation, education on knowledge axiomatization, and evaluation of outcomes. The opening discussion confirmed that the idea of curated knowledge, that is, knowledge captured in an expressive formal language that can be explicitly examined and verified by humans, is compelling. It must, however, be situated in the modern context of AI. Such a resource should address the limitations of existing generative ...