AITopics | thinker

Collaborating Authors

thinker

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Learning to Think from Multiple Thinkers

Joshi, Nirmit, Magen, Roey, Srebro, Nathan, Tsilivis, Nikolaos, Vardi, Gal

arXiv.org Machine LearningApr-28-2026

We study learning with Chain-of-Thought (CoT) supervision from multiple thinkers, all of whom provide correct but possibly systematically different solutions, e.g., step-by-step solutions to math problems written by different thinkers, or step-by-step execution traces of different programs solving the same problem. We consider classes that are computationally easy to learn using CoT supervision from a single thinker, but hard to learn with only end-result supervision, i.e., without CoT (Joshi et al. 2025). We establish that, under cryptographic assumptions, learning can be hard from CoT supervision provided by two or a few different thinkers, in passive data-collection settings. On the other hand, we provide a generic computationally efficient active learning algorithm that learns with a small amount of CoT data per thinker that is completely independent of the target accuracy $\varepsilon$, a moderate number of thinkers that scales as $\log \frac{1}{\varepsilon}\log \log \frac{1}{\varepsilon}$, and sufficient passive end-result data that scales as $\frac{1}{\varepsilon}\cdot poly\log\frac{1}{\varepsilon}$.

large language model, machine learning, thinker, (19 more...)

arXiv.org Machine Learning

2604.24737

Country:

North America > United States (0.45)
Africa (0.27)

Genre:

Workflow (1.00)
Research Report > New Finding (0.45)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.93)

Add feedback

Thinker: Learning to Plan and Act

Neural Information Processing SystemsDec-24-2025, 23:56:42 GMT

We propose the Thinker algorithm, a novel approach that enables reinforcement learning agents to autonomously interact with and utilize a learned world model. The Thinker algorithm wraps the environment with a world model and introduces new actions designed for interacting with the world model. These model-interaction actions enable agents to perform planning by proposing alternative plans to the world model before selecting a final action to execute in the environment. This approach eliminates the need for handcrafted planning algorithms by enabling the agent to learn how to plan autonomously and allows for easy interpretation of the agent's plan with visualization. We demonstrate the algorithm's effectiveness through experimental results in the game of Sokoban and the Atari 2600 benchmark, where the Thinker algorithm achieves state-of-the-art performance and competitive results, respectively. Visualizations of agents trained with the Thinker algorithm demonstrate that they have learned to plan effectively with the world model to select better actions. Thinker is the first work showing that an RL agent can learn to plan with a learned world model in complex environments.

agent, thinker, world model, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.98)

Add feedback

Decouple to Generalize: Context-First Self-Evolving Learning for Data-Scarce Vision-Language Reasoning

Li, Tingyu, Sun, Zheng, Wei, Jingxuan, Li, Siyuan, He, Conghui, Wu, Lijun, Tan, Cheng

arXiv.org Artificial IntelligenceDec-9-2025

Recent vision-language models (VLMs) achieve remarkable reasoning through reinforcement learning (RL), which provides a feasible solution for realizing continuous self-evolving large vision-language models (LVLMs) in the era of experience. However, RL for VLMs requires abundant high-quality multimodal data, especially challenging in specialized domains like chemistry, earth sciences, and multimodal mathematics. Existing strategies such as synthetic data and self-rewarding mechanisms suffer from limited distributions and alignment difficulties, ultimately causing reward hacking: models exploit high-reward patterns, collapsing policy entropy and destabilizing training. We propose DoGe (Decouple to Generalize), a dual-decoupling framework that guides models to first learn from context rather than problem solving by refocusing on the problem context scenarios overlooked by synthetic data methods. By decoupling learning process into dual components (Thinker and Solver), we reasonably quantify the reward signals of this process and propose a two-stage RL post-training approach from freely exploring context to practically solving tasks. Second, to increase the diversity of training data, DoGe constructs an evolving curriculum learning pipeline: an expanded native domain knowledge corpus and an iteratively evolving seed problems pool. Experiments show that our method consistently outperforms the baseline across various benchmarks, providing a scalable pathway for realizing self-evolving LVLMs.

arxiv preprint arxiv, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2512.06835

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Thinker: Training LLMs in Hierarchical Thinking for Deep Search via Multi-Turn Interaction

Xu, Jun, Du, Xinkai, Ao, Yu, Zhao, Peilong, Li, Yang, Zhong, Ling, Yuan, Lin, Bo, Zhongpu, Wang, Xiaorui, Sun, Mengshu, Gui, Zhengke, Zhang, Dalong, Wang, Zhaoyang, Wang, Qiwei, Hou, Yangyang, Yin, Zhiying, Wang, Haofen, Chen, Huajun, Liang, Lei, Zhou, Jun

arXiv.org Artificial IntelligenceNov-17-2025

Efficient retrieval of external knowledge bases and web pages is crucial for enhancing the reasoning abilities of LLMs. Previous works on training LLMs to leverage external retrievers for solving complex problems have predominantly employed end-to-end reinforcement learning. However, these approaches neglect supervision over the reasoning process, making it difficult to guarantee logical coherence and rigor. To address these limitations, we propose Thinker, a hierarchical thinking model for deep search through multi-turn interaction, making the reasoning process supervisable and verifiable. It decomposes complex problems into independently solvable sub-problems, each dually represented in both natural language and an equivalent logical function to support knowledge base and web searches. Concurrently, dependencies between sub-problems are passed as parameters via these logical functions, enhancing the logical coherence of the problem-solving process. To avoid unnecessary external searches, we perform knowledge boundary determination to check if a sub-problem is within the LLM's intrinsic knowledge, allowing it to answer directly. Experimental results indicate that with as few as several hundred training samples, the performance of Thinker is competitive with established baselines. Furthermore, when scaled to the full training set, Thinker significantly outperforms these methods across various datasets and model sizes. The source code is available at https://github.com/OpenSPG/KAG-Thinker.

computational linguistic, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2511.07943

Country:

Europe (1.00)
North America > United States (0.67)
Asia > Middle East > UAE (0.46)

Genre:

Research Report (0.81)
Workflow (0.68)

Industry:

Leisure & Entertainment (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Media > Film (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
(2 more...)

Add feedback

VocalNet-M2: Advancing Low-Latency Spoken Language Modeling via Integrated Multi-Codebook Tokenization and Multi-Token Prediction

Wang, Yuhao, Cheng, Ziyang, Liu, Heyang, Wu, Ronghua, Gu, Qunshan, Wang, Yanfeng, Wang, Yu

arXiv.org Artificial IntelligenceNov-14-2025

Current end-to-end spoken language models (SLMs) have made notable progress, yet they still encounter considerable response latency. This delay primarily arises from the autoregressive generation of speech tokens and the reliance on complex flow-matching models for speech synthesis. To overcome this, we introduce VocalNet-M2, a novel low-latency SLM that integrates a multi-codebook tokenizer and a multi-token prediction (MTP) strategy. Our model directly generates multi-codebook speech tokens, thus eliminating the need for a latency-inducing flow-matching model. Furthermore, our MTP strategy enhances generation efficiency and improves overall performance. Extensive experiments demonstrate that VocalNet-M2 achieves a substantial reduction in first chunk latency (from approximately 725ms to 350ms) while maintaining competitive performance across mainstream SLMs. This work also provides a comprehensive comparison of single-codebook and multi-codebook strategies, offering valuable insights for developing efficient and high-performance SLMs for real-time interactive applications.

arxiv preprint arxiv, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2511.10232

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

MDD-Thinker: Towards Large Reasoning Models for Major Depressive Disorder Diagnosis

Sha, Yuyang, Pan, Hongxin, Luo, Gang, Shi, Caijuan, Wang, Jing, Li, Kefeng

arXiv.org Artificial IntelligenceSep-30-2025

Background Major depressive disorder (MDD) is a leading cause of global disability, yet current diagnostic approaches often rely on subjective assessments and lack the ability to integrate multimodal clinical information. Large language models (LLMs) hold promise for enhancing diagnostic accuracy through advanced reasoning but face challenges in interpretability, hallucination, and reliance on synthetic data. Methods We developed MDD-Thinker, an LLM-based diagnostic framework that integrates supervised fine-tuning (SFT) with reinforcement learning (RL) to strengthen reasoning ability and interpretability. Using the UK Biobank dataset, we generated 40,000 reasoning samples, supplemented with 10,000 samples from publicly available mental health datasets. The model was fine-tuned on these reasoning corpora, and its diagnostic and reasoning performance was evaluated against machine learning, deep learning, and state-of-the-art LLM baselines. Findings MDD-Thinker achieved an accuracy of 0.8268 and F1-score of 0.8081, significantly outperforming traditional baselines such as SVM and MLP, as well as general-purpose LLMs. Incorporating both SFT and RL yielded the greatest improvements, with relative gains of 29.0% in accuracy, 38.1% in F1-score, and 34.8% in AUC. Moreover, the model demonstrated comparable reasoning performance compared to much larger LLMs, while maintaining computational efficiency. Interpretation This study presents the first reasoning-enhanced LLM framework for MDD diagnosis trained on large-scale real-world clinical data. By integrating SFT and RL, MDD-Thinker balances accuracy, interpretability, and efficiency, offering a scalable approach for intelligent psychiatric diagnostics. These findings suggest that reasoning-oriented LLMs can provide clinically reliable support for MDD detection and may inform broader applications in mental health care.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2509.24217

Country: Asia > China (0.68)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ChatGPT produces more "lazy" thinkers: Evidence of cognitive engagement decline

Georgiou, Georgios P.

arXiv.org Artificial IntelligenceJul-2-2025

Despite the increasing use of large language models (LLMs) in education, concerns have emerged about their potential to reduce deep thinking and active learning. This study investigates the impact of generative artificial intelligence (AI) tools, specifically ChatGPT, on the cognitive engagement of students during academic writing tasks. The study employed an experimental design with participants randomly assigned to either an AI-assisted (ChatGPT) or a non-assisted (control) condition. Participants completed a structured argumentative writing task followed by a cognitive engagement scale (CES), the CES-AI, developed to assess mental effort, attention, deep processing, and strategic thinking. The results revealed significantly lower cognitive engagement scores in the ChatGPT group compared to the control group. These findings suggest that AI assistance may lead to cognitive offloading. The study contributes to the growing body of literature on the psychological implications of AI in education and raises important questions about the integration of such tools into academic practice. It calls for pedagogical strategies that promote active, reflective engagement with AI-generated content to avoid compromising self-regulated learning and deep cognitive involvement of students.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2507.00181

Country:

Europe > Middle East > Cyprus > Nicosia > Nicosia (0.05)
Europe > Switzerland (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Education > Educational Setting (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.50)

Add feedback

Group Think: Multiple Concurrent Reasoning Agents Collaborating at Token Level Granularity

Hsu, Chan-Jan, Buffelli, Davide, McGowan, Jamie, Liao, Feng-Ting, Chen, Yi-Chang, Vakili, Sattar, Shiu, Da-shan

arXiv.org Artificial IntelligenceMay-19-2025

Recent advances in large language models (LLMs) have demonstrated the power of reasoning through self-generated chains of thought. Multiple reasoning agents can collaborate to raise joint reasoning quality above individual outcomes. However, such agents typically interact in a turn-based manner, trading increased latency for improved quality. In this paper, we propose Group Think--a single LLM that acts as multiple concurrent reasoning agents, or thinkers. With shared visibility into each other's partial generation progress, Group Think introduces a new concurrent-reasoning paradigm in which multiple reasoning trajectories adapt dynamically to one another at the token level. For example, a reasoning thread may shift its generation mid-sentence upon detecting that another thread is better positioned to continue. This fine-grained, token-level collaboration enables Group Think to reduce redundant reasoning and improve quality while achieving significantly lower latency. Moreover, its concurrent nature allows for efficient utilization of idle computational resources, making it especially suitable for edge inference, where very small batch size often underutilizes local~GPUs. We give a simple and generalizable modification that enables any existing LLM to perform Group Think on a local GPU. We also present an evaluation strategy to benchmark reasoning latency and empirically demonstrate latency improvements using open-source LLMs that were not explicitly trained for Group Think. We hope this work paves the way for future LLMs to exhibit more sophisticated and more efficient collaborative behavior for higher quality generation.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2505.11107

Country:

North America > United States (0.04)
Europe (0.04)
Asia > Thailand > Bangkok > Bangkok (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Qwen2.5-Omni Technical Report

Xu, Jin, Guo, Zhifang, He, Jinzheng, Hu, Hangrui, He, Ting, Bai, Shuai, Chen, Keqin, Wang, Jialin, Fan, Yang, Dang, Kai, Zhang, Bin, Wang, Xiong, Chu, Yunfei, Lin, Junyang

arXiv.org Artificial IntelligenceMar-26-2025

In this report, we present Qwen2.5-Omni, an end-to-end multimodal model designed to perceive diverse modalities, including text, images, audio, and video, while simultaneously generating text and natural speech responses in a streaming manner. To enable the streaming of multimodal information inputs, both audio and visual encoders utilize a block-wise processing approach. To synchronize the timestamps of video inputs with audio, we organize the audio and video sequentially in an interleaved manner and propose a novel position embedding approach, named TMRoPE(Time-aligned Multimodal RoPE). To concurrently generate text and speech while avoiding interference between the two modalities, we propose \textbf{Thinker-Talker} architecture. In this framework, Thinker functions as a large language model tasked with text generation, while Talker is a dual-track autoregressive model that directly utilizes the hidden representations from the Thinker to produce audio tokens as output. Both the Thinker and Talker models are designed to be trained and inferred in an end-to-end manner. For decoding audio tokens in a streaming manner, we introduce a sliding-window DiT that restricts the receptive field, aiming to reduce the initial package delay. Qwen2.5-Omni is comparable with the similarly sized Qwen2.5-VL and outperforms Qwen2-Audio. Furthermore, Qwen2.5-Omni achieves state-of-the-art performance on multimodal benchmarks like Omni-Bench. Notably, Qwen2.5-Omni's performance in end-to-end speech instruction following is comparable to its capabilities with text inputs, as evidenced by benchmarks such as MMLU and GSM8K. As for speech generation, Qwen2.5-Omni's streaming Talker outperforms most existing streaming and non-streaming alternatives in robustness and naturalness.

large language model, machine learning, qwen2, (18 more...)

arXiv.org Artificial Intelligence

2503.20215

Country: