AITopics

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.35)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Neural Information Processing SystemsJun-14-2026, 06:12:31 GMT

On Learning Verifiers and Implications to Chain-of-Thought Reasoning

Chain-of-Thought reasoning has emerged as a powerful approach for solving complex mathematical and logical problems. However, it can often veer off track through incorrect or unsubstantiated inferences. Formal mathematical reasoning, which can be checked with a formal verifier, is one approach to addressing this issue. However, currently LLMs are simply not good enough to solve complex problems in a formal way, and even just formalizing an informal problem statement can be challenging. Motivated by this fact, in this work we consider the problem of learning reliable verifiers for sequential reasoning, including natural language Chain-of-Thought reasoning. That is, given a problem statement and step-by-step solution in natural language, the aim of the verifier is to output [Yes] if the reasoning steps in the solution are all valid, and [No] otherwise. In this work we give a formal PAC-learning framework for studying this problem. We propose and analyze several natural verification goals, at different levels of strength, in this framework. We provide sample complexity upper-bounds for learning verifiers satisfying these goals, as well as lower-bound and impossibility results for learning other natural verification objectives without additional assumptions.

artificial intelligence, machine learning, natural language, (7 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.83)
Information Technology > Artificial Intelligence > Machine Learning (0.77)

Neural Information Processing SystemsApr-30-2026, 01:33:09 GMT

e0af79ad53a336b4c4b4f7e2a68eb609-Paper-Conference.pdf

Humans have a powerful and mysterious capacity to reason. Working through a set of mental steps enables us to make inferences we would not be capable of making directly even though we get no additional data from the world. Similarly, when large language models generate intermediate steps (a chain of thought) before answering a question, they often produce better answers than they would directly. We investigate why and how chain-of-thought reasoning is useful in language models, testing the hypothesis that reasoning is effective when training data consists of overlapping local clusters of variables that influence each other strongly. These training conditions enable the chaining of accurate local inferences to estimate relationships between variables that were not seen together in training.

large language model, machine learning, natural language, (19 more...)

Country: North America > United States > California > Santa Clara County (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Neural Information Processing SystemsFeb-17-2026, 13:58:13 GMT

e0af79ad53a336b4c4b4f7e2a68eb609-Paper-Conference.pdf

Humans have a powerful and mysterious capacity to reason.

large language model, machine learning, natural language, (19 more...)

Country:

North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > France (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Neural Information Processing SystemsFeb-11-2026, 06:26:21 GMT

46c2a9a6f2b2be68682013eb1173c801-Paper-Conference.pdf

make wood pickaxe requirement, player observation step 32 8, specific interaction requirement, (16 more...)

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre:

Research Report (0.87)
Workflow (0.68)

Industry:

Materials > Metals & Mining > Iron (1.00)
Materials > Metals & Mining > Coal (1.00)
Health & Medicine > Consumer Health (1.00)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.45)

Neural Information Processing SystemsDec-26-2025, 01:25:58 GMT

Deductive Verification of Chain-of-Thought Reasoning

While CoT allows models to produce more comprehensive reasoning processes, its emphasis on intermediate reasoning steps can inadvertently introduce hallucinations and accumulated errors, thereby limiting models' ability to solve complex reasoning tasks. Inspired by how humans engage in careful and meticulous deductive logical reasoning processes to solve tasks, we seek to enable language models to perform explicit and rigorous deductive reasoning, and also ensure the trustworthiness of their reasoning process through self-verification. However, directly verifying the validity of an entire deductive reasoning process is challenging, even with advanced models like ChatGPT. In light of this, we propose to decompose a reasoning verification process into a series of step-by-step subprocesses, each only receiving their necessary context and premises. To facilitate this procedure, we propose Natural Program, a natural language-based deductive reasoning format. Our approach enables models to generate precise reasoning steps where subsequent steps are more rigorously grounded on prior steps. It also empowers language models to carry out reasoning self-verification in a step-by-step manner. By integrating this verification process into each deductive reasoning stage, we significantly enhance the rigor and trustfulness of generated reasoning steps. Along this process, we also improve the answer correctness on complex reasoning tasks.

chain-of-thought reasoning, deductive verification, reasoning process, (7 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

arXiv.org Artificial IntelligenceDec-10-2025

MM-CoT:A Benchmark for Probing Visual Chain-of-Thought Reasoning in Multimodal Models

Zhang, Jusheng, Cai, Kaitong, Guo, Xiaoyang, Liu, Sidi, Lv, Qinhan, Chen, Ruiqi, Yang, Jing, Fan, Yijia, Sun, Xiaofei, Wang, Jian, Chen, Ziliang, Lin, Liang, Wang, Keze

The ability to perform Chain-of-Thought (CoT) reasoning marks a major milestone for multimodal models (MMs), enabling them to solve complex visual reasoning problems. Y et a critical question remains: is such reasoning genuinely grounded in visual evidence and logically coherent? Existing benchmarks emphasize generation but neglect verification, i.e., the capacity to assess whether a reasoning chain is both visually consistent and logically valid. T o fill this gap, we introduce MM-CoT, a diagnostic benchmark specifically designed to probe the visual grounding and logical coherence of CoT reasoning in MMs. Instead of generating free-form explanations, models must select the sole event chain that satisfies two orthogonal constraints: (i) visual consistency, ensuring all steps are anchored in observable evidence, and (ii) logical coherence, ensuring causal and commonsense validity. Adversarial distractors are engineered to violate one of these constraints, exposing distinct reasoning failures. W e evaluate leading vision-language models on MM-CoT and find that even the most advanced systems struggle, i.e., revealing a sharp discrepancy between generative fluency and true reasoning fidelity. MM-CoT shows low correlation with existing benchmarks, confirming that it measures a unique combination of visual grounding and logical reasoning. This benchmark provides a foundation for developing future models that reason not just plausibly, but faithfully and coherently within the visual world.

large language model, machine learning, natural language, (18 more...)

2512.08228

Country:

Europe > Austria (0.28)
Asia > Middle East > UAE (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

arXiv.org Artificial IntelligenceDec-1-2025

From Perception to Reasoning: Deep Thinking Empowers Multimodal Large Language Models

Zhu, Wenxin, Chen, Andong, Song, Yuchen, Chen, Kehai, Zhu, Conghui, Chen, Ziyan, Zhao, Tiejun

With the remarkable success of Multimodal Large Language Models (MLLMs) in perception tasks, enhancing their complex reasoning capabilities has emerged as a critical research focus. Existing models still suffer from challenges such as opaque reasoning paths and insufficient generalization ability. Chain-of-Thought (CoT) reasoning, which has demonstrated significant efficacy in language models by enhancing reasoning transparency and output interpretability, holds promise for improving model reasoning capabilities when extended to the multimodal domain. This paper provides a systematic review centered on "Multimodal Chain-of-Thought" (MCoT). First, it analyzes the background and theoretical motivations for its inception from the perspectives of technical evolution and task demands. Then, it introduces mainstream MCoT methods from three aspects: CoT paradigms, the post-training stage, and the inference stage, while also analyzing their underlying mechanisms. Furthermore, the paper summarizes existing evaluation benchmarks and metrics, and discusses the application scenarios of MCoT. Finally, it analyzes the challenges currently facing MCoT and provides an outlook on its future research directions.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

2511.12861

Country:

Asia > China (0.28)
Europe > Switzerland (0.27)

Genre:

Workflow (1.00)
Overview (1.00)
Research Report > New Finding (0.34)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Education (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

arXiv.org Artificial IntelligenceNov-14-2025

Answering Students' Questions on Course Forums Using Multiple Chain-of-Thought Reasoning and Finetuning RAG-Enabled LLM

Wang, Neo, Singh, Sonit

Abstract--The course forums are increasingly significant and play vital role in facilitating student discussions and answering their questions related to the course. It provides a platform for students to post their questions related to the content and admin issues related to the course. However, there are several challenges due to the increase in the number of students enrolled in the course. The primary challenge is that students' queries cannot be responded immediately and the instructors have to face lots of repetitive questions. T o mitigate these issues, we propose a question answering system based on large language model with retrieval augmented generation (RAG) method. This work focuses on designing a question answering system with open source Large Language Model (LLM) and fine-tuning it on the relevant course dataset. T o further improve the performance, we use a local knowledge base and applied RAG method to retrieve relevant documents relevant to students' queries, where the local knowledge base contains all the course content. T o mitigate the hallucination of LLMs, We also integrate it with multi chain-of-thought reasoning to overcome the challenge of hallucination in LLMs. The experimental results demonstrate that the fine-tuned LLM with RAG method has a strong performance on question answering task. In large university courses, online student forums (such as Moodle and Ed forum) play a crucial role in facilitating student discussions and resolving academic queries. In the beginning, it is possible for course staff to respond to queries in a timely manner. However, with a high volume of posts, many questions become repetitive, leading to delays in response times and an increased burden on instructors.

large language model, machine learning, natural language, (21 more...)

2511.09831

Country: North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Education > Educational Setting > Online (0.87)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Jain, Lovenya, Mousavi, Pooneh, Ravanelli, Mirco, Subakan, Cem

Investigating Faithfulness in Large Audio Language Models

arXiv.org Artificial IntelligenceOct-15-2025

Faithfulness measures whether chain-of-thought (CoT) representations accurately reflect a model's decision process and can be used as reliable explanations. Prior work has shown that CoTs from text-based LLMs are often unfaithful. This question has not been explored for large audio-language models (LALMs), where faithfulness is critical for safety-sensitive applications. Reasoning in LALMs is also more challenging, as models must first extract relevant clues from audio before reasoning over them. In this paper, we investigate the faithfulness of CoTs produced by several LALMs by applying targeted interventions, including paraphrasing, filler token injection, early answering, and introducing mistakes, on two challenging reasoning datasets: SAKURA and MMAR. After going through the aforementioned interventions across several datasets and tasks, our experiments suggest that, LALMs generally produce CoTs that appear to be faithful to their underlying decision processes.

artificial intelligence, large language model, natural language, (18 more...)

2509.22363

Country: North America > Canada (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.55)