AITopics

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsFeb-18-2026, 11:32:25 GMT

Self-Discover: Large Language Models Self-Compose Reasoning Structures Pei Zhou

Full BBH results are in Appendix C Table 3.

large language model, machine learning, natural language, (16 more...)

Country:

North America > United States > California (0.14)
North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)

Neural Information Processing SystemsFeb-18-2026, 05:20:37 GMT

Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models Ling Y ang 1, Zhaochen Y u

BoT has the potential to surpass Llama3-70B model.

large language model, machine learning, natural language, (18 more...)

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

arXiv.org Artificial IntelligenceNov-18-2025

You Don't Need Pre-built Graphs for RAG: Retrieval Augmented Generation with Adaptive Reasoning Structures

Chen, Shengyuan, Zhou, Chuang, Yuan, Zheng, Zhang, Qinggang, Cui, Zeyang, Chen, Hao, Xiao, Yilin, Cao, Jiannong, Huang, Xiao

Large language models (LLMs) often suffer from hallucination, generating factually incorrect statements when handling questions beyond their knowledge and perception. Retrieval-augmented generation (RAG) addresses this by retrieving query-relevant contexts from knowledge bases to support LLM reasoning. Recent advances leverage pre-constructed graphs to capture the relational connections among distributed documents, showing remarkable performance in complex tasks. However, existing Graph-based RAG (GraphRAG) methods rely on a costly process to transform the corpus into a graph, introducing overwhelming token cost and update latency. Moreover, real-world queries vary in type and complexity, requiring different logic structures for accurate reasoning. The pre-built graph may not align with these required structures, resulting in ineffective knowledge retrieval. To this end, we propose a $\textbf{Logic}$-aware $\textbf{R}etrieval$-$\textbf{A}$ugmented $\textbf{G}$eneration framework ($\textbf{LogicRAG}$) that dynamically extracts reasoning structures at inference time to guide adaptive retrieval without any pre-built graph. LogicRAG begins by decomposing the input query into a set of subproblems and constructing a directed acyclic graph (DAG) to model the logical dependencies among them. To support coherent multi-step reasoning, LogicRAG then linearizes the graph using topological sort, so that subproblems can be addressed in a logically consistent order. Besides, LogicRAG applies graph pruning to reduce redundant retrieval and uses context pruning to filter irrelevant context, significantly reducing the overall token cost. Extensive experiments demonstrate that LogicRAG achieves both superior performance and efficiency compared to state-of-the-art baselines.

large language model, machine learning, natural language, (18 more...)

2508.06105

Country:

Asia > Russia (0.47)
Europe > Russia (0.29)

Genre: Research Report (0.50)

Industry: Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsOct-10-2025, 19:38:52 GMT

Self-Discover: Large Language Models Self-Compose Reasoning Structures Pei Zhou

Full BBH results are in Appendix C Table 3.

arxiv preprint arxiv, language model, reasoning structure, (12 more...)

Country:

North America > United States > California (0.14)
North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)

Neural Information Processing SystemsOct-10-2025, 16:59:35 GMT

Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models Ling Y ang 1, Zhaochen Y u

BoT has the potential to surpass Llama3-70B model.

information, language model, reasoning, (14 more...)

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Shetty, Pranam, Upadhayaya, Abhisek, Shah, Parth Mitesh, Jagabathula, Srikanth, Nayak, Shilpi, Fee, Anna Joo

Advanced Financial Reasoning at Scale: A Comprehensive Evaluation of Large Language Models on CFA Level III

arXiv.org Artificial IntelligenceSep-23-2025

As financial institutions increasingly adopt Large Language Models (LLMs), rigorous domain-specific evaluation becomes critical for responsible deployment. For advanced financial reasoning, the Chartered Financial Analyst (CFA) Level III exam is widely considered the gold standard. In this paper, we present a comprehensive benchmark evaluating 23 state-of-the-art LLMs on mock CFA Level III exams, which require answering challenging multiple choice and essay questions. We evaluate reasoning and non-reasoning models, both proprietary and open source, using three prompting strategies: zero-shot, chain-of-thought, and self-discover. We find that frontier reasoning models, such as o4-mini, Gemini 2.5 Pro, and Claude Opus 4, using chain-of-thought prompting demonstrate strong capabilities, successfully passing the mock Level III exams. While there is little to separate the frontier models on multiple choice questions, only a few models excel at the complex essay questions, which require analysis, synthesis, and strategic thinking. These results demonstrate significant progress in the financial reasoning capabilities of LLMs, which previously [13] could clear Level I and Level II exams but struggled with the Level III exam, particularly the essay questions.

large language model, machine learning, natural language, (18 more...)

2507.02954

Genre: Research Report > New Finding (0.66)

Industry:

Education > Assessment & Standards > Student Performance (0.77)
Banking & Finance > Trading (0.68)
Banking & Finance > Financial Services (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Jung, Jeesu, Jung, Sangkeun

Reasoning Steps as Curriculum: Using Depth of Thought as a Difficulty Signal for Tuning LLMs

arXiv.org Artificial IntelligenceAug-27-2025

Curriculum learning for training LLMs requires a difficulty signal that aligns with reasoning while remaining scalable and interpretable. We propose a simple premise: tasks that demand deeper depth of thought for humans should also be harder for models. Accordingly, we define difficulty as depth of thought (DoT) and operationalize it by counting the discrete steps in a teacher model's reasoning trace (e.g., Chain-of-Thought). We then train with a shallow to deep curriculum ordered by this DoT and outline how to derive, validate, and schedule it at scale. Our position yields three testable hypotheses: (i) DoT correlates with conventional difficulty on reasoning benchmarks, (ii) DoT-ordered curricula outperform length- or judge-scored curricula under matched budgets, and (iii) the difficulty is robust across teacher models given light formatting controls. We propose an evaluation framework and discuss threats to validity (teacher style, length confounds) alongside practical mitigations. Taken together, we aim to move toward cognitively grounded, interpretable curricula for reasoning-centric training.

curriculum, large language model, natural language, (15 more...)

2508.18279

Genre: Research Report (0.40)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.66)

arXiv.org Artificial IntelligenceAug-20-2025

Explicit v.s. Implicit Memory: Exploring Multi-hop Complex Reasoning Over Personalized Information

Zhang, Zeyu, Zhang, Yang, Tan, Haoran, Li, Rui, Chen, Xu

In large language model-based agents, memory serves as a critical capability for achieving personalization by storing and utilizing users' information. Although some previous studies have adopted memory to implement user personalization, they typically focus on preference alignment and simple question-answering. However, in the real world, complex tasks often require multi-hop reasoning on a large amount of user information, which poses significant challenges for current memory approaches. To address this limitation, we propose the multi-hop personalized reasoning task to explore how different memory mechanisms perform in multi-hop reasoning over personalized information. We explicitly define this task and construct a dataset along with a unified evaluation framework. Then, we implement various explicit and implicit memory methods and conduct comprehensive experiments. We evaluate their performance on this task from multiple perspectives and analyze their strengths and weaknesses. Besides, we explore hybrid approaches that combine both paradigms and propose the HybridMem method to address their limitations. We demonstrate the effectiveness of our proposed model through extensive experiments. To benefit the research community, we release this project at https://github.com/nuster1128/MPR.

information, large language model, natural language, (15 more...)

2508.1325

Country:

North America > United States (0.48)
Asia (0.46)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology (0.54)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.66)

arXiv.org Artificial IntelligenceAug-13-2025

STELAR-VISION: Self-Topology-Aware Efficient Learning for Aligned Reasoning in Vision

Li, Chen, Zhang, Han, Yang, Zhantao, Chen, Fangyi, Wang, Zihan, Bolimera, Anudeepsekhar, Savvides, Marios

Vision-language models (VLMs) have made significant strides in reasoning, yet they often struggle with complex multimodal tasks and tend to generate overly verbose outputs. A key limitation is their reliance on chain-of-thought (CoT) reasoning, despite many tasks benefiting from alternative topologies like trees or graphs. To address this, we introduce STELAR-Vision, a training framework for topology-aware reasoning. At its core is TopoAug, a synthetic data pipeline that enriches training with diverse topological structures. Using supervised fine-tuning and reinforcement learning, we post-train Qwen2VL models with both accuracy and efficiency in mind. Additionally, we propose Frugal Learning, which reduces output length with minimal accuracy loss. On MATH-V and VLM-S2H, STELAR-Vision improves accuracy by 9.7% over its base model and surpasses the larger Qwen2VL-72B-Instruct by 7.3%. On five out-of-distribution benchmarks, it outperforms Phi-4-Multimodal-Instruct by up to 28.4% and LLaMA-3.2-11B-Vision-Instruct by up to 13.2%, demonstrating strong generalization. Compared to Chain-Only training, our approach achieves 4.3% higher overall accuracy on in-distribution datasets and consistently outperforms across all OOD benchmarks. We have released datasets, and code will be available.

large language model, machine learning, natural language, (18 more...)

2508.08688

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)