AITopics

2506.04723

Country: North America > United States (0.67)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Education > Educational Setting (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

arXiv.org Artificial IntelligenceOct-27-2025

Towards Comprehensive Scene Understanding: Integrating First and Third-Person Views for LVLMs

Lee, Insu, Park, Wooje, Jang, Jaeyun, Noh, Minyoung, Shim, Kyuhong, Shim, Byonghyo

Large vision-language models (LVLMs) are increasingly deployed in interactive applications such as virtual and augmented reality, where a first-person (egocentric) view captured by head-mounted cameras serves as key input. While this view offers fine-grained cues about user attention and hand-object interactions, its narrow field of view and lack of global context often lead to failures on spatially or contextually demanding queries. To address this, we introduce a framework that augments egocentric inputs with third-person (exocentric) views, providing complementary information such as global scene layout and object visibility to LVLMs. We present E3VQA, the first benchmark for multi-view question answering with 4K high-quality question-answer pairs grounded in synchronized ego-exo image pairs. Additionally, we propose M3CoT, a training-free prompting technique that constructs a unified scene representation by integrating scene graphs from three complementary perspectives. M3CoT enables LVLMs to reason more effectively across views, yielding consistent performance gains (4.84% for GPT-4o and 5.94% for Gemini 2.0 Flash) over a recent CoT baseline. Our extensive evaluation reveals key strengths and limitations of LVLMs in multi-view reasoning and highlights the value of leveraging both egocentric and exocentric inputs. The dataset and source code are available at https://github.com/Leeinsu1/Towards-Comprehensive-Scene-Understanding.

large language model, machine learning, natural language, (18 more...)

2505.21955

Genre: Research Report > New Finding (0.45)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.93)
(2 more...)

arXiv.org Artificial IntelligenceOct-27-2025

Reinforced Latent Reasoning for LLM-based Recommendation

Zhang, Yang, Xu, Wenxin, Zhao, Xiaoyan, Wang, Wenjie, Feng, Fuli, He, Xiangnan, Chua, Tat-Seng

Large Language Models (LLMs) have demonstrated impressive reasoning capabilities in complex problem-solving tasks, sparking growing interest in their application to preference reasoning in recommendation systems. Existing methods typically rely on fine-tuning with explicit chain-of-thought (CoT) data. However, these methods face significant practical limitations due to (1) the difficulty of obtaining high-quality CoT data in recommendation and (2) the high inference latency caused by generating CoT reasoning. In this work, we explore an alternative approach that shifts from explicit CoT reasoning to compact, information-dense latent reasoning. This approach eliminates the need for explicit CoT generation and improves inference efficiency, as few latent tokens can effectively capture the entire reasoning process. Building on this idea, we propose \textit{\underline{R}einforced \underline{Latent} \underline{R}easoning for \underline{R}ecommendation} (LatentR$^3$), a novel end-to-end training framework that leverages reinforcement learning (RL) to optimize latent reasoning without relying on any CoT data. LatentR$^3$ adopts a two-stage training strategy: first, supervised fine-tuning to initialize the latent reasoning module, followed by pure RL training to encourage exploration through a rule-based reward design. Our RL implementation is based on a modified GRPO algorithm, which reduces computational overhead during training and introduces continuous reward signals for more efficient learning. Extensive experiments demonstrate that LatentR$^3$ enables effective latent reasoning without any direct supervision of the reasoning process, significantly improving performance when integrated with different LLM-based recommendation methods. Our codes are available at https://github.com/xuwenxinedu/R3.

large language model, machine learning, natural language, (18 more...)

2505.19092

Country:

Asia (0.67)
North America > United States (0.46)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Agiollo, Andrea, Omicini, Andrea

Integrating Machine Learning into Belief-Desire-Intention Agents: Current Advances and Open Challenges

Thanks to the remarkable human-like capabilities of machine learning (ML) models in perceptual and cognitive tasks, frameworks integrating ML within rational agent architectures are gaining traction. Yet, the landscape remains fragmented and incoherent, often focusing on embedding ML into generic agent containers while overlooking the expressive power of rational architectures--such as Belief-Desire-Intention (BDI) agents. This paper presents a fine-grained systematisation of existing approaches, using the BDI paradigm as a reference. Our analysis illustrates the fast-evolving literature on rational agents enhanced by ML, and identifies key research opportunities and open challenges for designing effective rational ML agents.

large language model, machine learning, reinforcement learning, (23 more...)

2510.20641

Country:

Asia (0.67)
North America > United States (0.46)
Europe > Austria (0.28)
North America > Mexico (0.28)

Genre:

Overview (1.00)
Research Report (0.82)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Belief Revision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
(5 more...)

Prover Agent: An Agent-Based Framework for Formal Mathematical Proofs

Baba, Kaito, Liu, Chaoran, Kurita, Shuhei, Sannai, Akiyoshi

We present Prover Agent, a novel AI agent for automated theorem proving that integrates large language models (LLMs) with a formal proof assistant, Lean. Prover Agent coordinates an informal reasoning LLM, a formal prover model, and feedback from Lean while also generating auxiliary lemmas. These auxiliary lemmas are not limited to subgoals in the formal proof but can also include special cases or potentially useful facts derived from the assumptions, which help in discovering a viable proof strategy. It achieves an 88.1% success rate on the MiniF2F benchmark, establishing a new state-of-the-art among methods using small language models (SLMs) with a much lower sample budget than previous approaches. We also present theoretical analyses and case studies that illustrate how these generated lemmas contribute to solving challenging problems. Our code is publicly available at: https://github.com/kAIto47802/Prover-Agent.

large language model, machine learning, natural language, (16 more...)

2506.19923

Country: Asia > Japan > Honshū (0.46)

Genre: Research Report > New Finding (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

HauntAttack: When Attack Follows Reasoning as a Shadow

Ma, Jingyuan, Li, Rui, Li, Zheng, Liu, Junfeng, Xia, Heming, Sha, Lei, Sui, Zhifang

Emerging Large Reasoning Models (LRMs) consistently excel in mathematical and reasoning tasks, showcasing remarkable capabilities. However, the enhancement of reasoning abilities and the exposure of internal reasoning processes introduce new safety vulnerabilities. A critical question arises: when reasoning becomes intertwined with harmfulness, will LRMs become more vulnerable to jailbreaks in reasoning mode? To investigate this, we introduce HauntAttack, a novel and general-purpose black-box adversarial attack framework that systematically embeds harmful instructions into reasoning questions. Specifically, we modify key reasoning conditions in existing questions with harmful instructions, thereby constructing a reasoning pathway that guides the model step by step toward unsafe outputs. We evaluate HauntAttack on 11 LRMs and observe an average attack success rate of 70\%, achieving up to 12 percentage points of absolute improvement over the strongest prior baseline. Our further analysis reveals that even advanced safety-aligned models remain highly susceptible to reasoning-based attacks, offering insights into the urgent challenge of balancing reasoning capability and safety in future model development.

information, large language model, machine learning, (22 more...)

2506.07031

Country:

Europe > Austria (0.28)
North America > United States (0.28)

Genre:

Research Report > New Finding (1.00)
Overview (0.93)

Industry:

Law Enforcement & Public Safety (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Government > Military (0.88)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(3 more...)

From Masks to Worlds: A Hitchhiker's Guide to World Models

Bai, Jinbin, Lei, Yu, Wu, Hecong, Zhu, Yuchen, Li, Shufan, Xin, Yi, Li, Xiangtai, Tao, Molei, Grover, Aditya, Yang, Ming-Hsuan

This is not a typical survey of world models; it is a guide for those who want to build worlds. We do not aim to catalog every paper that has ever mentioned a "world model". Instead, we follow one clear road: from early masked models that unified representation learning across modalities, to unified architectures that share a single paradigm, then to interactive generative models that close the action-perception loop, and finally to memory-augmented systems that sustain consistent worlds over time. We bypass loosely related branches to focus on the core: the generative heart, the interactive loop, and the memory system. We show that this is the most promising path towards true world models. The term world model has been used to describe many different ideas: learned environment simulators for reinforcement learning (Ha & Schmidhuber, 2018; Hafner et al., 2019), agents that integrate learned models with planning (Schrittwieser et al., 2020), and large language models that simulate entire societies (Park et al., 2023). Y et despite hundreds of related works, there is no clear consensus on how to actually build a true world model. In this paper, we take a stance: the path is much narrower than it appears. A true world model is not a monolithic entity but a system synthesized from three core subsystems: a generative heart that produces world states, an interactive loop that closes the action-perception cycle in real time, and a persistent memory system that sustains coherence over long horizons. The history of the field can be understood as an evolutionary journey from first mastering these components in isolation to now integrating them. Most works focus on optimizing narrow tasks and drift away from the generative, interactive, and persistent nature required for a true world model. To make this perspective concrete, we chart the historical evolution of world models as a sequence of five stages, shown in Figure 1. It begins with Stage I: Mask-based Models, which established a universal, token-based pretraining paradigm across modalities.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

2510.20668

Country: Asia (0.28)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games > Computer Games (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

What Defines Good Reasoning in LLMs? Dissecting Reasoning Steps with Multi-Aspect Evaluation

Do, Heejin, Hwang, Jaehui, Han, Dongyoon, Oh, Seong Joon, Yun, Sangdoo

Evaluating large language models (LLMs) on final-answer correctness is the dominant paradigm. This approach, however, provides a coarse signal for model improvement and overlooks the quality of the underlying reasoning process. We argue that a more granular evaluation of reasoning offers a more effective path to building robust models. We decompose reasoning quality into two dimensions: relevance and coherence. Relevance measures if a step is grounded in the problem; coherence measures if it follows logically from prior steps. To measure these aspects reliably, we introduce causal stepwise evaluation (CaSE). This method assesses each reasoning step using only its preceding context, which avoids hindsight bias. We validate CaSE against human judgments on our new expert-annotated benchmarks, MRa-GSM8K and MRa-MATH. More importantly, we show that curating training data with CaSE-evaluated relevance and coherence directly improves final task performance. Our work provides a scalable framework for analyzing, debugging, and improving LLM reasoning, demonstrating the practical value of moving beyond validity checks.

large language model, machine learning, natural language, (18 more...)

2510.20603

Country:

North America > Mexico (0.28)
Europe > Austria > Vienna (0.14)

Genre: Research Report > New Finding (0.68)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Maathuis, Clara, Cools, Kasper

Collateral Damage Assessment Model for AI System Target Engagement in Military Operations

Abstract--In an era where AI (Artificial Intelligence) systems play an increasing role in the battlefield, ensuring responsible targeting demands rigorous assessment of potential collateral effects. In this context, a novel collateral damage assessment model for target engagement of AI systems in military operations is introduced. Its layered structure captures the categories and architectural components of the AI systems to be engaged together with corresponding engaging vectors and contextual aspects. At the same time, spreading, severity, likelihood, and evaluation metrics are considered in order to provide a clear representation enhanced by transparent reasoning mechanisms. Further, the model is demonstrated and evaluated through instantiation which serves as a basis for further dedicated efforts that aim at building responsible and trustworthy intelligent systems for assessing the effects produced by engaging AI systems in military operations.

ai system, artificial intelligence, expert system, (13 more...)

2510.20337

Country:

Europe (0.68)
Asia > Middle East (0.28)

Genre: Research Report (0.50)

Industry: Government > Military > Cyberwarfare (0.47)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.69)

UI-Ins: Enhancing GUI Grounding with Multi-Perspective Instruction-as-Reasoning

Chen, Liangyu, Zhou, Hanzhang, Cai, Chenglin, Zhang, Jianan, Tong, Panrong, Kong, Quyu, Zhang, Xu, Liu, Chen, Liu, Yuqi, Wang, Wenxuan, Wang, Yue, Jin, Qin, Hoi, Steven

GUI grounding, which maps natural-language instructions to actionable UI elements, is a core capability of GUI agents. Prior works largely treats instructions as a static proxy for user intent, overlooking the impact of instruction diversity and quality on grounding performance. Through a careful investigation of existing grounding datasets, we find a 23.3% flaw rate in their instructions and show that inference-time exploitation of instruction diversity yields up to a substantial 76% relative performance improvement. In this paper, we introduce the Instruction-as-Reasoning paradigm, treating instructions as dynamic analytical pathways that offer distinct perspectives and enabling the model to select the most effective pathway during reasoning. To achieve this, we propose a two-stage training framework: supervised fine-tuning (SFT) on synthesized, diverse instructions to instill multi-perspective reasoning, followed by reinforcement learning (RL) to optimize pathway selection and composition. Our resulting models, UI-Ins-7B and UI-Ins-32B, achieve state-of-the-art results on five challenging grounding benchmarks and exhibit emergent reasoning, selectively composing and synthesizing novel instruction pathways at inference. In particular, UI-Ins-32B attains the best grounding accuracy, scoring 87.3% on UI-I2E-Bench, 57.0% on ScreenSpot-Pro, and 84.9% on MMBench-GUI L2. Furthermore, our model demonstrates strong agentic potential, achieving a 74.1% success rate on AndroidWorld using UI-Ins-7B as the executor. Our in-depth analysis reveals additional insights such as how reasoning can be formulated to enhance rather than hinder grounding performance, and how our method mitigates policy collapse in the SFT+RL framework. All code and model checkpoints will be publicly released in https://github.com/alibaba/UI-Ins.

large language model, machine learning, natural language, (18 more...)

2510.20286

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Graphics (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)