observation
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.05)
- Asia > Singapore (0.04)
- North America > Canada (0.04)
- (11 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)
- North America > United States (0.28)
- Europe > Poland > Lublin Province > Lublin (0.04)
- Europe > France (0.04)
- (4 more...)
- Asia > China > Shanghai > Shanghai (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (3 more...)
- Information Technology (0.92)
- Leisure & Entertainment > Games (0.67)
- Europe > Italy (0.04)
- North America > United States > Texas (0.04)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
- Overview (0.46)
- Research Report > New Finding (0.46)
- Leisure & Entertainment > Games > Chess (0.50)
- Leisure & Entertainment > Games > Backgammon (0.47)
- Leisure & Entertainment > Games > Go (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- (2 more...)
Large Reasoning Models in Agent Scenarios: Exploring the Necessity of Reasoning Capabilities
Zhou, Xueyang, Tie, Guiyao, Zhang, Guowen, Wang, Weidong, Zuo, Zhigang, Wu, Di, Chu, Duanfeng, Zhou, Pan, Sun, Lichao, Gong, Neil Zhenqiang
The rise of Large Reasoning Models (LRMs) signifies a paradigm shift toward advanced computational reasoning. Yet, this progress disrupts traditional agent frameworks, traditionally anchored by execution-oriented Large Language Models (LLMs). To explore this transformation, we propose the LaRMA framework, encompassing nine tasks across Tool Usage, Plan Design, and Problem Solving, assessed with three top LLMs (e.g., Claude3.5-sonnet) and five leading LRMs (e.g., DeepSeek-R1). Our findings address four research questions: LRMs surpass LLMs in reasoning-intensive tasks like Plan Design, leveraging iterative reflection for superior outcomes; LLMs excel in execution-driven tasks such as Tool Usage, prioritizing efficiency; hybrid LLM-LRM configurations, pairing LLMs as actors with LRMs as reflectors, optimize agent performance by blending execution speed with reasoning depth; and LRMs' enhanced reasoning incurs higher computational costs, prolonged processing, and behavioral challenges, including overthinking and fact-ignoring tendencies. This study fosters deeper inquiry into LRMs' balance of deep thinking and overthinking, laying a critical foundation for future agent design advancements.
- North America > Mexico > Gulf of Mexico (0.28)
- South America > Suriname > North Atlantic Ocean (0.14)
- North America > United States > Colorado (0.04)
- (10 more...)
- Health & Medicine (0.67)
- Leisure & Entertainment (0.67)
- Consumer Products & Services (0.46)
- (3 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Diffusion Models for Inverse Problems in the Exponential Family
Micheli, Alessandro, Monod, Mélodie, Bhatt, Samir
Diffusion models have emerged as powerful tools for solving inverse problems, yet prior work has primarily focused on observations with Gaussian measurement noise, restricting their use in real-world scenarios. This limitation persists due to the intractability of the likelihood score, which until now has only been approximated in the simpler case of Gaussian likelihoods. In this work, we extend diffusion models to handle inverse problems where the observations follow a distribution from the exponential family, such as a Poisson or a Binomial distribution. By leveraging the conjugacy properties of exponential family distributions, we introduce the evidence trick, a method that provides a tractable approximation to the likelihood score. In our experiments, we demonstrate that our methodology effectively performs Bayesian inference on spatially inhomogeneous Poisson processes with intensities as intricate as ImageNet images. Furthermore, we demonstrate the real-world impact of our methodology by showing that it performs competitively with the current state-of-the-art in predicting malaria prevalence estimates in Sub-Saharan Africa.
- Africa > Sub-Saharan Africa (0.25)
- North America > United States > New York > New York County > Manhattan (0.04)
- Africa > Nigeria (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.66)
Immersion for AI: Immersive Learning with Artificial Intelligence
This work reflects upon what Immersion can mean from the perspective of an Artificial Intelligence (AI). Applying the lens of immersive learning theory, it seeks to understand whether this new perspective supports ways for AI participation in cognitive ecologies. By treating AI as a participant rather than a tool, it explores what other participants (humans and other AIs) need to consider in environments where AI can meaningfully engage and contribute to the cognitive ecology, and what the implications are for designing such learning environments. Drawing from the three conceptual dimensions of immersion - System, Narrative, and Agency - this work reinterprets AIs in immersive learning contexts. It outlines practical implications for designing learning environments where AIs are surrounded by external digital services, can interpret a narrative of origins, changes, and structural developments in data, and dynamically respond, making operational and tactical decisions that shape human-AI collaboration. Finally, this work suggests how these insights might influence the future of AI training, proposing that immersive learning theory can inform the development of AIs capable of evolving beyond static models. This paper paves the way for understanding AI as an immersive learner and participant in evolving human-AI cognitive ecosystems.
- Europe > France > Île-de-France > Paris > Paris (0.04)
- South America (0.04)
- North America > United States > New York (0.04)
- (8 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.72)
- Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)
QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search
Lin, Zongyu, Tang, Yao, Yao, Xingcheng, Yin, Da, Hu, Ziniu, Sun, Yizhou, Chang, Kai-Wei
Language agents have become a promising solution to complex interactive tasks. One of the key ingredients to the success of language agents is the reward model on the trajectory of the agentic workflow, which provides valuable guidance during training or inference. However, due to the lack of annotations of intermediate interactions, most existing works use an outcome reward model to optimize policies across entire trajectories. This may lead to sub-optimal policies and hinder the overall performance. To address this, we propose QLASS (Q-guided Language Agent Stepwise Search), to automatically generate annotations by estimating Q-values in a stepwise manner for open language agents. By introducing a reasoning tree and performing process reward modeling, QLASS provides effective intermediate guidance for each step. With the stepwise guidance, we propose a Q-guided generation strategy to enable language agents to better adapt to long-term value, resulting in significant performance improvement during model inference on complex interactive agent tasks. Notably, even with almost half the annotated data, QLASS retains strong performance, demonstrating its efficiency in handling limited supervision. We also empirically demonstrate that QLASS can lead to more effective decision making through qualitative analysis. We will release our code and data.
- Asia > Thailand > Bangkok > Bangkok (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- North America > United States > California > Los Angeles County > Los Angeles (0.04)
- (4 more...)
Advancing Trustworthy AI for Sustainable Development: Recommendations for Standardising AI Incident Reporting
Agarwal, Avinash, Nene, Manisha J
The increasing use of AI technologies has led to increasing AI incidents, posing risks and causing harm to individuals, organizations, and society. This study recognizes and addresses the lack of standardized protocols for reliably and comprehensively gathering such incident data crucial for preventing future incidents and developing mitigating strategies. Specifically, this study analyses existing open-access AI-incident databases through a systematic methodology and identifies nine gaps in current AI incident reporting practices. Further, it proposes nine actionable recommendations to enhance standardization efforts to address these gaps. Ensuring the trustworthiness of enabling technologies such as AI is necessary for sustainable digital transformation. Our research promotes the development of standards to prevent future AI incidents and promote trustworthy AI, thus facilitating achieving the UN sustainable development goals. Through international cooperation, stakeholders can unlock the transformative potential of AI, enabling a sustainable and inclusive future for all.
- North America > United States > Virginia (0.04)
- North America > Canada (0.04)
- Asia > India > NCT > New Delhi (0.04)
- (3 more...)
- Law (1.00)
- Transportation (0.95)
- Government (0.94)
- (2 more...)
Enhancing Decision-Making for LLM Agents via Step-Level Q-Value Models
Zhai, Yuanzhao, Yang, Tingkai, Xu, Kele, Dawei, Feng, Yang, Cheng, Ding, Bo, Wang, Huaimin
Agents significantly enhance the capabilities of standalone Large Language Models (LLMs) by perceiving environments, making decisions, and executing actions. However, LLM agents still face challenges in tasks that require multiple decision-making steps. Estimating the value of actions in specific tasks is difficult when intermediate actions are neither appropriately rewarded nor penalized. In this paper, we propose leveraging a task-relevant Q-value model to guide action selection. Specifically, we first collect decision-making trajectories annotated with step-level Q values via Monte Carlo Tree Search (MCTS) and construct preference data. We then use another LLM to fit these preferences through step-level Direct Policy Optimization (DPO), which serves as the Q-value model. During inference, at each decision-making step, LLM agents select the action with the highest Q value before interacting with the environment. We apply our method to various open-source and API-based LLM agents, demonstrating that Q-value models significantly improve their performance. Notably, the performance of the agent built with Phi-3-mini-4k-instruct improved by 103% on WebShop and 75% on HotPotQA when enhanced with Q-value models, even surpassing GPT-4o-mini. Additionally, Q-value models offer several advantages, such as generalization to different LLM agents and seamless integration with existing prompting strategies.
- North America > United States (0.14)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- Asia > China > Hunan Province > Changsha (0.04)
- Leisure & Entertainment (1.00)
- Education (0.68)
- Media > Television (0.47)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)