AITopics

Agent modeling is a critical component in developing effective policies within multi-agent systems, as it enables agents to form beliefs about the behaviors, intentions, and competencies of others. Many existing approaches assume access to other agents' episodic trajectories, a condition often unrealistic in real-world applications. Consequently, a practical agent modeling approach must learn a robust representation of the policies of the other agents based only on the local trajectory of the controlled agent. In this paper, we propose \texttt{TransAM}, a novel transformer-based agent modeling approach to encode local trajectories into an embedding space that effectively captures the policies of other agents. We evaluate the performance of the proposed method in cooperative, competitive, and mixed multi-agent environments. Extensive experimental results demonstrate that our approach generates strong policy representations, improves agent modeling, and leads to higher episodic returns.

artificial intelligence, deep learning, machine learning, (13 more...)

2508.02826

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Clinically Grounded Agent-based Report Evaluation: An Interpretable Metric for Radiology Report Generation

Dua, Radhika, Joon, Young, Kwon, null, Dogra, Siddhant, Freedman, Daniel, Ruan, Diana, Nashawaty, Motaz, Rigau, Danielle, Alber, Daniel Alexander, Zhang, Kang, Cho, Kyunghyun, Oermann, Eric Karl

Radiological imaging is central to diagnosis, treatment planning, and clinical decision-making. Vision-language foundation models have spurred interest in automated radiology report generation (RRG), but safe deployment requires reliable clinical evaluation of generated reports. Existing metrics often rely on surface-level similarity or behave as black boxes, lacking interpretability. We introduce ICARE (Interpretable and Clinically-grounded Agent-based Report Evaluation), an interpretable evaluation framework leveraging large language model agents and dynamic multiple-choice question answering (MCQA). Two agents, each with either the ground-truth or generated report, generate clinically meaningful questions and quiz each other. Agreement on answers captures preservation and consistency of findings, serving as interpretable proxies for clinical precision and recall. By linking scores to question-answer pairs, ICARE enables transparent, and interpretable assessment. Clinician studies show ICARE aligns significantly more with expert judgment than prior metrics. Perturbation analyses confirm sensitivity to clinical content and reproducibility, while model comparisons reveal interpretable error patterns.

large language model, machine learning, question answering, (17 more...)

2508.02808

Country: North America > United States (0.14)

Genre:

Research Report > Experimental Study (0.67)
Research Report > New Finding (0.46)

Industry:

Health & Medicine > Nuclear Medicine (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.71)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

The Architecture of Trust: A Framework for AI-Augmented Real Estate Valuation in the Era of Structured Data

Teikari, Petteri, Jarrell, Mike, Azh, Maryam, Pesola, Harri

The Uniform Appraisal Dataset (UAD) 3.6's mandatory 2026 implementation transforms residential property valuation from narrative reporting to structured, machine-readable formats. This paper provides the first comprehensive analysis of this regulatory shift alongside concurrent AI advances in computer vision, natural language processing, and autonomous systems. We develop a three-layer framework for AI-augmented valuation addressing technical implementation and institutional trust requirements. Our analysis reveals how regulatory standardization converging with AI capabilities enables fundamental market restructuring with profound implications for professional practice, efficiency, and systemic risk. We make four key contributions: (1) documenting institutional failures including inter-appraiser variability and systematic biases undermining valuation reliability; (2) developing an architectural framework spanning physical data acquisition, semantic understanding, and cognitive reasoning that integrates emerging technologies while maintaining professional oversight; (3) addressing trust requirements for high-stakes financial applications including regulatory compliance, algorithmic fairness, and uncertainty quantification; (4) proposing evaluation methodologies beyond generic AI benchmarks toward domain-specific protocols. Our findings indicate successful transformation requires not merely technological sophistication but careful human-AI collaboration, creating systems that augment rather than replace professional expertise while addressing historical biases and information asymmetries in real estate markets.

data mining, machine learning, natural language, (23 more...)

2508.02765

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.65)

Industry:

Banking & Finance > Real Estate (1.00)
Banking & Finance > Loans > Mortgages (1.00)
Government > Regional Government > North America Government > United States Government (0.67)
Transportation > Ground > Road (0.67)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(7 more...)

Sienicki, Mikołaj, Sienicki, Krzysztof

Beyond the Wavefunction: Qualia Abstraction Language Mechanics and the Grammar of Awareness

We propose a formal reconstruction of quantum mechanics grounded not in external mathematical abstractions, but in the structured dynamics of subjective experience. The Qualia Abstraction Language (QAL) models physical systems as evolving streams of introspective units, structured sequences of modality, shape, and functional effect, rather than as state vectors in Hilbert space. This approach reimagines core quantum concepts: superposition becomes a form of structured ambiguity; collapse is reframed as an introspective contraction; and entanglement is modeled as semantic resonance across streams of qualia. Drawing on insights from nominalist philosophy and oversight theoretic limits in AI, we argue that the observer paradox in quantum mechanics reflects not an ontological lacuna, but a linguistic one: the absence of a formal vocabulary for modeling first person structure. QAL introduces such a vocabulary, providing a morphodynamic framework that embeds the observer within the system and replaces abstract projection with endogenous transformation. We analyze the alignment of QAL with endophysical approaches, contrast it with standard interpretations of quantum theory, and explore its implications for a post Platonist, introspectively grounded physics.

coherence, machine learning, natural language, (20 more...)

2508.02755

Country: Europe (0.93)

Genre: Research Report (0.41)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.67)

Efficient Agents: Building Effective Agents While Reducing Cost

Wang, Ningning, Hu, Xavier, Liu, Pai, Zhu, He, Hou, Yue, Huang, Heyuan, Zhang, Shengyu, Yang, Jian, Liu, Jiaheng, Zhang, Ge, Zhang, Changwang, Wang, Jun, Jiang, Yuchen Eleanor, Zhou, Wangchunshu

The remarkable capabilities of Large Language Model (LLM)-driven agents have enabled sophisticated systems to tackle complex, multi-step tasks, but their escalating costs threaten scalability and accessibility. This work presents the first systematic study of the efficiency-effectiveness trade-off in modern agent systems, addressing the critical need for cost-effective designs without sacrificing performance. We investigate three key questions: (1) How much complexity do agentic tasks inherently require? (2) When do additional modules yield diminishing returns? (3) How much efficiency can be gained through the design of efficient agent frameworks? Through an empirical analysis on the GAIA benchmark, we evaluate the impact of LLM backbone selection, agent framework designs, and test-time scaling strategies. Using the cost-of-pass metric, we quantify the efficiency-performance trade-off across these dimensions. Our findings inform the development of Efficient Agents , a novel agent framework that has an optimal complexity to task requirements. Efficient Agents retains 96.7% of the performance of OWL, one leading open-source agent framework, while reducing operational costs from $0.398 to $0.228, resulting in a 28.4% improvement in cost-of-pass. Our work provides actionable insights for designing efficient, high-performing agent systems, advancing the accessibility and sustainability of AI-driven solutions.

efficiency, large language model, machine learning, (17 more...)

2508.02694

Country: North America (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Heitzig, Jobst, Potham, Ram

Model-Based Soft Maximization of Suitable Metrics of Long-Term Human Power

Power is a key concept in AI safety: power-seeking as an instrumental goal, sudden or gradual disempowerment of humans, power balance in human-AI interaction and international AI governance. At the same time, power as the ability to pursue diverse goals is essential for wellbeing. This paper explores the idea of promoting both safety and wellbeing by forcing AI agents explicitly to empower humans and to manage the power balance between humans and AI agents in a desirable way. Using a principled, partially axiomatic approach, we design a parametrizable and decomposable objective function that represents an inequality- and risk-averse long-term aggregate of human power. It takes into account humans' bounded rationality and social norms, and, crucially, considers a wide variety of possible human goals. We derive algorithms for computing that metric by backward induction or approximating it via a form of multi-agent reinforcement learning from a given world model. We exemplify the consequences of (softly) maximizing this metric in a variety of paradigmatic situations and describe what instrumental sub-goals it will likely imply. Our cautious assessment is that softly maximizing suitable aggregate metrics of human power might constitute a beneficial objective for agentic AI systems that is safer than direct utility-based objectives.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

2508.00159

Genre: Research Report > New Finding (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Yang, Xiuyu, Tan, Shuhan, Krähenbühl, Philipp

Long-term Traffic Simulation with Interleaved Autoregressive Motion and Scenario Generation

Prior models and benchmarks focus on closed-loop motion simulation for initial agents in a scene. This is problematic for long-term simulation. Agents enter and exit the scene as the ego vehicle enters new regions. W e propose InfGen, a unified next-token prediction model that performs interleaved closed-loop motion simulation and scene generation. InfGen automatically switches between closed-loop motion simulation and scene generation mode. It enables stable long-term rollout simulation. InfGen performs at the state-of-the-art in short-term (9s) traffic simulation, and significantly outperforms all other methods in long-term (30s) simulation.

artificial intelligence, machine learning, natural language, (19 more...)

2506.17213

Genre: Research Report (0.81)

Industry:

Energy (0.75)
Transportation (0.46)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
(2 more...)

AudioGenie: A Training-Free Multi-Agent Framework for Diverse Multimodality-to-Multiaudio Generation

Rong, Yan, Wang, Jinting, Lei, Guangzhi, Yang, Shan, Liu, Li

Multimodality-to-Multiaudio (MM2MA) generation faces significant challenges in synthesizing diverse and contextually aligned audio types (e.g., sound effects, speech, music, and songs) from multimodal inputs (e.g., video, text, images), owing to the scarcity of high-quality paired datasets and the lack of robust multi-task learning frameworks. Recently, multi-agent system shows great potential in tackling the above issues. However, directly applying it to MM2MA task presents three critical challenges: (1) inadequate fine-grained understanding of multimodal inputs (especially for video), (2) the inability of single models to handle diverse audio events, and (3) the absence of self-correction mechanisms for reliable outputs. To this end, we propose AudioGenie, a novel training-free multi-agent system featuring a dual-layer architecture with a generation team and a supervisor team. For the generation team, a fine-grained task decomposition and an adaptive Mixture-of-Experts (MoE) collaborative entity are designed for detailed comprehensive multimodal understanding and dynamic model selection, and a trial-and-error iterative refinement module is designed for self-correction. The supervisor team ensures temporal-spatial consistency and verifies outputs through feedback loops. Moreover, we build MA-Bench, the first benchmark for MM2MA tasks, comprising 198 annotated videos with multi-type audios. Experiments demonstrate that our AudioGenie achieves state-of-the-art (SOTA) or comparable performance across 9 metrics in 8 tasks. User study further validates the effectiveness of our method in terms of quality, accuracy, alignment, and aesthetic. The project website with audio samples can be found at https://audiogenie.github.io/.

artificial intelligence, arxiv preprint arxiv, machine learning, (15 more...)

2505.22053

Country: Asia > China > Guangdong Province (0.14)

Genre: Research Report (0.82)

Industry:

Media (0.94)
Leisure & Entertainment (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Communications of the ACMAug-5-2025, 13:28:00 GMT

Differentiable Economics: Strategic Behavior, Mechanisms, and Machine Learning

Economists have developed different types of models describing the interaction of agents in markets. Early models in general equilibrium theory describe agents taking prices as given and do not consider the incentives of agents to manipulate prices strategically. With appropriate convexity assumptions on the preferences, such models can be cast as convex optimization problems for which efficient algorithms are known to find a competitive equilibrium. Price-taking behavior might be a reasonable approximation of agent behavior in large markets, but it does not adequately capture the incentives and strategies that agents have in smaller markets or in other strategic settings. Modern models in economics, such as those used for modeling auctions, oligopoly competition, or contests, are based on game theory, with the Nash equilibrium as the central solution concept.

artificial intelligence, economics, machine learning, (16 more...)

Communications of the ACM

Industry: Leisure & Entertainment > Games (0.36)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.69)

Mendonça, Fábio, Mostafa, Sheikh Shanawaz, Freitas, Diogo, Morgado-Dias, Fernando, Ravelo-García, Antonio G.

Multiple Time Series Fusion Based on LSTM An Application to CAP A Phase Classification Using EEG

arXiv.org Artificial IntelligenceAug-5-2025

Biomedical decision making involves multiple signal processing, either from different sensors or from different channels. In both cases, information fusion plays a significant role. A deep learning based electroencephalogram channels' feature level fusion is carried out in this work for the electroencephalogram cyclic alternating pattern A phase classification. Channel selection, fusion, and classification procedures were optimized by two optimization algorithms, namely, Genetic Algorithm and Particle Swarm Optimization. The developed methodologies were evaluated by fusing the information from multiple electroencephalogram channels for patients with nocturnal frontal lobe epilepsy and patients without any neurological disorder, which was significantly more challenging when compared to other state of the art works. Results showed that both optimization algorithms selected a comparable structure with similar feature level fusion, consisting of three electroencephalogram channels, which is in line with the CAP protocol to ensure multiple channels' arousals for CAP detection. Moreover, the two optimized models reached an area under the receiver operating characteristic curve of 0.82, with average accuracy ranging from 77% to 79%, a result which is in the upper range of the specialist agreement. The proposed approach is still in the upper range of the best state of the art works despite a difficult dataset, and has the advantage of providing a fully automatic analysis without requiring any manual procedure. Ultimately, the models revealed to be noise resistant and resilient to multiple channel loss.

artificial intelligence, evolutionary algorithm, machine learning, (16 more...)

doi: 10.3390/ijerph191710892

2112.11218

Country:

Europe > Portugal > Madeira > Funchal (0.04)
North America > United States > California (0.04)
North America > United States > Massachusetts (0.04)
(19 more...)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)