Goto

Collaborating Authors

 Agents


Choosing What Game to Play without Selecting Equilibria: Inferring Safe (Pareto) Improvements in Binary Constraint Structures

arXiv.org Artificial Intelligence

We consider a setting in which a principal gets to choose which game from some given set is played by a group of agents. The principal would like to choose a game that favors one of the players, the social preferences of the players, or the principal's own preferences. Unfortunately, given the potential multiplicity of equilibria, it is conceptually unclear how to tell which of even any two games is better. Oesterheld et al. (2022) propose that we use assumptions about outcome correspondence -- i.e., about how the outcomes of different games relate -- to allow comparisons in some cases. For example, it seems reasonable to assume that isomorphic games are played isomorphically. From such assumptions we can sometimes deduce that the outcome of one game G' is guaranteed to be better than the outcome of another game G, even if we do not have beliefs about how each of G and G' will be played individually. Following Oesterheld et al., we then call G' a safe improvement on G. In this paper, we study how to derive safe improvement relations. We first show that if we are given a set of games and arbitrary assumptions about outcome correspondence between these games, deriving safe improvement relations is co-NP-complete. We then study the (in)completeness of a natural set of inference rules for outcome correspondence. We show that in general the inference rules are incomplete. However, we also show that under natural, generally applicable assumptions about outcome correspondence the rules are complete.


MarketGen: A Scalable Simulation Platform with Auto-Generated Embodied Supermarket Environments

arXiv.org Artificial Intelligence

The development of embodied agents for complex commercial environments is hindered by a critical gap in existing robotics datasets and benchmarks, which primarily focus on household or tabletop settings with short-horizon tasks. To address this limitation, we introduce MarketGen, a scalable simulation platform with automatic scene generation for complex supermarket environments. MarketGen features a novel agent-based Procedural Content Generation (PCG) framework. It uniquely supports multi-modal inputs (text and reference images) and integrates real-world design principles to automatically generate complete, structured, and realistic supermarkets. We also provide an extensive and diverse 3D asset library with a total of 1100+ supermarket goods and parameterized facilities assets. Building on this generative foundation, we propose a novel benchmark for assessing supermarket agents, featuring two daily tasks in a supermarket: (1) Checkout Unloading: long-horizon tabletop tasks for cashier agents, and (2) In-Aisle Item Collection: complex mobile manipulation tasks for salesperson agents. We validate our platform and benchmark through extensive experiments, including the deployment of a modular agent system and successful sim-to-real transfer. MarketGen provides a comprehensive framework to accelerate research in embodied AI for complex commercial applications.


Resilient Charging Infrastructure via Decentralized Coordination of Electric Vehicles at Scale

arXiv.org Artificial Intelligence

Abstract--The rapid adoption of electric vehicles (EVs) introduces major challenges for decentralized charging control. Existing decentralized approaches efficiently coordinate a large number of EVs to select charging stations while reducing energy costs, preventing power peak and preserving driver privacy. These situations create competition for limited charging slots, resulting in long queues and reduced driver comfort. T o address these limitations, we propose a novel collective learning-based coordination framework that allows EVs to balance individual comfort on their selections against system-wide efficiency, i.e., the overall queues across all stations. In the framework, EVs are recommended for adaptive charging behaviors that shift priority between comfort and efficiency, achieving Pareto-optimal trade-offs under varying station capacities and dynamic spatiotemporal EV distribution. Experiments using real-world data from EVs and charging stations show that the proposed approach outperforms baseline methods, significantly reducing travel and queuing time. The results reveal that, under uncertain charging conditions, EV drivers that behave selfishly or altruistically at the right moments achieve shorter waiting time than those maintaining moderate behavior throughout. Our findings under high fractions of station outages and adversarial EVs further demonstrate improved resilience and trustworthiness of decentralized EV charging infrastructure. LECTRIC vehicles (EVs) are becoming a preferred option in intelligent transportation systems due to their energy efficiency and reduced emissions, critical in addressing environmental concerns and fuel shortages. According to recent global market reports, EV sales are projected to surpass 17 million units in 2024 (over 20% market share), with over 20 million expected in 2025 [1]. As governments expand public charging infrastructure to meet soaring demand, centralized charging management faces limitations in scalability, cost, and resilience (e.g., single points of failure) [2], [3]. A promising alternative lies in decentralized charging control among EVs. It aims to allow EVs to manage their charging based on local conditions, user preference and grid/station needs without a central authority.


Chatty-KG: A Multi-Agent AI System for On-Demand Conversational Question Answering over Knowledge Graphs

arXiv.org Artificial Intelligence

Conversational Question Answering over Knowledge Graphs (KGs) combines the factual grounding of KG-based QA with the interactive nature of dialogue systems. KGs are widely used in enterprise and domain applications to provide structured, evolving, and reliable knowledge. Large language models (LLMs) enable natural and context-aware conversations, but lack direct access to private and dynamic KGs. Retrieval-augmented generation (RAG) systems can retrieve graph content but often serialize structure, struggle with multi-turn context, and require heavy indexing. Traditional KGQA systems preserve structure but typically support only single-turn QA, incur high latency, and struggle with coreference and context tracking. To address these limitations, we propose Chatty-KG, a modular multi-agent system for conversational QA over KGs. Chatty-KG combines RAG-style retrieval with structured execution by generating SPARQL queries through task-specialized LLM agents. These agents collaborate for contextual interpretation, dialogue tracking, entity and relation linking, and efficient query planning, enabling accurate and low-latency translation of natural questions into executable queries. Experiments on large and diverse KGs show that Chatty-KG significantly outperforms state-of-the-art baselines in both single-turn and multi-turn settings, achieving higher F1 and P@1 scores. Its modular design preserves dialogue coherence and supports evolving KGs without fine-tuning or pre-processing. Evaluations with commercial (e.g., GPT-4o, Gemini-2.0) and open-weight (e.g., Phi-4, Gemma 3) LLMs confirm broad compatibility and stable performance. Overall, Chatty-KG unifies conversational flexibility with structured KG grounding, offering a scalable and extensible approach for reliable multi-turn KGQA.


OpenApps: Simulating Environment Variations to Measure UI-Agent Reliability

arXiv.org Artificial Intelligence

Reliability is key to realizing the promise of autonomous UI-Agents, multimodal agents that directly interact with apps in the same manner as humans, as users must be able to trust an agent to complete a given task. Current evaluations rely on fixed environments, often clones of existing apps, which are limited in that they can only shed light on whether or how often an agent can complete a task within a specific environment. When deployed however, agents are likely to encounter variations in app design and content that can affect an agent's ability to complete a task. To address this blind spot of measuring agent reliability across app variations, we develop OpenApps, a light-weight open-source ecosystem with six apps (messenger, calendar, maps, etc.) that are configurable in appearance and content. OpenApps requires just a single CPU to run, enabling easy generation and deployment of thousands of versions of each app. Specifically, we run more than 10,000 independent evaluations to study reliability across seven leading multimodal agents. We find that while standard reliability within a fixed app is relatively stable, reliability can vary drastically when measured across app variations. Task success rates for many agents can fluctuate by more than $50\%$ across app variations. For example, Kimi-VL-3B's average success across all tasks fluctuates from $63\%$ to just $4\%$ across app versions. We also find agent behaviors such as looping or hallucinating actions can differ drastically depending on the environment configuration. These initial findings highlight the importance of measuring reliability along this new dimension of app variations. OpenApps is available at https://facebookresearch.github.io/OpenApps/


Data-Driven Methods and AI in Engineering Design: A Systematic Literature Review Focusing on Challenges and Opportunities

arXiv.org Artificial Intelligence

The increasing availability of data and advancements in computational intelligence have accelerated the adoption of data-driven methods (DDMs) in product development. However, their integration into product development remains fragmented. This fragmentation stems from uncertainty, particularly the lack of clarity on what types of DDMs to use and when to employ them across the product development lifecycle. To address this, a necessary first step is to investigate the usage of DDM in engineering design by identifying which methods are being used, at which development stages, and for what application. This paper presents a PRISMA systematic literature review. The V-model as a product development framework was adopted and simplified into four stages: system design, system implementation, system integration, and validation. A structured search across Scopus, Web of Science, and IEEE Xplore (2014--2024) retrieved 1{,}689 records. After screening, 114 publications underwent full-text analysis. Findings show that machine learning (ML) and statistical methods dominate current practice, whereas deep learning (DL), though still less common, exhibits a clear upward trend in adoption. Additionally, supervised learning, clustering, regression analysis, and surrogate modeling are prevalent in design, implementation, and integration system stages but contributions to validation remain limited. Key challenges in existing applications include limited model interpretability, poor cross-stage traceability, and insufficient validation under real-world conditions. Additionally, it highlights key limitations and opportunities such as the need for interpretable hybrid models. This review is a first step toward design-stage guidelines; a follow-up synthesis should map computer science algorithms to engineering design problems and activities.


Learning from Risk: LLM-Guided Generation of Safety-Critical Scenarios with Prior Knowledge

arXiv.org Artificial Intelligence

Autonomous driving faces critical challenges in rare long-tail events and complex multi-agent interactions, which are scarce in real-world data yet essential for robust safety validation. This paper presents a high-fidelity scenario generation framework that integrates a conditional variational autoencoder (CVAE) with a large language model (LLM). The CVAE encodes historical trajectories and map information from large-scale naturalistic datasets to learn latent traffic structures, enabling the generation of physically consistent base scenarios. This knowledge-driven optimization balances realism with controllability, ensuring that generated scenarios remain both plausible and risk-sensitive. Extensive experiments in CARLA and SMARTS demonstrate that our framework substantially increases the coverage of high-risk and long-tail events, improves consistency between simulated and real-world traffic distributions, and exposes autonomous driving systems to interactions that are significantly more challenging than those produced by existing rule-or data-driven methods. These results establish a new pathway for safety validation, enabling principled stress-testing of autonomous systems under rare but consequential events. Introduction The safety and reliability of autonomous driving depend on rigorous validation under diverse test conditions, especially in high-risk, highly interactive, and safety-critical scenarios (Wang et al., 2021; Hossain, 2025). Yet such events are extremely scarce in real-world datasets, creating a persistent gap between development testing and deployment needs. Simulation-based methods provide an effective alternative by generating large numbers of rare and adversarial environments, thereby alleviating data scarcity and enabling controlled safety evaluation (Huang et al., 2020). To address these challenges, this paper proposes a risk knowledge-guided traffic scene generation framework that integrates a Conditional Variational Autoencoder (CV AE) with a Large Language Model (LLM). Unlike prior works that merely sample or replay specific risky cases, the proposed framework establishes a general and controllable pipeline for synthesizing diverse safety-critical scenarios under varying risk conditions. The CVAE learns latent spatiotemporal representations from real-world trajectories and maps to generate physically coherent base scenes, while the LLM acts as a knowledge-driven controller that interprets scene semantics, analyzes multi-agent risk interactions, and dynamically adjusts optimization objectives to guide the generation toward desired levels of behavioral complexity and risk exposure.


Learning Multi-Access Point Coordination in Agentic AI Wi-Fi with Large Language Models

arXiv.org Artificial Intelligence

Abstract--Multi-access point coordination (MAPC) is a key technology for enhancing throughput in next-generation Wi-Fi within dense overlapping basic service sets. However, existing MAPC protocols rely on static, protocol-defined rules, which limits their ability to adapt to dynamic network conditions such as varying interference levels and topologies. T o address this limitation, we propose a novel Agentic AI Wi-Fi framework where each access point, modeled as an autonomous large language model agent, collaboratively reasons about the network state and negotiates adaptive coordination strategies in real time. This dynamic collaboration is achieved through a cognitive workflow that enables the agents to engage in natural language dialogue, leveraging integrated memory, reflection, and tool use to ground their decisions in past experience and environmental feedback. Comprehensive simulation results demonstrate that our agentic framework successfully learns to adapt to diverse and dynamic network environments, significantly outperforming the state-of-the-art spatial reuse baseline and validating its potential as a robust and intelligent solution for future wireless networks. The upcoming IEEE 802.11bn standard, or Wi-Fi 8, introduces multi-access point coordination (MAPC) as a key mechanism to enhance performance in dense Wi-Fi deployments [1]. Specifically, MAPC enables neighboring access points (APs) in overlapping basic service sets (OBSS) to jointly manage radio resources, thereby mitigating the adverse impact of co-channel interference and boosting network throughput.


Reasoning With a Star: A Heliophysics Dataset and Benchmark for Agentic Scientific Reasoning

arXiv.org Artificial Intelligence

Scientific reasoning through Large Language Models in heliophysics involves more than just recalling facts: it requires incorporating physical assumptions, maintaining consistent units, and providing clear scientific formats through coordinated approaches. To address these challenges, we present Reasoning With a Star, a newly contributed heliophysics dataset applicable to reasoning; we also provide an initial benchmarking approach. Our data are constructed from National Aeronautics and Space Administration & University Corporation for Atmospheric Research Living With a Star summer school problem sets and compiled into a readily consumable question-and-answer structure with question contexts, reasoning steps, expected answer type, ground-truth targets, format hints, and metadata. A programmatic grader checks the predictions using unit-aware numerical tolerance, symbolic equivalence, and schema validation. We benchmark a single-shot baseline and four multi-agent patterns, finding that decomposing workflows through systems engineering principles outperforms direct prompting on problems requiring deductive reasoning rather than pure inductive recall.


Intelligent Agents with Emotional Intelligence: Current Trends, Challenges, and Future Prospects

arXiv.org Artificial Intelligence

Developing intelligent agents that possess human-level intelligence is a key goal in the field of human-computer interaction (HCI) and general artificial intelligence[2]. A crucial aspect of achieving this goal is the incorporation of emotional intelligence, which is essential for human cognition and social interaction, into these intelligent agents. Emotional intelligence encompasses three interrelated capabilities: 1) emotion understanding, which involves accurately detecting and understanding affective signals, such as recognizing individuals' emotional states during interactions; 2) emotion elicitation and experiences, which refers to interpreting the causes, context, and implications of emotions for both the individual and the interaction; and 3) emotion expression, which encompasses the capacity to generate, modulate, and convey appropriate emotional responses in a socially meaningful manner. Affective Computing, coined by Rosalind Picard [1], emerged as a discipline dedicated to equipping machines with emotional intelligence, enabling them to recognize, interpret, and respond to human emotions. By embedding emotional intelligence into intelligent agents, affective computing facilitates more naturalistic, adaptive, and socially competent interactions, which in turn enhances user trust, engagement, and satisfaction [209]. Such emotionally intelligent systems not only improve usability but also enable advanced functionalities, including personalized assistance, empathetic dialogue, and context-aware decision-making. In Figure 1, an overview of the emotional intelligence capabilities in intelligent agents is presented. The process of emotional intelligence begins with analyzing the emotional aspects of the user input, enabling the agent to identify the user's affective state during interactions [259][306]. The next step is affective cognition, where the agent evaluates the observed emotional events using cognitive mental states to ensure accurate interpretation.