AITopics

This paper explores the fair allocation of indivisible items in a multidimensional setting, motivated by the need to address fairness in complex environments where agents assess bundles according to multiple criteria. Such multidimensional settings are not merely of theoretical interest but are central to many real-world applications. For example, cloud computing resources are evaluated based on multiple criteria such as CPU cores, memory, and network bandwidth. In such cases, traditional one dimensional fairness notions fail to capture fairness across multiple attributes. To address these challenges, we study two relaxed variants of envy-freeness: weak simultaneously envy-free up to c goods (weak sEFc) and strong simultaneously envy-free up to c goods (strong sEFc), which accommodate the multidimensionality of agents' preferences. Under the weak notion, for every pair of agents and for each dimension, any perceived envy can be eliminated by removing, if necessary, a different set of goods from the envied agent's allocation. In contrast, the strong version requires selecting a single set of goods whose removal from the envied bundle simultaneously eliminates envy in every dimension. We provide upper and lower bounds on the relaxation parameter c that guarantee the existence of weak or strong sEFc allocations, where these bounds are independent of the total number of items. In addition, we present algorithms for checking whether a weak or strong sEFc allocation exists. Moreover, we establish NP-hardness results for checking the existence of weak sEF1 and strong sEF1 allocations.

agent, allocation, artificial intelligence, (14 more...)

2506.21727

Country: Asia > Japan (0.28)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.69)

ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows

Sun, Qiushi, Liu, Zhoumianze, Ma, Chang, Ding, Zichen, Xu, Fangzhi, Yin, Zhangyue, Zhao, Haiteng, Wu, Zhenyu, Cheng, Kanzhi, Liu, Zhaoyang, Wang, Jianing, Li, Qintong, Tang, Xiangru, Xie, Tianbao, Feng, Xiachong, Li, Xiang, Kao, Ben, Wang, Wenhai, Qi, Biqing, Kong, Lingpeng, Wu, Zhiyong

Large Language Models (LLMs) have extended their impact beyond Natural Language Processing, substantially fostering the development of interdisciplinary research. Recently, various LLM-based agents have been developed to assist scientific discovery progress across multiple aspects and domains. Among these, computer-using agents, capable of interacting with operating systems as humans do, are paving the way to automated scientific problem-solving and addressing routines in researchers' workflows. Recognizing the transformative potential of these agents, we introduce ScienceBoard, which encompasses two complementary contributions: (i) a realistic, multi-domain environment featuring dynamic and visually rich scientific workflows with integrated professional software, where agents can autonomously interact via different interfaces to accelerate complex research tasks and experiments; and (ii) a challenging benchmark of 169 high-quality, rigorously validated real-world tasks curated by humans, spanning scientific-discovery workflows in domains such as biochemistry, astronomy, and geoinformatics. Extensive evaluations of agents with state-of-the-art backbones (e.g., GPT-4o, Claude 3.7, UI-TARS) show that, despite some promising results, they still fall short of reliably assisting scientists in complex workflows, achieving only a 15% overall success rate. In-depth analysis further provides valuable insights for addressing current agent limitations and more effective design principles, paving the way to build more capable agents for scientific discovery. Our code, environment, and benchmark are at https://qiushisun.github.io/ScienceBoard-Home/.

large language model, machine learning, natural language, (21 more...)

2505.19897

Country: Asia > China (0.46)

Genre:

Workflow (1.00)
Research Report > New Finding (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

$C^3$-Bench: The Things Real Disturbing LLM based Agent in Multi-Tasking

Yu, Peijie, Yang, Yifan, Li, Jinjian, Zhang, Zelong, Wang, Haorui, Feng, Xiao, Zhang, Feng

Agents based on large language models leverage tools to modify environments, revolutionizing how AI interacts with the physical world. Unlike traditional NLP tasks that rely solely on historical dialogue for responses, these agents must consider more complex factors, such as inter-tool relationships, environmental feedback and previous decisions, when making choices. Current research typically evaluates agents via multi-turn dialogues. However, it overlooks the influence of these critical factors on agent behavior. To bridge this gap, we present an open-source and high-quality benchmark $C^3$-Bench. This benchmark integrates attack concepts and applies univariate analysis to pinpoint key elements affecting agent robustness. In concrete, we design three challenges: navigate complex tool relationships, handle critical hidden information and manage dynamic decision paths. Complementing these challenges, we introduce fine-grained metrics, innovative data collection algorithms and reproducible evaluation methods. Extensive experiments are conducted on 49 mainstream agents, encompassing general fast-thinking, slow-thinking and domain-specific models. We observe that agents have significant shortcomings in handling tool dependencies, long context information dependencies and frequent policy-type switching. In essence, $C^3$-Bench aims to expose model vulnerabilities through these challenges and drive research into the interpretability of agent performance. The benchmark is publicly available at https://github.com/TencentHunyuan/C3-Benchmark.

information, large language model, machine learning, (21 more...)

2505.18746

Country:

Africa (0.69)
North America > United States > California (0.14)

Genre:

Research Report (0.81)
Overview (0.67)

Industry:

Information Technology > Security & Privacy (1.00)
Transportation > Air (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

TrajTok: Technical Report for 2025 Waymo Open Sim Agents Challenge

Zhang, Zhiyuan, Jia, Xiaosong, Chen, Guanyu, Li, Qifeng, Yan, Junchi

In this technical report, we introduce TrajTok, a trajectory tokenizer for discrete next-token-prediction based behavior generation models, which combines data-driven and rule-based methods with better coverage, symmetry and robustness, along with a spatial-aware label smoothing method for cross-entropy loss. We adopt the tokenizer and loss for the SMART model and reach a superior performance with realism score of 0.7852 on the Waymo Open Sim Agents Challenge 2025. We will open-source the code in the future.

artificial intelligence, machine learning, trajectory, (14 more...)

2506.21618

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.56)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.49)

Bouamra, Yasmine, Yun, Bruno, Poisson, Alexandre, Armetta, Frédéric

SysTemp: A Multi-Agent System for Template-Based Generation of SysML v2

The automatic generation of SysML v2 models represents a major challenge in the engineering of complex systems, particularly due to the scarcity of learning corpora and complex syntax. We present SysTemp, a system aimed at facilitating and improving the creation of SysML v2 models from natural language specifications. It is based on a multi-agent system, including a template generator that structures the generation process. We discuss the advantages and challenges of this system through an evaluation, highlighting its potential to improve the quality of the generations in SysML v2 modeling.

artificial intelligence, machine learning, requirement, (18 more...)

2506.21608

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

WIREDJun-29-2025, 10:00:00 GMT

I Let AI Agents Plan My Vacation--and It Wasn't Terrible

The worst part of travel is the planning: the faff of finding and booking transport, accommodation, restaurant reservations--the list can feel endless. To help, the latest wave of AI agents, such as OpenAI's Operator and Anthropic's Computer Use claim they can take these dreary, cumbersome tasks from befuddled travelers and do it all for you. But exactly how good are they are digging out the good stuff? What better way to find out than deciding on a last-minute weekend away. I tasked Operator, which is available to ChatGPT Pro subscribers, with booking me something budget-friendly, with good food and art, and told it that I'd prefer to travel by train.

ai agent plan, chatgpt, vacation, (4 more...)

WIRED

Country: Europe > Belgium > Flanders > West Flanders > Bruges (0.12)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.64)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.64)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.63)

arXiv.org Artificial IntelligenceJun-27-2025

xChemAgents: Agentic AI for Explainable Quantum Chemistry

Polat, Can, Tuncel, Mehmet, Kurban, Mustafa, Serpedin, Erchin, Kurban, Hasan

Recent progress in multimodal graph neural networks has demonstrated that augmenting atomic XYZ geometries with textual chemical descriptors can enhance predictive accuracy across a range of electronic and thermodynamic properties. However, naively appending large sets of heterogeneous descriptors often degrades performance on tasks sensitive to molecular shape or symmetry, and undermines interpretability. xChemAgents proposes a cooperative agent framework that injects physics-aware reasoning into multimodal property prediction. xChemAgents comprises two language-model-based agents: a Selector, which adaptively identifies a sparse, weighted subset of descriptors relevant to each target, and provides a natural language rationale; and a Validator, which enforces physical constraints such as unit consistency and scaling laws through iterative dialogue. On standard benchmark datasets, xChemAgents achieves up to a 22% reduction in mean absolute error over the state-of-the-art baselines, while producing faithful, human-interpretable explanations. Experiment results highlight the potential of cooperative, self-verifying agents to enhance both accuracy and transparency in foundation-model-driven materials science. The implementation and accompanying dataset are available at https://github.com/KurbanIntelligenceLab/xChemAgents.

descriptor, machine learning, natural language, (15 more...)

2505.20574

Country:

Asia > Middle East > Republic of Türkiye > Ankara Province > Ankara (0.04)
Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
(2 more...)

Genre:

Research Report (0.82)
Workflow (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.68)

arXiv.org Artificial IntelligenceJun-27-2025

Agent-RewardBench: Towards a Unified Benchmark for Reward Modeling across Perception, Planning, and Safety in Real-World Multimodal Agents

Men, Tianyi, Jin, Zhuoran, Cao, Pengfei, Chen, Yubo, Liu, Kang, Zhao, Jun

As Multimodal Large Language Models (MLLMs) advance, multimodal agents show promise in real-world tasks like web navigation and embodied intelligence. However, due to limitations in a lack of external feedback, these agents struggle with self-correction and generalization. A promising approach is to use reward models as external feedback, but there is no clear on how to select reward models for agents. Thus, there is an urgent need to build a reward bench targeted at agents. To address these challenges, we propose Agent-RewardBench, a benchmark designed to evaluate reward modeling ability in MLLMs. The benchmark is characterized by three key features: (1) Multiple dimensions and real-world agent scenarios evaluation. It covers perception, planning, and safety with 7 scenarios; (2) Step-level reward evaluation. It allows for the assessment of agent capabilities at the individual steps of a task, providing a more granular view of performance during the planning process; and (3) Appropriately difficulty and high-quality. We carefully sample from 10 diverse models, difficulty control to maintain task challenges, and manual verification to ensure the integrity of the data. Experiments demonstrate that even state-of-the-art multimodal models show limited performance, highlighting the need for specialized training in agent reward modeling. Code is available at github.

arxiv preprint arxiv, large language model, machine learning, (21 more...)

2506.21252

Country:

Asia > China > Beijing > Beijing (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Jiangsu Province > Changzhou (0.04)

Genre: Research Report (1.00)

Industry:

Transportation (0.69)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

arXiv.org Artificial IntelligenceJun-27-2025

Artificial Delegates Resolve Fairness Issues in Perpetual Voting with Partial Turnout

Shah, Apurva, Abels, Axel, Nowé, Ann, Lenaerts, Tom

Perpetual voting considers sequences of decis ions made by the same electorate, where fairness must be evaluated over time rather than perdecision [16]. A centralchallenge in this setting is ensuring adequaterepresentation for voters who are repeatedly in the minority. Traditional a ggregation rules, such as majority voting or Borda count, fail in this regard: they offer no guarantees of long-term fai rness or cumulative influence. In response, methods such as Perpetual Phragmén [17] and Perpetual Consensus [16] hav e been proposed to distribute influence more equitably over time. However, they rely on full knowledge of all voters ' approval sets, implicitly requiring consistent voter participation, a condition which can be hard to satisfy in real-world contexts. Real-world elections face various practical constraints-- including scheduling conflicts, limited resources, and restricted information access--that inevitably prevent vote rs from participating consistently.

artificial intelligence, machine learning, voter, (12 more...)

doi: 10.1145/3715928.3737482

2506.21186

Country:

Europe > Belgium > Brussels-Capital Region > Brussels (0.05)
North America > United States (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.93)

Industry: Government > Voting & Elections (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.46)

arXiv.org Machine LearningJun-27-2025

Homogenization of Multi-agent Learning Dynamics in Finite-state Markov Games

Kerzreho, Yann

This paper introduces a new approach for approximating the learning dynamics of multiple reinforcement learning (RL) agents interacting in a finite-state Markov game. The idea is to rescale the learning process by simultaneously reducing the learning rate and increasing the update frequency, effectively treating the agent's parameters as a slow-evolving variable influenced by the fast-mixing game state. Under mild assumptions-ergodicity of the state process and continuity of the updates-we prove the convergence of this rescaled process to an ordinary differential equation (ODE). This ODE provides a tractable, deterministic approximation of the agent's learning dynamics. An implementation of the framework is available at\,: https://github.com/yannKerzreho/MarkovGameApproximation

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

2506.21079

Country: Europe > France (0.04)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)