AITopics

2506.08403

Country:

Asia > China (0.28)
Europe > Austria (0.28)

Genre: Research Report (1.00)

Industry: Banking & Finance (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceJun-12-2025

SAFEFLOW: A Principled Protocol for Trustworthy and Transactional Autonomous Agent Systems

Li, Peiran, Zou, Xinkai, Wu, Zhuohang, Li, Ruifeng, Xing, Shuo, Zheng, Hanwen, Hu, Zhikai, Wang, Yuping, Li, Haoxi, Yuan, Qin, Zhang, Yingmo, Tu, Zhengzhong

Recent advances in large language models (LLMs) and vision-language models (VLMs) have enabled powerful autonomous agents capable of complex reasoning and multi-modal tool use. Despite their growing capabilities, today's agent frameworks remain fragile, lacking principled mechanisms for secure information flow, reliability, and multi-agent coordination. In this work, we introduce SAFEFLOW, a new protocol-level framework for building trustworthy LLM/VLM-based agents. SAFEFLOW enforces fine-grained information flow control (IFC), precisely tracking provenance, integrity, and confidentiality of all the data exchanged between agents, tools, users, and environments. By constraining LLM reasoning to respect these security labels, SAFEFLOW prevents untrusted or adversarial inputs from contaminating high-integrity decisions. To ensure robustness in concurrent multi-agent settings, SAFEFLOW introduces transactional execution, conflict resolution, and secure scheduling over shared state, preserving global consistency across agents. We further introduce mechanisms, including write-ahead logging, rollback, and secure caches, that further enhance resilience against runtime errors and policy violations. To validate the performances, we built SAFEFLOWBENCH, a comprehensive benchmark suite designed to evaluate agent reliability under adversarial, noisy, and concurrent operational conditions. Extensive experiments demonstrate that agents built with SAFEFLOW maintain impressive task performance and security guarantees even in hostile environments, substantially outperforming state-of-the-art. Together, SAFEFLOW and SAFEFLOWBENCH lay the groundwork for principled, robust, and secure agent ecosystems, advancing the frontier of reliable autonomy.

large language model, machine learning, natural language, (16 more...)

2506.07564

Country: North America > United States (0.92)

Genre: Research Report > New Finding (0.67)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

New ScientistJun-11-2025, 14:00:09 GMT

Can we stop big tech from controlling the internet with AI agents?

What does the future of the internet look like? If AI firms get their way, the once-open web could be fractured into digital silos dominated by commercial AI models, leaving hobbyists and small businesses behind. To prevent this, a team of grassroots researchers is planning to fight back and ensure an open approach to AI. At the heart of this battle is the concept of an AI "agent", a piece of software that browses the web and interacts with websites according to the instructions of a human user – for example, planning and booking a holiday. Many people see agents as the next evolution of services like ChatGPT, but getting them to work is proving tricky.

artificial intelligence, machine learning, natural language, (16 more...)

New Scientist

Country: Europe > United Kingdom > England > Staffordshire (0.05)

Industry: Information Technology > Services (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.92)
Information Technology > Communications > Networks (0.78)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.36)

arXiv.org Machine LearningJun-11-2025

Learning to Lead: Incentivizing Strategic Agents in the Dark

Wu, Yuchen, Zhong, Xinyi, Yang, Zhuoran

The principal-agent model (Ross, 1973; Grossman and Hart, 1992; Smith, 2004; Laffont and Martimort, 2009) is a fundamental framework for understanding decision-making processes with misaligned incentives and information asymmetry, with wide applications across various disciplines such as economics, finance, and computer science (Ratliff et al., 2018; Kamenica, 2012). In this model, the principal represents an entity such as a service provider, a policy maker, or a firm, whose objective is to maximize certain system-level outcomes, such as revenue, social welfare, or efficiency. On the other hand, an agent, who could be a customer, an employee, or an individual participant, aims to optimize his utility based on his private preferences or information, which is not directly observable by the principal. To induce the optimal outcomes, the principal designs and commits to a mechanism, which could be a contract, an incentive scheme, or a policy, that aligns the agent's incentives with the principal's objectives. The optimal mechanism and the agent's optimal strategy against it constitute the equilibrium of the principal-agent model, in certain settings also known as the Stackelberg equilibrium (Stackelberg, 1934, 2010).

artificial intelligence, data mining, machine learning, (22 more...)

arXiv.org Machine Learning

2506.08438

Country:

Europe > Kosovo > District of Gjilan > Kamenica (0.24)
Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report (0.49)

Industry:

Leisure & Entertainment > Games (1.00)
Health & Medicine (0.67)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.67)

Atomic-to-Compositional Generalization for Mobile Agents with A New Benchmark and Scheduling System

Guo, Yuan, Miao, Tingjia, Wu, Zheng, Cheng, Pengzhou, Zhou, Ming, Zhang, Zhuosheng

Autonomous agents powered by multimodal large language models have been developed to facilitate task execution on mobile devices. However, prior work has predominantly focused on atomic tasks -- such as shot-chain execution tasks and single-screen grounding tasks -- while overlooking the generalization to compositional tasks, which are indispensable for real-world applications. This work introduces UI-NEXUS, a comprehensive benchmark designed to evaluate mobile agents on three categories of compositional operations: Simple Concatenation, Context Transition, and Deep Dive. UI-NEXUS supports interactive evaluation in 20 fully controllable local utility app environments, as well as 30 online Chinese and English service apps. It comprises 100 interactive task templates with an average optimal step count of 14.05. Experimental results across a range of mobile agents with agentic workflow or agent-as-a-model show that UI-NEXUS presents significant challenges. Specifically, existing agents generally struggle to balance performance and efficiency, exhibiting representative failure modes such as under-execution, over-execution, and attention drift, causing visible atomic-to-compositional generalization gap. Inspired by these findings, we propose AGENT-NEXUS, a lightweight and efficient scheduling system to tackle compositional mobile tasks. AGENT-NEXUS extrapolates the abilities of existing mobile agents by dynamically decomposing long-horizon tasks to a series of self-contained atomic subtasks. AGENT-NEXUS achieves 24% to 40% task success rate improvement for existing mobile agents on compositional operation tasks within the UI-NEXUS benchmark without significantly sacrificing inference overhead. The demo video, dataset, and code are available on the project page at https://ui-nexus.github.io.

agent, artificial intelligence, arxiv preprint arxiv, (15 more...)

2506.08972

Genre:

Workflow (0.89)
Research Report > New Finding (0.46)

Industry: Information Technology (0.46)

Technology:

Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Ranjan, Yash, Sengupta, Rahul, Rangarajan, Anand, Ranka, Sanjay

Evaluating Generative Vehicle Trajectory Models for Traffic Intersection Dynamics

Traffic Intersections are vital to urban road networks as they regulate the movement of people and goods. However, they are regions of conflicting trajectories and are prone to accidents. Deep Generative models of traffic dynamics at signalized intersections can greatly help traffic authorities better understand the efficiency and safety aspects. At present, models are evaluated on computational metrics that primarily look at trajectory reconstruction errors. They are not evaluated online in a `live' microsimulation scenario. Further, these metrics do not adequately consider traffic engineering-specific concerns such as red-light violations, unallowed stoppage, etc. In this work, we provide a comprehensive analytics tool to train, run, and evaluate models with metrics that give better insights into model performance from a traffic engineering point of view. We train a state-of-the-art multi-vehicle trajectory forecasting model on a large dataset collected by running a calibrated scenario of a real-world urban intersection. We then evaluate the performance of the prediction models, online in a microsimulator, under unseen traffic conditions. We show that despite using ideally-behaved trajectories as input, and achieving low trajectory reconstruction errors, the generated trajectories show behaviors that break traffic rules. We introduce new metrics to evaluate such undesired behaviors and present our results.

artificial intelligence, machine learning, vehicle, (18 more...)

2506.08963

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.34)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Robots (0.94)

Ciocanel, Maria-Veronica, Nardini, John T., Flores, Kevin B., Rutter, Erica M., Sindi, Suzanne S., Volkening, Alexandria

Enhancing generalizability of model discovery across parameter space with multi-experiment equation learning (ME-EQL)

Agent-based modeling (ABM) is a powerful tool for understanding self-organizing biological systems, but it is computationally intensive and often not analytically tractable. Equation learning (EQL) methods can derive continuum models from ABM data, but they typically require extensive simulations for each parameter set, raising concerns about generalizability. In this work, we extend EQL to Multi-experiment equation learning (ME-EQL) by introducing two methods: one-at-a-time ME-EQL (OAT ME-EQL), which learns individual models for each parameter set and connects them via interpolation, and embedded structure ME-EQL (ES ME-EQL), which builds a unified model library across parameters. We demonstrate these methods using a birth--death mean-field model and an on-lattice agent-based model of birth, death, and migration with spatial structure. Our results show that both methods significantly reduce the relative error in recovering parameters from agent-based simulations, with OAT ME-EQL offering better generalizability across parameter space. Our findings highlight the potential of equation learning from multiple experiments to enhance the generalizability and interpretability of learned models for complex biological systems.

artificial intelligence, machine learning, me-eql, (17 more...)

2506.08916

Country: North America > United States (1.00)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area (0.67)
Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Ballotta, Luca, Vékássy, Áron, Gil, Stephanie, Yemini, Michal

Confidence Boosts Trust-Based Resilience in Cooperative Multi-Robot Systems

Wireless communication-based multi-robot systems open the door to cyberattacks that can disrupt safety and performance of collaborative robots. The physical channel supporting inter-robot communication offers an attractive opportunity to decouple the detection of malicious robots from task-relevant data exchange between legitimate robots. Yet, trustworthiness indications coming from physical channels are uncertain and must be handled with this in mind. In this paper, we propose a resilient protocol for multi-robot operation wherein a parameter λt accounts for how confident a robot is about the legitimacy of nearby robots that the physical channel indicates. Analytical results prove that our protocol achieves resilient coordination with arbitrarily many malicious robots under mild assumptions. Tuning λt allows a designer to trade between near-optimal inter-robot coordination and quick task execution; see Fig. 1. This is a fundamental performance tradeoff and must be carefully evaluated based on the task at hand. The effectiveness of our approach is numerically verified with experiments involving platoons of autonomous cars where some vehicles are maliciously spoofed.

artificial intelligence, consensus, robot, (15 more...)

2506.08807

Genre: Research Report (0.84)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Robots > Robots in the Workplace (0.61)

Margadji, Christos, Pattinson, Sebastian W.

Hybrid Reasoning for Perception, Explanation, and Autonomous Action in Manufacturing

Industrial processes must be robust and adaptable, as environments and tasks are often unpredictable, while operational errors remain costly and difficult to detect. AI-based control systems offer a path forward, yet typically depend on supervised learning with extensive labelled datasets, which limits their ability to generalize across variable and data-scarce industrial settings. Foundation models could enable broader reasoning and knowledge integration, but rarely deliver the quantitative precision demanded by engineering applications. Here, we introduceControl and Interpretation of Production via Hybrid Expertise and Reasoning (CIPHER): a vision-language-action (VLA) model framework aiming to replicate human-like reasoning for industrial control, instantiated in a commercial-grade 3D printer. It integrates a process expert, a regression model enabling quantitative characterization of system states required for engineering tasks. CIPHER also incorporates retrieval-augmented generation to access external expert knowledge and support physics-informed, chain-of-thought reasoning. This hybrid architecture exhibits strong generalization to out-of-distribution tasks. It interprets visual or textual inputs from process monitoring, explains its decisions, and autonomously generates precise machine instructions, without requiring explicit annotations. CIPHER thus lays the foundations for autonomous systems that act with precision, reason with context, and communicate decisions transparently, supporting safe and trusted deployment in industrial settings.

large language model, machine learning, natural language, (22 more...)

2506.08462

Country: Europe > United Kingdom (0.28)

Genre: Research Report (1.00)

Industry:

Machinery > Industrial Machinery (0.68)
Information Technology (0.67)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.89)

Yuan, Yurun, Xie, Tengyang

Reinforce LLM Reasoning through Multi-Agent Reflection

Leveraging more test-time computation has proven to be an effective way to boost the reasoning capabilities of large language models (LLMs). Among various methods, the verify-and-improve paradigm stands out for enabling dynamic solution exploration and feedback incorporation. However, existing approaches often suffer from restricted feedback spaces and lack of coordinated training of different parties, leading to suboptimal performance. To address this, we model this multi-turn refinement process as a Markov Decision Process and introduce DPSDP (Direct Policy Search by Dynamic Programming), a reinforcement learning algorithm that trains an actor-critic LLM system to iteratively refine answers via direct preference learning on self-generated data. Theoretically, DPSDP can match the performance of any policy within the training distribution. Empirically, we instantiate DPSDP with various base models and show improvements on both in- and out-of-distribution benchmarks. For example, on benchmark MATH 500, majority voting over five refinement steps increases first-turn accuracy from 58.2% to 63.2% with Ministral-based models. An ablation study further confirms the benefits of multi-agent collaboration and out-of-distribution generalization.

large language model, machine learning, natural language, (16 more...)

2506.08379

Country: North America > United States > Wisconsin (0.27)

Genre: Research Report > New Finding (0.93)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.45)