AITopics

2505.18397

Country: North America > United States (0.93)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

arXiv.org Artificial IntelligenceAug-26-2025

X-Teaming: Multi-Turn Jailbreaks and Defenses with Adaptive Multi-Agents

Rahman, Salman, Jiang, Liwei, Shiffer, James, Liu, Genglin, Issaka, Sheriff, Parvez, Md Rizwan, Palangi, Hamid, Chang, Kai-Wei, Choi, Yejin, Gabriel, Saadia

Multi-turn interactions with language models (LMs) pose critical safety risks, as harmful intent can be strategically spread across exchanges. Yet, the vast majority of prior work has focused on single-turn safety, while adaptability and diversity remain among the key challenges of multi-turn red-teaming. To address these challenges, we present X-Teaming, a scalable framework that systematically explores how seemingly harmless interactions escalate into harmful outcomes and generates corresponding attack scenarios. X-Teaming employs collaborative agents for planning, attack optimization, and verification, achieving state-of-the-art multi-turn jailbreak effectiveness and diversity with success rates up to 98.1% across representative leading open-weight and closed-source models. In particular, X-Teaming achieves a 96.2% attack success rate against the latest Claude 3.7 Sonnet model, which has been considered nearly immune to single-turn attacks. Building on X-Teaming, we introduce XGuard-Train, an open-source multi-turn safety training dataset that is 20x larger than the previous best resource, comprising 30K interactive jailbreaks, designed to enable robust multi-turn safety alignment for LMs. Our work offers essential tools and insights for mitigating sophisticated conversational attacks, advancing the multi-turn safety of LMs.

large language model, machine learning, natural language, (18 more...)

2504.13203

Country:

Asia > Middle East (0.45)
North America > United States > California (0.27)

Genre: Research Report (1.00)

Industry:

Media (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Information Technology > Security & Privacy (1.00)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Swarming Without an Anchor (SWA): Robot Swarms Adapt Better to Localization Dropouts Then a Single Robot

Horyna, Jiri, Jung, Roland, Weiss, Stephan, Ferrante, Eliseo, Saska, Martin

--In this paper, we present the Swarming Without an Anchor (SW A) approach to state estimation in swarms of Unmanned Aerial V ehicles (UA Vs) experiencing ego-localization dropout, where individual agents are laterally stabilized using relative information only. We propose to fuse decentralized state estimation with robust mutual perception and onboard sensor data to maintain accurate state awareness despite intermittent localization failures. Thus, the relative information used to estimate the lateral state of UA Vs enables the identification of the unambiguous state of UA Vs with respect to the local constellation. The resulting behavior reaches velocity consensus, as this task can be referred to as the double integrator synchronization problem. All disturbances and performance degradations except a uniform translation drift of the swarm as a whole is attenuated which is enabling new opportunities in using tight cooperation for increasing reliability and resilience of multi-UA V systems. Simulations and real-world experiments validate the effectiveness of our approach, demonstrating its capability to sustain cohesive swarm behavior in challenging conditions of unreliable or unavailable primary localization. A V swarms enhance mission capabilities by leveraging cooperative behavior to perform tasks more efficiently than single UA Vs [1]-[7].

artificial intelligence, ieee robotic and automation letter, state estimation, (13 more...)

doi: 10.1109/LRA.2025.3562786

2508.1646

Country:

Europe (0.68)
Asia > Middle East > UAE (0.46)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

AgentScope 1.0: A Developer-Centric Framework for Building Agentic Applications

Gao, Dawei, Li, Zitao, Xie, Yuexiang, Kuang, Weirui, Yao, Liuyi, Qian, Bingchen, Ma, Zhijian, Cui, Yue, Luo, Haohao, Li, Shen, Yi, Lu, Yu, Yi, He, Shiqi, Luo, Zhiling, Zhou, Wenmeng, Zhang, Zhicheng, He, Xuguang, Chen, Ziqian, Liao, Weikai, Kushnazarov, Farruh Isakulovich, Li, Yaliang, Ding, Bolin, Zhou, Jingren

Driven by rapid advancements of Large Language Models (LLMs), agents are empowered to combine intrinsic knowledge with dynamic tool use, greatly enhancing their capacity to address real-world tasks. In line with such an evolution, AgentScope introduces major improvements in a new version (1.0), towards comprehensively supporting flexible and efficient tool-based agent-environment interactions for building agentic applications. Specifically, we abstract foundational components essential for agentic applications and provide unified interfaces and extensible modules, enabling developers to easily leverage the latest progress, such as new models and MCPs. Furthermore, we ground agent behaviors in the ReAct paradigm and offer advanced agent-level infrastructure based on a systematic asynchronous design, which enriches both human-agent and agent-agent interaction patterns while improving execution efficiency. Building on this foundation, we integrate several built-in agents tailored to specific practical scenarios. AgentScope also includes robust engineering support for developer-friendly experiences. We provide a scalable evaluation module with a visual studio interface, making the development of long-trajectory agentic applications more manageable and easier to trace. In addition, AgentScope offers a runtime sandbox to ensure safe agent execution and facilitates rapid deployment in production environments. With these enhancements, AgentScope provides a practical foundation for building scalable, adaptive, and effective agentic applications.

large language model, machine learning, natural language, (20 more...)

2508.16279

Country: Asia (0.28)

Genre:

Workflow (1.00)
Research Report (0.64)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
(2 more...)

Limit-Computable Grains of Truth for Arbitrary Computable Extensive-Form (Un)Known Games

Wyeth, Cole, Hutter, Marcus, Leike, Jan, Taylor, Jessica

A Bayesian player acting in an infinite multi-player game learns to predict the other players' strategies if his prior assigns positive probability to their play (or contains a grain of truth). Kalai and Lehrer's classic grain of truth problem is to find a reasonably large class of strategies that contains the Bayes-optimal policies with respect to this class, allowing mutually-consistent beliefs about strategy choice that obey the rules of Bayesian inference. Only small classes are known to have a grain of truth and the literature contains several related impossibility results. In this paper we present a formal and general solution to the full grain of truth problem: we construct a class of strategies wide enough to contain all computable strategies as well as Bayes-optimal strategies for every reasonable prior over the class. When the "environment" is a known repeated stage game, we show convergence in the sense of [KL93a] and [KL93b]. When the environment is unknown, agents using Thompson sampling converge to play $\varepsilon$-Nash equilibria in arbitrary unknown computable multi-agent environments. Finally, we include an application to self-predictive policies that avoid planning. While these results use computability theory only as a conceptual tool to solve a classic game theory problem, we show that our solution can naturally be computationally approximated arbitrarily closely.

artificial intelligence, bayesian inference, machine learning, (18 more...)

2508.16245

Country:

North America > United States (0.28)
Europe (0.27)

Genre:

Workflow (0.67)
Research Report (0.63)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.66)

IR-Agent: Expert-Inspired LLM Agents for Structure Elucidation from Infrared Spectra

Noh, Heewoong, Lee, Namkyeong, Na, Gyoung S., Kim, Kibum, Park, Chanyoung

Spectral analysis provides crucial clues for the elucidation of unknown materials. Among various techniques, infrared spectroscopy (IR) plays an important role in laboratory settings due to its high accessibility and low cost. However, existing approaches often fail to reflect expert analytical processes and lack flexibility in incorporating diverse types of chemical knowledge, which is essential in real-world analytical scenarios. In this paper, we propose IR-Agent, a novel multi-agent framework for molecular structure elucidation from IR spectra. The framework is designed to emulate expert-driven IR analysis procedures and is inherently extensible. Each agent specializes in a specific aspect of IR interpretation, and their complementary roles enable integrated reasoning, thereby improving the overall accuracy of structure elucidation. Through extensive experiments, we demonstrate that IR-Agent not only improves baseline performance on experimental IR spectra but also shows strong adaptability to various forms of chemical information.

information, large language model, machine learning, (15 more...)

2508.16112

Genre: Research Report (1.00)

Industry: Materials > Chemicals > Commodity Chemicals (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Liao, Xiao-Cheng, Mei, Yi, Zhang, Mengjie

GPLight+: A Genetic Programming Method for Learning Symmetric Traffic Signal Control Policy

--Recently, learning-based approaches, have achieved significant success in automatically devising effective traffic signal control strategies. In particular, as a powerful evolutionary machine learning approach, Genetic Programming (GP) is utilized to evolve human-understandable phase urgency functions to measure the urgency of activating a green light for a specific phase. However, current GP-based methods are unable to treat the common traffic features of different traffic signal phases consistently. T o address this issue, we propose to use a symmetric phase urgency function to calculate the phase urgency for a specific phase based on the current road conditions. This is represented as an aggregation of two shared subtrees, each representing the urgency of a turn movement in the phase. We then propose a GP method to evolve the symmetric phase urgency function. We evaluate our proposed method on the well-known cityflow traffic simulator, based on multiple public real-world datasets. The experimental results show that the proposed symmetric urgency function representation can significantly improve the performance of the learned traffic signal control policies over the traditional GP representation on a wide range of scenarios. Further analysis shows that the proposed method can evolve effective, human-understandable and easily deployable traffic signal control policies. RAFFIC signals, located at signalized intersections, manage traffic flow in various directions, thereby significantly contributing to the improvement of both transportation efficiency and road safety [1]. Poorly designed traffic signal plans result in commuters wasting valuable time on the roads. The majority of existing traffic signal control systems do not operate based on decisions tailored to the dynamic traffic conditions. For instance, the Sydney Coordinated Adaptive Traffic System [2], which relies on a predetermined cycle time plan, remains extensively utilized in real signalized intersections worldwide. The emergence of Deep Reinforcement Learning (DRL) as a solution to the Traffic Signal Control (TSC) problem is driven by advancements in deep learning [3] and the increasing accessibility of transportation infrastructure components such as surveillance cameras, road sensors, and the internet of vehicles [4]. This trend is exemplified by recent research efforts [5]-[7].

evolutionary algorithm, machine learning, reinforcement learning, (16 more...)

doi: 10.1109/TEVC.2025.3578575

2508.1609

Country:

Asia (0.28)
North America > United States (0.28)
Oceania (0.28)
Europe (0.28)

Genre: Research Report > New Finding (0.66)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Allam, Ahmed, Mansour, Youssef, Shalan, Mohamed

ASIC-Agent: An Autonomous Multi-Agent System for ASIC Design with Benchmark Evaluation

Large Language Models (LLMs) have demonstrated remarkable capabilities in Register Transfer Level (RTL) design, enabling high-quality code generation from natural language descriptions. However, LLMs alone face significant limitations in real-world hardware design workflows, including the inability to execute code, lack of debugging capabilities, and absence of long-term memory. To address these challenges, we present ASIC-Agent, an autonomous system designed specifically for digital ASIC design tasks. ASIC-Agent enhances base LLMs with a multi-agent architecture incorporating specialized sub-agents for RTL generation, verification, OpenLane hardening, and Caravel chip integration, all operating within a comprehensive sandbox environment with access to essential hardware design tools. The system leverages a vector database containing documentation, API references, error knowledge, and curated insights from the open-source silicon community. To evaluate ASIC-Agent's performance, we introduce ASIC-Agent-Bench, the first benchmark specifically designed to assess agentic systems in hardware design tasks. We evaluate ASIC-Agent with various base LLMs, providing quantitative comparisons and qualitative insights into agent behavior across different design scenarios. Our results demonstrate that ASIC-Agent, when powered by Claude 4 Sonnet, successfully automates a broad range of ASIC design tasks spanning varying levels of complexity, showing the potential of significantly accelerating the ASIC design workflow.

large language model, machine learning, multiplier, (21 more...)

doi: 10.1109/ICLAD65226.2025.00033

2508.1594

Genre: Research Report > New Finding (0.68)

Industry: Semiconductors & Electronics (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

DeepMEL: A Multi-Agent Collaboration Framework for Multimodal Entity Linking

Wang, Fang, Yan, Tianwei, Yang, Zonghao, Hu, Minghao, Zhang, Jun, Luo, Zhunchen, Bai, Xiaoying

Entity linking is a fundamental task in knowledge graph (KG) construction Hofer et al. (2024), aiming to link mentions to their corresponding entities in a target knowledge base (KB). It is widely applied in downstream natural language processing (NLP) tasks, such as Question & Answering Systems Sequeda et al. (2024) and intelligent recommendation systems Chaudhari et al. (2017). Recently, the explosive growth of multimodal data on the Internet has raised challenges, as the quality of online information is often inconsistent, many mentions are ambiguous, and contextual information is frequently incomplete. Under such conditions, relying solely on a single modality (such as pure text) is often insufficient to accurately resolve reference ambiguity Gan et al. (2021). Integrating textual and visual modalities can significantly improve the precision and efficiency of disambiguation Gella et al. (2017). Consequently, multimodal entity linking, which involves combining textual and visual information to link real-world mentions to corresponding entities in a multimodal knowledge graph (MMKG), has become a critical research task. For example, as shown in Figure 1, the mention of "Apple" may be difficult to disambiguate, as it could refer to various entities, such as Apple Inc. or the apple (fruit). However, by considering both textual and visual information, it becomes easier and clearer to accurately link the mention of "Apple" to the entity "apple (fruit of the apple tree)." Currently, multimodal entity linking models are primarily based on deep learning frameworks, utilizing cross-attention mechanisms Lu and Elhamifar (2024) and visual feature encoding techniques Mokssit et al. (2023) to achieve the fusion of textual mentions and visual information.

large language model, machine learning, question answering, (22 more...)

2508.15876

Country:

Europe (1.00)
Asia (1.00)
South America (0.67)
North America > United States > California (0.46)

Genre: Research Report (1.00)

Industry: Information Technology (0.87)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

A Functionality-Grounded Benchmark for Evaluating Web Agents in E-commerce Domains

Zhang, Xianren, Prasad, Shreyas, Wang, Di, Zeng, Qiuhai, Wang, Suhang, Yan, Wenbo, Hans, Mat

Web agents have shown great promise in performing many tasks on ecommerce website. To assess their capabilities, several benchmarks have been introduced. However, current benchmarks in the e-commerce domain face two major problems. First, they primarily focus on product search tasks (e.g., Find an Apple Watch), failing to capture the broader range of functionalities offered by real-world e-commerce platforms such as Amazon, including account management and gift card operations. Second, existing benchmarks typically evaluate whether the agent completes the user query, but ignore the potential risks involved. In practice, web agents can make unintended changes that negatively impact the user account or status. For instance, an agent might purchase the wrong item, delete a saved address, or incorrectly configure an auto-reload setting. To address these gaps, we propose a new benchmark called Amazon-Bench. To generate user queries that cover a broad range of tasks, we propose a data generation pipeline that leverages webpage content and interactive elements (e.g., buttons, check boxes) to create diverse, functionality-grounded user queries covering tasks such as address management, wish list management, and brand store following. To improve the agent evaluation, we propose an automated evaluation framework that assesses both the performance and the safety of web agents. We systematically evaluate different agents, finding that current agents struggle with complex queries and pose safety risks. These results highlight the need for developing more robust and reliable web agents.

large language model, machine learning, natural language, (19 more...)

2508.15832

Country: North America > United States (0.93)

Genre:

Research Report (0.64)
Workflow (0.46)

Industry: Information Technology > Services > e-Commerce Services (1.00)

Technology:

Information Technology > e-Commerce (1.00)
Information Technology > Communications > Web (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(3 more...)