Not enough data to create a plot.
Try a different view from the menu above.
Federation of Agents: A Semantics-Aware Communication Fabric for Large-Scale Agentic AI
Giusti, Lorenzo, Werner, Ole Anton, Taiello, Riccardo, Costa, Matilde Carvalho, Tosun, Emre, Protani, Andrea, Molina, Marc, de Almeida, Rodrigo Lopes, Cacace, Paolo, Santos, Diogo Reis, Serio, Luigi
We present Federation of Agents (FoA), a distributed orchestration framework that transforms static multi-agent coordination into dynamic, capability-driven collaboration. FoA introduces Versioned Capability Vectors (VCVs): machine-readable profiles that make agent capabilities searchable through semantic embeddings, enabling agents to advertise their capabilities, cost, and limitations. Our aarchitecturecombines three key innovations: (1) semantic routing that matches tasks to agents over sharded HNSW indices while enforcing operational constraints through cost-biased optimization, (2) dynamic task decomposition where compatible agents collaboratively break down complex tasks into DAGs of subtasks through consensus-based merging, and (3) smart clustering that groups agents working on similar subtasks into collaborative channels for k-round refinement before synthesis. Built on top of MQTT,s publish-subscribe semantics for scalable message passing, FoA achieves sub-linear complexity through hierarchical capability matching and efficient index maintenance. Evaluation on HealthBench shows 13x improvements over single-model baselines, with clustering-enhanced laboration particularly effective for complex reasoning tasks requiring multiple perspectives. The system scales horizontally while maintaining consistent performance, demonstrating that semantic orchestration with structured collaboration can unlock the collective intelligence of heterogeneous federations of AI agents.
Probing Gender Bias in Multilingual LLMs: A Case Study of Stereotypes in Persian
Kalhor, Ghazal, Bahrak, Behnam
Multilingual Large Language Models (LLMs) are increasingly used worldwide, making it essential to ensure they are free from gender bias to prevent representational harm. While prior studies have examined such biases in high-resource languages, low-resource languages remain understudied. In this paper, we propose a template-based probing methodology, validated against real-world data, to uncover gender stereotypes in LLMs. As part of this framework, we introduce the Domain-Specific Gender Skew Index (DS-GSI), a metric that quantifies deviations from gender parity. We evaluate four prominent models, GPT-4o mini, DeepSeek R1, Gemini 2.0 Flash, and Qwen QwQ 32B, across four semantic domains, focusing on Persian, a low-resource language with distinct linguistic features. Our results show that all models exhibit gender stereotypes, with greater disparities in Persian than in English across all domains. Among these, sports reflect the most rigid gender biases. This study underscores the need for inclusive NLP practices and provides a framework for assessing bias in other low-resource languages.
EchoBench: Benchmarking Sycophancy in Medical Large Vision-Language Models
Yuan, Botai, Zhou, Yutian, Wang, Yingjie, Huo, Fushuo, Jing, Yongcheng, Shen, Li, Wei, Ying, Shen, Zhiqi, Liu, Ziwei, Zhang, Tianwei, Yang, Jie, Tao, Dacheng
Recent benchmarks for medical Large Vision-Language Models (LVLMs) emphasize leaderboard accuracy, overlooking reliability and safety. We study sycophancy -- models' tendency to uncritically echo user-provided information -- in high-stakes clinical settings. We introduce EchoBench, a benchmark to systematically evaluate sycophancy in medical LVLMs. It contains 2,122 images across 18 departments and 20 modalities with 90 prompts that simulate biased inputs from patients, medical students, and physicians. We evaluate medical-specific, open-source, and proprietary LVLMs. All exhibit substantial sycophancy; the best proprietary model (Claude 3.7 Sonnet) still shows 45.98% sycophancy, and GPT-4.1 reaches 59.15%. Many medical-specific models exceed 95% sycophancy despite only moderate accuracy. Fine-grained analyses by bias type, department, perceptual granularity, and modality identify factors that increase susceptibility. We further show that higher data quality/diversity and stronger domain knowledge reduce sycophancy without harming unbiased accuracy. EchoBench also serves as a testbed for mitigation: simple prompt-level interventions (negative prompting, one-shot, few-shot) produce consistent reductions and motivate training- and decoding-time strategies. Our findings highlight the need for robust evaluation beyond accuracy and provide actionable guidance toward safer, more trustworthy medical LVLMs.
Formal Verification of Minimax Algorithms
Wesselink, Wieger, Huizing, Kees, van de Wetering, Huub
Using the Dafny verification system, we formally verify a range of minimax search algorithms, including variations with alpha-beta pruning and transposition tables. For depth-limited search with transposition tables, we introduce a witness-based correctness criterion and apply it to two representative algorithms. All verification artifacts, including proofs and Python implementations, are publicly available.
Less is More: The Effectiveness of Compact Typological Language Representations
Ng, York Hay, Hoang, Phuong Hanh, Lee, En-Shiun Annie
Linguistic feature datasets such as URIEL+ are valuable for modelling cross-lingual relationships, but their high dimensionality and sparsity, especially for low-resource languages, limit the effectiveness of distance metrics. We propose a pipeline to optimize the URIEL+ typological feature space by combining feature selection and imputation, producing compact yet interpretable typological representations. We evaluate these feature subsets on linguistic distance alignment and downstream tasks, demonstrating that reduced-size representations of language typology can yield more informative distance metrics and improve performance in multilingual NLP applications.
KSDiff: Keyframe-Augmented Speech-Aware Dual-Path Diffusion for Facial Animation
Lyu, Tianle, Zhao, Junchuan, Wang, Ye
Audio-driven facial animation has made significant progress in multimedia applications, with diffusion models showing strong potential for talking-face synthesis. However, most existing works treat speech features as a monolithic representation and fail to capture their fine-grained roles in driving different facial motions, while also overlooking the importance of modeling keyframes with intense dynamics. To address these limitations, we propose KSDiff, a Keyframe-Augmented Speech-Aware Dual-Path Diffusion framework. Specifically, the raw audio and transcript are processed by a Dual-Path Speech Encoder (DPSE) to disentangle expression-related and head-pose-related features, while an autoregressive Keyframe Establishment Learning (KEL) module predicts the most salient motion frames. These components are integrated into a Dual-path Motion generator to synthesize coherent and realistic facial motions. Extensive experiments on HDTF and VoxCeleb demonstrate that KSDiff achieves state-of-the-art performance, with improvements in both lip synchronization accuracy and head-pose naturalness. Our results highlight the effectiveness of combining speech disentanglement with keyframe-aware diffusion for talking-head generation.
Probability Signature: Bridging Data Semantics and Embedding Structure in Language Models
In recent years, deep neural network-based large language models (LLMs) have demonstrated remarkable performance (Comanici et al., 2025; OpenAI et al., 2024; DeepSeek-AI et al., 2025). The development of these models has largely followed what Richard Sutton termed "the bitter lesson"-that the most effective approach to improving AI performance has historically been to leverage greater computational resources, larger models, and more data, rather than incorporating human knowledge or specialized architectures (Sutton, 2019). This trend has been formalized through scaling laws, which quantify the relationship between model performance and factors such as model size, dataset size, and computational budget through power law relationships (Kaplan et al., 2020). While these scaling laws provide valuable quantitative predictions for model performance, they also reveal a concerning limitation: the power law relationship suggests that achieving further significant improvements may require prohibitively large increases in model and data size, making continued scaling increasingly impractical and resource-intensive. One promising approach to address these limitations is to develop a deeper understanding of the underlying mechanisms that drive transformer models' success in natural language processing (NLP). The No Free Lunch theorem establishes that no single algorithm can perform optimally across all problem domains, highlighting the fundamental importance of understanding both the characteristics of the data and the properties of the algorithms that process it (Wolpert & Macready, 1997).
Discrete Diffusion for Reflective Vision-Language-Action Models in Autonomous Driving
Li, Pengxiang, Zheng, Yinan, Wang, Yue, Wang, Huimin, Zhao, Hang, Liu, Jingjing, Zhan, Xianyuan, Zhan, Kun, Lang, Xianpeng
End-to-End (E2E) solutions have emerged as a mainstream approach for autonomous driving systems, with Vision-Language-Action (VLA) models representing a new paradigm that leverages pre-trained multimodal knowledge from Vision-Language Models (VLMs) to interpret and interact with complex real-world environments. However, these methods remain constrained by the limitations of imitation learning, which struggles to inherently encode physical rules during training. Existing approaches often rely on complex rule-based post-refinement, employ reinforcement learning that remains largely limited to simulation, or utilize diffusion guidance that requires computationally expensive gradient calculations. To address these challenges, we introduce ReflectDrive, a novel learning-based framework that integrates a reflection mechanism for safe trajectory generation via discrete diffusion. We first discretize the two-dimensional driving space to construct an action codebook, enabling the use of pre-trained Diffusion Language Models for planning tasks through fine-tuning. Central to our approach is a safety-aware reflection mechanism that performs iterative self-correction without gradient computation. Our method begins with goal-conditioned trajectory generation to model multi-modal driving behaviors. Based on this, we apply local search methods to identify unsafe tokens and determine feasible solutions, which then serve as safe anchors for inpainting-based regeneration. Evaluated on the NA VSIM benchmark, ReflectDrive demonstrates significant advantages in safety-critical trajectory generation, offering a scalable and reliable solution for autonomous driving systems.
From Pheromones to Policies: Reinforcement Learning for Engineered Biological Swarms
Vellinger, Aymeric, Antonic, Nemanja, Tuci, Elio
Swarm intelligence emerges from decentralised interactions among simple agents, enabling collective problem-solving. This study establishes a theoretical equivalence between pheromone-mediated aggregation in \celeg\ and reinforcement learning (RL), demonstrating how stigmergic signals function as distributed reward mechanisms. We model engineered nematode swarms performing foraging tasks, showing that pheromone dynamics mathematically mirror cross-learning updates, a fundamental RL algorithm. Experimental validation with data from literature confirms that our model accurately replicates empirical \celeg\ foraging patterns under static conditions. In dynamic environments, persistent pheromone trails create positive feedback loops that hinder adaptation by locking swarms into obsolete choices. Through computational experiments in multi-armed bandit scenarios, we reveal that introducing a minority of exploratory agents insensitive to pheromones restores collective plasticity, enabling rapid task switching. This behavioural heterogeneity balances exploration-exploitation trade-offs, implementing swarm-level extinction of outdated strategies. Our results demonstrate that stigmergic systems inherently encode distributed RL processes, where environmental signals act as external memory for collective credit assignment. By bridging synthetic biology with swarm robotics, this work advances programmable living systems capable of resilient decision-making in volatile environments.
Hybrid Safety Verification of Multi-Agent Systems using $ψ$-Weighted CBFs and PAC Guarantees
Margapuri, Venkat, Kazanjian, Garik, Kosaraju, Naren
Abstract--This study proposes a hybrid safety verification framework for closed-loop multi-agent systems under bounded stochastic disturbances. The proposed approach augments control barrier functions with a novel ψ-weighted formulation that encodes directional control alignment between agents into the safety constraints. Deterministic admissibility is combined with empirical validation via Monte Carlo rollouts, and a PAC-style guarantee is derived based on margin-aware safety violations to provide a probabilistic safety certificate. The results from the experiments conducted under different bounded stochastic disturbances validate the feasibility of the proposed approach. Safety within multi-agent systems is essential for real-world applications such as autonomous driving [1], [2] and robotic swarm deployments in agriculture [3], [4], manufacturing [5], [6], and search and rescue operations [7], where agents must navigate safely through their environment.