Goto

Collaborating Authors

 experimental run


Linear Mode Connectivity under Data Shifts for Deep Ensembles of Image Classifiers

Hepburn, C., Zielke, T., Raulf, A. P.

arXiv.org Artificial Intelligence

--The phenomenon of linear mode connectivity (LMC) links several aspects of deep learning, including training stability under noisy stochastic gradients, the smoothness and generalization of local minima (basins), the similarity and functional diversity of sampled models, and architectural effects on data processing. In this work, we experimentally study LMC under data shifts and identify conditions that mitigate their impact. We interpret data shifts as an additional source of stochastic gradient noise, which can be reduced through small learning rates and large batch sizes. These parameters influence whether models converge to the same local minimum or to regions of the loss landscape with varying smoothness and generalization. Although models sampled via LMC tend to make similar errors more frequently than those converging to different basins, the benefit of LMC lies in balancing training efficiency against the gains achieved from larger, more diverse ensembles. Code and supplementary materials will be made publicly available at https://github.com/DLR-KI/LMC in due course. ODE connectivity refers to a phenomenon, when stochastic gradient descent (SGD) solutions or modes are connected via a path of low loss in neural networks parameter space [1], [2]. So every solution along such path exhibits similar performance and generalization as those solutions, between which the path is constructed. Moreover, such paths were shown to be embedded in a multi-dimensional manifold of low loss [3]. When a connecting path is linear the phenomenon is referred to as linear mode connectivity (LMC) [4]. LMC was investigated under different perspectives: (1) conditions affecting LMC [4], [5], (2) connectivity of layers, features or different types of solutions [6], [7], [8] and (3) so-called "re-basin" approaches, that "transport" a solution from one local minimum From a practical view point, LMC is expected to improve ensemble methods, in particular in federated learning setting, robustness of fine-tuned models, distributed optimization and model pruning [13], [9]. This work focuses on LMC from the perspective of data shifts [14], which are ever-present in real world applications. In particular, when training is performed on multiple training datasets separately and ensembles of models are employed.


Your AI Bosses Are Still Prejudiced: The Emergence of Stereotypes in LLM-Based Multi-Agent Systems

Guo, Jingyu, Xu, Yingying

arXiv.org Artificial Intelligence

While stereotypes are well-documented in human social interactions, AI systems are often presumed to be less susceptible to such biases. Previous studies have focused on biases inherited from training data, but whether stereotypes can emerge spontaneously in AI agent interactions merits further exploration. Through a novel experimental framework simulating workplace interactions with neutral initial conditions, we investigate the emergence and evolution of stereotypes in LLM-based multi-agent systems. Our findings reveal that (1) LLM-Based AI agents develop stereotype-driven biases in their interactions despite beginning without predefined biases; (2) stereotype effects intensify with increased interaction rounds and decision-making power, particularly after introducing hierarchical structures; (3) these systems exhibit group effects analogous to human social behavior, including halo effects, confirmation bias, and role congruity; and (4) these stereotype patterns manifest consistently across different LLM architectures. Through comprehensive quantitative analysis, these findings suggest that stereotype formation in AI systems may arise as an emergent property of multi-agent interactions, rather than merely from training data biases. Our work underscores the need for future research to explore the underlying mechanisms of this phenomenon and develop strategies to mitigate its ethical impacts.


MANTRA: The Manifold Triangulations Assemblage

Ballester, Rubén, Röell, Ernst, Schmid, Daniel Bin, Alain, Mathieu, Escalera, Sergio, Casacuberta, Carles, Rieck, Bastian

arXiv.org Artificial Intelligence

The rising interest in leveraging higher-order interactions present in complex systems has led to a surge in more expressive models exploiting high-order structures in the data, especially in topological deep learning (TDL), which designs neural networks on highorder domains such as simplicial complexes. However, progress in this field is hindered by the scarcity of datasets for benchmarking these architectures. To address this gap, we introduce MANTRA, the first large-scale, diverse, and intrinsically high-order dataset for benchmarking high-order models, comprising over 43,000 and 249,000 triangulations of surfaces and three-dimensional manifolds, respectively. With MANTRA, we assess several graph-and simplicial complex-based models on three topological classification tasks. We demonstrate that while simplicial complex-based neural networks generally outperform their graph-based counterparts in capturing simple topological invariants, they also struggle, suggesting a rethink of TDL. Thus, MANTRA serves as a benchmark for assessing and advancing topological methods, leading the way for more effective high-order models. Success in machine learning is commonly measured by a model's ability to solve tasks on benchmark datasets. While researchers typically devote a large amount of time to build their models, less time is devoted to data and its curation. As a consequence, graph learning is facing some issues in terms of reproducibility and wrong assumptions, which serve as obstructions to progress. An example of this was recently observed while analyzing long-range features: additional hyperparameter tuning resolves performance differences between message-passing (MP) graph neural networks on one side and graph transformers on the other (Tönshoff et al., 2023). In a similar vein, earlier work pointed out the relevance of strong baselines, highlighting the fact that structural information is not exploited equally by all models (Errica et al., 2020). Recently, new analyses even showed that for some benchmark datasets, as well as their associated tasks, graph information may be detrimental for the overall predictive performance (Bechler-Speicher et al., 2024).


The control architecture of a spherical robot for Minimally Invasive Surgery

Rus, Gabriela, Hajjar, Nadim Al, Tucan, Paul, Zima, Ionut, Vaida, Calin, Radu, Corina, Jucan, Daniel, Chablat, Damien, Pisla, Doina

arXiv.org Artificial Intelligence

Control systems used in Minimally Invasive Surgery (MIS) play a crucial role in ensuring preci-sion and safety throughout procedures. This paper presents a control architecture developed for a robotic system designed for MIS operations. The modular structure of the control system allows for compatibility with a range of procedures in abdominal and thoracic regions. The proposed control system, employing the master-slave concept, is presented alongside the experimental model. Functional validation is obtained by performing a Siemens NX simulation and comparing the results with several experimental runs using the experimental model of the robot. With its compact size and stiffness, the system holds promise for integration with other robotic systems. Future efforts will be dedicated to exploring and optimizing this potential collaboration to enhance the overall capabilities of robotic-assisted surgery.


A Novel Bioinspired Neuromorphic Vision-based Tactile Sensor for Fast Tactile Perception

Faris, Omar, Awad, Mohammad I., Awad, Murana A., Zweiri, Yahya, Khalaf, Kinda

arXiv.org Artificial Intelligence

Tactile sensing represents a crucial technique that can enhance the performance of robotic manipulators in various tasks. This work presents a novel bioinspired neuromorphic vision-based tactile sensor that uses an event-based camera to quickly capture and convey information about the interactions between robotic manipulators and their environment. The camera in the sensor observes the deformation of a flexible skin manufactured from a cheap and accessible 3D printed material, whereas a 3D printed rigid casing houses the components of the sensor together. The sensor is tested in a grasping stage classification task involving several objects using a data-driven learning-based approach. The results show that the proposed approach enables the sensor to detect pressing and slip incidents within a speed of 2 ms. The fast tactile perception properties of the proposed sensor makes it an ideal candidate for safe grasping of different objects in industries that involve high-speed pick-and-place operations.


Single file motion of robot swarms

Alonso-Llanes, Laciel, Garcimartín, Angel, Zuriguel, Iker

arXiv.org Artificial Intelligence

We present experimental results on the single file motion of a group of robots interacting with each other through position sensors. We successfully replicate the fundamental diagram typical of these systems, with a transition from free flow to congested traffic as the density of the system increases. In the latter scenario we also observe the characteristic stop-and-go waves. The unique advantages of this novel system, such as experimental stability and repeatability, allow for extended experimental runs, facilitating a comprehensive statistical analysis of the global dynamics. Above a certain density, we observe a divergence of the average jam duration and the average number of robots involved in it. This discovery enables us to precisely identify another transition: from congested intermittent flow (for intermediate densities) to a totally congested scenario for high densities. Beyond this finding, the present work demonstrates the suitability of robot swarms to model complex behaviors in many particle systems.


EEG and EMG dataset for the detection of errors introduced by an active orthosis device

Kueper, Niklas, Chari, Kartik, Bütefür, Judith, Habenicht, Julia, Kim, Su Kyoung, Rossol, Tobias, Tabie, Marc, Kirchner, Frank, Kirchner, Elsa Andrea

arXiv.org Artificial Intelligence

This paper presents a dataset containing recordings of the electroencephalogram (EEG) and the electromyogram (EMG) from eight subjects who were assisted in moving their right arm by an active orthosis device. The supported movements were elbow joint movements, i.e., flexion and extension of the right arm. While the orthosis was actively moving the subject's arm, some errors were deliberately introduced for a short duration of time. During this time, the orthosis moved in the opposite direction. In this paper, we explain the experimental setup and present some behavioral analyses across all subjects. Additionally, we present an average event-related potential analysis for one subject to offer insights into the data quality and the EEG activity caused by the error introduction. The dataset described herein is openly accessible. The aim of this study was to provide a dataset to the research community, particularly for the development of new methods in the asynchronous detection of erroneous events from the EEG. We are especially interested in the tactile and haptic-mediated recognition of errors, which has not yet been sufficiently investigated in the literature. We hope that the detailed description of the orthosis and the experiment will enable its reproduction and facilitate a systematic investigation of the influencing factors in the detection of erroneous behavior of assistive systems by a large community.


Unifying Distillation with Personalization in Federated Learning

Divi, Siddharth, Farrukh, Habiba, Celik, Berkay

arXiv.org Artificial Intelligence

Federated learning (FL) is a decentralized privacy-preserving learning technique in which clients learn a joint collaborative model through a central aggregator without sharing their data. In this setting, all clients learn a single common predictor (FedAvg), which does not generalize well on each client's local data due to the statistical data heterogeneity among clients. In this paper, we address this problem with PersFL, a discrete two-stage personalized learning algorithm. In the first stage, PersFL finds the optimal teacher model of each client during the FL training phase. In the second stage, PersFL distills the useful knowledge from optimal teachers into each user's local model. The teacher model provides each client with some rich, high-level representation that a client can easily adapt to its local model, which overcomes the statistical heterogeneity present at different clients. We evaluate PersFL on CIFAR-10 and MNIST datasets using three data-splitting strategies to control the diversity between clients' data distributions. We empirically show that PersFL outperforms FedAvg and three state-of-the-art personalization methods, pFedMe, Per-FedAvg, and FedPer on majority data-splits with minimal communication cost. Further, we study the performance of PersFL on different distillation objectives, how this performance is affected by the equitable notion of fairness among clients, and the number of required communication rounds. PersFL code is available at https://tinyurl.com/hdh5zhxs for public use and validation.