Bavaria
Robot Talk Episode 116 – Evolved behaviour for robot teams, with Tanja Kaiser
Claire chatted to Tanja Katharina Kaiser from the University of Technology Nuremberg about how applying evolutionary principles can help robot teams make better decisions. Tanja Katharina Kaiser is a senior researcher heading the Multi-Robot Systems Satellite Lab at the University of Technology Nuremberg (UTN) in Germany. She and her team focus on the development of adaptive multi-robot systems to solve complex real-world tasks using artificial intelligence. Tanja received her doctorate in robotics from the University of Lübeck in Germany in 2022. Before joining UTN, she held postdoctoral research positions at the Technical University of Dresden and the University of Konstanz.
'Dibling is the antidote to robotic, structured & predictable football'
In a world and industry which is becoming more commercialised, over sanitised, robotic, structured and predictable, Tyler's greatest strength is the opposite to all of that." That's quite the sell for Southampton's 19-year-old midfield star Tyler Dibling, especially given his basic Premier League career numbers amount to 25 appearances, 1540 minutes played, two goals and zero assists. But that gushing description from one senior source at the club, speaking to BBC Sport anonymously, hints at an emerging talent interesting a host of top clubs and why there are some unsubstantiated reports of a 100m price tag on his head. With the Saints facing an immediate relegation back to the Championship, Dibling's future is likely to be one of the summer's more interesting sagas, with Manchester United, Arsenal, Tottenham and Bayern Munich all reportedly chasing his signature. Another source close to the club suggested Southampton turned down previously unreported bids of 35m from Tottenham and 30m from RB Leipzig in January, with the club valuing Dibling at 55m at the start of the winter window. Southampton have not commented on those rumours, but what is known is that Dibling is one of the lowest paid players in Southampton's squad and has a deal that expires in 2027, after Southampton triggered a 12-month extension option. He signed his last contract in December 2023, when he had played just five minutes of senior football. The England Under-21 international has so far resisted the club's offers of a new deal in what has been a breakthrough season for him, despite a wretched campaign which could still see Southampton relegated with the Premier League's lowest ever points total. His dribbles completed per game (2.34) and fouls won per game (2.57) place him in the top 10. "He's the most fearless player I've ever worked with," former Saints Under-21 head coach Adam Asghar tells BBC Sport. "He's totally unique to anything I've seen before.
PROSPECT PTMs: Rich Labeled Tandem Mass Spectrometry Dataset of Modified Peptides for Machine Learning in Proteomics Wassim Gabriel 1 Omar Shouman 1 Ayla Schroeder 1
Post-Translational Modifications (PTMs) are changes that occur in proteins after synthesis, influencing their structure, function, and cellular behavior. PTMs are essential in cell biology; they regulate protein function and stability, are involved in various cellular processes, and are linked to numerous diseases. A particularly interesting class of PTMs are chemical modifications such as phosphorylation introduced on amino acid side chains because they can drastically alter the physicochemical properties of the peptides once they are present. One or more PTMs can be attached to each amino acid of the peptide sequence. The most commonly applied technique to detect PTMs on proteins is bottom-up Mass Spectrometrybased proteomics (MS), where proteins are digested into peptides and subsequently analyzed using Tandem Mass Spectrometry (MS/MS).
Evaluating alignment between humans and neural network representations in image-based learning tasks
Humans represent scenes and objects in rich feature spaces, carrying information that allows us to generalise about category memberships and abstract functions with few examples. What determines whether a neural network model generalises like a human? We tested how well the representations of 86 pretrained neural network models mapped to human learning trajectories across two tasks where humans had to learn continuous relationships and categories of natural images. In these tasks, both human participants and neural networks successfully identified the relevant stimulus features within a few trials, demonstrating effective generalisation. We found that while training dataset size was a core determinant of alignment with human choices, contrastive training with multi-modal data (text and imagery) was a common feature of currently publicly available models that predicted human generalisation. Intrinsic dimensionality of representations had different effects on alignment for different model types. Lastly, we tested three sets of human-aligned representations and found no consistent improvements in predictive accuracy compared to the baselines. In conclusion, pretrained neural networks can serve to extract representations for cognitive models, as they appear to capture some fundamental aspects of cognition that are transferable across tasks. Both our paradigms and modelling approach offer a novel way to quantify alignment between neural networks and humans and extend cognitive science into more naturalistic domains.
Targeted Sequential Indirect Experiment Design Niclas Dern Technical University of Munich Technical University of Munich Helmholtz Munich Munich Center for Machine Learning (MCML) Jason Hartford
Scientific hypotheses typically concern specific aspects of complex, imperfectly understood or entirely unknown mechanisms, such as the effect of gene expression levels on phenotypes or how microbial communities influence environmental health. Such queries are inherently causal (rather than purely associational), but in many settings, experiments can not be conducted directly on the target variables of interest, but are indirect. Therefore, they perturb the target variable, but do not remove potential confounding factors. If, additionally, the resulting experimental measurements are multi-dimensional and the studied mechanisms nonlinear, the query of interest is generally not identified. We develop an adaptive strategy to design indirect experiments that optimally inform a targeted query about the ground truth mechanism in terms of sequentially narrowing the gap between an upper and lower bound on the query. While the general formulation consists of a bi-level optimization procedure, we derive an efficiently estimable analytical kernel-based estimator of the bounds for the causal effect, a query of key interest, and demonstrate the efficacy of our approach in confounded, multivariate, nonlinear synthetic settings.
Quantifying Aleatoric Uncertainty of the Treatment Effect: A Novel Orthogonal Learner 2 1 LMU Munich & Munich Center for Machine Learning (MCML), Germany
Estimating causal quantities from observational data is crucial for understanding the safety and effectiveness of medical treatments. However, to make reliable inferences, medical practitioners require not only estimating averaged causal quantities, such as the conditional average treatment effect, but also understanding the randomness of the treatment effect as a random variable. This randomness is referred to as aleatoric uncertainty and is necessary for understanding the probability of benefit from treatment or quantiles of the treatment effect. Yet, the aleatoric uncertainty of the treatment effect has received surprisingly little attention in the causal machine learning community. To fill this gap, we aim to quantify the aleatoric uncertainty of the treatment effect at the covariate-conditional level, namely, the conditional distribution of the treatment effect (CDTE).
GatedxLSTM: A Multimodal Affective Computing Approach for Emotion Recognition in Conversations
Li, Yupei, Sun, Qiyang, Murthy, Sunil Munthumoduku Krishna, Alturki, Emran, Schuller, Björn W.
GatedxLSTM: A Multimodal Affective Computing Approach for Emotion Recognition in Conversations Y upei Li, Qiyang Sun, Sunil Munthumoduku Krishna Murthy, Emran Alturki, and Bj orn W . Schuller Fellow, IEEE Abstract --Affective Computing (AC) is essential for advancing Artificial General Intelligence (AGI), with emotion recognition serving as a key component. However, human emotions are inherently dynamic, influenced not only by an individual's expressions but also by interactions with others, and single-modality approaches often fail to capture their full dynamics. Multimodal Emotion Recognition (MER) leverages multiple signals but traditionally relies on utterance-level analysis, overlooking the dynamic nature of emotions in conversations. Emotion Recognition in Conversation (ERC) addresses this limitation, yet existing methods struggle to align multimodal features and explain why emotions evolve within dialogues. T o bridge this gap, we propose GatedxLSTM, a novel speech-text multimodal ERC model that explicitly considers voice and transcripts of both the speaker and their conversational partner(s) to identify the most influential sentences driving emotional shifts. By integrating Contrastive Language-Audio Pretraining (CLAP) for improved cross-modal alignment and employing a gating mechanism to emphasise emotionally impactful utterances, GatedxLSTM enhances both interpretability and performance. Experiments on the IEMOCAP dataset demonstrate that GatedxLSTM achieves state-of-the-art (SOT A) performance among open-source methods in four-class emotion classification. These results validate its effectiveness for ERC applications and provide an interpretability analysis from a psychological perspective. I NTRODUCTION Artificial General Intelligence (AGI) represents a key future direction in AI development, with Affective Computing (AC) playing a crucial role in enhancing AGI's ability to interact effectively with humans. Sunil Munthumoduku Krishna Murthy is with CHI - Chair of Health Informatics, MRI, Technical University of Munich, Germany (e-mail: sunil.munthumoduku@tum.de). Bj orn W . Schuller is with GLAM, Department of Computing, Imperial College London, UK; CHI - Chair of Health Informatics, Technical University of Munich, Germany; relAI - the Konrad Zuse School of Excellence in Reliable AI, Munich, Germany; MDSI - Munich Data Science Institute, Munich, Germany; and MCML - Munich Center for Machine Learning, Munich, Germany (e-mail: schuller@tum.de). Y upei Li and Qiyang Sun contributed equally to this work.
Domain-incremental White Blood Cell Classification with Privacy-aware Continual Learning
Kumari, Pratibha, Bozorgpour, Afshin, Reisenbüchler, Daniel, Jost, Edgar, Crysandt, Martina, Matek, Christian, Merhof, Dorit
White blood cell (WBC) classification plays a vital role in hematology for diagnosing various medical conditions. However, it faces significant challenges due to domain shifts caused by variations in sample sources (e.g., blood or bone marrow) and differing imaging conditions across hospitals. Traditional deep learning models often suffer from catastrophic forgetting in such dynamic environments, while foundation models, though generally robust, experience performance degradation when the distribution of inference data differs from that of the training data. To address these challenges, we propose a generative replay-based Continual Learning (CL) strategy designed to prevent forgetting in foundation models for WBC classification. Our method employs lightweight generators to mimic past data with a synthetic latent representation to enable privacy-preserving replay. To showcase the effectiveness, we carry out extensive experiments with a total of four datasets with different task ordering and four backbone models including ResNet50, RetCCL, CTransPath, and UNI. Experimental results demonstrate that conventional fine-tuning methods degrade performance on previously learned tasks and struggle with domain shifts. In contrast, our continual learning strategy effectively mitigates catastrophic forgetting, preserving model performance across varying domains. This work presents a practical solution for maintaining reliable WBC classification in real-world clinical settings, where data distributions frequently evolve.
HiRes-FusedMIM: A High-Resolution RGB-DSM Pre-trained Model for Building-Level Remote Sensing Applications
Mutreja, Guneet, Schuegraf, Philipp, Bittner, Ksenia
Recent advances in self-supervised learning have led to the development of foundation models that have significantly advanced performance in various computer vision tasks. However, despite their potential, these models often overlook the crucial role of high-resolution digital surface models (DSMs) in understanding urban environments, particularly for building-level analysis, which is essential for applications like digital twins. To address this gap, we introduce HiRes-FusedMIM, a novel pre-trained model specifically designed to leverage the rich information contained within high-resolution RGB and DSM data. HiRes-FusedMIM utilizes a dual-encoder simple masked image modeling (SimMIM) architecture with a multi-objective loss function that combines reconstruction and contrastive objectives, enabling it to learn powerful, joint representations from both modalities. We conducted a comprehensive evaluation of HiRes-FusedMIM on a diverse set of downstream tasks, including classification, semantic segmentation, and instance segmentation. Our results demonstrate that: 1) HiRes-FusedMIM outperforms previous state-of-the-art geospatial methods on several building-related datasets, including WHU Aerial and LoveDA, demonstrating its effectiveness in capturing and leveraging fine-grained building information; 2) Incorporating DSMs during pre-training consistently improves performance compared to using RGB data alone, highlighting the value of elevation information for building-level analysis; 3) The dual-encoder architecture of HiRes-FusedMIM, with separate encoders for RGB and DSM data, significantly outperforms a single-encoder model on the Vaihingen segmentation task, indicating the benefits of learning specialized representations for each modality. To facilitate further research and applications in this direction, we will publicly release the trained model weights.