Goto

Collaborating Authors

 Groningen



Informed Learning for Estimating Drought Stress at Fine-Scale Resolution Enables Accurate Yield Prediction

Miranda, Miro, Charfuelan, Marcela, Toro, Matias Valdenegro, Dengel, Andreas

arXiv.org Artificial Intelligence

Water is essential for agricultural productivity. Assessing water shortages and reduced yield potential is a critical factor in decision-making for ensuring agricultural productivity and food security. Crop simulation models, which align with physical processes, offer intrinsic explainability but often perform poorly. Conversely, machine learning models for crop yield modeling are powerful and scalable, yet they commonly operate as black boxes and lack adherence to the physical principles of crop growth. This study bridges this gap by coupling the advantages of both worlds. We postulate that the crop yield is inherently defined by the water availability. Therefore, we formulate crop yield as a function of temporal water scarcity and predict both the crop drought stress and the sensitivity to water scarcity at fine-scale resolution. Sequentially modeling the crop yield response to water enables accurate yield prediction. To enforce physical consistency, a novel physics-informed loss function is proposed. We leverage multispectral satellite imagery, meteorological data, and fine-scale yield data. Further, to account for the uncertainty within the model, we build upon a deep ensemble approach. Our method surpasses state-of-the-art models like LSTM and Transformers in crop yield prediction with a coefficient of determination ($R^2$-score) of up to 0.82 while offering high explainability. This method offers decision support for industry, policymakers, and farmers in building a more resilient agriculture in times of changing climate conditions.


Personal Attribute Leakage in Federated Speech Models

Al-Ali, Hamdan, Ghavamipour, Ali Reza, Caselli, Tommaso, Turkmen, Fatih, Talat, Zeerak, Aldarmaki, Hanan

arXiv.org Artificial Intelligence

Federated learning is a common method for privacy-preserving training of machine learning models. In this paper, we analyze the vulnerability of ASR models to attribute inference attacks in the federated setting. We test a non-parametric white-box attack method under a passive threat model on three ASR models: Wav2Vec2, HuBERT, and Whisper. The attack operates solely on weight differentials without access to raw speech from target speakers. We demonstrate attack feasibility on sensitive demographic and clinical attributes: gender, age, accent, emotion, and dysarthria. Our findings indicate that attributes that are underrepresented or absent in the pre-training data are more vulnerable to such inference attacks. In particular, information about accents can be reliably inferred from all models. Our findings expose previously undocumented vulnerabilities in federated ASR models and offer insights towards improved security.


Personalized Sleep Prediction via Deep Adaptive Spatiotemporal Modeling and Sparse Data

Wang, Xueyi, C., C. J., Lamoth, null, Wilhelm, Elisabeth

arXiv.org Artificial Intelligence

A sleep forecast allows individuals and healthcare providers to anticipate and proactively address factors influencing restful rest, ultimately improving mental and physical well-being. This work presents an adaptive spatial and temporal model (AdaST-Sleep) for predicting sleep scores. Our proposed model combines convolutional layers to capture spatial feature interactions between multiple features and recurrent neural network layers to handle longer-term temporal health-related data. A domain classifier is further integrated to generalize across different subjects. We conducted several experiments using five input window sizes (3, 5, 7, 9, 11 days) and five predicting window sizes (1, 3, 5, 7, 9 days). Our approach consistently outperformed four baseline models, achieving its lowest RMSE (0.282) with a seven-day input window and a one-day predicting window. Moreover, the method maintained strong performance even when forecasting multiple days into the future, demonstrating its versatility for real-world applications. Visual comparisons reveal that the model accurately tracks both the overall sleep score level and daily fluctuations. These findings prove that the proposed framework provides a robust and adaptable solution for personalized sleep forecasting using sparse data from commercial wearable devices and domain adaptation techniques.


Individualized and Interpretable Sleep Forecasting via a Two-Stage Adaptive Spatial-Temporal Model

Wang, Xueyi, Wilhelm, Elisabeth

arXiv.org Artificial Intelligence

Sleep quality significantly impacts well-being. Therefore, healthcare providers and individuals need accessible and reliable forecasting tools for preventive interventions. This paper introduces an interpretable, individualized two-stage adaptive spatial-temporal model for predicting sleep quality scores. Our proposed framework combines multi-scale convolutional layers to model spatial interactions across multiple input variables, recurrent layers and attention mechanisms to capture long-term temporal dependencies, and a two-stage domain adaptation strategy to enhance generalization. The first adaptation stage is applied during training to mitigate overfitting on the training set. In the second stage, a source-free test-time adaptation mechanism is employed to adapt the model to new users without requiring labels. We conducted various experiments with five input window sizes (3, 5, 7, 9, and 11 days) and five prediction window sizes (1, 3, 5, 7, and 9 days). Our model consistently outperformed time series forecasting baseline approaches, including Long Short-Term Memory (LSTM), Informer, PatchTST, and TimesNet. The best performance was achieved with a three-day input window and a one-day prediction window, yielding a root mean square error (RMSE) of 0.216. Furthermore, the model demonstrated good predictive performance even for longer forecasting horizons (e.g, with a 0.257 RMSE for a three-day prediction window), highlighting its practical utility for real-world applications. We also conducted an explainability analysis to examine how different features influence sleep quality. These findings proved that the proposed framework offers a robust, adaptive, and explainable solution for personalized sleep forecasting using sparse data from commercial wearable devices.


On the Generalisation of Koopman Representations for Chaotic System Control

Hjikakou, Kyriakos, Cartagena, Juan Diego Cardenas, Sabatelli, Matthia

arXiv.org Artificial Intelligence

This paper investigates the generalisability of Koopman-based representations for chaotic dynamical systems, focusing on their transferability across prediction and control tasks. Using the Lorenz system as a testbed, we propose a three-stage methodology: learning Koopman embeddings through autoencoding, pre-training a transformer on next-state prediction, and fine-tuning for safety-critical control. Our results show that Koopman embeddings outperform both standard and physics-informed PCA baselines, achieving accurate and data-efficient performance. Notably, fixing the pre-trained transformer weights during fine-tuning leads to no performance degradation, indicating that the learned representations capture reusable dynamical structure rather than task-specific patterns. These findings support the use of Koopman embeddings as a foundation for multi-task learning in physics-informed machine learning. A project page is available at https://kikisprdx.github.io/.


High-resolution spatial memory requires grid-cell-like neural codes

Cotteret, Madison, Kymn, Christopher J., Greatorex, Hugh, Ziegler, Martin, Chicca, Elisabetta, Sommer, Friedrich T.

arXiv.org Artificial Intelligence

Continuous attractor networks (CANs) are widely used to model how the brain temporarily retains continuous behavioural variables via persistent recurrent activity, such as an animal's position in an environment. However, this memory mechanism is very sensitive to even small imperfections, such as noise or heterogeneity, which are both common in biological systems. Previous work has shown that discretising the continuum into a finite set of discrete attractor states provides robustness to these imperfections, but necessarily reduces the resolution of the represented variable, creating a dilemma between stability and resolution. We show that this stability-resolution dilemma is most severe for CANs using unimodal bump-like codes, as in traditional models. To overcome this, we investigate sparse binary distributed codes based on random feature embeddings, in which neurons have spatially-periodic receptive fields. We demonstrate theoretically and with simulations that such grid-cell-like codes enable CANs to achieve both high stability and high resolution simultaneously. The model extends to embedding arbitrary nonlinear manifolds into a CAN, such as spheres or tori, and generalises linear path integration to integration along freely-programmable on-manifold vector fields. Together, this work provides a theory of how the brain could robustly represent continuous variables with high resolution and perform flexible computations over task-relevant manifolds.


What's Missing in Vision-Language Models? Probing Their Struggles with Causal Order Reasoning

Weng, Zhaotian, Li, Haoxuan, Huang, Kuan-Hao, Zhao, Jieyu

arXiv.org Artificial Intelligence

Despite the impressive performance of vision-language models (VLMs) on downstream tasks, their ability to understand and reason about causal relationships in visual inputs remains unclear. Robust causal reasoning is fundamental to solving complex high-level reasoning tasks, yet existing benchmarks often include a mixture of reasoning questions, and VLMs can frequently exploit object recognition and activity identification as shortcuts to arrive at the correct answers, making it challenging to truly assess their causal reasoning abilities. To bridge this gap, we introduce VQA-Causal and VCR-Causal, two new benchmarks specifically designed to isolate and rigorously evaluate VLMs' causal reasoning abilities. Our findings reveal that while VLMs excel in object and activity recognition, they perform poorly on causal reasoning tasks, often only marginally surpassing random guessing. Further analysis suggests that this limitation stems from a severe lack of causal expressions in widely used training datasets, where causal relationships are rarely explicitly conveyed. We additionally explore fine-tuning strategies with hard negative cases, showing that targeted fine-tuning can improve model's causal reasoning while maintaining generalization and downstream performance. Our study highlights a key gap in current VLMs and lays the groundwork for future work on causal understanding.


When Harry Meets Superman: The Role of The Interlocutor in Persona-Based Dialogue Generation

Occhipinti, Daniela, Guerini, Marco, Nissim, Malvina

arXiv.org Artificial Intelligence

Endowing dialogue agents with persona information has proven to significantly improve the consistency and diversity of their generations. While much focus has been placed on aligning dialogues with provided personas, the adaptation to the interlocutor's profile remains largely underexplored. In this work, we investigate three key aspects: (1) a model's ability to align responses with both the provided persona and the interlocutor's; (2) its robustness when dealing with familiar versus unfamiliar interlocutors and topics, and (3) the impact of additional fine-tuning on specific persona-based dialogues. We evaluate dialogues generated with diverse speaker pairings and topics, framing the evaluation as an author identification task and employing both LLM-as-a-judge and human evaluations. By systematically masking or disclosing information about the interlocutor, we assess its impact on dialogue generation. Results show that access to the interlocutor's persona improves the recognition of the target speaker, while masking it does the opposite. Although models generalise well across topics, they struggle with unfamiliar interlocutors. Finally, we found that in zero-shot settings, LLMs often copy biographical details, facilitating identification but trivialising the task.


GenSwarm: Scalable Multi-Robot Code-Policy Generation and Deployment via Language Models

Ji, Wenkang, Chen, Huaben, Chen, Mingyang, Zhu, Guobin, Xu, Lufeng, Groß, Roderich, Zhou, Rui, Cao, Ming, Zhao, Shiyu

arXiv.org Artificial Intelligence

The present paradigm of developing multi-robot systems follows a complex and labor-intensive process that involves steps like task analysis, algorithm design, code programming, simulation validation, and real-world deployment. This paradigm requires skilled professionals who are familiar with both theories and software/hardware implementation, incurring high costs in human resources. Moreover, it does not adapt well to dynamically changing tasks: the emergence of a new task requires the repetition of the complex process. Automatic generation and deployment of control policies for multi-robot systems is an appealing paradigm, as it promises substantial savings in terms of human effort and other resources [3-5]. However, this paradigm is nontrivial to realize as a multi-robot system as a whole cannot be programmed directly; rather, a desired collective behavior can be achieved only by programming each individual robot, which relies on its locally available information. Previous methods for automatic development of multi-robot swarming are primarily based on optimization techniques [3, 5]. For instance, an objective function is first crafted to mathematically describe a desired task and then optimized to generate policies through methods such as evolutionary computation [5-7] or systematic search [8]. Despite their promise, these optimization methods face the common limitation of requiring manual crafting of objective functions.