Goto

Collaborating Authors

 Pirkanmaa



AI and Agile Software Development: From Frustration to Success -- XP2025 Workshop Summary

Herda, Tomas, Pichler, Victoria, Zhang, Zheying, Abrahamsson, Pekka, Hanssen, Geir K.

arXiv.org Artificial Intelligence

The full-day workshop on AI and Agile at XP 2025 convened a diverse group of researchers and industry practitioners to address the practical challenges and opportunities of integrating Artificial Intelligence into Agile software development. Through interactive sessions, participants identified shared frustrations related to integrating AI into Agile Software Development practices, including challenges with tooling, governance, data quality, and critical skill gaps. These challenges were systematically prioritized and analyzed to uncover root causes. The workshop culminated in the collaborative development of a research roadmap that pinpoints actionable directions for future work, including both immediate solutions and ambitious long-term goals. The key outcome is a structured agenda designed to foster joint industry-academic efforts to move from identified frustrations to successful implementation.


CLEF: Clinically-Guided Contrastive Learning for Electrocardiogram Foundation Models

Shu, Yuxuan, Charlton, Peter H., Kawsar, Fahim, Hernesniemi, Jussi, Malekzadeh, Mohammad

arXiv.org Artificial Intelligence

The electrocardiogram (ECG) is a key diagnostic tool in cardiovascular health. Single-lead ECG recording is integrated into both clinical-grade and consumer wearables. While self-supervised pretraining of foundation models on unlabeled ECGs improves diagnostic performance, existing approaches do not incorporate domain knowledge from clinical metadata. We introduce a novel contrastive learning approach that utilizes an established clinical risk score to adaptively weight negative pairs: clinically-guided contrastive learning. It aligns the similarities of ECG embeddings with clinically meaningful differences between subjects, with an explicit mechanism to handle missing metadata. On 12-lead ECGs from 161K patients in the MIMIC-IV dataset, we pretrain single-lead ECG foundation models at three scales, collectively called CLEF, using only routinely collected metadata without requiring per-sample ECG annotations. We evaluate CLEF on 18 clinical classification and regression tasks across 7 held-out datasets, and benchmark against 5 foundation model baselines and 3 self-supervised algorithms. When pretrained on 12-lead ECG data and tested on lead-I data, CLEF outperforms self-supervised foundation model baselines: the medium-sized CLEF achieves average AUROC improvements of at least 2.6% in classification and average reductions in MAEs of at least 3.2% in regression. Comparing with existing self-supervised learning algorithms, CLEF improves the average AUROC by at least 1.8%. Moreover, when pretrained only on lead-I data for classification tasks, CLEF performs comparably to the state-of-the-art ECGFounder, which was trained in a supervised manner. Overall, CLEF enables more accurate and scalable single-lead ECG analysis, advancing remote health monitoring. Code and pretrained CLEF models are available at: github.com/Nokia-Bell-Labs/ecg-foundation-model.


TreeCoder: Systematic Exploration and Optimisation of Decoding and Constraints for LLM Code Generation

Princis, Henrijs, Sharma, Arindam, David, Cristina

arXiv.org Artificial Intelligence

Large language models (LLMs) have shown remarkable ability to generate code, yet their outputs often violate syntactic or semantic constraints when guided only through natural language prompts. We introduce TreeCoder, the most general and flexible framework to date for exploring decoding strategies, constraints, and hyperparameters in LLMs, and use it in code generation to enforce correctness and structure during decoding rather than relying on prompt engineering. TreeCoder represents decoding as a tree search over candidate programs, where both decoding strategies and constraint functions - such as style, syntax, execution - are treated as first-class, optimisable components. This design enables systematic exploration and automatic tuning of decoding configurations using standard optimisation techniques. Experiments on the MBPP (Python) and SQL-Spider benchmarks show that TreeCoder consistently improves accuracy across open-source models such as CodeLlama, Mistral and DeepSeek, often outperforming their unconstrained baselines by considerable margins.


Adaptive Factor Graph-Based Tightly Coupled GNSS/IMU Fusion for Robust Positionin

Ahmadi, Elham, Olama, Alireza, Välisuo, Petri, Kuusniemi, Heidi

arXiv.org Artificial Intelligence

Reliable positioning in GNSS-challenged environments remains a critical challenge for navigation systems. Tightly coupled GNSS/IMU fusion improves robustness but remains vulnerable to non-Gaussian noise and outliers. We present a robust and adaptive factor graph-based fusion framework that directly integrates GNSS pseudorange measurements with IMU preintegration factors and incorporates the Barron loss, a general robust loss function that unifies several m-estimators through a single tunable parameter. By adaptively down weighting unreliable GNSS measurements, our approach improves resilience positioning. The method is implemented in an extended GTSAM framework and evaluated on the UrbanNav dataset. The proposed solution reduces positioning errors by up to 41% relative to standard FGO, and achieves even larger improvements over extended Kalman filter (EKF) baselines in urban canyon environments. These results highlight the benefits of Barron loss in enhancing the resilience of GNSS/IMU-based navigation in urban and signal-compromised environments.


The Spheres Dataset: Multitrack Orchestral Recordings for Music Source Separation and Information Retrieval

Garcia-Martinez, Jaime, Diaz-Guerra, David, Anderson, John, Falcon-Perez, Ricardo, Cabañas-Molero, Pablo, Virtanen, Tuomas, Carabias-Orti, Julio J., Vera-Candeas, Pedro

arXiv.org Artificial Intelligence

This paper introduces The Spheres dataset, multitrack orchestral recordings designed to advance machine learning research in music source separation and related MIR tasks within the classical music domain. The dataset is composed of over one hour recordings of musical pieces performed by the Colibrì Ensemble at The Spheres recording studio, capturing two canonical works - Tchaikovsky's Romeo and Juliet and Mozart's Symphony No. 40 - along with chromatic scales and solo excerpts for each instrument. The recording setup employed 23 microphones, including close spot, main, and ambient microphones, enabling the creation of realistic stereo mixes with controlled bleeding and providing isolated stems for supervised training of source separation models. In addition, room impulse responses were estimated for each instrument position, offering valuable acoustic characterization of the recording space. We present the dataset structure, acoustic analysis, and baseline evaluations using X-UMX based models for orchestral family separation and microphone debleeding. Results highlight both the potential and the challenges of source separation in complex orchestral scenarios, underscoring the dataset's value for benchmarking and for exploring new approaches to separation, localization, dereverberation, and immersive rendering of classical music.


Missing the human touch? A computational stylometry analysis of GPT-4 translations of online Chinese literature

Yao, Xiaofang, Kang, Yong-Bin, McCosker, Anthony

arXiv.org Artificial Intelligence

Existing research suggests that machine translations of literary texts remain unsatisfactory. Such quality assessment often relies on automated metrics and subjective human ratings, with little attention to the stylistic features of machine translation. Empirical evidence is also scant on whether the advent of AI will transform the literary translation landscape, with implications for other critical domains for translation such as creative industries more broadly. This pioneering study investigates the stylistic features of AI translations, specifically examining GPT -4's performance against human translations in a Chinese online literature task. Our computational stylometry analysis reveals that GPT -4 translations closely mirror human translations in lexical, syntactic and content features. As such, AI translations can in fact replicate the'human touch' in literary translation style. The study provides critical insights into the implications of AI on literary translation in the posthuman, where the line between machine and human translations may become increasingly blurry.


DeepCoT: Deep Continual Transformers for Real-Time Inference on Data Streams

Picón, Ginés Carreto, Zhou, Peng Yuan, Zhang, Qi, Iosifidis, Alexandros

arXiv.org Artificial Intelligence

Abstract--Transformer-based models have dramatically increased their size and parameter count to tackle increasingly complex tasks. At the same time, there is a growing demand for low-latency inference on resource-constrained devices that achieves high performance. In particular, stream data inference is typically performed over a sliding temporal window, leading to highly redundant computations. The recent Continual Transformers have addressed this issue, but they can only be effectively used in shallow models, which limits their scope and generalization power . In this paper, we propose the Deep Continual Transformer (DeepCoT), a redundancy-free encoder-only model that can be applied over existing deep encoder architectures with minimal changes. In our experiments over audio, video, and text streams, we show that DeepCoTs retain comparative performance to their non-continual baselines while offering a linear computational cost for all Transformer layers, which reduces up to two orders of magnitude in the running time compared to previous efficient models. RANSFORMER models [1] have shown impressive performance for a wide range of classification and regression tasks [2], [3]. However, their size has grown significantly as new complex tasks have been targeted, resulting in slower inference speeds. This problem is especially critical in applications where low-latency models are required, making the use of deep Transformer models unfeasible. Some applications such as robot perception impose limitations in the available hardware to perform predictions, further increasing the latency. Cloud solutions are not always possible due to privacy or practical constraints such as network delay or reliability. Moreover, there is an increasing sense of awareness regarding the high energy consumption required to run large Transformer-based models. One problem following with such characteristics is stream processing. Stream processing can be defined as the set of tasks in which new predictions are made by a model at specific intervals or on-demand, given new data inputs. These models normally benefit from leveraging past information together with the present data and rely on a sliding temporal window formed by the n most recent data points. Zhou, and Q. Zhang are with the Department of Electrical and Computer Engineering, Aarhus University, Denmark. A. Iosifidis is with the Data Science Research Centre, Tampere University, Finland.