Goto

Collaborating Authors

 training and test



Towards Explainable Personalized Recommendations by Learning from Users' Photos

Díez, Jorge, Pérez-Núñez, Pablo, Luaces, Oscar, Remeseiro, Beatriz, Bahamonde, Antonio

arXiv.org Artificial Intelligence

Explaining the output of a complex system, such as a Recommender System (RS), is becoming of utmost importance for both users and companies. In this paper we explore the idea that personalized explanations can be learned as recommendation themselves. There are plenty of online services where users can upload some photos, in addition to rating items. We assume that users take these photos to reinforce or justify their opinions about the items. For this reason we try to predict what photo a user would take of an item, because that image is the argument that can best convince her of the qualities of the item. In this sense, an RS can explain its results and, therefore, increase its reliability. Furthermore, once we have a model to predict attractive images for users, we can estimate their distribution. The paper includes a formal framework that estimates the authorship probability for a given pair (user, photo). To illustrate the proposal, we use data gathered from TripAdvisor containing the reviews (with photos) of restaurants in six cities of different sizes. Keywords: Recommender Systems, Personalization, Explainability, Photo, Collaborative 1. Introduction Explainable Artificial Intelligence (XAI) is becoming an important area of interest since explainability is increasingly necessary to meet stakeholder demands. In particular, the General Data Protection Regulation (GDPR) [29] of the European Union demands transparency in systems that take decisions affecting people, making explanations more needed than ever. Additionally, explanations may help increase the trust of users in AI algorithms, since people rely not only on their efficacy but also on the degree of understanding of the process they follow. Since they provide suggestions to users, explainability plays an important role on them.


Scaling Physical Reasoning with the PHYSICS Dataset

Zheng, Shenghe, Cheng, Qianjia, Yao, Junchi, Wu, Mengsong, He, Haonan, Ding, Ning, Cheng, Yu, Hu, Shuyue, Bai, Lei, Zhou, Dongzhan, Cui, Ganqu, Ye, Peng

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have achieved remarkable progress on advanced reasoning tasks such as mathematics and coding competitions. Meanwhile, physics, despite being both reasoning-intensive and essential to real-world understanding, received limited academic and industrial attention. This paper introduces PHYSICS, a dataset containing 16,568 high-quality physics problems spanning subjects and difficulty levels, to facilitate this issue. Specifically, PHYSICS is curated with exercises from over 100 textbooks through a carefully designed pipeline for quality control. It covers five major physics domains: Mechanics, Electromagnetism, Thermodynamics, Optics, and Modern Physics. It also spans a wide range of difficulty levels, from high school to graduate-level physics courses. To utilize the data for improving and evaluating the model's physical reasoning capabilities, we split the dataset into training and test sets, and provide reasoning paths generated by powerful reasoning models for the training data to facilitate model training. In addition, for the evaluation part, we find that existing evaluation frameworks exhibit biases in aspects such as units, simplification, and precision in physics domain. To balance efficiency and accuracy, we introduce a Rule+Model evaluation framework tailored to physics problems. Our evaluations on current state-of-the-art open-source and proprietary models highlight the limitations of current models in handling physics-related tasks. We hope that our dataset and evaluation methodology will jointly advance the development of LLMs in the field of physics. The code and data can be found at: https://github.com/Zhengsh123/PHYSICS.


PRESOL: a web-based computational setting for feature-based flare forecasting

Curletto, Chiara, Massa, Paolo, Tagliafico, Valeria, Campi, Cristina, Benvenuto, Federico, Piana, Michele, Tacchino, Andrea

arXiv.org Artificial Intelligence

Solar flares are the most explosive phenomena in the solar system and the main trigger of the events' chain that starts from Coronal Mass Ejections and leads to geomagnetic storms with possible impacts on the infrastructures at Earth. Data-driven solar flare forecasting relies on either deep learning approaches, which are operationally promising but with a low explainability degree, or machine learning algorithms, which can provide information on the physical descriptors that mostly impact the prediction. This paper describes a web-based technological platform for the execution of a computational pipeline of feature-based machine learning methods that provide predictions of the flare occurrence, feature ranking information, and assessment of the prediction performances.


Parametrized Quantum Circuit Learning for Quantum Chemical Applications

Jones, Grier M., Prasad, Viki Kumar, Fekl, Ulrich, Jacobsen, Hans-Arno

arXiv.org Artificial Intelligence

Despite numerous proposed applications, there remains limited exploration of datasets relevant to quantum chemistry. In this study, we investigate the potential benefits and limitations of PQCs on two chemically meaningful datasets: (1) the BSE49 dataset, containing bond separation energies for 49 different classes of chemical bonds, and (2) a dataset of water conformations, where coupled-cluster singles and doubles (CCSD) wavefunctions are predicted from lower-level electronic structure methods using the data-driven coupled-cluster (DDCC) approach. We construct a comprehensive set of 168 PQCs by combining 14 data encoding strategies with 12 variational ansätze, and evaluate their performance on circuits with 5 and 16 qubits. Our initial analysis examines the impact of circuit structure on model performance using state-vector simulations. We then explore how circuit depth and training set size influence model performance. Finally, we assess the performance of the best-performing PQCs on current quantum hardware, using both noisy simulations ("fake" backends) and real quantum devices. Our findings underscore the challenges of applying PQCs to chemically relevant problems that are straightforward for classical machine learning methods but remain non-trivial for quantum approaches. 2 1 Introduction In recent years, machine learning (ML) has emerged as a popular tool in chemistry to reveal new patterns in data, provide new insights beyond simple models, accelerate computations, and analyze chemical space. For computational chemists, the primary goal of applying ML is often to circumvent the explicit calculation of molecular properties, which can be computationally expensive for large datasets.


Driving Accurate Allergen Prediction with Protein Language Models and Generalization-Focused Evaluation

Wong, Brian Shing-Hei, Kim, Joshua Mincheol, Fung, Sin-Hang, Xiong, Qing, Ao, Kelvin Fu-Kiu, Wei, Junkang, Wang, Ran, Wang, Dan Michelle, Zhou, Jingying, Feng, Bo, Cheng, Alfred Sze-Lok, Yip, Kevin Y., Tsui, Stephen Kwok-Wing, Cao, Qin

arXiv.org Artificial Intelligence

Allergens, typically proteins capable of triggering adverse immune responses, represent a significant public health challenge. To accurately identify allergen proteins, we introduce Applm (Allergen Prediction with Protein Language Models), a computational framework that leverages the 100-billion parameter xTrimoPGLM protein language model. We show that Applm consistently outperforms seven state-of-the-art methods in a diverse set of tasks that closely resemble difficult real-world scenarios. These include identifying novel allergens that lack similar examples in the training set, differentiating between allergens and non-allergens among homologs with high sequence similarity, and assessing functional consequences of mutations that create few changes to the protein sequences. Our analysis confirms that xTrimoPGLM, originally trained on one trillion tokens to capture general protein sequence characteristics, is crucial for Applm's performance by detecting important differences among protein sequences. In addition to providing Applm as open-source software, we also provide our carefully curated benchmark datasets to facilitate future research.


Non-native Children's Automatic Speech Assessment Challenge (NOCASA)

Getman, Yaroslav, Grósz, Tamás, Kurimo, Mikko, Salvi, Giampiero

arXiv.org Artificial Intelligence

This paper presents the "Non-native Children's Automatic Speech Assessment" (NOCASA) - a data competition part of the IEEE MLSP 2025 conference. NOCASA challenges participants to develop new systems that can assess single-word pronunciations of young second language (L2) learners as part of a gamified pronunciation training app. To achieve this, several issues must be addressed, most notably the limited nature of available training data and the highly unbalanced distribution among the pronunciation level categories. To expedite the development, we provide a pseudo-anonymized training data (TeflonNorL2), containing 10,334 recordings from 44 speakers attempting to pronounce 205 distinct Norwegian words, human-rated on a 1 to 5 scale (number of stars that should be given in the game). In addition to the data, two already trained systems are released as official baselines: an SVM classifier trained on the ComParE_16 acoustic feature set and a multi-task wav2vec 2.0 model. The latter achieves the best performance on the challenge test set, with an unweighted average recall (UAR) of 36.37%.


AdaptHetero: Machine Learning Interpretation-Driven Subgroup Adaptation for EHR-Based Clinical Prediction

Liao, Ling, Aagaard, Eva

arXiv.org Machine Learning

However, the in t rinsic complexity and heterogeneity of EHR data limit its effectiveness in guiding subgroup - specific modelin g . W e propose AdaptHetero, a novel MLI - driven framework that transforms interpretability insights into actionable guidance for tailor ing model training and evaluation across subpopulations within individual hospital systems . E valuated on th ree large - scale EH R datasets -- GOSSIS - 1 - eICU, WiDS, and MIMIC - IV -- AdaptHetero consistently identif ies heterogeneous model behaviors in predicting ICU mortality, in - hospital death, and hidden hypoxemia. By integrating SHAP - based interpretation and unsupervised clustering, the framework enhances the identification of clinicall y meaningful subgroup - specific characteristics, leading to improved predictive performance and optimized clinical deployment . Introduction Machine learning interpretation (MLI) techniques are increasingly leveraged in the analysis of electronic health records (EHRs) to reveal latent clinical patterns and to support trustworthy, actionable decision - making in high - stakes healthcare settings .


Machine Learning-based Regional Cooling Demand Prediction with Optimised Dataset Partitioning

Zhang, Meng, Li, Zhihui, Yu, Zhibin

arXiv.org Artificial Intelligence

In the context of global warming, even relatively cooler countries like the UK are experiencing a rise in cooling demand, particularly in southern regions such as London. This growing demand, especially during the summer months, presents significant challenges for energy management systems. Accurately predicting cooling demand in urban domestic buildings is essential for maintaining energy efficiency. This study introduces a generalised framework for developing high-resolution Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) networks using physical model-based summer cooling demand data. To maximise the predictive capability and generalisation ability of the models under limited data scenarios, four distinct data partitioning strategies were implemented, including the extrapolation, month-based interpolation, global interpolation, and day-based interpolation. Bayesian Optimisation (BO) was then applied to fine-tune the hyper-parameters, substantially improving the framework predictive accuracy. Results show that the day-based interpolation GRU model demonstrated the best performance due to its ability to retain both the data randomness and the time sequence continuity characteristics. This optimal model achieves a Root Mean Squared Error (RMSE) of 2.22%, a Mean Absolute Error (MAE) of 0.87%, and a coefficient of determination (R square) of 0.9386 on the test set. The generalisation ability of this framework was further evaluated by forecasting.


Exploring Gender Disparities in Automatic Speech Recognition Technology

ElGhazaly, Hend, Mirheidari, Bahman, Moosavi, Nafise Sadat, Christensen, Heidi

arXiv.org Artificial Intelligence

This study investigates factors influencing Automatic Speech Recognition (ASR) systems' fairness and performance across genders, beyond the conventional examination of demographics. Using the LibriSpeech dataset and the Whisper small model, we analyze how performance varies across different gender representations in training data. Our findings suggest a complex interplay between the gender ratio in training data and ASR performance. Optimal fairness occurs at specific gender distributions rather than a simple 50-50 split. Furthermore, our findings suggest that factors like pitch variability can significantly affect ASR accuracy. This research contributes to a deeper understanding of biases in ASR systems, highlighting the importance of carefully curated training data in mitigating gender bias.