Eskisehir
Hybrid Deep Learning and Signal Processing for Arabic Dialect Recognition in Low-Resource Settings
Al-Shwayyat, Ghazal, Gerek, Omer Nezih
Arabic dialect recognition presents a significant challenge in speech technology due to the linguistic diversity of Arabic and the scarcity of large annotated datasets, particularly for underrepresented dialects. This research investigates hybrid modeling strategies that integrate classical signal processing techniques with deep learning architectures to address this problem in low-resource scenarios. Two hybrid models were developed and evaluated: (1) Mel-Frequency Cepstral Coefficients (MFCC) combined with a Convolutional Neural Network (CNN), and (2) Discrete Wavelet Transform (DWT) features combined with a Recurrent Neural Network (RNN). The models were trained on a dialect-filtered subset of the Common Voice Arabic dataset, with dialect labels assigned based on speaker metadata. Experimental results demonstrate that the MFCC + CNN architecture achieved superior performance, with an accuracy of 91.2% and strong precision, recall, and F1-scores, significantly outperforming the Wavelet + RNN configuration, which achieved an accuracy of 66.5%. These findings highlight the effectiveness of leveraging spectral features with convolutional models for Arabic dialect recognition, especially when working with limited labeled data. The study also identifies limitations related to dataset size, potential regional overlaps in labeling, and model optimization, providing a roadmap for future research. Recommendations for further improvement include the adoption of larger annotated corpora, integration of self-supervised learning techniques, and exploration of advanced neural architectures such as Transformers. Overall, this research establishes a strong baseline for future developments in Arabic dialect recognition within resource-constrained environments.
On the Tunability of Random Survival Forests Model for Predictive Maintenance
Yardımcı, Yigitcan, Cavus, Mustafa
This paper investigates the tunability of the Random Survival Forest (RSF) model in predictive maintenance, where accurate time-to-failure estimation is crucial. Although RSF is widely used due to its flexibility and ability to handle censored data, its performance is sensitive to hyperparameter configurations. However, systematic evaluations of RSF tunability remain limited, especially in predictive maintenance contexts. We introduce a three-level framework to quantify tunability: (1) a model-level metric measuring overall performance gain from tuning, (2) a hyperparameter-level metric assessing individual contributions, and (3) identification of optimal tuning ranges. These metrics are evaluated across multiple datasets using survival-specific criteria: the C-index for discrimination and the Brier score for calibration. Experiments on four CMAPSS dataset subsets, simulating aircraft engine degradation, reveal that hyperparameter tuning consistently improves model performance. On average, the C-index increased by 0.0547, while the Brier score decreased by 0.0199. These gains were consistent across all subsets. Moreover, ntree and mtry showed the highest average tunability, while nodesize offered stable improvements within the range of 10 to 30. In contrast, splitrule demonstrated negative tunability on average, indicating that improper tuning may reduce model performance. Our findings emphasize the practical importance of hyperparameter tuning in survival models and provide actionable insights for optimizing RSF in real-world predictive maintenance applications.
Predictive Multiplicity in Survival Models: A Method for Quantifying Model Uncertainty in Predictive Maintenance Applications
In many applications, especially those involving prediction, models may yield near-optimal performance yet significantly disagree on individual-level outcomes. This phenomenon, known as predictive multiplicity, has been formally defined in binary, probabilistic, and multi-target classification, and undermines the reliability of predictive systems. However, its implications remain unexplored in the context of survival analysis, which involves estimating the time until a failure or similar event while properly handling censored data. We frame predictive multiplicity as a critical concern in survival-based models and introduce formal measures -- ambiguity, discrepancy, and obscurity -- to quantify it. This is particularly relevant for downstream tasks such as maintenance scheduling, where precise individual risk estimates are essential. Understanding and reporting predictive multiplicity helps build trust in models deployed in high-stakes environments. We apply our methodology to benchmark datasets from predictive maintenance, extending the notion of multiplicity to survival models. Our findings show that ambiguity steadily increases, reaching up to 40-45% of observations; discrepancy is lower but exhibits a similar trend; and obscurity remains mild and concentrated in a few models. These results demonstrate that multiple accurate survival models may yield conflicting estimations of failure risk and degradation progression for the same equipment. This highlights the need to explicitly measure and communicate predictive multiplicity to ensure reliable decision-making in process health management.
The Role of Hyperparameters in Predictive Multiplicity
Cavus, Mustafa, Woźnica, Katarzyna, Biecek, Przemysław
This paper investigates the critical role of hyperparameters in predictive multiplicity, where different machine learning models trained on the same dataset yield divergent predictions for identical inputs. These inconsistencies can seriously impact high-stakes decisions such as credit assessments, hiring, and medical diagnoses. Focusing on six widely used models for tabular data - Elastic Net, Decision Tree, k-Nearest Neighbor, Support Vector Machine, Random Forests, and Extreme Gradient Boosting - we explore how hyperparameter tuning influences predictive multiplicity, as expressed by the distribution of prediction discrepancies across benchmark datasets. Key hyperparameters such as lambda in Elastic Net, gamma in Support Vector Machines, and alpha in Extreme Gradient Boosting play a crucial role in shaping predictive multiplicity, often compromising the stability of predictions within specific algorithms. Our experiments on 21 benchmark datasets reveal that tuning these hyperparameters leads to notable performance improvements but also increases prediction discrepancies, with Extreme Gradient Boosting exhibiting the highest discrepancy and substantial prediction instability. This highlights the trade-off between performance optimization and prediction consistency, raising concerns about the risk of arbitrary predictions. These findings provide insight into how hyperparameter optimization leads to predictive multiplicity. While predictive multiplicity allows prioritizing domain-specific objectives such as fairness and reduces reliance on a single model, it also complicates decision-making, potentially leading to arbitrary or unjustified outcomes.
Rashomon perspective for measuring uncertainty in the survival predictive maintenance models
Yardimci, Yigitcan, Cavus, Mustafa
The prediction of the Remaining Useful Life of aircraft engines is a critical area in high-reliability sectors such as aerospace and defense. Early failure predictions help ensure operational continuity, reduce maintenance costs, and prevent unexpected failures. Traditional regression models struggle with censored data, which can lead to biased predictions. Survival models, on the other hand, effectively handle censored data, improving predictive accuracy in maintenance processes. This paper introduces a novel approach based on the Rashomon perspective, which considers multiple models that achieve similar performance rather than relying on a single best model. This enables uncertainty quantification in survival probability predictions and enhances decision-making in predictive maintenance. The Rashomon survival curve was introduced to represent the range of survival probability estimates, providing insights into model agreement and uncertainty over time. The results on the CMAPSS dataset demonstrate that relying solely on a single model for RUL estimation may increase risk in some scenarios. The censoring levels significantly impact prediction uncertainty, with longer censoring times leading to greater variability in survival probabilities. These findings underscore the importance of incorporating model multiplicity in predictive maintenance frameworks to achieve more reliable and robust failure predictions. This paper contributes to uncertainty quantification in RUL prediction and highlights the Rashomon perspective as a powerful tool for predictive modeling.
Decoding Drug Discovery: Exploring A-to-Z In silico Methods for Beginners
Rasul, Hezha O., Ghafour, Dlzar D., Aziz, Bakhtyar K., Hassan, Bryar A., Rashid, Tarik A., Kivrak, Arif
The drug development process is a critical challenge in the pharmaceutical industry due to its time-consuming nature and the need to discover new drug potentials to address various ailments. The initial step in drug development, drug target identification, often consumes considerable time. While valid, traditional methods such as in vivo and in vitro approaches are limited in their ability to analyze vast amounts of data efficiently, leading to wasteful outcomes. To expedite and streamline drug development, an increasing reliance on computer-aided drug design (CADD) approaches has merged. These sophisticated in silico methods offer a promising avenue for efficiently identifying viable drug candidates, thus providing pharmaceutical firms with significant opportunities to uncover new prospective drug targets. The main goal of this work is to review in silico methods used in the drug development process with a focus on identifying therapeutic targets linked to specific diseases at the genetic or protein level. This article thoroughly discusses A-to-Z in silico techniques, which are essential for identifying the targets of bioactive compounds and their potential therapeutic effects. This review intends to improve drug discovery processes by illuminating the state of these cutting-edge approaches, thereby maximizing the effectiveness and duration of clinical trials for novel drug target investigation.
datadriftR: An R Package for Concept Drift Detection in Predictive Models
Predictive models often face performance degradation due to evolving data distributions, a phenomenon known as data drift. Among its forms, concept drift, where the relationship between explanatory variables and the response variable changes, is particularly challenging to detect and adapt to. Traditional drift detection methods often rely on metrics such as accuracy or variable distributions, which may fail to capture subtle but significant conceptual changes. This paper introduces drifter, an R package designed to detect concept drift, and proposes a novel method called Profile Drift Detection (PDD) that enables both drift detection and an enhanced understanding of the cause behind the drift by leveraging an explainable AI tool - Partial Dependence Profiles (PDPs). The PDD method, central to the package, quantifies changes in PDPs through novel metrics, ensuring sensitivity to shifts in the data stream without excessive computational costs. This approach aligns with MLOps practices, emphasizing model monitoring and adaptive retraining in dynamic environments. The experiments across synthetic and real-world datasets demonstrate that PDD outperforms existing methods by maintaining high accuracy while effectively balancing sensitivity and stability. The results highlight its capability to adaptively retrain models in dynamic environments, making it a robust tool for real-time applications. The paper concludes by discussing the advantages, limitations, and future extensions of the package for broader use cases.
Investigating the Impact of Balancing, Filtering, and Complexity on Predictive Multiplicity: A Data-Centric Perspective
Cavus, Mustafa, Biecek, Przemyslaw
The Rashomon effect presents a significant challenge in model selection. It occurs when multiple models achieve similar performance on a dataset but produce different predictions, resulting in predictive multiplicity. This is especially problematic in high-stakes environments, where arbitrary model outcomes can have serious consequences. Traditional model selection methods prioritize accuracy and fail to address this issue. Factors such as class imbalance and irrelevant variables further complicate the situation, making it harder for models to provide trustworthy predictions. Data-centric AI approaches can mitigate these problems by prioritizing data optimization, particularly through preprocessing techniques. However, recent studies suggest preprocessing methods may inadvertently inflate predictive multiplicity. This paper investigates how data preprocessing techniques like balancing and filtering methods impact predictive multiplicity and model stability, considering the complexity of the data. We conduct the experiments on 21 real-world datasets, applying various balancing and filtering techniques, and assess the level of predictive multiplicity introduced by these methods by leveraging the Rashomon effect. Additionally, we examine how filtering techniques reduce redundancy and enhance model generalization. The findings provide insights into the relationship between balancing methods, data complexity, and predictive multiplicity, demonstrating how data-centric AI strategies can improve model performance.
Rashomon effect in Educational Research: Why More is Better Than One for Measuring the Importance of the Variables?
Kuzilek, Jakub, Çavuş, Mustafa
This study explores how the Rashomon effect influences variable importance in the context of student demographics used for academic outcomes prediction. Our research follows the way machine learning algorithms are employed in Educational Data Mining, focusing on highlighting the so-called Rashomon effect. The study uses the Rashomon set of simple-yet-accurate models trained using decision trees, random forests, light GBM, and XGBoost algorithms with the Open University Learning Analytics Dataset. We found that the Rashomon set improves the predictive accuracy by 2-6%. Variable importance analysis revealed more consistent and reliable results for binary classification than multiclass classification, highlighting the complexity of predicting multiple outcomes. Key demographic variables imd_band and highest_education were identified as vital, but their importance varied across courses, especially in course DDD. These findings underscore the importance of model choice and the need for caution in generalizing results, as different models can lead to different variable importance rankings. The codes for reproducing the experiments are available in the repository: https://anonymous.4open.science/r/JEDM_paper-DE9D.
DualCast: Disentangling Aperiodic Events from Traffic Series with a Dual-Branch Model
Su, Xinyu, Liu, Feng, Chang, Yanchuan, Tanin, Egemen, Sarvi, Majid, Qi, Jianzhong
Traffic forecasting is an important problem in the operation and optimisation of transportation systems. State-of-the-art solutions train machine learning models by minimising the mean forecasting errors on the training data. The trained models often favour periodic events instead of aperiodic ones in their prediction results, as periodic events often prevail in the training data. While offering critical optimisation opportunities, aperiodic events such as traffic incidents may be missed by the existing models. To address this issue, we propose DualCast -- a model framework to enhance the learning capability of traffic forecasting models, especially for aperiodic events. DualCast takes a dual-branch architecture, to disentangle traffic signals into two types, one reflecting intrinsic {spatial-temporal} patterns and the other reflecting external environment contexts including aperiodic events. We further propose a cross-time attention mechanism, to capture high-order spatial-temporal relationships from both periodic and aperiodic patterns. DualCast is versatile. We integrate it with recent traffic forecasting models, consistently reducing their forecasting errors by up to 9.6% on multiple real datasets.