Goto

Collaborating Authors

 validation scheme


A Set of Rules for Model Validation

arXiv.org Machine Learning

The validation of a data-driven model is the process of asses sing the model's ability to generalize to new, unseen data in the population o f interest. This paper proposes a set of general rules for model validation. T hese rules are designed to help practitioners create reliable validation plans and report their results transparently. While no validation scheme is flawle ss, these rules can help practitioners ensure their strategy is sufficient for pr actical use, openly discuss any limitations of their validation strategy, and r eport clear, comparable performance metrics. Keywords: Validation, Cross-validation 1. Introduction Model validation is a fundamental task in all modern data-dr iven systems, whether they fall under the broad categories of Statistics, Machine Learning (ML), Artificial Intelligence (AI), or more specialized fiel ds like chemometrics. Validation has become a major focus for regulatory and stand ardization bodies, with key reports and standards highlighting the growing con cern for ensuring the trustworthiness and reliability of data-driven models: NIST AI Risk Management Framework (AI RMF 1.0, 2023): Publi shed by the U.S. Department of Commerce, this framework provides management techniques to address the risks and ensure the trustwor thiness of AI systems, with validation as a core component. The EU AI Act of 2024, landmark piece of EU legislation that c ategorizes AI systems by risk level, where validation is not defined as a b est practice but a legal requirement within the conformity assessment. The ISO/IEC TS 4213:2022, by the International Organizati on for Standardization (ISO), describes approaches and methods to ens ure the rele-Email address: josecamacho@ugr.es The IEEE P2841 -2022 is a recommended practice for the fram ework and process for deep learning evaluation.


All Thresholds Barred: Direct Estimation of Call Density in Bioacoustic Data

arXiv.org Artificial Intelligence

Passive acoustic monitoring (PAM) studies generate thousands of hours of audio, which may be used to monitor specific animal populations, conduct broad biodiversity surveys, detect threats such as poachers, and more. Machine learning classifiers for species identification are increasingly being used to process the vast amount of audio generated by bioacoustic surveys, expediting analysis and increasing the utility of PAM as a management tool. In common practice, a threshold is applied to classifier output scores, and scores above the threshold are aggregated into a detection count. The choice of threshold produces biased counts of vocalizations, which are subject to false positive/negative rates that may vary across subsets of the dataset. In this work, we advocate for directly estimating call density: The proportion of detection windows containing the target vocalization, regardless of classifier score. Our approach targets a desirable ecological estimator and provides a more rigorous grounding for identifying the core problems caused by distribution shifts -- when the defining characteristics of the data distribution change -- and designing strategies to mitigate them. We propose a validation scheme for estimating call density in a body of data and obtain, through Bayesian reasoning, probability distributions of confidence scores for both the positive and negative classes. We use these distributions to predict site-level densities, which may be subject to distribution shifts. We test our proposed methods on a real-world study of Hawaiian birds and provide simulation results leveraging existing fully annotated datasets, demonstrating robustness to variations in call density and classifier model quality.


A Novel Approach for Machine Learning-based Load Balancing in High-speed Train System using Nested Cross Validation

arXiv.org Artificial Intelligence

Fifth-generation (5G) mobile communication networks have recently emerged in various fields, including highspeed trains. However, the dense deployment of 5G millimeter wave (mmWave) base stations (BSs) and the high speed of moving trains lead to frequent handovers (HOs), which can adversely affect the Quality-of-Service (QoS) of mobile users. As a result, HO optimization and resource allocation are essential considerations for managing mobility in high-speed train systems. In this paper, we model system performance of a high-speed train system with a novel machine learning (ML) approach that is nested cross validation scheme that prevents information leakage from model evaluation into the model parameter tuning, thereby avoiding overfitting and resulting in better generalization error. To this end, we employ ML methods for the high-speed train system scenario. Handover Margin (HOM) and Time-to-Trigger (TTT) values are used as features, and several KPIs are used as outputs, and several ML methods including Gradient Boosting Regression (GBR), Adaptive Boosting (AdaBoost), CatBoost Regression (CBR), Artificial Neural Network (ANN), Kernel Ridge Regression (KRR), Support Vector Regression (SVR), and k-Nearest Neighbor Regression (KNNR) are employed for the problem. Finally, performance comparisons of the cross validation schemes with the methods are made in terms of mean absolute error (MAE) and mean square error (MSE) metrics are made. As per obtained results, boosting methods, ABR, CBR, GBR, with nested cross validation scheme superiorly outperforms conventional cross validation scheme results with the same methods. On the other hand, SVR, KNRR, KRR, ANN with the nested scheme produce promising results for prediction of some KPIs with respect to their conventional scheme employment.


A Robust Machine Learning Approach for Path Loss Prediction in 5G Networks with Nested Cross Validation

arXiv.org Artificial Intelligence

The design and deployment of fifth-generation (5G) wireless networks pose significant challenges due to the increasing number of wireless devices. Path loss has a landmark importance in network performance optimization, and accurate prediction of the path loss, which characterizes the attenuation of signal power during transmission, is critical for effective network planning, coverage estimation, and optimization. In this sense, we utilize machine learning (ML) methods, which overcome conventional path loss prediction models drawbacks, for path loss prediction in a 5G network system to facilitate more accurate network planning, resource optimization, and performance improvement in wireless communication systems. To this end, we utilize a novel approach, nested cross validation scheme, with ML to prevent overfitting, thereby getting better generalization error and stable results for ML deployment. First, we acquire a publicly available dataset obtained through a comprehensive measurement campaign conducted in an urban macro-cell scenario located in Beijing, China. The dataset includes crucial information such as longitude, latitude, elevation, altitude, clutter height, and distance, which are utilized as essential features to predict the path loss in the 5G network system. We deploy Support Vector Regression (SVR), CatBoost Regression (CBR), eXtreme Gradient Boosting Regression (XGBR), Artificial Neural Network (ANN), and Random Forest (RF) methods to predict the path loss, and compare the prediction results in terms of Mean Absolute Error (MAE) and Mean Square Error (MSE). As per obtained results, XGBR outperforms the rest of the methods. It outperforms CBR with a slight performance differences by 0.4 % and 1 % in terms of MAE and MSE metrics, respectively. On the other hand, it outperforms the rest of the methods with clear performance differences.


Efficient implementations of echo state network cross-validation

arXiv.org Machine Learning

Background/introduction: Cross-validation is still uncommon in time series modeling. Echo State Networks (ESNs), as a prime example of Reservoir Computing (RC) models, are known for their fast and precise one-shot learning, that often benefit from good hyper-parameter tuning. This makes them ideal to change the status quo. Methods: We suggest several schemes for cross-validating ESNs and introduce an efficient algorithm for implementing them. This algorithm is presented as two levels of optimizations of doing $k$-fold cross-validation. Training an RC model typically consists of two stages: (i) running the reservoir with the data and (ii) computing the optimal readouts. The first level of our proposed optimization addresses the most computationally expensive part (i) and makes it remain constant irrespective of $k$. It dramatically reduces reservoir computations in any type of RC system and is enough if $k$ is small. The second level of optimization also makes the (ii) part remain constant irrespective of large $k$, as long as the dimension of the output is low. We discuss when the proposed validation schemes for ESNs could be beneficial, three options for producing the final model and empirically investigate them on six different real-world datasets, as well as do empirical computation time experiments. We provide the code in an online repository. Results: Proposed cross-validation schemes give better and more stable test performance in all the six different real-world datasets, three task types. Empirical run times confirm our complexity analysis. Conclusions: In most situations $k$-fold cross-validation of ESNs and many other RC models can be done for virtually the same time complexity as a simple single-split validation. Space complexity can also remain the same in all the cases. This enables cross-validation to become a standard practice in reservoir computing.


How time can ruin your most precious machine learning model

#artificialintelligence

Once, potential time dependent effects have been identified you should check if these effects are present in the data and if your data actually covers the necessary time spans to detect it. If the data depends on time there are three basic options to handle it. What experience regarding time have you made in your machine learning projects? Would you like to read a story about the various modelling & feature engineering techniques and validation schemes to include these time effects?


Efficient Cross-Validation of Echo State Networks

arXiv.org Machine Learning

Echo State Networks (ESNs) are known for their fast and precise one-shot learning of time series. But they often need good hyper-parameter tuning for best performance. For this good validation is key, but usually, a single validation split is used. In this rather practical contribution we suggest several schemes for cross-validating ESNs and introduce an efficient algorithm for implementing them. The component that dominates the time complexity of the already quite fast ESN training remains constant (does not scale up with $k$) in our proposed method of doing $k$-fold cross-validation. The component that does scale linearly with $k$ starts dominating only in some not very common situations. Thus in many situations $k$-fold cross-validation of ESNs can be done for virtually the same time complexity as a simple single split validation. Space complexity can also remain the same. We also discuss when the proposed validation schemes for ESNs could be beneficial and empirically investigate them on several different real-world datasets.


An Automated Text Categorization Framework based on Hyperparameter Optimization

arXiv.org Artificial Intelligence

A great variety of text tasks such as topic or spam identification, user profiling, and sentiment analysis can be posed as a supervised learning problem and tackle using a text classifier. A text classifier consists of several subprocesses, some of them are general enough to be applied to any supervised learning problem, whereas others are specifically designed to tackle a particular task, using complex and computational expensive processes such as lemmatization, syntactic analysis, etc. Contrary to traditional approaches, we propose a minimalistic and wide system able to tackle text classification tasks independent of domain and language, namely microTC. It is composed by some easy to implement text transformations, text representations, and a supervised learning algorithm. These pieces produce a competitive classifier even in the domain of informally written text. We provide a detailed description of microTC along with an extensive experimental comparison with relevant state-of-the-art methods. mircoTC was compared on 30 different datasets. Regarding accuracy, microTC obtained the best performance in 20 datasets while achieves competitive results in the remaining 10. The compared datasets include several problems like topic and polarity classification, spam detection, user profiling and authorship attribution. Furthermore, it is important to state that our approach allows the usage of the technology even without knowledge of machine learning and natural language processing.