AITopics

2504.1286

Country: Europe > Netherlands (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

arXiv.org Machine LearningApr-16-2025

Can Moran Eigenvectors Improve Machine Learning of Spatial Data? Insights from Synthetic Data Validation

Li, Ziqi, Peng, Zhan

Moran Eigenvector Spatial Filtering (ESF) approaches have shown promise in accounting for spatial effects in statistical models. Can this extend to machine learning? This paper examines the effectiveness of using Moran Eigenvectors as additional spatial features in machine learning models. We generate synthetic datasets with known processes involving spatially varying and nonlinear effects across two different geometries. Moran Eigenvectors calculated from different spatial weights matrices, with and without a priori eigenvector selection, are tested. We assess the performance of popular machine learning models, including Random Forests, LightGBM, XGBoost, and TabNet, and benchmark their accuracies in terms of cross-validated R2 values against models that use only coordinates as features. We also extract coefficients and functions from the models using GeoShapley and compare them with the true processes. Results show that machine learning models using only location coordinates achieve better accuracies than eigenvector-based approaches across various experiments and datasets. Furthermore, we discuss that while these findings are relevant for spatial processes that exhibit positive spatial autocorrelation, they do not necessarily apply when modeling network autocorrelation and cases with negative spatial autocorrelation, where Moran Eigenvectors would still be useful.

artificial intelligence, machine learning, spatial reasoning, (16 more...)

2504.1245

Country:

North America > United States > Florida > Hillsborough County > University (0.04)
Asia > China (0.04)
North America > United States > Virginia (0.04)
(6 more...)

Genre: Research Report > New Finding (0.88)

Industry: Banking & Finance > Real Estate (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

arXiv.org Artificial IntelligenceApr-16-2025

Progressive Rock Music Classification

Nagar, Arpan, Bensabat, Joseph, Gaza, Jokent, Dey, Moinak

This study investigates the classification of progressive rock music, a genre characterized by complex compositions and diverse instrumentation, distinct from other musical styles. Addressing this Music Information Retrieval (MIR) task, we extracted comprehensive audio features, including spectrograms, Mel-Frequency Cepstral Coefficients (MFCCs), chromagrams, and beat positions from song snippets using the Librosa library. A winner-take-all voting strategy was employed to aggregate snippet-level predictions into final song classifications. We conducted a comparative analysis of various machine learning techniques. Ensemble methods, encompassing Bagging (Random Forest, ExtraTrees, Bagging Classifier) and Boosting (XGBoost, Gradient Boosting), were explored, utilizing Principal Component Analysis (PCA) for dimensionality reduction to manage computational constraints with high-dimensional feature sets. Additionally, deep learning approaches were investigated, including the development of custom 1D Convolutional Neural Network (1D CNN) architectures (named "Zuck" and "Satya") featuring specific layer configurations, normalization, and activation functions. Furthermore, we fine-tuned a state-of-the-art Audio Spectrogram Transformer (AST) model, leveraging its attention-based mechanisms for audio classification. Performance evaluation on validation and test sets revealed varying effectiveness across models, with ensemble methods like Extra Trees achieving test accuracies up to 76.38%. This research provides insights into the application and relative performance of diverse machine learning paradigms for the nuanced task of progressive rock genre classification.

artificial intelligence, machine learning, validation, (14 more...)

2504.10821

Country: Asia (0.28)

Genre: Research Report (1.00)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)
Education > Curriculum > Subject-Specific Education (0.41)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Ooi, Guang An, Ahmed, Shehab

Adaptive Cluster-Based Synthetic Minority Oversampling Technique for Traffic Mode Choice Prediction with Imbalanced Dataset

arXiv.org Artificial IntelligenceApr-15-2025

Urban datasets such as citizen transportation modes often contain disproportionately distributed classes, posing significant challenges to the classification of under-represented samples using data-driven models. In the literature, various resampling methods have been developed to create synthetic data for minority classes (oversampling) or remove samples from majority classes (undersampling) to alleviate class imbalance. However, oversampling approaches tend to overgeneralize minor classes that are closely clustered and neglect sparse regions which may contain crucial information. Conversely, undersampling methods potentially remove useful information on certain subgroups. Hence, a resampling approach that takes the inherent distribution of data into consideration is required to ensure appropriate synthetic data creation. This study proposes an adaptive cluster-based synthetic minority oversampling technique. Density-based spatial clustering is applied on minority classes to identify subgroups based on their input features. The classes in each of these subgroups are then oversampled according to the ratio of data points of their local cluster to the largest majority class. When used in conjunction with machine learning models such as random forest and extreme gradient boosting, this oversampling method results in significantly higher F1 scores for the minority classes compared to other resampling techniques. These improved models provide accurate classification of transportation modes.

artificial intelligence, dataset, machine learning, (16 more...)

2504.09486

Country: Asia > Middle East > Saudi Arabia (0.14)

Genre: Research Report (1.00)

Industry: Transportation (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)

arXiv.org Artificial IntelligenceApr-15-2025

Combining Forecasts using Meta-Learning: A Comparative Study for Complex Seasonality

Dudek, Grzegorz

Abstract--In this paper, we investigate meta-learning for combining forecasts generated by models of different types . While typical approaches for combining forecasts involve s imple averaging, machine learning techniques enable more sophis ti-cated methods of combining through meta-learning, leading to improved forecasting accuracy. We use linear regression, k - nearest neighbors, multilayer perceptron, random forest, and long short-term memory as meta-learners. We define global and local meta-learning variants for time series with compl ex seasonality and compare meta-learners on multiple forecas ting problems, demonstrating their superior performance compa red to simple averaging. Ensemble methods are widely recognized as a cornerstone of modern machine learning (ML) [1], commonly used for regression and classification problems. In addition, ensem bling has proven to be a highly effective approach for increasing the predictive power of forecasting models. The ensemble approach in forecasting, which involves combining the predictions of multiple models, can be justified for several reasons. First of all, it usually leads to increased accurac y. Ensemble models often outperform individual models, as the y leverage the strengths of different models and minimize the ir weaknesses. By combining diverse models, the ensemble can produce more accurate predictions by capturing a broader range of patterns and insights from the data. Ensembling als o allows for the incorporation of multiple drivers into the da ta generating process, mitigating uncertainties regarding m odel form and parameter specification [2].

artificial intelligence, deep learning, machine learning, (17 more...)

doi: 10.1109/DSAA60987.2023.10302585

2504.0894

Genre: Research Report > New Finding (1.00)

Industry: Energy > Power Industry (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Zhang, Yixin, Chen, Yisong

The Role of Machine Learning in Reducing Healthcare Costs: The Impact of Medication Adherence and Preventive Care on Hospitalization Expenses

arXiv.org Artificial IntelligenceApr-11-2025

This study reveals the important role of prevention care and medication adherence in reducing hospitalizations. By using a structured dataset of 1,171 patients, four machine learning models Logistic Regression, Gradient Boosting, Random Forest, and Artificial Neural Networks are applied to predict five-year hospitalization risk, with the Gradient Boosting model achieving the highest accuracy of 81.2%. The result demonstrated that patients with high medication adherence and consistent preventive care can reduce 38.3% and 37.7% in hospitalization risk. The finding also suggests that targeted preventive care can have positive Return on Investment (ROI), and therefore ML models can effectively direct personalized interventions and contribute to long-term medical savings.

artificial intelligence, machine learning, preventive care, (15 more...)

2504.07422

Genre: Research Report > New Finding (0.91)

Industry:

Health & Medicine > Consumer Health (0.95)
Health & Medicine > Government Relations & Public Policy (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.37)

de Melo, Eduardo Fraga L., Graziadei, Helton, Targino, Rodrigo

Actuarial Learning for Pension Fund Mortality Forecasting

arXiv.org Machine LearningApr-8-2025

For the assessment of the financial soundness of a pension fund, it is necessary to take into account mortality forecasting so that longevity risk is consistently incorporated into future cash flows. In this article, we employ machine learning models applied to actuarial science ({\it actuarial learning}) to make mortality predictions for a relevant sample of pension funds' participants. Actuarial learning represents an emerging field that involves the application of machine learning (ML) and artificial intelligence (AI) techniques in actuarial science. This encompasses the use of algorithms and computational models to analyze large sets of actuarial data, such as regression trees, random forest, boosting, XGBoost, CatBoost, and neural networks (eg. FNN, LSTM, and MHA). Our results indicate that some ML/AI algorithms present competitive out-of-sample performance when compared to the classical Lee-Carter model. This may indicate interesting alternatives for consistent liability evaluation and effective pension fund risk management.

artificial intelligence, deep learning, machine learning, (19 more...)

2504.05881

Country:

South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
North America > United States (0.04)
Europe > United Kingdom > Wales (0.04)
Europe > United Kingdom > England (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Consumer Products & Services > Retirement (1.00)
Banking & Finance > Insurance (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Souza, J. V. S., Vieira, C. B., Cavalcanti, G. D. C., Cruz, R. M. O.

Imbalanced malware classification: an approach based on dynamic classifier selection

arXiv.org Artificial IntelligenceApr-5-2025

In recent years, the rise of cyber threats has emphasized the need for robust malware detection systems, especially on mobile devices. Malware, which targets vulnerabilities in devices and user data, represents a substantial security risk. A significant challenge in malware detection is the imbalance in datasets, where most applications are benign, with only a small fraction posing a threat. This study addresses the often-overlooked issue of class imbalance in malware detection by evaluating various machine learning strategies for detecting malware in Android applications. We assess monolithic classifiers and ensemble methods, focusing on dynamic selection algorithms, which have shown superior performance compared to traditional approaches. In contrast to balancing strategies performed on the whole dataset, we propose a balancing procedure that works individually for each classifier in the pool. Our empirical analysis demonstrates that the KNOP algorithm obtained the best results using a pool of Random Forest. Additionally, an instance hardness assessment revealed that balancing reduces the difficulty of the minority class and enhances the detection of the minority class (malware). The code used for the experiments is available at https://github.com/jvss2/Machine-Learning-Empirical-Evaluation.

artificial intelligence, classifier, machine learning, (13 more...)

2504.00041

Country:

South America > Brazil > Pernambuco > Recife (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > Netherlands > South Holland > Dordrecht (0.04)

Genre: Research Report (0.40)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.36)

arXiv.org Artificial IntelligenceMar-28-2025

Machine Learning Models for Soil Parameter Prediction Based on Satellite, Weather, Clay and Yield Data

Kammerlander, Calvin, Kolb, Viola, Luegmair, Marinus, Scheermann, Lou, Schmailzl, Maximilian, Seufert, Marco, Zhang, Jiayun, Dalic, Denis, Schön, Torsten

Efficient nutrient management and precise fertilization are essential for advancing modern agriculture, particularly in regions striving to optimize crop yields sustainably. The AgroLens project endeavors to address this challenge by develop ing Machine Learning (ML)-based methodologies to predict soil nutrient levels without reliance on laboratory tests. By leveraging state of the art techniques, the project lays a foundation for acionable insights to improve agricultural productivity in resource-constrained areas, such as Africa. The approach begins with the development of a robust European model using the LUCAS Soil dataset and Sentinel-2 satellite imagery to estimate key soil properties, including phosphorus, potassium, nitrogen, and pH levels. This model is then enhanced by integrating supplementary features, such as weather data, harvest rates, and Clay AI-generated embeddings. This report details the methodological framework, data preprocessing strategies, and ML pipelines employed in this project. Advanced algorithms, including Random Forests, Extreme Gradient Boosting (XGBoost), and Fully Connected Neural Networks (FCNN), were implemented and finetuned for precise nutrient prediction. Results showcase robust model performance, with root mean square error values meeting stringent accuracy thresholds. By establishing a reproducible and scalable pipeline for soil nutrient prediction, this research paves the way for transformative agricultural applications, including precision fertilization and improved resource allocation in underresourced regions like Africa.

artificial intelligence, deep learning, machine learning, (16 more...)

2503.22276

Country:

North America > United States (0.46)
Africa (0.45)
Europe > Germany > Bavaria > Upper Bavaria > Ingolstadt (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry:

Food & Agriculture > Agriculture (1.00)
Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.34)
Education > Health & Safety > School Nutrition (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Lee, HyeYoung, Tsoi, Pavel

Feature-Enhanced Machine Learning for All-Cause Mortality Prediction in Healthcare Data

arXiv.org Machine LearningMar-27-2025

Accurate patient mortality prediction enables effective risk stratification, leading to personalized treatment plans and improved patient outcomes. However, predicting mortality in healthcare remains a significant challenge, with existing studies often focusing on specific diseases or limited predictor sets. This study evaluates machine learning models for all-cause in-hospital mortality prediction using the MIMIC-III database, employing a comprehensive feature engineering approach. Guided by clinical expertise and literature, we extracted key features such as vital signs (e.g., heart rate, blood pressure), laboratory results (e.g., creatinine, glucose), and demographic information. The Random Forest model achieved the highest performance with an AUC of 0.94, significantly outperforming other machine learning and deep learning approaches. This demonstrates Random Forest's robustness in handling high-dimensional, noisy clinical data and its potential for developing effective clinical decision support tools. Our findings highlight the importance of careful feature engineering for accurate mortality prediction. We conclude by discussing implications for clinical adoption and propose future directions, including enhancing model robustness and tailoring prediction models for specific diseases.

artificial intelligence, machine learning, prediction, (14 more...)

2503.21241

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Middle East > Israel (0.04)

Genre:

Research Report > New Finding (0.88)
Research Report > Experimental Study (0.66)

Industry:

Health & Medicine > Health Care Providers & Services (1.00)
Health & Medicine > Diagnostic Medicine > Vital Signs (0.55)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.47)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.94)