AITopics

2512.11162

Country:

Asia > Singapore (0.05)
North America > United States > California (0.04)

Genre: Research Report > New Finding (0.49)

Industry: Leisure & Entertainment > Games (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Liu, Chenyue, Mostafavi, Ali

WildfireGenome: Interpretable Machine Learning Reveals Local Drivers of Wildfire Risk and Their Cross-County Variation

arXiv.org Artificial IntelligenceNov-20-2025

Current wildfire risk assessments rely on coarse hazard maps and opaque machine learning models that optimize regional accuracy while sacrificing interpretability at the decision scale. WildfireGenome addresses these gaps through three components: (1) fusion of seven federal wildfire indicators into a sign-aligned, PCA-based composite risk label at H3 Level-8 resolution; (2) Random Forest classification of local wildfire risk; and (3) SHAP and ICE/PDP analyses to expose county-specific nonlinear driver relationships. Across seven ecologically diverse U.S. counties, models achieve accuracies of 0.755-0.878 and Quadratic Weighted Kappa up to 0.951, with principal components explaining 87-94% of indicator variance. Transfer tests show reliable performance between ecologically similar regions but collapse across dissimilar contexts. Explanations consistently highlight needleleaf forest cover and elevation as dominant drivers, with risk rising sharply at 30-40% needleleaf coverage. WildfireGenome advances wildfire risk assessment from regional prediction to interpretable, decision-scale analytics that guide vegetation management, zoning, and infrastructure planning.

artificial intelligence, machine learning, threshold, (18 more...)

2511.11589

Country:

North America > United States > Colorado (0.47)
North America > United States > California (0.46)
North America > United States > Texas (0.46)
North America > United States > Arkansas > Cross County (0.41)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (0.69)
Energy (0.68)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Tertulino, Rodrigo, Almeida, Ricardo

A Multi-level Analysis of Factors Associated with Student Performance: A Machine Learning Approach to the SAEB Microdata

arXiv.org Artificial IntelligenceNov-18-2025

Identifying the determinants of academic success in basic education represents a central challenge for educational research and policymaking, particularly in a country with Brazil's vast dimensions and socioeconomic heterogeneity (Issah et al. 2023). A systemic approach is crucial, as student performance is influenced by a complex interplay of factors spanning individual, academic, socioeconomic, and institutional domains (Barrag an Moreno and Guzm an Rinc on 2025). The System of Assessment of Basic Education (SAEB), conducted by the National Institute for Educational Studies and Research An ısio Teixeira (INEP) (INEP 2025), provides a rich, multi-level dataset uniquely suited for such an analysis (Bonamino et al. 2010). The public availability of its anonymized microdata enables the research community to investigate the intricate relationships between student proficiency and a wide array of contextual factors, from socioeconomic backgrounds to school infrastructure and teacher profiles. Consequently, the SAEB microdata is an essential resource for data-driven research aimed at informing and evaluating educational policies in the country (Lundberg and Lee 2017b; Mazoni and Oliveira 2023). While traditional statistical methods are common, the Educational Data Mining (EDM) paradigm offers powerful tools for uncovering complex, non-linear patterns from such data (Romero and Ventura 2010). Furthermore, we demonstrate that by interpreting the model's classification results with XAI techniques, our method provides data-driven insights for educators and policymakers (Idrizi 2024). The primary objective of this research is thus to develop and evaluate a multi-level machine learning model to identify the key systemic factors associated with the academic performance of 9th-grade and high school students, using the SAEB microdata. Building upon this perspective, the study shifts its analytical focus from purely individual student interventions toward addressing the systemic determinants that shape educational outcomes in Brazilian basic education.

artificial intelligence, machine learning, student, (17 more...)

2510.22266

Country:

North America > United States (0.93)
South America (0.67)

Genre:

Research Report > New Finding (1.00)
Instructional Material (1.00)

Industry:

Education > Assessment & Standards > Student Performance (1.00)
Education > Educational Setting > Higher Education (0.69)
Education > Curriculum > Subject-Specific Education (0.67)
Education > Educational Setting > K-12 Education > Secondary School (0.55)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Hammouri, Ziad Akram Ali, Zou, Yating, Ghosal, Rahul, Vidal, Juan C., Matabuena, Marcos

ROC Analysis with Covariate Adjustment Using Neural Network Models: Evaluating the Role of Age in the Physical Activity-Mortality Association

arXiv.org Machine LearningOct-20-2025

The receiver operating characteristic (ROC) curve and its summary measure, the Area Under the Curve (AUC), are well-established tools for evaluating the efficacy of biomarkers in biomedical studies. Compared to the traditional ROC curve, the covariate-adjusted ROC curve allows for individual evaluation of the biomarker. However, the use of machine learning models has rarely been explored in this context, despite their potential to develop more powerful and sophisticated approaches for biomarker evaluation. The goal of this paper is to propose a framework for neural network-based covariate-adjusted ROC modeling that allows flexible and nonlinear evaluation of the effectiveness of a biomarker to discriminate between two reference populations. The finite-sample performance of our method is investigated through extensive simulation tests under varying dependency structures between biomarkers, covariates, and referenced populations. The methodology is further illustrated in a clinically case study that assesses daily physical activity - measured as total activity time (TAC), a proxy for daily step count-as a biomarker to predict mortality at three, five and eight years. Analyzes stratified by sex and adjusted for age and BMI reveal distinct covariate effects on mortality outcomes. These results underscore the importance of covariate-adjusted modeling in biomarker evaluation and highlight TAC's potential as a functional capacity biomarker based on specific individual characteristics.

artificial intelligence, machine learning, mortality, (17 more...)

2510.14494

Country: North America > United States (0.93)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Zamrai, Muhammad Arif Hakimi, Yusof, Kamaludin Mohd

Random Forest Stratified K-Fold Cross Validation on SYN DoS Attack SD-IoV

arXiv.org Artificial IntelligenceSep-10-2025

In response to the prevalent concern of TCP SYN flood attacks within the context of Software-Defined Internet of Vehicles (SD-IoV), this study addresses the significant challenge of network security in rapidly evolving vehicular communication systems. This research focuses on optimizing a Random Forest Classifier model to achieve maximum accuracy and minimal detection time, thereby enhancing vehicular network security. The methodology involves preprocessing a dataset containing SYN attack instances, employing feature scaling and label encoding techniques, and applying Stratified K-Fold cross-validation to target key metrics such as accuracy, precision, recall, and F1-score. This research achieved an average value of 0.999998 for all metrics with a SYN DoS attack detection time of 0.24 seconds. Results show that the fine-tuned Random Forest model, configured with 20 estimators and a depth of 10, effectively differentiates between normal and malicious traffic with high accuracy and minimal detection time, which is crucial for SD-IoV networks. This approach marks a significant advancement and introduces a state-of-the-art algorithm in detecting SYN flood attacks, combining high accuracy with minimal detection time. It contributes to vehicular network security by providing a robust solution against TCP SYN flood attacks while maintaining network efficiency and reliability.

accuracy, artificial intelligence, machine learning, (16 more...)

doi: 10.1109/ICCET62255.2024.00008 10.1109/ICCET62255.2024.00008 10.1109/ICCET62255.2024.00008

2509.07016

Genre: Research Report > New Finding (0.48)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.62)

Kunwar, Raunak, LeBoulluec, Aera Kim

Predicting NCAP Safety Ratings: An Analysis of Vehicle Characteristics and ADAS Features Using Machine Learning

arXiv.org Artificial IntelligenceSep-3-2025

Vehicle safety assessment is crucial for consumer information and regulatory oversight. The New Car Assessment Program (NCAP) assigns standardized safety ratings, which traditionally emphasize passive safety measures but now include active safety technologies such as Advanced Driver-Assistance Systems (ADAS). It is crucial to understand how these various systems interact empirically. This study explores whether particular ADAS features like Forward Collision Warning, Lane Departure Warning, Crash Imminent Braking, and Blind Spot Detection, together with established vehicle attributes (e.g., Curb Weight, Model Year, Vehicle Type, Drive Train), can reliably predict a vehicle's likelihood of earning the highest (5-star) overall NCAP rating. Using a publicly available dataset derived from NCAP reports that contain approximately 5,128 vehicle variants spanning model years 2011-2025, we compared four different machine learning models: logistic regression, random forest, gradient boosting, and support vector classifier (SVC) using a 5-fold stratified cross-validation approach. The two best-performing algorithms (random forest and gradient boost) were hyperparameter optimized using RandomizedSearchCV. Analysis of feature importance showed that basic vehicle characteristics, specifically curb weight and model year, dominated predictive capability, contributing more than 55% of the feature relevance of the Random Forest model. However, the inclusion of ADAS features also provided meaningful predictive contributions. The optimized Random Forest model achieved robust results on a held-out test set, with an accuracy of 89.18% and a ROC AUC of 0.9586. This research reveals the use of machine learning to analyze large-scale NCAP data and highlights the combined predictive importance of both established vehicle parameters and modern ADAS features to achieve top safety ratings.

artificial intelligence, machine learning, random forest, (13 more...)

2509.01897

Country: North America > United States > Texas (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Automobiles & Trucks (1.00)
Transportation > Ground > Road (0.91)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

Neural Information Processing SystemsAug-14-2025, 23:55:56 GMT

86b94dae7c6517ec1ac767fd2c136580-AuthorFeedback.pdf

ada-bf, bloom filter, lbf, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.38)

Romanus, Ogonnaya M., Molinari, Roberto

Differentially Private Conformal Prediction via Quantile Binary Search

arXiv.org Machine LearningJul-18-2025

Most Differentially Private (DP) approaches focus on limiting privacy leakage from learners based on the data that they are trained on, there are fewer approaches that consider leakage when procedures involve a calibration dataset which is common in uncertainty quantification methods such as Conformal Prediction (CP). Since there is a limited amount of approaches in this direction, in this work we deliver a general DP approach for CP that we call Private Conformity via Quantile Search (P-COQS). The proposed approach adapts an existing randomized binary search algorithm for computing DP quantiles in the calibration phase of CP thereby guaranteeing privacy of the consequent prediction sets. This however comes at a price of slightly under-covering with respect to the desired $(1 - α)$-level when using finite-sample calibration sets (although broad empirical results show that the P-COQS generally targets the required level in the considered cases). Confirming properties of the adapted algorithm and quantifying the approximate coverage guarantees of the consequent CP, we conduct extensive experiments to examine the effects of privacy noise, sample size and significance level on the performance of our approach compared to existing alternatives. In addition, we empirically evaluate our approach on several benchmark datasets, including CIFAR-10, ImageNet and CoronaHack. Our results suggest that the proposed method is robust to privacy noise and performs favorably with respect to the current DP alternative in terms of empirical coverage, efficiency, and informativeness. Specifically, the results indicate that P-COQS produces smaller conformal prediction sets while simultaneously targeting the desired coverage and privacy guarantees in all these experimental settings.

machine learning, natural language, prediction, (16 more...)

2507.12497

Country:

North America > United States > Alabama > Lee County > Auburn (0.04)
Europe > Finland > Uusimaa > Helsinki (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Natural Language (0.67)

Weese, Maria L., Smucker, Byran J., Edwards, David J.

The use of cross validation in the analysis of designed experiments

arXiv.org Machine LearningJun-18-2025

Cross-validation (CV) is a common method to tune machine learning methods and can be used for model selection in regression as well. Because of the structured nature of small, traditional experimental designs, the literature has warned against using CV in their analysis. The striking increase in the use of machine learning, and thus CV, in the analysis of experimental designs, has led us to empirically study the effectiveness of CV compared to other methods of selecting models in designed experiments, including the little bootstrap. We consider both response surface settings where prediction is of primary interest, as well as screening where factor selection is most important. Overall, we provide evidence that the use of leave-one-out cross-validation (LOOCV) in the analysis of small, structured is often useful. More general $k$-fold CV may also be competitive but its performance is uneven.

artificial intelligence, experiment, machine learning, (16 more...)

2506.14593

Country:

North America > United States > Michigan > Wayne County > Detroit (0.04)
North America > United States > South Carolina > Charleston County > Charleston (0.04)
North America > United States > Ohio > Butler County > Oxford (0.04)
(7 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.68)

Industry:

Health & Medicine (1.00)
Materials (0.93)
Energy (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.84)

Haque, Md. Ehsanul, Polash, Md. Saymon Hosen, Simla, Md Al-Imran Sanjida, Hossain, Md Alomgir, Jahan, Sarwar

Enhancing IoT Cyber Attack Detection in the Presence of Highly Imbalanced Data

arXiv.org Artificial IntelligenceMay-19-2025

Due to the rapid growth in the number of Internet of Things (IoT) networks, the cyber risk has increased exponentially, and therefore, we have to develop effective IDS that can work well with highly imbalanced datasets. A high rate of missed threats can be the result, as traditional machine learning models tend to struggle in identifying attacks when normal data volume is much higher than the volume of attacks. For example, the dataset used in this study reveals a strong class imbalance with 94,659 instances of the majority class and only 28 instances of the minority class, making it quite challenging to determine rare attacks accurately. The challenges presented in this research are addressed by hybrid sampling techniques designed to improve data imbalance detection accuracy in IoT domains. After applying these techniques, we evaluate the performance of several machine learning models such as Random Forest, Soft Voting, Support Vector Classifier (SVC), K-Nearest Neighbors (KNN), Multi-Layer Perceptron (MLP), and Logistic Regression with respect to the classification of cyber-attacks. The obtained results indicate that the Random Forest model achieved the best performance with a Kappa score of 0.9903, test accuracy of 0.9961, and AUC of 0.9994. Strong performance is also shown by the Soft Voting model, with an accuracy of 0.9952 and AUC of 0.9997, indicating the benefits of combining model predictions. Overall, this work demonstrates the value of hybrid sampling combined with robust model and feature selection for significantly improving IoT security against cyber-attacks, especially in highly imbalanced data environments.

accuracy, artificial intelligence, machine learning, (11 more...)

doi: 10.1109/CSNT64827.2025.10968583

2505.106

Country: Asia (0.46)

Genre: Research Report > New Finding (0.89)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.71)