AITopics

2509.17674

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.88)

Industry:

Health & Medicine > Nuclear Medicine (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)

Prieto-Santamaría, Lucía, Iglesias, Alba Cortés, Giné, Claudio Vidal, Calderón, Fermín Fernández, Lozano, Óscar M., Rodríguez-González, Alejandro

A Weak Supervision Approach for Monitoring Recreational Drug Use Effects in Social Media

arXiv.org Artificial IntelligenceSep-22-2025

Understanding the real-world effects of recreational drug use remains a critical challenge in public health and biomedical research, especially as traditional surveillance systems often underrepresent user experiences. In this study, we leverage social media (specifically Twitter) as a rich and unfiltered source of user-reported effects associated with three emerging psychoactive substances: ecstasy, GHB, and 2C-B. By combining a curated list of slang terms with biomedical concept extraction via MetaMap, we identified and weakly annotated over 92,000 tweets mentioning these substances. Each tweet was labeled with a polarity reflecting whether it reported a positive or negative effect, following an expert-guided heuristic process. We then performed descriptive and comparative analyses of the reported phenotypic outcomes across substances and trained multiple machine learning classifiers to predict polarity from tweet content, accounting for strong class imbalance using techniques such as cost-sensitive learning and synthetic oversampling. The top performance on the test set was obtained from eXtreme Gradient Boosting with cost-sensitive learning (F1 = 0.885, AUPRC = 0.934). Our findings reveal that Twitter enables the detection of substance-specific phenotypic effects, and that polarity classification models can support real-time pharmacovigilance and drug effect characterization with high accuracy.

machine learning, natural language, tweet, (20 more...)

2509.15266

Country: Europe > Spain (0.29)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Addiction Disorder (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.96)
(3 more...)

Tessema, Amsalu, Bayih, Tizazu, Azezew, Kassahun, Kassie, Ayenew

Data-Driven Prediction of Maternal Nutritional Status in Ethiopia Using Ensemble Machine Learning Models

arXiv.org Artificial IntelligenceSep-19-2025

Malnutrition among pregnant women is a major public health challenge in Ethiopia, increasing the risk of adverse maternal and neonatal outcomes. Traditional statistical approaches often fail to capture the complex and multidimensional determinants of nutritional status. This study develops a predictive model using ensemble machine learning techniques, leveraging data from the Ethiopian Demographic and Health Survey (2005-2020), comprising 18,108 records with 30 socio-demographic and health attributes. Data preprocessing included handling missing values, normalization, and balancing with SMOTE, followed by feature selection to identify key predictors. Several supervised ensemble algorithms including XGBoost, Random Forest, CatBoost, and AdaBoost were applied to classify nutritional status. Among them, the Random Forest model achieved the best performance, classifying women into four categories (normal, moderate malnutrition, severe malnutrition, and overnutrition) with 97.87% accuracy, 97.88% precision, 97.87% recall, 97.87% F1-score, and 99.86% ROC AUC. These findings demonstrate the effectiveness of ensemble learning in capturing hidden patterns from complex datasets and provide timely insights for early detection of nutritional risks. The results offer practical implications for healthcare providers, policymakers, and researchers, supporting data-driven strategies to improve maternal nutrition and health outcomes in Ethiopia.

artificial intelligence, ethiopia, machine learning, (14 more...)

2509.14945

Country:

Africa > Ethiopia (0.94)
Asia (0.94)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Pediatrics/Neonatology (1.00)
Health & Medicine > Therapeutic Area > Obstetrics/Gynecology (1.00)
Health & Medicine > Consumer Health (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Karaca, Huseyin, Kozat, Suleyman Serdar

Soft Gradient Boosting with Learnable Feature Transforms for Sequential Regression

arXiv.org Artificial IntelligenceSep-17-2025

We propose a soft gradient boosting framework for sequential regression that embeds a learnable linear feature transform within the boosting procedure. At each boosting iteration, we train a soft decision tree and learn a linear input feature transform Q together. This approach is particularly advantageous in high-dimensional, data-scarce scenarios, as it discovers the most relevant input representations while boosting. We demonstrate, using both synthetic and real-world datasets, that our method effectively and efficiently increases the performance by an end-to-end optimization of feature selection/transform and boosting while avoiding overfitting. We also extend our algorithm to differentiable non-linear transforms if overfitting is not a problem. To support reproducibility and future work, we share our code publicly.

artificial intelligence, decision tree, machine learning, (11 more...)

2509.1292

Country: Asia (0.46)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Narimani, Mohammadreza, Pourreza, Alireza, Moghimi, Ali, Farajpoor, Parastoo, Jafarbiglu, Hamid, Mesgaran, Mohsen B.

Early Detection of Branched Broomrape (Phelipanche ramosa) Infestation in Tomato Crops Using Leaf Spectral Analysis and Machine Learning

Branched broomrape (Phelipanche ramosa) is a chlorophyll-deficient parasitic weed that threatens tomato production by extracting nutrients from the host. We investigate early detection using leaf-level spectral reflectance (400-2500 nm) and ensemble machine learning. In a field experiment in Woodland, California, we tracked 300 tomato plants across growth stages defined by growing degree days (GDD). Leaf reflectance was acquired with a portable spectrometer and preprocessed (band denoising, 1 nm interpolation, Savitzky-Golay smoothing, correlation-based band reduction). Clear class differences were observed near 1500 nm and 2000 nm water absorption features, consistent with reduced leaf water content in infected plants at early stages. An ensemble combining Random Forest, XGBoost, SVM with RBF kernel, and Naive Bayes achieved 89% accuracy at 585 GDD, with recalls of 0.86 (infected) and 0.93 (noninfected). Accuracy declined at later stages (e.g., 69% at 1568 GDD), likely due to senescence and weed interference. Despite the small number of infected plants and environmental confounders, results show that proximal sensing with ensemble learning enables timely detection of broomrape before canopy symptoms are visible, supporting targeted interventions and reduced yield losses.

artificial intelligence, broomrape, machine learning, (15 more...)

2509.12074

Country: North America > United States > California > Yolo County > Woodland (0.25)

Genre: Research Report > New Finding (0.88)

Industry:

Food & Agriculture > Agriculture (1.00)
Government > Regional Government > North America Government > United States Government (0.94)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.65)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Transparent and Fair Profiling in Employment Services: Evidence from Switzerland

Räz, Tim

Long-term unemployment (LTU) is a challenge for both jobseekers and public employment services. Statistical profiling tools are increasingly used to predict LTU risk. Some profiling tools are opaque, black-box machine learning models, which raise issues of transparency and fairness. This paper investigates whether interpretable models could serve as an alternative, using administrative data from Switzerland. Traditional statistical, interpretable, and black-box models are compared in terms of predictive performance, interpretability, and fairness. It is shown that explainable boosting machines, a recent interpretable model, perform nearly as well as the best black-box models. It is also shown how model sparsity, feature smoothing, and fairness mitigation can enhance transparency and fairness with only minor losses in performance. These findings suggest that interpretable profiling provides an accountable and trustworthy alternative to black-box models without compromising performance.

data mining, machine learning, prediction, (18 more...)

2509.11847

Country: Europe > Switzerland (1.00)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry:

Government (1.00)
Banking & Finance > Economy (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Data Science > Data Mining (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.68)

Somvanshi, Shriyank, Hebli, Pavan, Chhetri, Gaurab, Das, Subasish

Tabular Data with Class Imbalance: Predicting Electric Vehicle Crash Severity with Pretrained Transformers (TabPFN) and Mamba-Based Models

This study presents a deep tabular learning framework for predicting crash severity in electric vehicle (EV) collisions using real-world crash data from Texas (2017-2023). After filtering for electric-only vehicles, 23,301 EV-involved crash records were analyzed. Feature importance techniques using XGBoost and Random Forest identified intersection relation, first harmful event, person age, crash speed limit, and day of week as the top predictors, along with advanced safety features like automatic emergency braking. To address class imbalance, Synthetic Minority Over-sampling Technique and Edited Nearest Neighbors (SMOTEENN) resampling was applied. Three state-of-the-art deep tabular models, TabPFN, MambaNet, and MambaAttention, were benchmarked for severity prediction. While TabPFN demonstrated strong generalization, MambaAttention achieved superior performance in classifying severe injury cases due to its attention-based feature reweighting. The findings highlight the potential of deep tabular architectures for improving crash severity prediction and enabling data-driven safety interventions in EV crash contexts.

artificial intelligence, deep learning, machine learning, (16 more...)

2509.11449

Country:

North America > United States > Texas (0.36)
Asia > Philippines > Luzon > National Capital Region > City of Manila (0.14)

Genre: Research Report (0.82)

Industry:

Transportation > Ground > Road (1.00)
Transportation > Electric Vehicle (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Schwartz, Sagi, Wang, Qinling, Fang, Fang

Enhancing ML Models Interpretability for Credit Scoring

Predicting default is essential for banks to ensure profitability and financial stability. While modern machine learning methods often outperform traditional regression techniques, their lack of transparency limits their use in regulated environments. Explainable artificial intelligence (XAI) has emerged as a solution in domains like credit scoring. However, most XAI research focuses on post-hoc interpretation of black-box models, which does not produce models lightweight or transparent enough to meet regulatory requirements, such as those for Internal Ratings-Based (IRB) models. This paper proposes a hybrid approach: post-hoc interpretations of black-box models guide feature selection, followed by training glass-box models that maintain both predictive power and transparency. Using the Lending Club dataset, we demonstrate that this approach achieves performance comparable to a benchmark black-box model while using only 10 features - an 88.5% reduction. In our example, SHapley Additive exPlanations (SHAP) is used for feature selection, eXtreme Gradient Boosting (XGBoost) serves as the benchmark and the base black-box model, and Explainable Boosting Machine (EBM) and Penalized Logistic Tree Regression (PLTR) are the investigated glass-box models. We also show that model refinement using feature interaction analysis, correlation checks, and expert input can further enhance model interpretability and robustness.

artificial intelligence, interaction, machine learning, (17 more...)

2509.11389

Genre: Research Report > New Finding (0.70)

Industry: Banking & Finance > Credit (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

Yadav, Yogesh, Yadav, Sandeep K, Vijay, Vivek, Dixit, Ambesh

Crystal Systems Classification of Phosphate-Based Cathode Materials Using Machine Learning for Lithium-Ion Battery

The physical and chemical characteristics of cathodes used in batteries are derived from the lithium-ion phosphate cathodes crystalline arrangement, which is pivotal to the overall battery performance. Therefore, the correct prediction of the crystal system is essential to estimate the properties of cathodes. This study applies machine learning classification algorithms for predicting the crystal systems, namely monoclinic, orthorhombic, and triclinic, related to Li P (Mn, Fe, Co, Ni, V) O based Phosphate cathodes. The data used in this work is extracted from the Materials Project. Feature evaluation showed that cathode properties depend on the crystal structure, and optimized classification strategies lead to better predictability. Ensemble machine learning algorithms such as Random Forest, Extremely Randomized Trees, and Gradient Boosting Machines have demonstrated the best predictive capabilities for crystal systems in the Monte Carlo cross-validation test. Additionally, sequential forward selection (SFS) is performed to identify the most critical features influencing the prediction accuracy for different machine learning models, with Volume, Band gap, and Sites as input features ensemble machine learning algorithms such as Random Forest (80.69%), Extremely Randomized Tree (78.96%), and Gradient Boosting Machine (80.40%) approaches lead to the maximum accuracy towards crystallographic classification with stability and the predicted materials can be the potential cathode materials for lithium ion batteries.

artificial intelligence, cathode material, machine learning, (14 more...)

2509.10532

Country: Asia > India (0.30)

Genre: Research Report > New Finding (0.69)

Industry:

Energy > Energy Storage (1.00)
Electrical Industrial Apparatus (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.67)

AttnBoost: Retail Supply Chain Sales Insights via Gradient Boosting Perspective

Ge, Muxin, Ma, Hanyu, Wu, Yiyang, Ma, Xiaoli, Liu, Yadi, Moe, Ye Aung, Xie, Weizheng

Forecasting product demand in retail supply chains presents a complex challenge due to noisy, heterogeneous features and rapidly shifting consumer behavior. While traditional gradient boosting decision trees (GBDT) offer strong predictive performance on structured data, they often lack adaptive mechanisms to identify and emphasize the most relevant features under changing conditions. In this work, we propose AttnBoost, an interpretable learning framework that integrates feature-level attention into the boosting process to enhance both predictive accuracy and explainability. Specifically, the model dynamically adjusts feature importance during each boosting round via a lightweight attention mechanism, allowing it to focus on high-impact variables such as promotions, pricing, and seasonal trends. We evaluate AttnBoost on a large-scale retail sales dataset and demonstrate that it outperforms standard machine learning and deep tabular models, while also providing actionable insights for supply chain managers. An ablation study confirms the utility of the attention module in mitigating overfitting and improving interpretability. Our results suggest that attention-guided boosting represents a promising direction for interpretable and scalable AI in real-world forecasting applications.

artificial intelligence, attnboost, machine learning, (16 more...)

2509.10506

Country: North America > United States (0.68)

Genre: Research Report > New Finding (1.00)

Industry:

Retail (0.88)
Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)