tree model
- North America > United States > California (0.14)
- North America > United States > New Jersey > Hudson County > Hoboken (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- (4 more...)
AI-driven multi-source data fusion for algal bloom severity classification in small inland water bodies: Leveraging Sentinel-2, DEM, and NOAA climate data
Harmful algal blooms are a growing threat to inland water quality and public health worldwide, creating an urgent need for e fficient, accurate, and cost-e ff ective detection methods. This research introduces a high-performing methodology that integrates multiple open-source remote sensing data with advanced artificial intelligence models. Key data sources include Copernicus Sentinel-2 optical imagery, the Copernicus Digital Elevation Model (DEM), and NOAA's High-Resolution Rapid Refresh (HRRR) climate data, all e ffi ciently retrieved using platforms like Google Earth Engine (GEE) and Microsoft Planetary Computer (MPC). The NIR and two SWIR bands from Sentinel-2, the altitude from the elevation model, the temperature and wind from NOAA as well as the longitude and latitude were the most important features. The approach combines two types of machine learning models--tree-based models and a neural network--into an ensemble for classifying algal bloom severity. While the tree models performed strongly on their own, incorporating a neural network added robustness and demonstrated how deep learning models can e ff ectively use diverse remote sensing inputs. The method leverages high-resolution satellite imagery and AI-driven analysis to monitor algal blooms dynamically, and although initially developed for a NASA competition in the U.S., it shows potential for global application. Keywords: Machine learning; Inland Water; Algal Bloom; Remote Sensing; Data Fusion; Water Quality 1. Introduction Algal blooms are becoming the greatest inland water quality threat to public health and aquatic ecosystems that can degrade water quality to a greater extent than many chemicals (Brooks et al., 2016). Human nutrient loading and climate change (warming, altered rainfall) synergistically enhance cyanobacterial blooms in aquatic ecosystems (Paerl and Paul, 2012). Excessive nutrient loads in many cases comes from agricultural, industrial and other sources (Novotny, 2011). Phenology and trends of chlorophyll-a and cyanobacterial blooms are established (Matthews, 2014).
- Asia > China (0.05)
- South America > Uruguay (0.04)
- Indian Ocean > Red Sea (0.04)
- (10 more...)
- Water & Waste Management > Water Management > Water Supplies & Services (1.00)
- Health & Medicine (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
- Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (1.00)
- North America > United States > California (0.14)
- North America > United States > New York (0.04)
- North America > United States > New Jersey > Hudson County > Hoboken (0.04)
- (5 more...)
Enhanced Survival Trees
Zhou, Ruiwen, Xie, Ke, Liu, Lei, Xu, Zhichen, Ding, Jimin, Su, Xiaogang
We introduce a new survival tree method for censored failure time data that incorporates three key advancements over traditional approaches. First, we develop a more computationally efficient splitting procedure that effectively mitigates the end-cut preference problem, and we propose an intersected validation strategy to reduce the variable selection bias inherent in greedy searches. Second, we present a novel framework for determining tree structures through fused regularization. In combination with conventional pruning, this approach enables the merging of non-adjacent terminal nodes, producing more parsimonious and interpretable models. Third, we address inference by constructing valid confidence intervals for median survival times within the subgroups identified by the final tree. To achieve this, we apply bootstrap-based bias correction to standard errors. The proposed method is assessed through extensive simulation studies and illustrated with data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) study.
- North America > United States > Texas > El Paso County > El Paso (0.14)
- Europe > Austria > Vienna (0.14)
- North America > United States > Missouri > St. Louis County > St. Louis (0.04)
- North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
- North America > United States > Nevada > Clark County > Las Vegas (0.04)
- Europe > Italy (0.04)
Trustworthy Tree-based Machine Learning by $MoS_2$ Flash-based Analog CAM with Inherent Soft Boundaries
Wen, Bo, Gao, Guoyun, Xu, Zhicheng, Mao, Ruibin, Qi, Xiaojuan, Hu, X. Sharon, Yin, Xunzhao, Li, Can
The rapid advancement of artificial intelligence has raised concerns regarding its trustworthiness, especially in terms of interpretability and robustness. Tree-based models like Random Forest and XGBoost excel in interpretability and accuracy for tabular data, but scaling them remains computationally expensive due to poor data locality and high data dependence. Previous efforts to accelerate these models with analog content addressable memory (CAM) have struggled, due to the fact that the difficult-to-implement sharp decision boundaries are highly susceptible to device variations, which leads to poor hardware performance and vulnerability to adversarial attacks. This work presents a novel hardware-software co-design approach using $MoS_2$ Flash-based analog CAM with inherent soft boundaries, enabling efficient inference with soft tree-based models. Our soft tree model inference experiments on $MoS_2$ analog CAM arrays show this method achieves exceptional robustness against device variation and adversarial attacks while achieving state-of-the-art accuracy. Specifically, our fabricated analog CAM arrays achieve $96\%$ accuracy on Wisconsin Diagnostic Breast Cancer (WDBC) database, while maintaining decision explainability. Our experimentally calibrated model validated only a $0.6\%$ accuracy drop on the MNIST dataset under $10\%$ device threshold variation, compared to a $45.3\%$ drop for traditional decision trees. This work paves the way for specialized hardware that enhances AI's trustworthiness and efficiency.
- North America > United States > Wisconsin (0.24)
- Asia > China > Hong Kong (0.05)
- North America > United States > Indiana > St. Joseph County > Notre Dame (0.04)
- (2 more...)
- Information Technology (0.87)
- Government > Military (0.69)
- Health & Medicine > Therapeutic Area > Oncology (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Generalized Bayesian Ensemble Survival Tree (GBEST) model
Ballante, Elena, Muliere, Pietro, Figini, Silvia
This paper proposes a new class of predictive models for survival analysis called Generalized Bayesian Ensemble Survival Tree (GBEST). It is well known that survival analysis poses many different challenges, in particular when applied to small data or censorship mechanism. Our contribution is the proposal of an ensemble approach that uses Bayesian bootstrap and beta Stacy bootstrap methods to improve the outcome in survival application with a special focus on small datasets. More precisely, a novel approach to integrate Beta Stacy Bayesian bootstrap in bagging tree models for censored data is proposed in this paper. Empirical evidence achieved on simulated and real data underlines that our approach performs better in terms of predictive performances and stability of the results compared with classical survival models available in the literature. In terms of methodology our novel contribution considers the adaptation of recent Bayesian ensemble approaches to survival data, providing a new model called Generalized Bayesian Ensemble Survival Tree (GBEST). A further result in terms of computational novelty is the implementation in R of GBEST, available in a public GitHub repository.
- North America > United States > New York (0.04)
- Europe > Italy > Lombardy > Milan (0.04)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Law > Civil Rights & Constitutional Law (0.75)
- Education > Curriculum > Subject-Specific Education (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
Explaining the Unexplained: Revealing Hidden Correlations for Better Interpretability
Jiang, Wen-Dong, Chang, Chih-Yung, Yen, Show-Jane, Roy, Diptendu Sinha
Thanks to the rapid advancement of computer hardware, deep learning has made significant progress in the application of unstructured data, such as images (Cao & Chen, 2025) and text (Li et al., 2024). Specifically, the success of representation learning (Wang & Lian, 2025; Zhang et al., 2025) has gradually replaced the earlier approaches of transforming unstructured data into structured formats. The key to the success of representation learning lies in leveraging a large number of parameters for backpropagation, enabling the model to adapt to data with non-normal distributions. Although models based on backpropagation neural networks (Yang et al., 2019; Banerjee et al., 2023) have achieved significant technical advancements, their application in many sensitive domains, such as medicine (Zhang et al., 2025) and industrial inspection (Rathee et al., 2021), still faces considerable challenges due to the difficulty in understanding the basis of their decision-making. Explainable Artificial Intelligence (XAI) aims to reveal the inner mechanisms of neural network decisions, thereby making these models more reliable for applications in sensitive domains. In recent years, several studies (Li et al., 2025; Jing et al., 2025; Liu et al., 2024; Guan et al., 2024) have focused on injecting explainability into deep learning models and using various visualization techniques to explain the decisions of these "black box" models. While these models have achieved a certain level of interpretability, two pressing issues remain (Huang & Marques, 2023; Huang & Marques, 2024): first, whether the correlations between different attributes are correctly evaluated, and second, whether the model's decision-making pathway truly aligns with human reasoning, even when the model's understanding appears consistent with user expectations.
Beyond Tree Models: A Hybrid Model of KAN and gMLP for Large-Scale Financial Tabular Data
Zhang, Mingming, Hu, Jiahao, Shi, Pengfei, Wang, Ningtao, Gao, Ruizhe, Sun, Guandong, Zhao, Feng, kang, Yulin, Fu, Xing, Wang, Weiqiang, Zhao, Junbo
Tabular data plays a critical role in real-world financial scenarios. Traditionally, tree models have dominated in handling tabular data. However, financial datasets in the industry often encounter some challenges, such as data heterogeneity, the predominance of numerical features and the large scale of the data, which can range from tens of millions to hundreds of millions of records. These challenges can lead to significant memory and computational issues when using tree-based models. Consequently, there is a growing need for neural network-based solutions that can outperform these models. In this paper, we introduce TKGMLP, an hybrid network for tabular data that combines shallow Kolmogorov Arnold Networks with Gated Multilayer Perceptron. This model leverages the strengths of both architectures to improve performance and scalability. We validate TKGMLP on a real-world credit scoring dataset, where it achieves state-of-the-art results and outperforms current benchmarks. Furthermore, our findings demonstrate that the model continues to improve as the dataset size increases, making it highly scalable. Additionally, we propose a novel feature encoding method for numerical data, specifically designed to address the predominance of numerical features in financial datasets. The integration of this feature encoding method within TKGMLP significantly improves prediction accuracy. This research not only advances table prediction technology but also offers a practical and effective solution for handling large-scale numerical tabular data in various industrial applications.
- Asia > China > Zhejiang Province > Hangzhou (0.06)
- North America > United States > New York > New York County > New York City (0.04)
- Asia > Singapore (0.04)
- Asia > Middle East > Iraq > Erbil Governorate > Erbil (0.04)