AITopics

Country:

North America > United States > Maryland (0.04)
North America > United States > New York (0.04)
North America > United States > California (0.04)
(3 more...)

Genre: Research Report > New Finding (0.92)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.92)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(4 more...)

Neural Information Processing SystemsOct-8-2025, 18:49:33 GMT

5f38404edff6f3f642d6fa5892479c42-Paper-Datasets_and_Benchmarks.pdf

data mining, machine learning, natural language, (22 more...)

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.04)
(6 more...)

Genre: Research Report (0.68)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Tax (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
(4 more...)

Technology:

Information Technology > e-Commerce > Financial Technology (1.00)
Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
(5 more...)

Neural Information Processing SystemsOct-8-2025, 17:19:15 GMT

EMBERSim: A Large-Scale Databank for Boosting Similarity Search in Malware Analysis Dragos

Moreover, we observe that the focus in the few related works falls on quantifying similarity in malware, often overlooking the clean data. This one-sided quantification is especially dangerous in the context of detection bypass.

data mining, machine learning, natural language, (18 more...)

Genre: Research Report (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.68)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
(4 more...)

Vandervorst, Félix, Deprez, Bruno, Verbeke, Wouter, Verdonck, Tim

Inductive inference of gradient-boosted decision trees on graphs for insurance fraud detection

arXiv.org Artificial IntelligenceOct-8-2025

Graph-based methods are becoming increasingly popular in machine learning due to their ability to model complex data and relations. Insurance fraud is a prime use case, since false claims are often the result of organised criminals that stage accidents or the same persons filing erroneous claims on multiple policies. One challenge is that graph-based approaches struggle to find meaningful representations of the data because of the high class imbalance present in fraud data. Another is that insurance networks are heterogeneous and dynamic, given the changing relations among people, companies and policies. That is why gradient boosted tree approaches on tabular data still dominate the field. Therefore, we present a novel inductive graph gradient boosting machine (G-GBM) for supervised learning on heterogeneous and dynamic graphs. We show that our estimator competes with popular graph neural network approaches in an experiment using a variety of simulated random graphs. We demonstrate the power of G-GBM for insurance fraud detection using an open-source and a real-world, proprietary dataset. Given that the backbone model is a gradient boosting forest, we apply established explainability methods to gain better insights into the predictions made by G-GBM.

artificial intelligence, graph, machine learning, (18 more...)

2510.05676

Country:

North America > United States (0.46)
Europe > Belgium > Flanders (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Law Enforcement & Public Safety > Fraud (1.00)
Banking & Finance > Insurance (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Jafari, Alireza, Yousefirizi, Fereshteh, Seydi, Vahid

DeepBoost-AF: A Novel Unsupervised Feature Learning and Gradient Boosting Fusion for Robust Atrial Fibrillation Detection in Raw ECG Signals

arXiv.org Artificial IntelligenceOct-8-2025

Atrial fibrillation (AF) is a prevalent cardiac arrhythmia associated with elevated health risks, where timely detection is pivotal for mitigating stroke-related morbidity. This study introduces an innovative hybrid methodology integrating unsupervised deep learning and gradient boosting models to improve AF detection. A 19-layer deep convolutional autoencoder (DCAE) is coupled with three boosting classifiers-AdaBoost, XGBoost, and LightGBM (LGBM)-to harness their complementary advantages while addressing individual limitations. The proposed framework uniquely combines DCAE with gradient boosting, enabling end-to-end AF identification devoid of manual feature extraction. The DCAE-LGBM model attains an F1-score of 95.20%, sensitivity of 99.99%, and inference latency of four seconds, outperforming existing methods and aligning with clinical deployment requirements. The DCAE integration significantly enhances boosting models, positioning this hybrid system as a reliable tool for automated AF detection in clinical settings.

algorithm, artificial intelligence, machine learning, (17 more...)

2505.24085

Country: Asia > Middle East > Iran (0.14)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

arXiv.org Artificial IntelligenceOct-7-2025

Bond-Centered Molecular Fingerprint Derivatives: A BBBP Dataset Study

Godin, Guillaume

A strong and fast baseline in molecular property prediction is a Random Forest (RF) trained on ECFP4/ECFP6 descriptors. In practice, the count-based variant of ECFP generally outperforms the binary variant, especially for classification. Recent deep-learning approaches can match or exceed these baselines, including pretrained transformer-CNN models (5) and graph neural networks such as ChemProp or AttentiveFP(6). Chemprop's key architectural choice is directed, bond-centered message passing, in contrast to the more common atom-centered formulations used by many MPNNs. Because much of the remaining architecture is comparable across message-passing GNNs, this raises a focused question: what concrete advantage does the bond-centered formulation confer over atom-centered approaches? To isolate this representational factor, we introduce a static Bond-Centered Fingerprint (BCFP) that mirrors Chemprop's bond-centric view, and we compare it directly against ECFP using a lightweight Random Forest or XGBoost pipeline on the Blood-Brain Barrier Penetration (BBBP) classification task. To our knowledge, this is the first study to propose BCFP and analyze its complementarity with ECFP (7) . Our results indicate that concatenating atom-and bond-centered fingerprints yields efficient and effective models for BBBP prediction, clarifying why bond-centric message passing often appears among top-k performers while offering a simple, fast alternative to full neural architectures.

artificial intelligence, bcfp, machine learning, (14 more...)

2510.04837

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine (0.36)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

Wang, Yuandou, Gunnarsson, Filip, Hai, Rihan

IMLP: An Energy-Efficient Continual Learning Method for Tabular Data Streams

arXiv.org Artificial IntelligenceOct-7-2025

Tabular data streams are rapidly emerging as a dominant modality for real-time decision-making in healthcare, finance, and the Internet of Things (IoT). These applications commonly run on edge and mobile devices, where energy budgets, memory, and compute are strictly limited. Continual learning (CL) addresses such dynamics by training models sequentially on task streams while preserving prior knowledge and consolidating new knowledge. While recent CL work has advanced in mitigating catastrophic forgetting and improving knowledge transfer, the practical requirements of energy and memory efficiency for tabular data streams remain underexplored. In particular, existing CL solutions mostly depend on replay mechanisms whose buffers grow over time and exacerbate resource costs. We propose a context-aware incremental Multi-Layer Perceptron (IMLP), a compact continual learner for tabular data streams. IMLP incorporates a windowed scaled dot-product attention over a sliding latent feature buffer, enabling constant-size memory and avoiding storing raw data. The attended context is concatenated with current features and processed by shared feed-forward layers, yielding lightweight per-segment updates. To assess practical deployability, we introduce NetScore-T, a tunable metric coupling balanced accuracy with energy for Pareto-aware comparison across models and datasets. IMLP achieves up to $27.6\times$ higher energy efficiency than TabNet and $85.5\times$ higher than TabPFN, while maintaining competitive average accuracy. Overall, IMLP provides an easy-to-deploy, energy-efficient alternative to full retraining for tabular data streams.

artificial intelligence, learning, machine learning, (16 more...)

2510.0466

Genre: Research Report (1.00)

Industry:

Health & Medicine (1.00)
Energy (0.70)
Information Technology > Smart Houses & Appliances (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.68)

arXiv.org Artificial IntelligenceOct-7-2025

Towards Carbon-Aware Container Orchestration: Predicting Workload Energy Consumption with Federated Learning

Saad, Zainab, Yang, Jialin, Leung, Henry, Drew, Steve

The growing reliance on large-scale data centers to run resource-intensive workloads has significantly increased the global carbon footprint, underscoring the need for sustainable computing solutions. While container orchestration platforms like Kubernetes help optimize workload scheduling to reduce carbon emissions, existing methods often depend on centralized machine learning models that raise privacy concerns and struggle to generalize across diverse environments. In this paper, we propose a federated learning approach for energy consumption prediction that preserves data privacy by keeping sensitive operational data within individual enterprises. By extending the Kubernetes Efficient Power Level Exporter (Kepler), our framework trains XGBoost models collaboratively across distributed clients using Flower's FedXgbBagging aggregation using a bagging strategy, eliminating the need for centralized data sharing. Experimental results on the SPECPower benchmark dataset show that our FL-based approach achieves 11.7 percent lower Mean Absolute Error compared to a centralized baseline. This work addresses the unresolved trade-off between data privacy and energy prediction efficiency in prior systems such as Kepler and CASPER and offers enterprises a viable pathway toward sustainable cloud computing without compromising operational privacy.

cloud computing, machine learning, reinforcement learning, (15 more...)

2510.0397

Country: North America > Canada > Alberta > Census Division No. 6 > Calgary Metropolitan Region > Calgary (0.14)

Genre: Research Report (0.52)

Industry:

Information Technology > Security & Privacy (1.00)
Energy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.52)
(2 more...)

Neural Information Processing SystemsOct-3-2025, 09:25:45 GMT

SnapBoost: A Heterogeneous Boosting Machine Thomas Parnell

We note that while the subclasses used in practice (e.g., trees) may well be infinite beyond a simple Our proposed method for solving this optimization problem is presented in full in Algorithm 1. The supplemental material contains exemplary code for Algorithm 1 that uses generic scikit-learn regressors.

hypothesis, iteration, snapboost, (16 more...)