samme
Appendices This is the supplemental material for Optimization and Generalization Analysis of Transduction through Gradient Boosting and Application to Multi-scale Graph Neural Networks
We give proofs for the theorems and propositions in the order they appeared in the main paper. We prove the more detailed claim. Proposition 1 is a part of the following proposition. The followings are equivalent 1. There exist α,β such that α > β 0 and Z satisfies ( α,β, g)-w.l.c. 2. nullZ, g null > 0 .
GAdaBoost: An Efficient and Robust AdaBoost Algorithm Based on Granular-Ball Structure
Xie, Qin, Zhang, Qinghua, Xia, Shuyin, Zhou, Xinran, Wang, Guoyin
Adaptive Boosting (AdaBoost) faces significant challenges posed by label noise, especially in multiclass classification tasks. Existing methods either lack mechanisms to handle label noise effectively or suffer from high computational costs due to redundant data usage. Inspired by granular computing, this paper proposes granular adaptive boosting (GAdaBoost), a novel two-stage framework comprising a data granulation stage and an adaptive boosting stage, to enhance efficiency and robustness under noisy conditions. To validate its feasibility, an extension of SAMME, termed GAdaBoost.SA, is proposed. Specifically, first, a granular-ball generation method is designed to compress data while preserving diversity and mitigating label noise. Second, the granular ball-based SAMME algorithm focuses on granular balls rather than individual samples, improving efficiency and reducing sensitivity to noise. Experimental results on some noisy datasets show that the proposed approach achieves superior robustness and efficiency compared with existing methods, demonstrating that this work effectively extends AdaBoost and SAMME.
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.93)
The SAMME.C2 algorithm for severely imbalanced multi-class classification
So, Banghee, Valdez, Emiliano A.
Classification predictive modeling involves the accurate assignment of observations in a dataset to target classes or categories. There is an increasing growth of real-world classification problems with severely imbalanced class distributions. In this case, minority classes have much fewer observations to learn from than those from majority classes. Despite this sparsity, a minority class is often considered the more interesting class yet developing a scientific learning algorithm suitable for the observations presents countless challenges. In this article, we suggest a novel multi-class classification algorithm specialized to handle severely imbalanced classes based on the method we refer to as SAMME.C2. It blends the flexible mechanics of the boosting techniques from SAMME algorithm, a multi-class classifier, and Ada.C2 algorithm, a cost-sensitive binary classifier designed to address highly class imbalances. Not only do we provide the resulting algorithm but we also establish scientific and statistical formulation of our proposed SAMME.C2 algorithm. Through numerical experiments examining various degrees of classifier difficulty, we demonstrate consistent superior performance of our proposed model.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > New York (0.04)
- North America > United States > Michigan (0.04)
- (2 more...)
Cost-sensitive Multi-class AdaBoost for Understanding Driving Behavior with Telematics
So, Banghee, Boucher, Jean-Philippe, Valdez, Emiliano A.
Powered with telematics technology, insurers can now capture a wide range of data, such as distance traveled, how drivers brake, accelerate or make turns, and travel frequency each day of the week, to better decode driver's behavior. Such additional information helps insurers improve risk assessments for usage-based insurance (UBI), an increasingly popular industry innovation. In this article, we explore how to integrate telematics information to better predict claims frequency. For motor insurance during a policy year, we typically observe a large proportion of drivers with zero claims, a less proportion with exactly one claim, and far lesser with two or more claims. We introduce the use of a cost-sensitive multi-class adaptive boosting (AdaBoost) algorithm, which we call SAMME.C2, to handle such imbalances. To calibrate SAMME.C2 algorithm, we use empirical data collected from a telematics program in Canada and we find improved assessment of driving behavior with telematics relative to traditional risk variables. We demonstrate our algorithm can outperform other models that can handle class imbalances: SAMME, SAMME with SMOTE, RUSBoost, and SMOTEBoost. The sampled data on telematics were observations during 2013-2016 for which 50,301 are used for training and another 21,574 for testing. Broadly speaking, the additional information derived from vehicle telematics helps refine risk classification of drivers of UBI.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > New York (0.04)
- North America > United States > Connecticut > Tolland County > Storrs (0.04)
- (3 more...)