AITopics | Ensemble Learning

Collaborating Authors

Ensemble Learning

Ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Adversarial Learning for Feature Shift Detection and Correction

Neural Information Processing SystemsJan-19-2025, 19:58:03 GMT

Data shift is a phenomenon present in many real-world applications, and while there are multiple methods attempting to detect shifts, the task of localizing and correcting the features originating such shifts has not been studied in depth. Feature shifts can occur in many datasets, including in multi-sensor data, where some sensors are malfunctioning, or in tabular and structured data, including biomedical, financial, and survey data, where faulty standardization and data processing pipelines can lead to erroneous features. In this work, we explore using the principles of adversarial learning, where the information from several discriminators trained to distinguish between two distributions is used to both detect the corrupted features and fix them in order to remove the distribution shift between datasets. We show that mainstream supervised classifiers, such as random forest or gradient boosting trees, combined with simple iterative heuristics, can localize and correct feature shifts, outperforming current statistical and neural network-based techniques.

adversarial learning, dataset, feature shift detection and correction

Neural Information Processing Systems

Country: North America > Montserrat (0.09)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.64)

Add feedback

On the Gini-impurity Preservation For Privacy Random Forests

Neural Information Processing SystemsJan-19-2025, 14:29:49 GMT

Random forests have been one successful ensemble algorithms in machine learning. Various techniques have been utilized to preserve the privacy of random forests from anonymization, differential privacy, homomorphic encryption, etc., whereas it rarely takes into account some crucial ingredients of learning algorithm. This work presents a new encryption to preserve data's Gini impurity, which plays a crucial role during the construction of random forests. Our basic idea is to modify the structure of binary search tree to store several examples in each node, and encrypt data features by incorporating label and order information. Theoretically, we prove that our scheme preserves the minimum Gini impurity in ciphertexts without decrypting, and present the security guarantee for encryption.

encrypt data, gini-impurity preservation, privacy random forest, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Stability of Random Forests and Coverage of Random-Forest Prediction Intervals

Neural Information Processing SystemsJan-18-2025, 21:36:37 GMT

We establish stability of random forests under the mild condition that the squared response ( Y 2) does not have a heavy tail. In particular, our analysis holds for the practical version of random forests that is implemented in popular packages like \texttt{randomForest} in \texttt{R}. Empirical results show that stability may persist even beyond our assumption and hold for heavy-tailed Y 2 . Using the stability property, we prove a non-asymptotic lower bound for the coverage probability of prediction intervals constructed from the out-of-bag error of random forests. With another mild condition that is typically satisfied when Y is continuous, we also establish a complementary upper bound, which can be similarly established for the jackknife prediction interval constructed from an arbitrary stable algorithm.

coverage probability, random forest and coverage, random-forest prediction interval, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

SketchBoost: Fast Gradient Boosted Decision Tree for Multioutput Problems

Neural Information Processing SystemsJan-18-2025, 07:38:20 GMT

Gradient Boosted Decision Tree (GBDT) is a widely-used machine learning algorithm that has been shown to achieve state-of-the-art results on many standard data science problems. We are interested in its application to multioutput problems when the output is highly multidimensional. Although there are highly effective GBDT implementations, their scalability to such problems is still unsatisfactory. In this paper, we propose novel methods aiming to accelerate the training process of GBDT in the multioutput scenario. The idea behind these methods lies in the approximate computation of a scoring function used to find the best split of decision trees.

fast gradient boosted decision tree, gradient boosted decision tree, sketchboost, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.66)

Add feedback

Fake Advertisements Detection Using Automated Multimodal Learning: A Case Study for Vietnamese Real Estate Data

Nguyen, Duy, Nguyen, Trung T., Nguyen, Cuong V.

arXiv.org Artificial IntelligenceJan-18-2025

The popularity of e-commerce has given rise to fake advertisements that can expose users to financial and data risks while damaging the reputation of these e-commerce platforms. For these reasons, detecting and removing such fake advertisements are important for the success of e-commerce websites. In this paper, we propose FADAML, a novel end-to-end machine learning system to detect and filter out fake online advertisements. Our system combines techniques in multimodal machine learning and automated machine learning to achieve a high detection rate. As a case study, we apply FADAML to detect fake advertisements on popular Vietnamese real estate websites. Our experiments show that we can achieve 91.5% detection accuracy, which significantly outperforms three different state-of-the-art fake news detection systems.

advertisement, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2501.10848

Country:

Asia > Vietnam > Hồ Chí Minh City > Hồ Chí Minh City (0.04)
North America > United States > New York (0.04)
North America > Canada > Ontario > Toronto (0.04)
(3 more...)

Genre: Research Report > New Finding (0.68)

Industry:

Marketing (1.00)
Banking & Finance > Real Estate (1.00)
Information Technology > Services > e-Commerce Services (0.95)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.69)

Add feedback

Model Class Reliance for Random Forests

Neural Information Processing SystemsJan-16-2025, 22:58:14 GMT

Variable Importance (VI) has traditionally been cast as the process of estimating each variables contribution to a predictive model's overall performance. Recent research has sought to address this concern via analysis of Rashomon sets - sets of alternative model instances that exhibit equivalent predictive performance to some reference model, but which take different functional forms. Measures such as Model Class Reliance (MCR) have been proposed, that are computed against Rashomon sets, in order to ascertain how much a variable must be relied on to make robust predictions, or whether alternatives exist. If MCR range is tight, we have no choice but to use a variable; if range is high then there exists competing, perhaps fairer models, that provide alternative explanations of the phenomena being examined. Applications are wide, from enabling construction of fairer' models in areas such as recidivism, health analytics and ethical marketing.

estimation, model class reliance, random forest, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.44)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.44)

Add feedback

Gradient Boosted Normalizing Flows

Neural Information Processing SystemsJan-16-2025, 18:50:13 GMT

By chaining a sequence of differentiable invertible transformations, normalizing flows (NF) provide an expressive method of posterior approximation, exact density evaluation, and sampling. The trend in normalizing flow literature has been to devise deeper, more complex transformations to achieve greater flexibility. We propose an alternative: Gradient Boosted Normalizing Flows (GBNF) model a density by successively adding new NF components with gradient boosting. Under the boosting framework, each new NF component optimizes a weighted likelihood objective, resulting in new components that are fit to the suitable residuals of the previously trained components. The GBNF formulation results in a mixture model structure, whose flexibility increases as more components are added.

complex transformation, gradient boosted normalizing flow, transformation, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Add feedback

Shape-Based Single Object Classification Using Ensemble Method Classifiers

Kamarudin, Nur Shazwani, Makhtar, Mokhairi, Shamsuddin, Syadiah Nor Wan, Fadzli, Syed Abdullah

arXiv.org Artificial IntelligenceJan-16-2025

Nowadays, more and more images are available. Annotation and retrieval of the images pose classification problems, where each class is defined as the group of database images labelled with a common semantic label. Various systems have been proposed for content-based retrieval, as well as for image classification and indexing. In this paper, a hierarchical classification framework has been proposed for bridging the semantic gap effectively and achieving multi-category image classification. A well known pre-processing and post-processing method was used and applied to three problems; image segmentation, object identification and image classification. The method was applied to classify single object images from Amazon and Google datasets. The classification was tested for four different classifiers; BayesNetwork (BN), Random Forest (RF), Bagging and Vote. The estimated classification accuracies ranged from 20% to 99% (using 10-fold cross validation). The Bagging classifier presents the best performance, followed by the Random Forest classifier.

classification, classifier, dataset, (13 more...)

arXiv.org Artificial Intelligence

2501.09311

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Oceania > New Zealand > North Island > Waikato (0.05)
North America > United States > New Jersey (0.04)
Asia > Malaysia (0.04)

Genre: Research Report (0.50)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision > Image Understanding (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.58)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.57)

Add feedback

Automating Credit Card Limit Adjustments Using Machine Learning

Pestana, Diego

arXiv.org Artificial IntelligenceJan-14-2025

Venezuelan banks have historically made credit card limit adjustment decisions manually through committees. However, since the number of credit card holders in Venezuela is expected to increase in the upcoming months due to economic improvements, manual decisions are starting to become unfeasible. In this project, a machine learning model that uses cost-sensitive learning is proposed to automate the task of handing out credit card limit increases. To accomplish this, several neural network and XGBoost models are trained and compared, leveraging Venezolano de Credito's data and using grid search with 10-fold cross-validation. The proposed model is ultimately chosen due to its superior balance of accuracy, cost-effectiveness, and interpretability. The model's performance is evaluated against the committee's decisions using Cohen's kappa coefficient, showing an almost perfect agreement.

adjustment, artificial intelligence, machine learning, (11 more...)

arXiv.org Artificial Intelligence

2501.10451

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
South America > Venezuela > Capital District > Caracas (0.04)
(4 more...)

Genre: Research Report (0.65)

Industry: Banking & Finance > Credit (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.73)

Add feedback

Quantized Training of Gradient Boosting Decision Trees

Neural Information Processing SystemsJan-13-2025, 14:11:58 GMT

Recent years have witnessed significant success in Gradient Boosting Decision Trees (GBDT) for a wide range of machine learning applications. Generally, a consensus about GBDT's training algorithms is gradients and statistics are computed based on high-precision floating points. In this paper, we investigate an essentially important question which has been largely ignored by the previous literature - how many bits are needed for representing gradients in training GBDT? To solve this mystery, we propose to quantize all the high-precision gradients in a very simple yet effective way in the GBDT's training algorithm. Surprisingly, both our theoretical analysis and empirical studies show that the necessary precisions of gradients without hurting any performance can be quite low, e.g., 2 or 3 bits.

decision tree, gbdt, quantized training, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.64)

Add feedback