AITopics | Ensemble Learning

Collaborating Authors

Ensemble Learning

Ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

RoNGBa: A Robustly Optimized Natural Gradient Boosting Training Approach with Leaf Number Clipping

Ren, Liliang, Sun, Gen, Wu, Jiaman

arXiv.org Machine LearningDec-4-2019

Natural gradient has been recently introduced to the field of boosting to enable the generic probabilistic predication capability. Natural gradient boosting shows promising performance improvements on small datasets due to better training dynamics, but it suffers from slow training speed overhead especially for large datasets. W e present a replication study of NGBoost ( Duan et al., 2019) training that carefully examines the impacts of key hyper-parameters under the circumstance of best-first decision tree learning. W e find that with the regularization of leaf number clipping, the performance of NGBoost can be largely improved via a better choice of hyperparameters. Experiments show that our approach significantly beats the state-of-the-art performance on various kinds of datasets from the UCI Machine Learning Repository while still has up to 4.85x speed up compared with the original approach of NGBoost.

dataset, gradient, ngboost, (13 more...)

arXiv.org Machine Learning

1912.02338

Country: North America > United States > California > San Diego County > San Diego (0.04)

Genre: Research Report (0.71)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.93)

Add feedback

Predicting Airbnb prices with machine learning and location data

#artificialintelligenceDec-3-2019, 00:05:59 GMT

As part of the IBM Data Science Professional Certificate, we get to have a go at our very own Data Science Capstone, where we get a taste of what is like to solve problems and answer questions like a data scientist. For my assignment, I decided to do yet another project that looks into the relationship between Airbnb prices and its determinants. Yes, there are several very cool ones like Laura Lewis's here. I would not have been able to do mine without reading and understanding hers (and her code), so kudos! However, being that I'm all about transportation research, I added a little touch of geospatial analysis by looking into locational features as possible predictors. This post explains a bit of the project background, data collection, cleaning and pre-processing, modeling, and a quick wrap up. For the complete notebook with all the code, you can check out the repo on my Github.

accessibility, neighbourhood, venue, (14 more...)

#artificialintelligence

Country: Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)

Industry:

Consumer Products & Services > Hotels (0.67)
Consumer Products & Services > Food, Beverage, Tobacco & Cannabis (0.48)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.31)

Add feedback

XGBOD: Improving Supervised Outlier Detection with Unsupervised Representation Learning

Zhao, Yue, Hryniewicki, Maciej K.

arXiv.org Machine LearningNov-30-2019

A new semi-supervised ensemble algorithm called XGBOD (Extreme Gradient Boosting Outlier Detection) is proposed, described and demonstrated for the enhanced detection of outliers from normal observations in various practical datasets. The proposed framework combines the strengths of both supervised and unsupervised machine learning methods by creating a hybrid approach that exploits each of their individual performance capabilities in outlier detection. XGBOD uses multiple unsupervised outlier mining algorithms to extract useful representations from the underlying data that augment the predictive capabilities of an embedded supervised classifier on an improved feature space. The novel approach is shown to provide superior performance in comparison to competing individual detectors, the full ensemble and two existing representation learning based algorithms across seven outlier datasets.

algorithm, comb, feature space, (12 more...)

arXiv.org Machine Learning

1912.0029

Country: North America > Canada > Ontario > Toronto (0.14)

Genre:

Research Report > New Finding (0.69)
Research Report > Experimental Study (0.47)

Industry:

Education (0.48)
Health & Medicine (0.31)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Add feedback

SecureGBM: Secure Multi-Party Gradient Boosting

Fengy, Zhi, Xiong, Haoyi, Song, Chuanyuan, Yang, Sijia, Zhao, Baoxin, Wang, Licheng, Chen, Zeyu, Yang, Shengwen, Liu, Liping, Huan, Jun

arXiv.org Machine LearningNov-27-2019

Federated machine learning systems have been widely used to facilitate the joint data analytics across the distributed datasets owned by the different parties that do not trust each others. In this paper, we proposed a novel Gradient Boosting Machines (GBM) framework SecureGBM built-up with a multi-party computation model based on semi-homomorphic encryption, where every involved party can jointly obtain a shared Gradient Boosting machines model while protecting their own data from the potential privacy leakage and inferential identification. More specific, our work focused on a specific "dual--party" secure learning scenario based on two parties -- both party own an unique view (i.e., attributes or features) to the sample group of samples while only one party owns the labels. In such scenario, feature and label data are not allowed to share with others. To achieve the above goal, we firstly extent -- LightGBM -- a well known implementation of tree-based GBM through covering its key operations for training and inference with SEAL homomorphic encryption schemes. However, the performance of such re-implementation is significantly bottle-necked by the explosive inflation of the communication payloads, based on ciphertexts subject to the increasing length of plaintexts. In this way, we then proposed to use stochastic approximation techniques to reduced the communication payloads while accelerating the overall training procedure in a statistical manner. Our experiments using the real-world data showed that SecureGBM can well secure the communication and computation of LightGBM training and inference procedures for the both parties while only losing less than 3% AUC, using the same number of iterations for gradient boosting, on a wide range of benchmark datasets.

algorithm, dataset, securegbm, (15 more...)

arXiv.org Machine Learning

1911.11997

Country:

Asia > China > Beijing > Beijing (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
South America > Paraguay > Asunción > Asunción (0.04)
North America > United States > California (0.04)

Genre: Research Report > New Finding (0.68)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Add feedback

An Alternative Cross Entropy Loss for Learning-to-Rank

Bruch, Sebastian

arXiv.org Machine LearningNov-26-2019

Listwise learning-to-rank methods form a powerful class of ranking algorithms that are widely adopted in applications such as information retrieval. These algorithms learn to rank a set of items by optimizing a loss that is a function of the entire set---as a surrogate to a typically non-differentiable ranking metric. Despite their empirical success, existing listwise methods are based on heuristics and remain theoretically ill-understood. In particular, none of the empirically-successful loss functions are related to ranking metrics. In this work, we propose a cross entropy-based learning-to-rank loss function that is theoretically sound and is a convex bound on NDCG, a popular ranking metric. Furthermore, empirical evaluation of an implementation of the proposed method with gradient boosting machines on benchmark learning-to-rank datasets demonstrates the superiority of our proposed formulation over existing algorithms in quality and robustness.

loss function, mart, ndcg, (13 more...)

arXiv.org Machine Learning

1911.09798

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.34)

Add feedback

Neural Random Forest Imitation

Reinders, Christoph, Rosenhahn, Bodo

arXiv.org Machine LearningNov-25-2019

Existing methods produce very inefficient architectures and do not scale. In this paper, we introduce a new method for generating data from a random forest and learning a neural network that imitates it. Without any additional training data, this transformation creates very efficient neural networks that learn the decision boundaries of a random forest. The generated model is fully differentiable and can be combined with the feature extraction in a single pipeline enabling further end-to-end processing. Experiments on several real-world benchmark datasets demonstrate outstanding performance in terms of scalability, accuracy, and learning with very few training examples. Compared to state-of-the-art mappings, we significantly reduce the network size while achieving the same or even improved accuracy due to better generalization.

decision tree, neural network, random forest, (14 more...)

arXiv.org Machine Learning

1911.10829

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Germany > Lower Saxony > Hanover (0.04)
Asia > China (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Tuning Random Forest on Time Series Data STATWORX

#artificialintelligenceNov-23-2019, 12:31:40 GMT

I am a data scientist at STATWORX, and I enjoy making data make sense.

hyperparameter, time series data, time sery, (13 more...)

#artificialintelligence

Country:

North America > United States > New York (0.05)
Europe > Switzerland > Zürich > Zürich (0.05)
Europe > Austria > Vienna (0.05)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.44)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.44)

Add feedback

A Fast Sampling Gradient Tree Boosting Framework

Zhou, Daniel Chao, Jin, Zhongming, Zhang, Tong

arXiv.org Machine LearningNov-20-2019

As an adaptive, interpretable, robust, and accurate meta-algorithm for arbitrary differentiable loss functions, gradient tree boosting is one of the most popular machine learning techniques, though the computational expensiveness severely limits its usage. Stochastic gradient boosting could be adopted to accelerates gradient boosting by uniformly sampling training instances, but its estimator could introduce a high variance. This situation arises motivation for us to optimize gradient tree boosting. We combine gradient tree boosting with importance sampling, which achieves better performance by reducing the stochastic variance. Furthermore, we use a regularizer to improve the diagonal approximation in the Newton step of gradient boosting. The theoretical analysis supports that our strategies achieve a linear convergence rate on logistic loss. Empirical results show that our algorithm achieves a 2.5x--18x acceleration on two different gradient boosting algorithms (LogitBoost and LambdaMART) without appreciable performance loss.

algorithm, gradient, gradient tree, (16 more...)

arXiv.org Machine Learning

1911.0882

Country: North America > United States (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.36)

Add feedback

LionForests: Local Interpretation of Random Forests through Path Selection

Mollas, Ioannis, Tsoumakas, Grigorios, Bassiliades, Nick

arXiv.org Artificial IntelligenceNov-20-2019

Towards a future where machine learning systems will integrate into every aspect of people's lives, researching methods to interpret such systems is necessary, instead of focusing exclusively on enhancing their performance. Enriching the trust between these systems and people will accelerate this integration process. Many medical and retail banking/finance applications use state-of-the-art machine learning techniques to predict certain aspects of new instances. Tree ensembles, like random forests, are widely acceptable solutions on these tasks, while at the same time they are avoided due to their black-box uninterpretable nature, creating an unreasonable paradox. In this paper, we provide a sequence of actions for shedding light on the predictions of the misjudged family of tree ensemble algorithms. Using classic unsupervised learning techniques and an enhanced similarity metric, to wander among transparent trees inside a forest following breadcrumbs, the interpretable essence of tree ensembles arises. An explanation provided by these systems using our approach, which we call "LionForests", can be a simple, comprehensive rule.

association rule, explanation, prediction, (14 more...)

arXiv.org Artificial Intelligence

1911.0878

Country:

North America > United States (0.14)
North America > Jamaica (0.04)
Europe > Greece > Central Macedonia > Thessaloniki (0.04)
(30 more...)

Genre: Research Report (0.50)

Industry:

Information Technology > Security & Privacy (0.93)
Health & Medicine > Therapeutic Area (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.73)
(2 more...)

Add feedback

A model for predicting price polarity of real estate properties using information of real estate market websites

Vargas-Calderón, Vladimir, Camargo, Jorge E.

arXiv.org Machine LearningNov-19-2019

November 20, 2019 A BSTRACT This paper presents a model that uses the information that sellers publish in real estate market websites to predict whether a property has higher or lower price than the average price of its similar properties. The model learns the correlation between price and information (text descriptions and features) of real estate properties through automatic identification of latent semantic content given by a machine learning model based on doc2vec and xgboost. The proposed model was evaluated with a data set of 57,516 publications of real estate properties collected from 2016 to 2018 of Bogot a city. Results show that the accuracy of a classifier that involves text descriptions is slightly higher than a classifier that only uses features of the real estate properties, as text descriptions tends to contain detailed information about the property. K eywords housing price prediction · real estate property · machine learning · doc2vec · xgboost 1 Introduction A fairly popular way for property sellers to advertise a property for sale is through a real estate market website which guarantees many more possible buyers than just the street for sale sign.

accuracy, information, text description, (14 more...)

arXiv.org Machine Learning

1911.08382

Country:

South America > Colombia (0.14)
Asia > Singapore (0.04)
North America > United States (0.04)
(2 more...)

Genre: Research Report > New Finding (0.34)

Industry: Banking & Finance > Real Estate (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.88)

Add feedback