AITopics | Ensemble Learning

Collaborating Authors

Ensemble Learning

Ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Predictive Modeling: Picking the best model – Towards Data Science

#artificialintelligenceFeb-11-2019, 02:00:28 GMT

Whether you are working on predicting data in an office setting or just competing in a Kaggle competition, it's important to test out different models to find the best fit for the data you are working with. I recently had the opportunity to compete with some very smart colleagues in a private Kaggle competition predicting faulty water pumps in Tanzania. I ran the following models after doing some data cleaning and I'll show you the results. First, we need to take a look at the data we're working with. In this particular data set, the features were in a separate file than the labels.

artificial intelligence, machine learning, train and test, (13 more...)

#artificialintelligence

Country: Africa > Tanzania (0.25)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.53)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.35)

Add feedback

Artificial Intelligence and Machine Learning to Predict and Improve Efficiency in Manufacturing Industry

Hassani, Ibtissam El, Mazgualdi, Choumicha El, Masrour, Tawfik

arXiv.org Machine LearningFeb-3-2019

The overall equipment effectiveness (OEE) is a performance measurement metric widely used. Its calculation provides to the managers the possibility to identify the main losses that reduce the machine effectiveness and then take the necessary decisions in order to improve the situation. However, this calculation is done a-posterior which is often too late. In the present research, we implemented different Machine Learning algorithms namely; Support vector machine, Optimized Support vector Machine (using Genetic Algorithm), Random Forest, XGBoost and Deep Learning to predict the estimate OEE value. The data used to train our models was provided by an automotive cable production industry. The results show that the Deep Learning and Random Forest are more accurate and present better performance for the prediction of the overall equipment effectiveness in our case study.

algorithm, learning, prediction, (14 more...)

arXiv.org Machine Learning

1901.02256

Country:

Africa > Middle East > Morocco (0.05)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(3 more...)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Add feedback

Optimal Minimal Margin Maximization with Boosting

Grønlund, Allan, Larsen, Kasper Green, Mathiasen, Alexander

arXiv.org Machine LearningJan-30-2019

Boosting algorithms produce a classifier by iteratively combining base hypotheses. It has been observed experimentally that the generalization error keeps improving even after achieving zero training error. One popular explanation attributes this to improvements in margins. A common goal in a long line of research, is to maximize the smallest margin using as few base hypotheses as possible, culminating with the AdaBoostV algorithm by (R{\"a}tsch and Warmuth [JMLR'04]). The AdaBoostV algorithm was later conjectured to yield an optimal trade-off between number of hypotheses trained and the minimal margin over all training points (Nie et al. [JMLR'13]). Our main contribution is a new algorithm refuting this conjecture. Furthermore, we prove a lower bound which implies that our new algorithm is optimal.

algorithm, classifier, hypothesis, (17 more...)

arXiv.org Machine Learning

1901.10789

Country: North America > United States > Texas > Travis County > Austin (0.04)

Genre:

Research Report (0.64)
Workflow (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Gradient Regularized Budgeted Boosting

Xu, Zhixiang Eddie, Kusner, Matt J., Weinberger, Kilian Q., Zheng, Alice X.

arXiv.org Machine LearningJan-26-2019

As machine learning transitions increasingly towards real world applications controlling the test-time cost of algorithms becomes more and more crucial. Recent work, such as the Greedy Miser and Speedboost, incorporate test-time budget constraints into the training procedure and learn classifiers that provably stay within budget (in expectation). However, so far, these algorithms are limited to the supervised learning scenario where sufficient amounts of labeled data are available. In this paper we investigate the common scenario where labeled data is scarce but unlabeled data is available in abundance. We propose an algorithm that leverages the unlabeled data (through Laplace smoothing) and learns classifiers with budget constraints. Our model, based on gradient boosted regression trees (GBRT), is, to our knowledge, the first algorithm for semi-supervised budgeted learning.

algorithm, regularization, unlabeled input, (16 more...)

arXiv.org Machine Learning

1901.04065

Country:

North America > United States > Missouri > St. Louis County > St. Louis (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.57)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.35)

Add feedback

Faster Boosting with Smaller Memory

Alafate, Julaiti, Freund, Yoav

arXiv.org Machine LearningJan-25-2019

The two state-of-the-art implementations of boosted trees: XGBoost and LightGBM, can process large training sets extremely fast. However, this performance requires that memory size is sufficient to hold a 2-3 multiple of the training set size. This paper presents an alternative approach to implementing boosted trees. which achieves a significant speedup over XGBoost and LightGBM, especially when memory size is small. This is achieved using a combination of two techniques: early stopping and stratified sampling, which are explained and analyzed in the paper. We describe our implementation and present experimental results to support our claims.

dataset, lightgbm, sparrow, (14 more...)

arXiv.org Machine Learning

1901.09047

Country:

Asia > Japan (0.14)
North America > United States > California > San Diego County > San Diego (0.04)
Oceania > New Zealand > North Island > Waikato (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.91)

Add feedback

SecureBoost: A Lossless Federated Learning Framework

Cheng, Kewei, Fan, Tao, Jin, Yilun, Liu, Yang, Chen, Tianjian, Yang, Qiang

arXiv.org Machine LearningJan-25-2019

The protection of user privacy is an important concern in machine learning, as evidenced by the rolling out of the General Data Protection Regulation (GDPR) in the European Union (EU) in May 2018. The GDPR is designed to give users more control over their personal data, which motivates us to explore machine learning frameworks with data sharing without violating user privacy. To meet this goal, in this paper, we propose a novel lossless privacy-preserving tree-boosting system known as SecureBoost in the setting of federated learning. This federated-learning system allows a learning process to be jointly conducted over multiple parties with partially common user samples but different feature sets, which corresponds to a vertically partitioned virtual data set. An advantage of SecureBoost is that it provides the same level of accuracy as the non-privacy-preserving approach while at the same time, reveal no information of each private data provider. We theoretically prove that the SecureBoost framework is as accurate as other non-federated gradient tree-boosting algorithms that bring the data into one place. In addition, along with a proof of security, we discuss what would be required to make the protocols completely secure.

information, passive party, secureboost, (15 more...)

arXiv.org Machine Learning

1901.08755

Country:

Europe (0.66)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
Asia > China > Hong Kong (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Regional Government > Europe Government (0.48)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.70)

Add feedback

A XGBoost risk model via feature selection and Bayesian hyper-parameter optimization

Wang, Yan, Ni, Xuelei Sherry

arXiv.org Machine LearningJan-24-2019

This paper aims to explore models based on the extreme gradient boosting (XGBoost) approach for business risk classification. Feature selection (FS) algorithms and hyper-parameter optimizations are simultaneously considered during model training. The five most commonly used FS methods including weight by Gini, weight by Chi-square, hierarchical variable clustering, weight by correlation, and weight by information are applied to alleviate the effect of redundant features. Two hyper-parameter optimization approaches, random search (RS) and Bayesian tree-structured Parzen Estimator (TPE), are applied in XGBoost. The effect of different FS and hyper-parameter optimization methods on the model performance are investigated by the Wilcoxon Signed Rank Test. The performance of XGBoost is compared to the traditionally utilized logistic regression (LR) model in terms of classification accuracy, area under the curve (AUC), recall, and F1 score obtained from the 10-fold cross validation. Results show that hierarchical clustering is the optimal FS method for LR while weight by Chi-square achieves the best performance in XG-Boost. Both TPE and RS optimization in XGBoost outperform LR significantly. TPE optimization shows a superiority over RS since it results in a significantly higher accuracy and a marginally higher AUC, recall and F1 score. Furthermore, XGBoost with TPE tuning shows a lower variability than the RS method. Finally, the ranking of feature importance based on XGBoost enhances the model interpretation. Therefore, XGBoost with Bayesian TPE hyper-parameter optimization serves as an operative while powerful approach for business risk modeling.

fs method, optimization, xgboost, (16 more...)

arXiv.org Machine Learning

1901.08433

Country: North America > United States > Utah (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Banking & Finance > Credit (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Add feedback

Gentle Introduction of XGBoost Library – Mohit Sharma – Medium

#artificialintelligenceJan-20-2019, 00:02:02 GMT

In this article, you will discover XGBoost and get a gentle introduction to what it is, where it came from and how you can learn more. Bagging: It is an approach where you take random samples of data, build learning algorithms and take simple means to find bagging probabilities. Boosting: Boosting is similar, however, the selection of sample is made more intelligently. We subsequently give more and more weight to hard to classify observations. XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable.

artificial intelligence, machine learning, xgboost, (16 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Add feedback

Machine learning for predicting thermal power consumption of the Mars Express Spacecraft

Petković, Matej, Boumghar, Redouane, Breskvar, Martin, Džeroski, Sašo, Kocev, Dragi, Levatić, Jurica, Lucas, Luke, Osojnik, Aljaž, Ženko, Bernard, Simidjievski, Nikola

arXiv.org Machine LearningJan-16-2019

The thermal subsystem of the Mars Express (MEX) spacecraft keeps the on-board equipment within its pre-defined operating temperatures range. To plan and optimize the scientific operations of MEX, its operators need to estimate in advance, as accurately as possible, the power consumption of the thermal subsystem. The remaining power can then be allocated for scientific purposes. We present a machine learning pipeline for efficiently constructing accurate predictive models for predicting the power of the thermal subsystem on board MEX. In particular, we employ state-of-the-art feature engineering approaches for transforming raw telemetry data, in turn used for constructing accurate models with different state-of-the-art machine learning methods. We show that the proposed pipeline considerably improve our previous (competition-winning) work in terms of time efficiency and predictive performance. Moreover, while achieving superior predictive performance, the constructed models also provide important insight into the spacecraft's behavior, allowing for further analyses and optimal planning of MEX's operation.

artificial intelligence, ensemble, machine learning, (16 more...)

arXiv.org Machine Learning

1809.00542

Country:

Europe > France (0.14)
Europe > Slovenia (0.14)
Europe > Belgium (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.71)

Add feedback

Gradient Boosted Feature Selection

Xu, Zhixiang Eddie, Huang, Gao, Weinberger, Kilian Q., Zheng, Alice X.

arXiv.org Machine LearningJan-13-2019

A feature selection algorithm should ideally satisfy four conditions: reliably extract relevant features; be able to identify non-linear feature interactions; scale linearly with the number of features and dimensions; allow the incorporation of known sparsity structure. In this work we propose a novel feature selection algorithm, Gradient Boosted Feature Selection (GBFS), which satisfies all four of these requirements. The algorithm is flexible, scalable, and surprisingly straight-forward to implement as it is based on a modification of Gradient Boosted Trees. We evaluate GBFS on several real world data sets and show that it matches or out-performs other state of the art feature selection algorithms. Yet it scales to larger data set sizes and naturally allows for domain-specific side information.

algorithm, feature selection, selection, (12 more...)

arXiv.org Machine Learning

1901.04055

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > San Mateo County > Menlo Park (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.66)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Add feedback