Collaborating Authors

Ensemble Learning

XGBoost: its present-day powers and use cases


Originally published on Towards AI the World's Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses. It's free, we don't spam, and we never share your email address.

Retrain, or not Retrain? Online Machine Learning with Gradient Boosting


Training a machine learning model requires energy, time, and patience. Smart data scientists organize experiments and track trials on the historical data to deploy the best solution. Problems may arise when we pass newly available samples to our pre-build machine learning pipeline. In the case of predictive algorithms, the registered performances may diverge from the expected ones. The causes behind discrepancies are variegated.

XGBoost Alternative Base Learners


XGBoost, short for "Extreme Gradient Boosting," is one of the strongest machine learning algorithms for handling tabular data, a well-deserved reputation due to its success in winning numerous Kaggle competitions. XGBoost is an ensemble machine learning algorithm that usually consists of Decision Trees. The Decision Trees that make up XGBoost are individually referred to as gbtree, short for "gradient boosted tree." The first Decision Tree in the XGBoost ensemble is the base learner whose mistakes all subsequent trees learn from. Although Decision Trees are generally preferred as base learners due to their excellent ensemble scores, in some cases, alternative base learners may outperform them.

ICMAB - Defining inkjet printing conditions of superconducting cuprate films through machine learning


The design and optimization of new processing approaches for the development of rare earth cuprate (REBCO) high temperature superconductors is required to increase their cost-effective fabrication and promote market implementation. The exploration of a broad range of parameters enabled by these methods is the ideal scenario for a new set of high-throughput experimentation (HTE) and data-driven tools based on machine learning (ML) algorithms that are envisaged to speed up this optimization in a low-cost and efficient manner compatible with industrialization. In this work, we developed a data-driven methodology that allows us to analyze and optimize the inkjet printing (IJP) deposition process of REBCO precursor solutions. A dataset containing 231 samples was used to build ML models. Linear and tree-based (Random Forest, AdaBoost and Gradient Boosting) regression algorithms were compared, reaching performances above 87%.

Know About Ensemble Methods in Machine Learning - Analytics Vidhya


This article was published as a part of the Data Science Blogathon. The variance is the difference between the model and the ground truth value, whereas the error is the outcome of sensitivity to tiny perturbations in the training set. Excessive bias might cause an algorithm to miss unique relationships between the intended outputs and the features (underfitting). There is a high variance in the algorithm that models random noise in the training data (overfitting). The bias-variance tradeoff is a characteristic of a model that states to lower the bias in estimated parameters, the variance of the parameter estimated across samples has increased.

Techinfoplace Softwares Pvt Ltd.


Problem Statement A target marketing campaign for a bank was undertaken to identify a segment of customers who are likely to respond to an insurance product. Here, the target variable is whether or not the customers bought insurance product and it depends on factors like Product usage in three months, demographics, transaction patterns as like deposit amount, checking account, a branch of the bank, Residential information (like urban, rural) and so on.

Pruned Random Forests for Effective and Efficient Financial Data Analytics


It is evident that Machine Learning (ML) has touched all walks of our lives! From checking the weather forecast to applying for a loan or a credit card, ML is used in almost every aspect of our daily life. In this chapter, ML is explored in terms of algorithms and applications. Special consideration is given to ML applications in the financial data analytics domain including stock market analysis, fraud detection in financial transactions, credit risk analysis, loan defaulting rate analysis, and profit–loss analysis. The chapter establishes the significance of Random Forests as an effective machine learning method for a wide variety of financial applications.

Bojan Tunguz, Ph.D. on LinkedIn: #MachineLerning #DeepLearning #DataScience


Recently I came across this incredible survey paper on the use of neural networks for tabular data. After going through it carefully, I can confidently say that it's thus far THE best paper on the subject. It goes into depth of all the main issues that have stymied the use of NNs in this domain. The paper is very thoughtful, systematic, and fairly thorough. Despite what the authors claim, though, it is not the first paper on the topic, but it goes well beyond many recent papers on the subjects. It also doesn't have as an exhaustive set of datasets that it uses as some of the other papers.

2022 Machine Learning A to Z : 5 Machine Learning Projects


Evaluation metrics to analyze the performance of models. Different methods to deal with imbalanced data. Implementation of Content and Collaborative based filtering. Implementation of Different algorithms used for Time Series forecasting. Evaluation metrics to analyze the performance of models.