AITopics

2106.0382

Country:

North America > United States > California > San Diego County > San Diego (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Sprangers, Olivier, Schelter, Sebastian, de Rijke, Maarten

Probabilistic Gradient Boosting Machines for Large-Scale Probabilistic Regression

arXiv.org Machine LearningJun-6-2021

Gradient Boosting Machines (GBM) are hugely popular for solving tabular data problems. However, practitioners are not only interested in point predictions, but also in probabilistic predictions in order to quantify the uncertainty of the predictions. Creating such probabilistic predictions is difficult with existing GBM-based solutions: they either require training multiple models or they become too computationally expensive to be useful for large-scale settings. We propose Probabilistic Gradient Boosting Machines (PGBM), a method to create probabilistic predictions with a single ensemble of decision trees in a computationally efficient manner. PGBM approximates the leaf weights in a decision tree as a random variable, and approximates the mean and variance of each sample in a dataset via stochastic tree ensemble update equations. These learned moments allow us to subsequently sample from a specified distribution after training. We empirically demonstrate the advantages of PGBM compared to existing state-of-the-art methods: (i) PGBM enables probabilistic estimates without compromising on point performance in a single model, (ii) PGBM learns probabilistic estimates via a single model only (and without requiring multi-parameter boosting), and thereby offers a speedup of up to several orders of magnitude over existing state-of-the-art methods on large datasets, and (iii) PGBM achieves accurate probabilistic estimates in tasks with complex differentiable loss functions, such as hierarchical time series problems, where we observed up to 10% improvement in point forecasting performance and up to 300% improvement in probabilistic forecasting performance.

dataset, pgbm, variance, (16 more...)

doi: 10.1145/3447548.3467278

2106.01682

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Netherlands > North Holland > Amsterdam (0.05)
Asia > Singapore (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

#artificialintelligenceJun-3-2021, 04:30:44 GMT

Battle of the Ensemble -- Random Forest vs Gradient Boosting

If you have spent some time in the world of machine learning, you would have undoubtedly heard of a concept called the bias-variance tradeoff. It is one of the most important concepts any machine learning practitioner should learn and be aware of. Essentially, the bias-variance tradeoff is a conundrum in machine learning which states that models with low bias will usually have high variance and vice versa. Bias is the difference between the actual value and the expected value predicted by the model. A model with a high bias is said to be oversimplified as a result, underfitting the data.

bias and variance, bias-variance tradeoff, ensemble method, (1 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

arXiv.org Machine LearningJun-3-2021

Gradient Boosted Binary Histogram Ensemble for Large-scale Regression

Hang, Hanyuan, Huang, Tao, Cai, Yuchao, Yang, Hanfang, Lin, Zhouchen

In this paper, we propose a gradient boosting algorithm for large-scale regression problems called \textit{Gradient Boosted Binary Histogram Ensemble} (GBBHE) based on binary histogram partition and ensemble learning. From the theoretical perspective, by assuming the H\"{o}lder continuity of the target function, we establish the statistical convergence rate of GBBHE in the space $C^{0,\alpha}$ and $C^{1,0}$, where a lower bound of the convergence rate for the base learner demonstrates the advantage of boosting. Moreover, in the space $C^{1,0}$, we prove that the number of iterations to achieve the fast convergence rate can be reduced by using ensemble regressor as the base learner, which improves the computational efficiency. In the experiments, compared with other state-of-the-art algorithms such as gradient boosted regression tree (GBRT), Breiman's forest, and kernel-based methods, our GBBHE algorithm shows promising performance with less running time on large-scale datasets.

algorithm, binary histogram, convergence rate, (14 more...)

2106.01986

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > California (0.04)
Asia > China > Beijing > Beijing (0.04)
(5 more...)

Genre: Research Report (1.00)

Industry: Energy (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

#artificialintelligenceJun-2-2021

Confidence Intervals for XGBoost - KDnuggets

Gradient Boosting methods are a very powerful tool for performing accurate predictions quickly, on large datasets, for complex variables that depend non linearly on a lot of features. The underlying mathematical principles are explained with code here. Moreover, it has been implemented in various ways: XGBoost, CatBoost, GradientBoostingRegressor, each having its own advantages, discussed here or here. Something these implementations all share is the ability to choose a given objective for training to minimize. And even more interesting is the fact that XGBoost and CatBoost offer easy support for a custom objective function.

confidence interval, objective, objective function, (14 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)

arXiv.org Artificial IntelligenceMay-31-2021

An exact counterfactual-example-based approach to tree-ensemble models interpretability

Blanchart, Pierre

Explaining the decisions of machine learning models is becoming a necessity in many areas where trust in ML models decision is key to their accreditation/adoption. The ability to explain models decisions also allows to provide diagnosis in addition to the model decision, which is highly valuable in scenarios such as fault detection. Unfortunately, high-performance models do not exhibit the necessary transparency to make their decisions fully understandable. And the black-boxes approaches, which are used to explain such model decisions, suffer from a lack of accuracy in tracing back the exact cause of a model decision regarding a given input. Indeed, they do not have the ability to explicitly describe the decision regions of the model around that input, which is necessary to determine what influences the model towards one decision or the other. We thus asked ourselves the question: is there a category of high-performance models among the ones currently used for which we could explicitly and exactly characterise the decision regions in the input feature space using a geometrical characterisation? Surprisingly we came out with a positive answer for any model that enters the category of tree ensemble models, which encompasses a wide range of high-performance models such as XGBoost, LightGBM, random forests ... We could derive an exact geometrical characterisation of their decision regions under the form of a collection of multidimensional intervals. This characterisation makes it straightforward to compute the optimal counterfactual (CF) example associated with a query point. We demonstrate several possibilities of the approach, such as computing the CF example based only on a subset of features. This allows to obtain more plausible explanations by adding prior knowledge about which variables the user can control. An adaptation to CF reasoning on regression problems is also envisaged.

cf example, decomposition, dimension, (16 more...)

arXiv.org Artificial Intelligence

2105.1482

Country:

North America > United States > Iowa > Story County > Ames (0.04)
Europe > France (0.04)

Genre: Research Report (0.82)

Industry: Banking & Finance > Real Estate (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

#artificialintelligenceMay-30-2021, 07:31:06 GMT

Introducing TensorFlow Decision Forests

We are happy to open source TensorFlow Decision Forests (TF-DF). TF-DF is a collection of production-ready state-of-the-art algorithms for training, serving and interpreting decision forest models (including random forests and gradient boosted trees). You can now use these models for classification, regression and ranking tasks - with the flexibility and composability of the TensorFlow and Keras. Decision forests are a family of machine learning algorithms with quality and speed competitive with (and often favorable to) neural networks, especially when you're working with tabular data. They're built from many decision trees, which makes them easy to use and understand - and you can take advantage of a plethora of interpretability tools and techniques that already exist today.

hyperparameter, input feature, tensorflow decision forest, (8 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.58)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.58)

Haben, Stephen, Arora, Siddharth, Giasemidis, Georgios, Voss, Marcus, Greetham, Danica Vukadinovic

Review of Low-Voltage Load Forecasting: Methods, Applications, and Recommendations

arXiv.org Machine LearningMay-30-2021

The increased digitalisation and monitoring of the energy system opens up numerous opportunities % and solutions which can help to decarbonise the energy system. Applications on low voltage (LV), localised networks, such as community energy markets and smart storage will facilitate decarbonisation, but they will require advanced control and management. Reliable forecasting will be a necessary component of many of these systems to anticipate key features and uncertainties. Despite this urgent need, there has not yet been an extensive investigation into the current state-of-the-art of low voltage level forecasts, other than at the smart meter level. This paper aims to provide a comprehensive overview of the landscape, current approaches, core applications, challenges and recommendations. Another aim of this paper is to facilitate the continued improvement and advancement in this area. To this end, the paper also surveys some of the most relevant and promising trends. It establishes an open, community-driven list of the known LV level open datasets to encourage further research and development.

forecast, renewable energy, upstream oil & gas, (28 more...)

2106.00006

Country:

North America > United States (1.00)
Asia > China (0.67)
Europe > Germany (0.46)
(14 more...)

Genre:

Overview (1.00)
Research Report > Experimental Study (0.92)

Industry:

Energy > Renewable > Solar (1.00)
Energy > Power Industry (1.00)
Energy > Energy Storage (0.93)
(2 more...)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Data Science > Data Quality (1.00)
Information Technology > Data Science > Data Mining (1.00)
(15 more...)

#artificialintelligenceMay-28-2021, 20:55:44 GMT

Ensemble Machine Learning With Python (7-Day Mini-Course)

Ensemble learning refers to machine learning models that combine the predictions from two or more models. Ensembles are an advanced approach to machine learning that are often used when the capability and skill of the predictions are more important than using a simple and understandable model. As such, they are often used by top and winning participants in machine learning competitions like the One Million Dollar Netflix Prize and Kaggle Competitions. Modern machine learning libraries like scikit-learn Python provide a suite of advanced ensemble learning methods that are easy to configure and use correctly without data leakage, a common concern when using ensemble algorithms. In this crash course, you will discover how you can get started and confidently bring ensemble learning algorithms to your predictive modeling project with Python in seven days. This is a big and important post.

decision tree, ensemble, prediction, (14 more...)

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

#artificialintelligenceMay-26-2021, 14:16:21 GMT

How to use PyCaret -- the library for low-code ML

When we approach supervised machine learning problems, it can be tempting to just see how a random forest or gradient boosting model performs and stop experimenting if we are satisfied with the results. What if you could compare many different models with just one line of code? What if you could reduce each step of the data science process from feature engineering to model deployment to just a few lines of code? This is exactly where PyCaret comes into play. PyCaret is a high-level, low-code Python library that makes it easy to compare, train, evaluate, tune, and deploy machine learning models with only a few lines of code.

library, model function, pycaret, (17 more...)

Country: North America > United States > California (0.05)

Genre: Workflow (0.49)

Industry: Education (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.69)