AITopics

Industry:

Banking & Finance (0.48)
Information Technology (0.30)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.37)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.30)

#artificialintelligenceNov-6-2019, 07:13:47 GMT

New version of SageMaker XGBoost algorithm available

Customers can now use a new version of the SageMaker XGBoost algorithm that is based on version 0.90 of the open-sourced XGBoost framework. XGBoost is a highly efficient and flexible algorithm for problems in regression, classification, and ranking.

new version, sagemaker xgboost algorithm

Industry: Retail > Online (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Bentéjac, Candice, Csörgő, Anna, Martínez-Muñoz, Gonzalo

A Comparative Analysis of XGBoost

arXiv.org Machine LearningNov-5-2019

XGBoost is a scalable ensemble technique based on gradient boosting that has demonstrated to be a reliable and efficient machine learning challenge solver. This work proposes a practical analysis of how this novel technique works in terms of training speed, generalization performance and parameter setup. In addition, a comprehensive comparison between XGBoost, random forests and gradient boosting has been performed using carefully tuned models as well as using the default settings. The results of this comparison may indicate that XGBoost is not necessarily the best choice under all circumstances. Finally an extensive analysis of XGBoost parametrization tuning process is carried out.

artificial intelligence, health & medicine, xgboost, (18 more...)

1911.01914

Country:

North America > United States > New York (0.14)
Europe > Spain (0.14)
Europe > France (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area (0.48)
Energy > Oil & Gas (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Alessandretti, Laura, Baronchelli, Andrea, He, Yang-Hui

Machine Learning meets Number Theory: The Data Science of Birch-Swinnerton-Dyer

arXiv.org Machine LearningNov-4-2019

Empirical analysis is often the first step towards the birth of a conjecture. This is the case of the Birch-Swinnerton-Dyer (BSD) Conjecture describing the rational points on an elliptic curve, one of the most celebrated unsolved problems in mathematics. Here we extend the original empirical approach, to the analysis of the Cremona database of quantities relevant to BSD, inspecting more than 2.5 million elliptic curves by means of the latest techniques in data science, machine-learning and topological data analysis. Key quantities such as rank, Weierstrass coefficients, period, conductor, Tamagawa number, regulator and order of the Tate-Shafarevich group give rise to a high-dimensional point-cloud whose statistical properties we investigate. We reveal patterns and distributions in the rank versus Weierstrass coefficients, as well as the Beta distribution of the BSD ratio of the quantities. Via gradient boosted trees, machine learning is applied in finding inter-correlation amongst the various quantities. We anticipate that our approach will spark further research on the statistical properties of large datasets in Number Theory and more in general in pure Mathematics.

coefficient, elliptic curve, quantity, (14 more...)

1911.02008

Country:

Europe > Denmark > Capital Region > Copenhagen (0.04)
North America > United States (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(2 more...)

Genre:

Research Report > Experimental Study (0.46)
Research Report > New Finding (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.69)

#artificialintelligenceNov-2-2019, 20:38:03 GMT

XGBoost in Amazon SageMaker

SageMaker is Amazon Web Services' (AWS) machine learning platform that works in the cloud. It is fully-managed and allows one to perform an entire data science workflow on the platform. And in this post, I will show you how to call your data from AWS S3, upload your data into S3 and bypassing local storage, train a model, deploy an endpoint, perform predictions, and perform hyperparameter tuning. The data cleaning and feature engineering code are derived from this blog post, which is written by Andrew Long, who gave full permission to use his code. The dataset can be found here. Head over to your AWS dashboard and find SageMaker, and on the left sidebar, click on Notebook instances .

amazon sagemaker, sagemaker, xgboost, (1 more...)

Industry: Information Technology (0.40)

Technology:

Information Technology > Data Science (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.40)

Mentch, Lucas, Zhou, Siyu

Randomization as Regularization: A Degrees of Freedom Explanation for Random Forest Success

arXiv.org Machine LearningOct-31-2019

Random forests remain among the most popular off-the-shelf supervised machine learning tools with a well-established track record of predictive accuracy in both regression and classification settings. Despite their empirical success as well as a bevy of recent work investigating their statistical properties, a full and satisfying explanation for their success has yet to be put forth. Here we aim to take a step forward in this direction by demonstrating that the additional randomness injected into individual trees serves as a form of implicit regularization, making random forests an ideal model in low signal-to-noise ratio (SNR) settings. Specifically, from a model-complexity perspective, we show that the mtry parameter in random forests serves much the same purpose as the shrinkage penalty in explicitly regularized regression procedures like lasso and ridge regression. To highlight this point, we design a randomized linear-model-based forward selection procedure intended as an analogue to tree-based random forests and demonstrate its surprisingly strong empirical performance. Numerous demonstrations on both real and synthetic data are provided.

mtry, procedure, random forest, (16 more...)

1911.0019

Country:

Oceania > Australia > Tasmania (0.04)
North America > United States > New York (0.04)

Genre: Research Report (0.81)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.86)

#artificialintelligenceOct-29-2019, 22:52:43 GMT

SAS Tutorial How to train forest models in SAS

In this SAS How To Tutorial, Cat Truxillo shows you how to train forest models in SAS. There are multiple ways to train forest models. Cat will show you how to train a forest using two different point-and-click methods. The first method uses SAS Visual Analytics while in the second example, Cat trains a forest in Model Studio, using SAS Viya. Before diving into the examples of how to create a forest model, Cat explains random forest and answers the question "what are random forests?".

sas, sas tutorial, train forest model, (6 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.60)

Ibragimov, Bulat, Gusev, Gleb

Minimal Variance Sampling in Stochastic Gradient Boosting

arXiv.org Machine LearningOct-29-2019

Stochastic Gradient Boosting (SGB) is a widely used approach to regularization of boosting models based on decision trees. It was shown that, in many cases, random sampling at each iteration can lead to better generalization performance of the model and can also decrease the learning time. Different sampling approaches were proposed, where probabilities are not uniform, and it is not currently clear which approach is the most effective. In this paper, we formulate the problem of randomization in SGB in terms of optimization of sampling probabilities to maximize the estimation accuracy of split scoring used to train decision trees. This optimization problem has a closed-form nearly optimal solution, and it leads to a new sampling technique, which we call Minimal Variance Sampling (MVS). The method both decreases the number of examples needed for each iteration of boosting and increases the quality of the model significantly as compared to the state-of-the art sampling methods. The superiority of the algorithm was confirmed by introducing MVS as a new default option for subsampling in CatBoost, a gradient boosting library achieving state-of-the-art quality on various machine learning tasks.

algorithm, dataset, gradient, (17 more...)

1910.13204

Country:

Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
Asia > Russia (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.71)

#artificialintelligenceOct-27-2019, 12:39:01 GMT

[Webinar] Introduction to AutoML: A Hands-On Experience with H2O AutoML

This episode, we are going to mention AutoML concept. Automated Machine Learning or shortly AutoML offers you to skip designing steps in machine learning including algorithm selection, designing the model and tuning hyperparameters. It can build transcendental machine learning models. The longer time you provide, the better it is. We will also have a hands-on experience with H2O AutoML.

automl, hand-on experience

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.48)

#artificialintelligenceOct-27-2019, 10:53:24 GMT

Things I learned about Random Forest Machine Learning Algorithm

On a meetup that I attended a couple of months ago in Sydney, I was introduced to an online machine learning course by fast.ai. I never paid any attention to it then. This week, while working on a Kaggle competition, and looking for ways to improve my score, I came across this course again. I decided to give it a try. Here is what I learned from the first lecture, which is a 1 hour 17 minutes video on INTRODUCTION TO RANDOM FOREST.

data science, random forest, random forest machine learning algorithm, (8 more...)

Genre: Instructional Material (0.37)

Industry:

Materials > Paper & Forest Products > Forest Products (0.40)
Machinery > Agricultural & Farm Machinery (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.74)