Decision Tree Learning
Sparse learning with CART
Decision trees with binary splits are popularly constructed using Classification and Regression Trees (CART) methodology. For regression models, this approach recursively divides the data into two near-homogenous daughter nodes according to a split point that maximizes the reduction in sum of squares error (the impurity) along a particular variable. This paper aims to study the statistical properties of regression trees constructed with CART methodology. In doing so, we find that the training error is governed by the Pearson correlation between the optimal decision stump and response data in each node, which we bound by constructing a prior distribution on the split points and solving a nonlinear optimization problem. We leverage this connection between the training error and Pearson correlation to show that CART with cost-complexity pruning achieves an optimal complexity/goodnessof-fit tradeoff when the depth scales with the logarithm of the sample size. Data dependent quantities, which adapt to the dimensionality and latent structure of the regression model, are seen to govern the rates of convergence of the prediction error.
Soft Gradient Boosting Machine
Feng, Ji, Xu, Yi-Xuan, Jiang, Yuan, Zhou, Zhi-Hua
Gradient Boosting Machine has proven to be one successful function approximator and has been widely used in a variety of areas. However, since the training procedure of each base learner has to take the sequential order, it is infeasible to parallelize the training process among base learners for speed-up. In addition, under online or incremental learning settings, GBMs achieved sub-optimal performance due to the fact that the previously trained base learners can not adapt with the environment once trained. In this work, we propose the soft Gradient Boosting Machine (sGBM) by wiring multiple differentiable base learners together, by injecting both local and global objectives inspired from gradient boosting, all base learners can then be jointly optimized with linear speed-up. When using differentiable soft decision trees as base learner, such device can be regarded as an alternative version of the (hard) gradient boosting decision trees with extra benefits. Experimental results showed that, sGBM enjoys much higher time efficiency with better accuracy, given the same base learner in both on-line and off-line settings.
Random Forests (and Extremely) in Python with scikit-learn
In this guest post, you will learn by example how to do two popular machine learning techniques called random forest and extremely random forests. In fact, this post is an excerpt (adapted to the blog format) from the forthcoming Artificial Intelligence with Python โ Second Edition: Your Complete Guide to Building Intelligent Apps using Python 3.x and TensorFlow 2. Now, before you will learn how to carry out random forests in Python with scikit-learn, you will find some brief information about the book. The new edition of this book, which will guide you to artificial intelligence with Python, is now updated to Python 3.x and TensorFlow 2. Furthermore, it has new chapters that, besides random forests, cover recurrent neural networks, artificial intelligence and Big Data, fundamental use cases, chatbots, and more. Finally, artificial Intelligence with Python โ Second Edition is written by two experts in the field of artificial intelligence; Alberto Artasanches and Pratek Joshi (more information about the authors can be found towards the end of the post). Now, in the next section of this post, you will learn what random forests and extremely random forests are.
Saber Pro success prediction model using decision tree based learning
Bernal, Gregorio Perez, Villegas, Luisa Toro, Toro, Mauricio
The primary objective of this report is to determine what influences the success rates of students who have studied in Colombia, analyzing the Saber 11, the test done at the last school year, some socioeconomic aspects and comparing the Saber Pro results with the national average. The problem this faces is to find what influences success, but it also provides an insight in the countries education dynamics and predicts one's opportunities to be prosperous. The opposite situation to the one presented in this paper could be the desertion levels, in the sense that by detecting what makes someone outstanding, these factors can say what makes one unsuccessful. The solution proposed to solve this problem was to implement a CART decision tree algorithm that helps to predict the probability that a student has of scoring higher than the mean value, based on different socioeconomic and academic factors, such as the profession of the parents of the subject parents and the results obtained on Saber 11. It was discovered that one of the most influential factors is the score in the Saber 11, on the topic of Social Studies, and that the gender of the subject is not as influential as it is usually portrayed as. The algorithm designed provided significant insight into which factors most affect the probability of success of any given person and if further pursued could be used in many given situations such as deciding which subject in school should be given more intensity to and academic curriculum in general.
A Machine Learning System for Retaining Patients in HIV Care
Kumar, Avishek, Ramachandran, Arthi, De Unanue, Adolfo, Sung, Christina, Walsh, Joe, Schneider, John, Ridgway, Jessica, Schuette, Stephanie Masiello, Lauritsen, Jeff, Ghani, Rayid
Retaining persons living with HIV (PLWH) in medical care is paramount to preventing new transmissions of the virus and allowing PLWH to live normal and healthy lifespans. Maintaining regular appointments with an HIV provider and taking medication daily for a lifetime is exceedingly difficult. 51% of PLWH are non-adherent with their medications and eventually drop out of medical care. Current methods of re-linking individuals to care are reactive (after a patient has dropped-out) and hence not very effective. We describe our system to predict who is most at risk to drop-out-of-care for use by the University of Chicago HIV clinic and the Chicago Department of Public Health. Models were selected based on their predictive performance under resource constraints, stability over time, as well as fairness. Our system is applicable as a point-of-care system in a clinical setting as well as a batch prediction system to support regular interventions at the city level. Our model performs 3x better than the baseline for the clinical model and 2.3x better than baseline for the city-wide model. The code has been released on github and we hope this methodology, particularly our focus on fairness, will be adopted by other clinics and public health agencies in order to curb the HIV epidemic.
Distributional Random Forests: Heterogeneity Adjustment and Multivariate Distributional Regression
ฤevid, Domagoj, Michel, Loris, Meinshausen, Nicolai, Bรผhlmann, Peter
We propose an adaptation of the Random Forest algorithm to estimate the conditional distribution of a possibly multivariate response. We suggest a new splitting criterion based on the MMD two-sample test, which is suitable for detecting heterogeneity in multivariate distributions. The weights provided by the forest can be conveniently used as an input to other methods in order to locally solve various learning problems. The code is available as \texttt{R}-package \texttt{drf}.
Using Machine Learning to Forecast Future Earnings
Cui, Xinyue, Xu, Zhaoyu, Zhou, Yue
In this essay, we have comprehensively evaluated the feasibility and suitability of adopting the Machine Learning Models on the forecast of corporation fundamentals (i.e. the earnings), where the prediction results of our method have been thoroughly compared with both analysts' consensus estimation and traditional statistical models. As a result, our model has already been proved to be capable of serving as a favorable auxiliary tool for analysts to conduct better predictions on company fundamentals. Compared with previous traditional statistical models being widely adopted in the industry like Logistic Regression, our method has already achieved satisfactory advancement on both the prediction accuracy and speed. Meanwhile, we are also confident enough that there are still vast potentialities for this model to evolve, where we do hope that in the near future, the machine learning model could generate even better performances compared with professional analysts.
Alexander Jung
This lecture discusses how decision trees can be used to represent predictor functions. Variations of the basic decision tree model provide some of the most powerful machine learning methods curren... Alexander Jung uploaded a video 1 week ago Classification Methods - Duration: 46 minutes. Our focus is on linear regression methods which can be expanded by feature constructions. Guest lecture of Prof. Minna Huotilainen on learning processes in human brains. Alexander Jung subscribed to a channel 3 weeks ago Playing For Change - Channel PFC is a movement created to inspire and connect the world through music. The idea for this project came from a common belief that music has the power to break down boundaries and overcome distances SubscribeSubscribedUnsubscribe1.9M This video explains how network Lasso can be used to learn localized linear models that allow "personalized" predictions for individual data points within a network.
Fine-Tuning ML Hyperparameters
"Just as electricity transformed almost every industry 100 years ago, today I actually have hard time thinking of an industry that I don't think AI (Artificial Intelligence) will transform in the next several years" -- Andrew NG I have long been fascinated with these algorithms, capable of something that we can as humans barely begin to comprehend. However, even with all these resources one of the biggest setbacks any ML practitioner has ever faced would be tuning the model's hyperparameters. A hyperparameter is a parameter whose value is used to control the learning process. By contrast, the values of other parameters (typically node weights) are learned. The same kind of machine learning model can be trained on different constraints, learning rates or kernels and other such parameters to generalize to different datasets, and hence these instructions have to be tuned so that the model can optimally solve the machine learning problem.
Decision Tree Learning-Inspired Dynamic Variable Ordering for the Weighted CSP
Xu, Hong (University of Southern California) | Sun, Kexuan (University of Southern California) | Koenig, Sven (University of Southern California) | Kumar, T. K. Satish (University of Southern California )
The weighted constraint satisfaction problem (WCSP) is a powerful mathematical framework for combinatorial optimization. The branch and bound search paradigm is very successful in solving the WCSP but critically depends on the ordering in which variables are instantiated. In this paper, we introduce a new framework for dynamic variable ordering for solving the WCSP. This framework is inspired by regression decision tree learning. Variables are ordered dynamically based on samples of random assignments of values to variables as well as their corresponding total weights. Within this framework, we propose four variable ordering heuristics (sdr, inv-sdr, rr and inv-rr). We compare them with many other state-of-the-art dynamic variable ordering heuristics, and show that sdr and rr outperform them on many real-world and random benchmark instances.