Collaborating Authors


Forecasting: theory and practice Machine Learning

Forecasting has always been at the forefront of decision making and planning. The uncertainty that surrounds the future is both exciting and challenging, with individuals and organisations seeking to minimise risks and maximise utilities. The large number of forecasting applications calls for a diverse set of forecasting methods to tackle real-life challenges. This article provides a non-systematic review of the theory and the practice of forecasting. We provide an overview of a wide range of theoretical, state-of-the-art models, methods, principles, and approaches to prepare, produce, organise, and evaluate forecasts. We then demonstrate how such theoretical concepts are applied in a variety of real-life contexts. We do not claim that this review is an exhaustive list of methods and applications. However, we wish that our encyclopedic presentation will offer a point of reference for the rich work that has been undertaken over the last decades, with some key insights for the future of forecasting theory and practice. Given its encyclopedic nature, the intended mode of reading is non-linear. We offer cross-references to allow the readers to navigate through the various topics. We complement the theoretical concepts and applications covered by large lists of free or open-source software implementations and publicly-available databases.

[R] Please point me in the right direction: decision trees or possibly something better


The Linear Baseline model solves a convex problem and thus will converge to roughly the same optimum. This gives you basically an Idea for how informative naive correlation between features are. Logistic Regressions and Support Vector Machines are a common choice here in my experience. Regarding nonlinear models, the random forrest classifier is neat. XGBoost (Gradient Boosted Decision forrest) from the package of the same name is also realy good.

[D] What's the best deep-dive explanation of XGBoost hyperparameters out there?


I'm not a total newbie, so I'd thank for all those "how to get started with xgboost" articles which there are plenty of. I remember having bumped into a site or blog with a great and comprehensive summary of each hyperparameter, but I lost that link and can't find it know from search. As far as I remember, it had a hyperparameter menu on the left, probably referred to all boosting trees and their hyperparameters and was created by some women. Anybody can recall that source?

Predicting movie revenue with AdaBoost, XGBoost and LightGBM


Marvel's Avengers: Endgame recently dethroned Avatar as the highest grossing movie in history and while there was no doubt about this movie becoming very successful, I want to understand what makes any given movie a success. I am using data from The Movie Database provided through kaggle. The data set is split into a train and test set with the train set containing 3,000 movies and the test set comprising 4,398. The train data set also contains the target variable revenue. Prequels and Sequels: Maybe unsurprisingly, movies that are either prequels or sequels to related movies earn on average a higher revenue than standalone movies.

r/MachineLearning - [D] What happens when you pit an XGBoost model against a scorecard?


Anyone have any thoughts on when it's best to use se ML v. Scorecards? This blog compares predicted probabilities vs. observed proportions at the feature/predictor level. The example finds that the XGBoost model is consistently under-estimating good credit risk across all bins of this predictor while the risk Scorecard demonstrates less discrepancy between the estimated and observed outcome.

A Comprehensive Guide to Ensemble Learning (with Python codes) - Analytics Vidhya


When you want to purchase a new car, will you walk up to the first car shop and purchase one based on the advice of the dealer? You would likely browser a few web portals where people have posted their reviews and compare different car models, checking for their features and prices. You will also probably ask your friends and colleagues for their opinion. In short, you wouldn't directly reach a conclusion, but will instead make a decision considering the opinions of other people as well. Ensemble models in machine learning operate on a similar idea. They combine the decisions from multiple models to improve the overall performance.

r/MachineLearning - [D] Error function for AdaBoost Algorithm


I need to solve a task where it is asked to me to provide an error function whose minimization leads to a formulation equivalent to the AdaBoost algorithm. I did not understand exactly this question, I know that in the AdaBoost algorithm at the beginning I train a "weak" learner by minimizing its error function and then I used the weights to compute errors and iterate over the new classifier, this in an iterative way; so what does it mean with this error function to minimize?

[P] Is XGBoost w/ iterating undersampling doable? • r/MachineLearning


I know this might sound like a "google this for me question" but bare with me (I googled it). I'm working with a highly imbalanced data set where the minority class accounts for 1.5% of the total set. This leads to poor predictive performance by most models when nothing is done to address the problem because most algorithms will minimize cost on the majority class, to the detriment of the minority class, when training so as to decrease overall cost. So far I've tried out ANNs,RFs,XGBs, and SVMs and have found that XGB and RF outperform the others in this particular problem, so the remaining post will be about RF and XGB. I've tried penalizing classification on the minority class much more than the majority class to try to fix the imbalance on an algorithmic level but I've found undersampling and then training on the resulting data set to be more effective in my case.

XGBoost explained • /r/MachineLearning


To expand: according to my naive understanding, boosted trees are basically an ensemble of decision trees which are fit sequentially so that each new tree makes up for the errors of the previously existing set of trees. The model is "boosted" by focusing new additions on correcting the residual errors of the last version of the model. The "gradient" comes in afterward as the parameters of the tree ensemble are optimized to minimize the error of the whole "base learner". I think of this as fine tuning of the boosted tree ensemble using a gradient-based optimization.