Goto

Collaborating Authors

Feature Engineering: Data scientist's Secret Sauce !

@machinelearnbot

It is very tempting for data science practitioners to opt for the best known algorithms for a given problem.However It's not the algorithm alone, which can provide the best solution; Model built on carefully engineered and selected features can provide far better results. "Any intelligent fool can make things bigger, more complex, and more violent. It takes a touch of genius -- and a lot of courage -- to move in the opposite direction."- The complex models are not easily interpretable and tougher to tune. Simpler algorithms, with better features or more data can perform far better than a weak assumption accompanied with a complex model.


Feature Engineering: Data scientist's Secret Sauce !

@machinelearnbot

It is very tempting for data science practitioners to opt for the best known algorithms for a given problem.However It's not the algorithm alone, which can provide the best solution; Model built on carefully engineered and selected features can provide far better results. "Any intelligent fool can make things bigger, more complex, and more violent. It takes a touch of genius -- and a lot of courage -- to move in the opposite direction."- The complex models are not easily interpretable and tougher to tune. Simpler algorithms, with better features or more data can perform far better than a weak assumption accompanied with a complex model.


Machine learning for financial prediction: experimentation with David Aronson's latest work – part 1

#artificialintelligence

The results are a little different to those obtained using RMSE as the objective function. The focus is still well and truly on the volatility indicators, but in this case the best cross validated performance occurred when selecting only 2 out of the 15 candidate variables. Here's a plot of the cross validated performance of the best feature set for various numbers of features: The model clearly performs better in terms of absolute return for a smaller number of predictors. Performance bottoms at 8 predictors and then improves, but never again achieves the performance obtained with 2-4 predictors. This is consistent with Aronson's assertion that we should stick with at most 3-4 variables otherwise overfitting is almost unavoidable.


Machine learning for financial prediction: experimentation with David Aronson's latest work – part 1

#artificialintelligence

The results are a little different to those obtained using RMSE as the objective function. The focus is still well and truly on the volatility indicators, but in this case the best cross validated performance occurred when selecting only 2 out of the 15 candidate variables. Here's a plot of the cross validated performance of the best feature set for various numbers of features: The model clearly performs better in terms of absolute return for a smaller number of predictors. Performance bottoms at 8 predictors and then improves, but never again achieves the performance obtained with 2-4 predictors. This is consistent with Aronson's assertion that we should stick with at most 3-4 variables otherwise overfitting is almost unavoidable.


How to Improve Machine Learning: Tricks and Tips for Feature Engineering

#artificialintelligence

Predictive modeling is a formula that transforms a list of input fields or variables into some output of interest. Feature engineering is simply a thoughtful creation of new input fields from existing input fields, either in an automated fashion or manually, with valuable inputs from domain expertise, logical reasoning, or intuition. The new input fields could result in better inferences and insights from data and exponentially increase the performance of predictive models. Feature engineering is one of the most important parts of the data preparation process, where deriving new and meaningful variables takes place. Feature engineering enhances and enriches the ingredients needed for creating a robust model.