Decision Tree Learning
Tuning Random Forest model Machine Learning Predictive modeling
A month back, I participated in a Kaggle competition called TFI. I started with my first submission at 50th percentile. Having worked relentlessly on feature engineering for more than 2 weeks, I managed to reach 20th percentile. To my surprise, right after tuning the parameters of the machine learning algorithm I was using, I was able to breach top 10th percentile. This is how important tuning these machine learning algorithms are.
RedisLabsModules/redis-ml
Redis-ML is a Redis module that implements several machine learning models as Redis data types. The stored models are fully operational and support the prediction/evaluation process. Redis-ML is a turnkey solution for using trained models in a production environment. Load ML models from any platform, immediately ready to serve. The following code creates a random forest under the key myforest that consists of three trees with IDs ranging from 0 to 2, where each consists of a single numeric splitter and its predicate values.
Decision Trees -- Understanding Explainable AI – Towards Data Science
Explainable AI or XAI is a sub-category of AI where the decisions made by the model can be interpreted by humans, as opposed to "black box" models. As AI moves from correcting our spelling and targeting ads to driving our cars and diagnosing patients, the need to verify and justify the conclusions being reached is beginning to be prioritised. To begin to delve into the field, lets look at one simple XAI model: the decision tree. Decision trees can be easily read and even mimic a human approach to decision making by breaking the choice into many small sub-choices. A simple example is how one may evaluate local universities when the leave high school.
Imbalance Class Classification using Random Forest
I agree with the idea of using boosting algorithms is better but not enough in practice. SMOTE would be a good starting point (definitely I would opt for a over-sampling strategy) but there are others. Here you can find a nice implementation of solutions for imbalanced data in python (scikit-learn-contrib). The success of any of these techniques depend largely on the nature of your data. Therefore, I would suggest you try different approaches and see how they affect your results.
The roots of inequality : estimating inequality of opportunity from regression trees (English)
This paper proposes a set of new methods to estimate inequality of opportunity based on conditional inference regression trees. It illustrates how these methods represent a substantial improvement over existing empirical approaches to measure inequality... See More This paper proposes a set of new methods to estimate inequality of opportunity based on conditional inference regression trees. It illustrates how these methods represent a substantial improvement over existing empirical approaches to measure inequality of opportunity. First, the new methods minimize the risk of arbitrary and ad hoc model selection. Second, they provide a standardized way to trade off upward and downward biases in inequality of opportunity estimations.
The Random Forest Algorithm – Towards Data Science
Random Forest is a flexible, easy to use machine learning algorithm that produces, even without hyper-parameter tuning, a great result most of the time. It is also one of the most used algorithms, because it's simplicity and the fact that it can be used for both classification and regression tasks. In this post, you are going to learn, how the random forest algorithm works and several other important things about it. Random Forest is a supervised learning algorithm. Like you can already see from it's name, it creates a forest and makes it somehow random.
Top 10 Data Mining Algorithms, Explained
Today, I'm going to explain in plain English the top 10 most influential data mining algorithms as voted on by 3 separate panels in this survey paper. Once you know what they are, how they work, what they do and where you can find them, my hope is you'll have this blog post as a springboard to learn even more about data mining. What are we waiting for? We also provide interesting resources at the end. In order to do this, C4.5 is given a set of data representing things that are already classified.
Decision Trees, Classification & Interpretation Using SciKit-Learn
This article is by Jitesh Shah, a data & stats jockey in perpetual beta, located in Fremont, California. This article includes the data set and Python code. Wouldn't it be nice if defects and product failures can be predicted in advance. We've got the data on attributes and design features and manufacturing processes that come together and creates that product and we have defect and failure rate data so all we got to do is connect the two and use that to predict which set of features and attributes and processes in combination cause these defects. That was probably a non-trivial endeavor in the past but now with the ability to store and process vast amounts of data (no secret there), no big deal.
Extremely Fast Decision Tree
Manapragada, Chaitanya, Webb, Geoff, Salehi, Mahsa
We introduce a novel incremental decision tree learning algorithm, Hoeffding Anytime Tree, that is statistically more efficient than the current state-of-the-art, Hoeffding Tree. We demonstrate that an implementation of Hoeffding Anytime Tree---"Extremely Fast Decision Tree", a minor modification to the MOA implementation of Hoeffding Tree---obtains significantly superior prequential accuracy on most of the largest classification datasets from the UCI repository. Hoeffding Anytime Tree produces the asymptotic batch tree in the limit, is naturally resilient to concept drift, and can be used as a higher accuracy replacement for Hoeffding Tree in most scenarios, at a small additional computational cost.