Collaborating Authors

On Education Decision Trees, Random Forests, AdaBoost & XGBoost in Python - all courses


Get a solid understanding of decision tree Understand the business scenarios where decision tree is applicable Tune a machine learning model's hyperparameters and evaluate its performance. Use Pandas DataFrames to manipulate data and make statistical computations. Use decision trees to make predictions Learn the advantage and disadvantages of the different algorithms Students will need to install Python and Anaconda software but we have a separate lecture to help you install the same You're looking for a complete Decision tree course that teaches you everything you need to create a Decision tree/ Random Forest/ XGBoost model in Python, right? You've found the right Decision Trees and tree based advanced techniques course! After completing this course you will be able to: Identify the business problem which can be solved using Decision tree/ Random Forest/ XGBoost of Machine Learning.

Top 30 Python Libraries for Machine Learning


In this article, you'll see top 30 Python libraries for Machine Learning. In this article, you'll see top 30 Python libraries for Machine Learning. Today, Python is one of the most popular programming languages and it has replaced many languages in the industry. There are various reasons for its popularity and one of them is that python has a large collection of libraries. Python is one of the most widely used languages by Data Scientists and Machine Learning experts across the world. Though there is no shortage of alternatives in the form of languages like R, Julia and others, python has steadily and rightfully gained popularity. Similar to the Google Trends shown above(the plot is prepared using matplotlib and pytrends), confidence is visible year over year with python featuring way above its peers in the StackOverflow surveys for 2017 and 2018. These trends/surveys are the consequences of ease of use, shorter learning curve, widespread usage, strong community, large number of libraries covering depth and breadth of a number of research and application areas. The amazing popularity might make one think that python is the gold standard for Machine Learning.

datas-frame – Modern Pandas (Part 8): Scaling


We can answer questions like "Which employer's employees donated the most?" Or "what is the average amount donated per occupation?" Since Dask is lazy, we haven't actually computed anything.

Ultimate guide to handle Big Datasets for Machine Learning using Dask (in Python)


We will now have a look at some simple cases for creating arrays using Dask. As you can see here, I had 11 values in the array and I used the chunk size as 5. This distributed my array into three chunks, where the first and second blocks have 5 values each and the third one has 1 value. Dask arrays support most of the numpy functions. For instance, you can use .sum()

Lightning Fast XGBoost on Multiple GPUs


XGBoost is one of the most used libraries fora data science. At the time XGBoost came into existence, it was lightning fast compared to its nearest rival Python's Scikit-learn GBM. But as the times have progressed, it has been rivaled by some awesome libraries like LightGBM and Catboost, both on speed as well as accuracy. I, for one, use LightGBM for most of the use cases where I have just got CPU for training. But when I have a GPU or multiple GPUs at my disposal, I still love to train with XGBoost.