Collaborating Authors

How do I encode categorical features using scikit-learn?


In order to include categorical features in your Machine Learning model, you have to encode them numerically using "dummy" or "one-hot" encoding. But how do you do this correctly using scikit-learn? In this video, you'll learn how to use OneHotEncoder and ColumnTransformer to encode your categorical features and prepare your feature matrix in a single step. You'll also learn how to include this step within a Pipeline so that you can cross-validate your model and preprocessing steps simultaneously. Finally, you'll learn why you should use scikit-learn (rather than pandas) for preprocessing your dataset.

Machine Learning with Text in scikit-learn (PyCon 2016)


Although numeric data is easy to work with in Python, most knowledge created by humans is actually raw, unstructured text. By learning how to transform text into data that is usable by machine learning models, you drastically increase the amount of data that your models can learn from. In this tutorial, we'll build and evaluate predictive models from real-world text using scikit-learn. Subscribe to the Data School newsletter: OTHER RESOURCES My scikit-learn video series: My pandas video series: JOIN THE DATA SCHOOL COMMUNITY Blog:

Getting started with machine learning in Python (webcast)


Have you heard about machine learning, but you don't really understand what it's good for? Or you understand the basic idea, but you're struggling to apply it using Python? In this video, I'll explain the essential ideas behind machine learning. Then, we'll build our first machine learning model in just a few lines of code using Python's scikit-learn library. This is a recording of a webcast hosted by Trey Hunner of Weekly Python Chat:

Data School


You're about to learn 25 tricks that will help you to work faster, write better pandas code, and impress your friends. These are the BEST tricks I've learned from 5 years of teaching Python's pandas library. Don't miss the BONUS at the end of this video! Create a DataFrame from the clipboard 11:50 12. Split a DataFrame into two random subsets 12:57 13. Reshape a MultiIndexed Series 22:04 22. Create a pivot table 23:01 23.