NumPy Fundamentals for Data Science and Machine Learning

#artificialintelligence 

Note: If you prefer to read with a white background and black font, you can see this article in GitHub here. Las time I check SVG images rendered just fine. It is no exaggeration to say that NumPy is at the core of the entire scientific computing Python ecosystem, both as a standalone package for numerical computation and as the engine behind most data science packages. In this document, I review NumPy main components and functionality, with attention to the needs of Data Science and Machine Learning practitioners, and people who aspire to become a data professional. My only assumption is that you have basic familiarity with Python, things like variables, lists, tuples, and loops. Advance Python concepts like Object Oriented Programming are not touched at all. Content-wise, I'll say that 95% is based on NumPy v1.18 manual, in particular: The rest 5% comes from a couple of random articles on the Internet and Stack Overflow. I resort to those sources mostly to clarify concepts and functionality that wasn't clear for me from NumPy documentation. My own experience was the base to organize the tutorial, explain concepts, create practical examples, create images, etc. "Why are you using the documentation as the main source of content, instead of the many great tutorials online?" Because it is the most up-to-date, complete, and reliable source about NumPy (and about any library for that matter). "Why then I should read this if everything comes from the documentation?" Well, you don't need to read this, you are right. Actually, I encourage you to read the documentation and learn from there. What I can offer is my own: (1) organization of contents, (2) selection of contents, (3) explanations and framing of concepts, (4) images, (5) practical examples, (6) and general perspective. This tutorial is part of a larger project I am working on, which is an introduction to Python and its libraries for scientific computing, data science, and machine learning that you can find here. As a final note, if you are NumPy expert, advanced user, or developer, you may find some inaccuracies or lack of depth in some of my explanations. Two things: (1) feel free to suggest a better explanation or something that I may add to make things clearer, (2) I prioritize conciseness and accessibility over the accuracy, so the lack of accuracy or depth sometimes it is intentional from my part. If you have any questions or suggestion feel free to reach me out to at pcaceres@wisc.edu Here is my Twitter, LinkedIn, and personal site. Scientific and numerical computing often requires processing massive datasets with complex algorithms. If you are a scientist or data professional, you want a programming language than can process data FAST. The closer a programming language is to machine instructions (binary), the faster it runs.