Python continues to take leading positions in solving data science tasks and challenges. Last year we made a blog post overviewing the Python's libraries that proved to be the most helpful at that moment. This year, we expanded our list with new libraries and gave a fresh look to the ones we already talked about, focusing on the updates that have been made during the year. Our selection actually contains more than 20 libraries, as some of them are alternatives to each other and solve the same problem. Therefore we have grouped them as it's difficult to distinguish one particular leader at the moment.
It has been some time since we last performed a Python libraries roundup, and as such we have taken the opportunity to start the month of November with just such a fresh list. Last time we at KDnuggets did this, editor and author Dan Clark split up the vast array of Python data science related libraries up into several smaller collections, including data science libraries, machine learning libraries, and deep learning libraries. While splitting libraries into categories is inherently arbitrary, this made sense at the time of previous publication. This time, however, we have split the collected on open source Python data science libraries in two. This first post (this) covers "data science, data visualization & machine learning," and can be thought of as "traditional" data science tools covering common tasks. The second post, to be published next week, will cover libraries for use in building neural networks, and those for performing natural language processing and computer vision tasks.
We recently published a series of articles looking at the top Python libraries, across Data science, Deep Learning and Machine Learning. As the year draws to a close, we thought we'd give you a special Christmas gift, and collate these into a KDnuggets official top Python libraries in 2018. As always, we want your opinions! So, if you think we've unfairly left any out, or if you disagree with any of our choices, please let us know in the comments section below. Shape size is proportional to number of commits.
As Python has gained a lot of traction in the recent years in Data Science industry, I wanted to outline some of its most useful libraries for data scientists and engineers, based on recent experience. And, since all of the libraries are open sourced, we have added commits, contributors count and other metrics from Github, which could be served as a proxy metrics for library popularity. When starting to deal with the scientific task in Python, one inevitably comes for help to Python's SciPy Stack, which is a collection of software specifically designed for scientific computing in Python (do not confuse with SciPy library, which is part of this stack, and the community around this stack). This way we want to start with a look at it. However, the stack is pretty vast, there is more than a dozen of libraries in it, and we want to put a focal point on the core packages (particularly the most essential ones).
Python continues to lead the way when it comes to Machine Learning, AI, Deep Learning and Data Science tasks. Because of this, we've decided to start a series investigating the top Python libraries across several categories: Of course, these lists are entirely subjective as many libraries could easily place in multiple categories. For example, Keras is included in this list but TensorFlow has been omitted and features in the Deep Learning library collection instead. This is because Keras is more of an'end-user' library like SKLearn, as opposed to TensorFlow which appeals more to researchers and Machine Learning engineer types. Now, let's get onto the list (GitHub figures correct as of October 3rd, 2018): "scikit-learn is a Python module for machine learning built on NumPy, SciPy and matplotlib. It provides simple and efficient tools for data mining and data analysis. SKLearn is accessible to everybody and reusable in various contexts. "Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano.