For example, for personalized recommendations, we have been working with learning to rank methods that learn individual rankings over item sets. Figure 1: Typical data science workflow, starting with raw data that is turned into features and fed into learning algorithms, resulting in a model that is applied on future data. This means that this pipeline is iterated and improved many times, trying out different features, different forms of preprocessing, different learning methods, or maybe even going back to the source and trying to add more data sources. Probably the main difference between production systems and data science systems is that production systems are real-time systems that are continuously running.
Sophie WatsonLet's start with an uncontroversial point: Software developers and system operators love Kubernetes as a way to deploy and manage applications in Linux containers. What you may not know is that Kubernetes also provides an unbeatable combination of features for working data scientists. The same features that streamline the software development workflow also support a data science workflow! To see why, let's first see what a data scientist's job looks like. Some people define data science broadly, including machine learning (ML), software engineering, distributed computing, data management, and statistics.
I've been working with Machine Learning models both in academic and industrial settings for a few years now. I've recently been watching the excellent Scalable ML from Mikio Braun, this is to learn some more about Scala and Spark. His video series talks about the practicalities of'big data' and so made me think what I wish I knew earlier about Machine Learning I'll take each in turn. I gave a talk on Data-Products and getting Ordinary Differential Equations into production. One thing that I didn't realise until sometime afterwards was just how challenging it is to handle issues like model decay, evaluation of models in production, dev-ops etc all by yourself.
It is dangerous to attribute too much intelligence to these systems. When I enrolled in Computer Science in 1995, Data Science didn't exist yet, but a lot of the algorithms we are still using already did. And this is not just because of the return of the neural networks, but also because probably not that much has fundamentally changed since back then. At least it feels to me this way. Which is funny considering that starting this year or so AI seems to finally have gone mainstream.