To explore emerging topics and new areas of study made possible by vast troves of raw data and cutting-edge architectures, check out the Data Science and Machine Learning sessions at Strata Data Conference, September 25-28, 2017, in New York City. Use code KDNU to get an additional 20% off Best price (ends August 11). Data science has become widely accepted across a broad range of industries in the past few years. Originally more of a research topic, data science has early roots in scientists' efforts to understand human intelligence and create artificial intelligence; it has since proven that it can add real business value. As an example, we can look at the company I work for: Zalando, one of Europe's biggest fashion retailers, where data science is heavily used to provide data-driven recommendations, among other things.
Sophie WatsonLet's start with an uncontroversial point: Software developers and system operators love Kubernetes as a way to deploy and manage applications in Linux containers. What you may not know is that Kubernetes also provides an unbeatable combination of features for working data scientists. The same features that streamline the software development workflow also support a data science workflow! To see why, let's first see what a data scientist's job looks like. Some people define data science broadly, including machine learning (ML), software engineering, distributed computing, data management, and statistics.
To learn more about cutting-edge data science tools like Apache Kafka, check out the Strata Data Conference in San Jose, March 5-8, 2018--registration is now open. Machine learning has become mainstream, and suddenly businesses everywhere are looking to build systems that use it to optimize aspects of their product, processes or customer experience. The cartoon version of machine learning sounds quite easy: you feed in training data made up of examples of good and bad outcomes, and the computer automatically learns from these and spits out a model that can make similar predictions on new data not seen before. What could be easier, right? Those with real experience building and deploying production systems built around machine learning know that, in fact, these systems are shockingly hard to build. This difficulty is not, for the most part, the algorithmic or mathematical complexities of machine learning algorithms. Creating such algorithms is difficult, to be sure, but the algorithm creation process is mostly done by academic researchers.
There's some confusion surrounding the roles of machine learning engineer vs. data scientist, primarily because they are both relatively new. However, if you parse things out and examine the semantics, the distinctions become clear. While a scientist needs to fully understand the, well, science behind their work, an engineer is tasked with building something. But before we go any further, let's address the difference between machine learning and data science. It starts with having a solid definition of artificial intelligence.