To explore emerging topics and new areas of study made possible by vast troves of raw data and cutting-edge architectures, check out the Data Science and Machine Learning sessions at Strata Data Conference, September 25-28, 2017, in New York City. Use code KDNU to get an additional 20% off Best price (ends August 11). Data science has become widely accepted across a broad range of industries in the past few years. Originally more of a research topic, data science has early roots in scientists' efforts to understand human intelligence and create artificial intelligence; it has since proven that it can add real business value. As an example, we can look at the company I work for: Zalando, one of Europe's biggest fashion retailers, where data science is heavily used to provide data-driven recommendations, among other things.
Sophie WatsonLet's start with an uncontroversial point: Software developers and system operators love Kubernetes as a way to deploy and manage applications in Linux containers. What you may not know is that Kubernetes also provides an unbeatable combination of features for working data scientists. The same features that streamline the software development workflow also support a data science workflow! To see why, let's first see what a data scientist's job looks like. Some people define data science broadly, including machine learning (ML), software engineering, distributed computing, data management, and statistics.
To learn more about cutting-edge data science tools like Apache Kafka, check out the Strata Data Conference in San Jose, March 5-8, 2018--registration is now open. Machine learning has become mainstream, and suddenly businesses everywhere are looking to build systems that use it to optimize aspects of their product, processes or customer experience. The cartoon version of machine learning sounds quite easy: you feed in training data made up of examples of good and bad outcomes, and the computer automatically learns from these and spits out a model that can make similar predictions on new data not seen before. What could be easier, right? Those with real experience building and deploying production systems built around machine learning know that, in fact, these systems are shockingly hard to build. This difficulty is not, for the most part, the algorithmic or mathematical complexities of machine learning algorithms. Creating such algorithms is difficult, to be sure, but the algorithm creation process is mostly done by academic researchers.
Deploying and maintaining Machine Learning models at scale is one of the most pressing challenges faced by organizations today. Machine Learning workflow which includes Training, Building and Deploying machine learning models can be a long process with many roadblocks along the way. Many data science projects don't make it to production because of challenges that slow down or halt the entire process. To overcome the challenges of model deployment, we need to identify the problems and learn what causes them. End-to-end ML applications often comprise of components written in different programming languages.