The growing amounts of data that are being generated due to such trends as the Internet of Things (IoT) and cloud computing have naturally beget the need for data scientists who can collect, analyze and, most importantly, interpret these massive stockpiles of complex information to help their companies more quickly and accurately make better business decisions to give them a competitive edge over competitors and to improve their operations and make them more efficient. That in turn has created something of a land rush in what's become a rapidly expanding data science platform market of more than a dozen vendors that range from established companies like IBM, Google, Microsoft and SAS to an array of smaller, younger pure-plays. The goal of all of these companies is to give these data scientists a single place to develop and run algorithms, use machine learning to help build predictive models and then deploy those models into their businesses' operations. IBM offers such products as SPSS Modeler and SPSS Statistics as well as its two-year-old Data Science Experience, a set of tools around such aspects as machine learning via the vendor's Watson cognitive computing technology and the R programming language, through the open-source RStudio offering. SAS has its Visual Suite for data visualization, prep, analytics and model building, while Microsoft offers its Azure Machine Learning platform as part of the cloud-based Cortana Intelligence Suite and Microsoft R for those who want to code in R. Other names in the space include H2O, RapidMiner, Angoss, Knime and Dataiku.
Domino Data Lab is making the case for a multi-cloud approach to building and deploying applications infused with machine learning algorithms now that its platform runs on Kubernetes. Company CEO Nick Elprin says that as organizations move to employ machine learning algorithms to build various types of applications, many of them don't appreciate the extent to which relying on proprietary services is locking them into a "walled garden" that only runs on a specific cloud computing platform. Many of those same organizations may even wake up one morning to discover they are suddenly now competing with Amazon, Google or Microsoft, all of which are rapidly expanding the type of services they provide based on machine learning algorithms, he notes. By opting to build machine learning models on a platform provided by Domino Data Lab, organizations can deploy those models on any public cloud or on-premises IT environment as they best see fit, Elprin says. Longer-term, Domino Data Labs is betting most applications employing machine learning algorithms also will be likely to span multiple clouds, he adds.
The field of data science is fairly young and evolving extremely rapidly. Finding people who can harness the tornado of big data tech is a major challenge. One of the up and coming vendors who are making data science more accessible is Domino Data Lab. Datanami recently talked with Nick Elprin, the co-founder and CEO of Domino Data Lab, a data science software company based in San Francisco. Here is an edited transcript of the conversation.
One of today's organizational dilemmas: it's pretty well understood that data science is a key driver of innovation, but few organizations know how to consistently turn data science output into business value. Sixty percent of companies plan to double the size of their data science teams in 2018. Ninety percent believe data science contributes to business innovation. However, less than nine percent can actually quantify the business impact of all their models, and only 11 percent can claim more than 50 predictive models working in production. This data stems from a recent survey of more than 250 data science leaders and practitioners.
Enterprises are adopting data science pipelines for artificial intelligence, machine learning and plain old statistics. A data science pipeline -- a sequence of actions for processing data -- will help companies be more competitive in a digital, fast-moving economy. Before CIOs take this approach, however, it's important to consider some of the key differences between data science development workflows and traditional application development workflows. Data science development pipelines used for building predictive and data science models are inherently experimental and don't always pan out in the same way as other software development processes, such as Agile and DevOps. Because data science models break and lose accuracy in different ways than traditional IT apps do, a data science pipeline needs to be scrutinized to assure the model reflects what the business is hoping to achieve.