The growing amounts of data that are being generated due to such trends as the Internet of Things (IoT) and cloud computing have naturally beget the need for data scientists who can collect, analyze and, most importantly, interpret these massive stockpiles of complex information to help their companies more quickly and accurately make better business decisions to give them a competitive edge over competitors and to improve their operations and make them more efficient. That in turn has created something of a land rush in what's become a rapidly expanding data science platform market of more than a dozen vendors that range from established companies like IBM, Google, Microsoft and SAS to an array of smaller, younger pure-plays. The goal of all of these companies is to give these data scientists a single place to develop and run algorithms, use machine learning to help build predictive models and then deploy those models into their businesses' operations. IBM offers such products as SPSS Modeler and SPSS Statistics as well as its two-year-old Data Science Experience, a set of tools around such aspects as machine learning via the vendor's Watson cognitive computing technology and the R programming language, through the open-source RStudio offering. SAS has its Visual Suite for data visualization, prep, analytics and model building, while Microsoft offers its Azure Machine Learning platform as part of the cloud-based Cortana Intelligence Suite and Microsoft R for those who want to code in R. Other names in the space include H2O, RapidMiner, Angoss, Knime and Dataiku.
Forget about being "data driven." What you really want to be is "model driven," according to the CEO of Domino Data Lab, which today unveiled its new vision for elevating the predictive model as the single most important asset driving success in data science organizations. Nick Elprin co-founded Domino Data Lab with two colleagues, Chris Yang and Matthew Granade, at the height of the big data boom in 2013. With experience working as quants in the financial services industry, the founders were eager to build a platform that could help organizations build systems to leverage their data to gain a competitive edge, no matter the industry. At first, Domino focused on lowering the barrier separating data scientists from utilizing parallel computational infrastructure.
The field of data science is fairly young and evolving extremely rapidly. Finding people who can harness the tornado of big data tech is a major challenge. One of the up and coming vendors who are making data science more accessible is Domino Data Lab. Datanami recently talked with Nick Elprin, the co-founder and CEO of Domino Data Lab, a data science software company based in San Francisco. Here is an edited transcript of the conversation.
As data scientists confront operational challenges that are slowing the transition of machine learning models to production, more vendors are stepping up with possible solutions for breaking up the logjam. The latest is data science platform vendor Domino Data Lab, which rolled out the latest release of its flagship LaunchPad module this week. The 3.0 version specifically addresses "last mile" data science hurdles to streamline the model deployment process while speeding up ongoing improvements to production models. Citing the slow rate of AI model utilization, Domino Data Lab and others are offering tools designed to bridge the gap between IT and DevOps teams. That is increasingly seen as the biggest operational challenge as data science teams struggle to push models to production.
Enterprises are adopting data science pipelines for artificial intelligence, machine learning and plain old statistics. A data science pipeline -- a sequence of actions for processing data -- will help companies be more competitive in a digital, fast-moving economy. Before CIOs take this approach, however, it's important to consider some of the key differences between data science development workflows and traditional application development workflows. Data science development pipelines used for building predictive and data science models are inherently experimental and don't always pan out in the same way as other software development processes, such as Agile and DevOps. Because data science models break and lose accuracy in different ways than traditional IT apps do, a data science pipeline needs to be scrutinized to assure the model reflects what the business is hoping to achieve.