This article describes the machine learning services provided in SQL Server 2017, which support in-database use of the Python and R languages. The integration of SQL Server with open source languages popular for machine learning makes it easier to use the appropriate tool--SQL, Python, or R--for data exploration and modeling. R and Python scripts can also be used in T-SQL scripts or Integration Services packages, expanding the capabilities of ETL and database scripting. What has this to do with stone soup, you ask? It's a metaphor, of course, but one that captures the essence of why SQL Server works so well with Python and R. To illustrate the point, I'll provide a simple walkthrough of data exploration and modeling combining SQL and Python, using a food and nutrition analysis dataset from the US Department of Agriculture. You might have heard that data science is more of a craft than a science. Many ingredients have to come together efficiently, to process intake data and generate models and predictions that can be consumed by business users and end customers. However, what works well at the level of "craftsmanship" often has to change at commercial scale. Much like the home cook who has ventured out of the kitchen into a restaurant or food factory, big changes are required in the roles, ingredients, and processes. Moreover, cooking can no longer be a "one-man show;" you need the help of professionals with different specializations and their own tools to create a successful product or make the process more efficient. These specialists include data scientists, data developers and taxonomists, SQL developers, DBAS, application developers, and the domain specialists or end users who consume the results. Any kitchen would soon be chaos if the tools used by each professional were incompatible with each other, or if processes had to be duplicated and slightly changed at each step. What restaurant would survive if carrots chopped up at one station were unusable at the next?
Data science, machine learning, and analytics have re-defined how we look at the world. The R community plays a vital role in that transformation and the R language continues to be the de-facto choice for statistical computing, data analysis, and many machine learning scenarios. The importance of R was first recognized by the SQL Server team back in 2016 with the launch of SQL ML Services and R Server. Over the years we have added Python to SQL ML Services in 2017 and Java support through our language extensions in 2019. Earlier this year we also announced the general availability of SQL ML Services into Azure SQL Managed Instance.
At Pivotal Data Science, our primary charter is to help our customers derive value from their data assets, be it in the reduction of cost or by increasing revenue by offering better products and services. While we are not working on customer engagements, we engage in R&D using our wide array of products. For instance, we may contribute a new module to PDLTools or MADlib - our distributed in-database machine learning libraries, we might build end-to-end demos such as these or experiment with new technology and blog about them here. Last quarter, we set out to explore data science microservices for operationalizing our models for real-time scoring. Microservices have been the most talked about topic in many Cloud conferences of late. They've gained a large fan following by application developers, solution architects, data scientists and engineers alike.
In recent times, machine learning (ML) and artificial intelligence (AI) based systems have evolved and scaled across different industries such as finance, retail, insurance, energy utilities, etc. Among other things, they have been used to predict patterns of customer behavior, to generate pricing models, and to predict the return on investments. But the successes in deploying machine learning models at scale in those industries have not translated into the healthcare setting. There are multiple reasons why integrating ML models into healthcare has not been widely successful, but from a technical perspective, general-purpose commercial machine learning platforms are not a good fit for healthcare due to complexities in handling data quality issues, mandates to demonstrate clinical relevance, and a lack of ability to monitor performance in a highly regulated environment with stringent security and privacy needs. In this paper, we describe Isthmus, a turnkey, cloud-based platform which addresses the challenges above and reduces time to market for operationalizing ML/AI in healthcare. Towards the end, we describe three case studies which shed light on Isthmus capabilities. These include (1) supporting an end-to-end lifecycle of a model which predicts trauma survivability at hospital trauma centers, (2) bringing in and harmonizing data from disparate sources to create a community data platform for inferring population as well as patient level insights for Social Determinants of Health (SDoH), and (3) ingesting live-streaming data from various IoT sensors to build models, which can leverage real-time and longitudinal information to make advanced time-sensitive predictions.