A decade ago, Ion Stoica and his colleagues at UC Berkeley's school of computing identified the roadblock to performing advanced analytics. The challenge at the time was what we then called Big Data. Cheap storage and compute could be tapped, courtesy of the Hadoop project, but the jobs tended to take hours or days. Stoica and colleagues worked on a solution that exploited memory, and the result was the Apache Spark project. Created at UC Berkeley's AMPLab, it has become the de facto standard for large-scale batch data processing, not to mention the technology that birthed a company currently valued at $28 billion.
Over the past 4 - 5 years, Apache Spark has taken the big data analytics world by storm (for fans of streaming, no pun intended). As the company whose founders created and continue to lead the Apache Spark project, Databricks has differentiated itself as the company that can give you the most performant, up to date, Spark-based cloud platform service.
A key highlight from last week's re:Invent was the extension of serverless compute to a swath of AWS analytics services, including Amazon EMR, Kinesis Data Streams, MSK (Managed Service for Kafka), and Redshift. For cloud analytics, AWS was not the first to offer serverless options, as Google Cloud BigQuery and Azure Synapse Analytics have long offered serverless options (by contrast, Snowflake's is still in preview). Serverless wasn't the only new feature announced last week. AWS also announced the preview of automated materialized views that treats the creation of these views much like cost-based query optimizers: it automatically generates the views based on data hot spots. Nonetheless, serverless grabbed the limelight.
Serverless computing has come a long way since its humble origins with programming simple function services that typically are implemented in lightweight web or mobile apps. In a recent briefing with analysts, IBM reviewed its plans for serverless services in its cloud, and the future points to almost the exact opposite of simple functions: applications for complex supercomputing. It's part of an ongoing expansion of IBM Cloud Code Engine, first rolled out earlier this year, to become a broad-based platform for automating deployment and running code across a broad spectrum of use cases, from functions to PaaS, batch jobs, and containers-as-a-service. As we'll note below, extending this up to supercomputing is a marked shift from the far humbler origins of serverless computing. But also, part of the roadmap is having the engine encompass a full spectrum of services, starting with functions as a service that IBM has offered for a number of years.