I recently published Chapter 3 of my book-in-progress on leanpub. The goal with this chapter is to empower data scientists to leverage managed services to deploy models to production and own more of DevOps. Serverless technologies enable developers to write and deploy code without needing to worry about provisioning and maintaining servers. One of the most common uses of this technology is serverless functions, which makes it much easier to author code that can scale to match variable workloads. With serverless function environments, you write a function that the runtime supports, specify a list of dependencies, and then deploy the function to production. The cloud platform is responsible for provisioning servers, scaling up more machines to match demand, managing load balancers, and handling versioning. Since we've already explored hosting models as web endpoints, serverless functions are an excellent tool to utilize when you want to rapidly move from prototype to production for your predictive models. Serverless functions were first introduced on AWS in 2015 and GCP in 2016. Both of these systems provide a variety of triggers that can invoke functions and a number of outputs that the functions can trigger in response. While it's possible to use serverless functions to avoid writing complex code for gluing different components together in a cloud platform, we'll explore a much narrower use case in this chapter.
At Retention Science we deliver personalized marketing campaigns powered by machine learning to drive a deeper level of customer engagement. Our AI engine, Cortex, is responsible for billions of predictions daily and hundreds of millions of personalized emails each month. As this number grows, it becomes increasingly important to report the campaign metrics in a fast, efficient, and fault tolerant way. In a recent project, we upgraded our existing nightly metrics reporting pipeline to an efficient real-time streaming pipeline. We achieved this using AWS Lambda in conjunction with Amazon Kinesis, Amazon Aurora and Amazon S3.
Qubole is announcing the availability of a working implementation of Apache Spark on AWS Lambda. This prototype has been able to show a successful scan of 1 TB of data and sort 100 GB of data from AWS Simple Storage Service (S3). This article dives into the technical details of how we built this prototype and the code changes required on top of Apache Spark 2.1.0.
From Google's advertisements to Amazon's product suggestions, recommendation engines are everywhere. As users of smart internet services, we've become so accustomed to seeing things we like. This blog post is an overview of how we built a product recommendation engine for Hubba. I'll start with an explanation of different types of recommenders and how we went about the selection process. Content-based recommenders use discrete properties of an item, such as its tags. If a user views products tagged "dogs", "pets", "chow", the recommender may suggest to view more pet food products.