I recently published Chapter 3 of my book-in-progress on leanpub. The goal with this chapter is to empower data scientists to leverage managed services to deploy models to production and own more of DevOps. Serverless technologies enable developers to write and deploy code without needing to worry about provisioning and maintaining servers. One of the most common uses of this technology is serverless functions, which makes it much easier to author code that can scale to match variable workloads. With serverless function environments, you write a function that the runtime supports, specify a list of dependencies, and then deploy the function to production. The cloud platform is responsible for provisioning servers, scaling up more machines to match demand, managing load balancers, and handling versioning. Since we've already explored hosting models as web endpoints, serverless functions are an excellent tool to utilize when you want to rapidly move from prototype to production for your predictive models. Serverless functions were first introduced on AWS in 2015 and GCP in 2016. Both of these systems provide a variety of triggers that can invoke functions and a number of outputs that the functions can trigger in response. While it's possible to use serverless functions to avoid writing complex code for gluing different components together in a cloud platform, we'll explore a much narrower use case in this chapter.
At Retention Science we deliver personalized marketing campaigns powered by machine learning to drive a deeper level of customer engagement. Our AI engine, Cortex, is responsible for billions of predictions daily and hundreds of millions of personalized emails each month. As this number grows, it becomes increasingly important to report the campaign metrics in a fast, efficient, and fault tolerant way. In a recent project, we upgraded our existing nightly metrics reporting pipeline to an efficient real-time streaming pipeline. We achieved this using AWS Lambda in conjunction with Amazon Kinesis, Amazon Aurora and Amazon S3.
Qubole is announcing the availability of a working implementation of Apache Spark on AWS Lambda. This prototype has been able to show a successful scan of 1 TB of data and sort 100 GB of data from AWS Simple Storage Service (S3). This article dives into the technical details of how we built this prototype and the code changes required on top of Apache Spark 2.1.0.
Connected devices have found their way into a myriad of commercial and consumer applications. Industries have already moved, or are in the process of moving to, operational models that require them to measure broad data points in real time and optimize their operations based on their analysis of this data. The move to smart connected devices can become expensive if expensive components must be upgraded across the infrastructure. This blog post explores how AWS IoT can be used to gather remote sensor telemetry and control legacy non-IP devices through remote infrared (IR) commands over the Internet. In agriculture, greenhouses are used to create ideal growing conditions to maximize yield.