delta lake table
Simplifying Distributed Deep Learning Model Inference Webinar
On October 10th, our team hosted a live webinar--Simple Distributed Deep Learning Model Inference--with Xiangrui Meng, Software Engineer at Databricks. Model inference, unlike model training, is usually embarrassingly parallel and hence simple to distribute. However, in practice, complex data scenarios and compute infrastructure often make this "simple" task hard to do from data source to sink. In this webinar, we provided a reference end-to-end pipeline for distributed deep learning model inference using the latest features from Apache Spark and Delta Lake. While the reference pipeline applies to various deep learning scenarios, we focused on image applications, and demonstrated specific pain points and proposed solutions.
Productionizing Machine Learning with Delta Lake - Databricks Blog
For many data scientists, the process of building and tuning machine learning models is only a small portion of the work they do every day. The vast majority of their time is spent doing the less-than-glamorous (but crucial) work of performing ETL, building data pipelines, and putting models into production. Along the way, we'll demonstrate how Delta Lake is the ideal platform for the machine learning life cycle because it offers tools and features that unify data science, data engineering, and production workflows, including: These features of Delta Lake allow data engineers and scientists to design reliable, resilient, automated data pipelines and machine learning models faster than ever. A common architecture uses tables that correspond to different quality levels in the data engineering pipeline, progressively adding structure to the data: data ingestion ("Bronze" tables), transformation/feature engineering ("Silver" tables), and machine learning training or prediction ("Gold" tables). Combined, we refer to these tables as a "multi-hop" architecture.