At the OOP 2018 conference in Munich, I presented an updated version of my talk about building scalable, mission-critical microservices with the Apache Kafka ecosystem and deep learning frameworks like TensorFlow, DeepLearning4J, or H2O. I want to share the updated slide deck and discuss a few updates about newest trends, which I incorporated into the talk.
Uber expanded Michelangelo "to serve any kind of Python model from any source to support other Machine Learning and Deep Learning frameworks like PyTorch and TensorFlow [instead of just using Spark for everything]." So why did Uber (and many other tech companies) build its own platform and framework-independent machine learning infrastructure? The posts How to Build and Deploy Scalable Machine Learning in Production with Apache Kafka and Using Apache Kafka to Drive Cutting-Edge Machine Learning describe the benefits of leveraging the Apache Kafka ecosystem as a central, scalable, and mission-critical nervous system. It allows real-time data ingestion, processing, model deployment, and monitoring in a reliable and scalable way. This post focuses on how the Kafka ecosystem can help solve the impedance mismatch between data scientists, data engineers, and production engineers. By leveraging it to build your own scalable machine learning infrastructure and also make your data scientists happy, you can solve the same problems for which Uber built its own ML platform, Michelangelo. Based on what I've seen in the field, an impedance mismatch between data scientists, data engineers, and production engineers is the main reason why companies struggle to bring analytic models into production to add business value.
I had a new talk presented at Codemotion Amsterdam 2018 this week. I discussed the relation of Apache Kafka and machine learning to build a machine learning infrastructure for extreme scale. As always, I want to share the slide deck. The talk was also recorded. I will share the video as soon as it is published by the organizer.
The relationship between Apache Kafka and machine learning (ML) is an interesting one that I've written about quite a bit in How to Build and Deploy Scalable Machine Learning in Production with Apache Kafka and Using Apache Kafka to Drive Cutting-Edge Machine Learning. This blog post addresses a specific part of building a machine learning infrastructure: the deployment of an analytic model in a Kafka application for real-time predictions. Model training and model deployment can be two separate processes. However, you can also use many of the same steps for integration and data preprocessing because you often need to perform the same integration, filter, enrichment, and aggregation of data for model training and model inference. We will discuss and compare two different options for model deployment: model servers with remote procedure calls (RPCs), and natively embedding models into Kafka client applications.
I had a new talk presented at "Codemotion Amsterdam 2018" this week. I discussed the relation of Apache Kafka and Machine Learning to build a Machine Learning infrastructure for extreme scale. As always, I want to share the slide deck. The talk was also recorded. I will share the video as soon as it was published by the organizer.