Goto

Collaborating Authors

5 Machine Learning Trends for 2018 Combined With Apache Kafka Ecosystem - DZone AI

#artificialintelligence

At the OOP 2018 conference in Munich, I presented an updated version of my talk about building scalable, mission-critical microservices with the Apache Kafka ecosystem and deep learning frameworks like TensorFlow, DeepLearning4J, or H2O. I want to share the updated slide deck and discuss a few updates about newest trends, which I incorporated into the talk.


Machine Learning Tutorial with Python, Jupyter, KSQL and TensorFlow

#artificialintelligence

When Michelangelo started, the most urgent and highest impact use cases were some very high scale problems, which led us to build around Apache Spark (for large-scale data processing and model training) and Java (for low latency, high throughput online serving). This structure worked well for production training and deployment of many models but left a lot to be desired in terms of overhead, flexibility, and ease of use, especially during early prototyping and experimentation [where Notebooks and Python shine]. Uber expanded Michelangelo "to serve any kind of Python model from any source to support other Machine Learning and Deep Learning frameworks like PyTorch and TensorFlow [instead of just using Spark for everything]." So why did Uber (and many other tech companies) build its own platform and framework-independent machine learning infrastructure? The posts How to Build and Deploy Scalable Machine Learning in Production with Apache Kafka and Using Apache Kafka to Drive Cutting-Edge Machine Learning describe the benefits of leveraging the Apache Kafka ecosystem as a central, scalable, and mission-critical nervous system. It allows real-time data ingestion, processing, model deployment, and monitoring in a reliable and scalable way. This post focuses on how the Kafka ecosystem can help solve the impedance mismatch between data scientists, data engineers, and production engineers. By leveraging it to build your own scalable machine learning infrastructure and also make your data scientists happy, you can solve the same problems for which Uber built its own ML platform, Michelangelo.


Machine Learning With Python, Jupyter, KSQL, and TensorFlow - DZone AI

#artificialintelligence

Uber expanded Michelangelo "to serve any kind of Python model from any source to support other Machine Learning and Deep Learning frameworks like PyTorch and TensorFlow [instead of just using Spark for everything]." So why did Uber (and many other tech companies) build its own platform and framework-independent machine learning infrastructure? The posts How to Build and Deploy Scalable Machine Learning in Production with Apache Kafka and Using Apache Kafka to Drive Cutting-Edge Machine Learning describe the benefits of leveraging the Apache Kafka ecosystem as a central, scalable, and mission-critical nervous system. It allows real-time data ingestion, processing, model deployment, and monitoring in a reliable and scalable way. This post focuses on how the Kafka ecosystem can help solve the impedance mismatch between data scientists, data engineers, and production engineers. By leveraging it to build your own scalable machine learning infrastructure and also make your data scientists happy, you can solve the same problems for which Uber built its own ML platform, Michelangelo. Based on what I've seen in the field, an impedance mismatch between data scientists, data engineers, and production engineers is the main reason why companies struggle to bring analytic models into production to add business value.


Streaming Machine Learning with Tiered Storage

#artificialintelligence

Both approaches have their pros and cons. The blog post Machine Learning and Real-Time Analytics in Apache Kafka Applications and the Kafka Summit presentation Event-Driven Model Serving: Stream Processing vs. RPC with Kafka and TensorFlow discuss this in detail. There are more and more applications where the analytic model is directly embedded into the event streaming application, making it robust, decoupled, and optimized for performance and latency. The model can be loaded into the application when starting it up (e.g., using the TensorFlow Java API). Model management (including versioning) depends on your build pipeline and DevOps strategy. For example, new models can be embedded into a new Kubernetes pod which simply replaces the old pod. Another commonly used option is to send newly trained models (or just the updated weights or hyperparameters) as a Kafka message to a Kafka topic.


Machine Learning Infrastructure for Extreme Scale With the Apache Kafka Open-Source Ecosystem - DZone AI

#artificialintelligence

I had a new talk presented at Codemotion Amsterdam 2018 this week. I discussed the relation of Apache Kafka and machine learning to build a machine learning infrastructure for extreme scale. As always, I want to share the slide deck. The talk was also recorded. I will share the video as soon as it is published by the organizer.