Goto

Collaborating Authors

Apache Kafka and the four challenges of production machine learning systems

#artificialintelligence

Machine learning has become mainstream, and suddenly businesses everywhere are looking to build systems that use it to optimize aspects of their product, processes or customer experience.


Apache Kafka and the four challenges of production machine learning systems

#artificialintelligence

To learn more about cutting-edge data science tools like Apache Kafka, check out the Strata Data Conference in Singapore, Dec. 4-7, 2017--early price ends October 20.


Managing Spark and Kafka Pipelines - Silicon Valley Data Science

@machinelearnbot

Do you fully understand how your systems operate? As an engineer, there is a lot you can do to aid the person who is going to manage your application in the future. In a previous post we covered how exposing the tuning knobs of the underlying technologies to operations will go a long way to making your application successful. Your application is a unique project--it's easier for you to learn the operational aspects of the underlying technologies, than for others to learn the specifics of all the applications. Notice I said "the person who is going to manage your application in the future" and not "operations."


How to use Apache Kafka to transform a batch pipeline into a real-time one

#artificialintelligence

If you want to learn about Avro and the Schema Registry, see my course here! All the instructions to run the project are in GitHub, but here is the output you will see.


Introducing Chaperone: How Uber Engineering Audits Kafka End-to-End - Uber Engineering Blog

@machinelearnbot

Operating Kafka at Uber's scale almost instantaneously for many downstream consumers is difficult. We use batching aggressively and rely on asynchronous processing wherever possible for high throughput. Services use in-house client libraries to publish messages to Kafka proxies, which batch and forward them to regional Kafka clusters. Some Kafka topics are directly consumed from regional clusters, while many others are combined with data from other data centers into an aggregate Kafka cluster using uReplicator for scalable stream or batch processing.