Uber is one of those organizations that rely heavily on data. Each day, millions of trips take place in 700 cities across the world, generating information on traffic, preferred routes, estimated times of arrival/delivery, drop-off locations, and more that enables Uber to deliver a smooth riding experience to its customers. With access to the rich dataset coming from the cabs, drivers, and users, Uber has been investing in machine learning and artificial intelligence to enhance its business. Uber AI Labs consists of ML researchers and practitioners that translate the benefits of the state of the art machine learning techniques and advancements to Uber's core business. From computer vision to conversational AI to sensing and perception, Uber has successfully infused ML and AI into its ride-sharing platform.
Despite the hype surrounding machine learning and artificial intelligence(AI) most efforts in the enterprise remain in a pilot stage. Part of the reason for this phenomenon is the natural experimentation associated with machine learning projects but also there is a significant component related to the lack of maturity of machine learning architectures. This problem is particularly visible in enterprise environments in which the new application lifecycle management practices of modern machine learning solutions conflicts with corporate practices and regulatory requirements. What are the key architecture building blocks that organizations should put in place when adopting machine learning solutions? The answer is not very trivial but recently we have seen some efforts from research labs and AI data science that are starting to lay down the path of what can become reference architectures for large scale machine learning solutions.
Uber's services require real-world coordination between a wide range of customers, including driver-partners, riders, restaurants, and eaters. Accurately forecasting things like rider demand and ETAs enables this coordination, which makes our services work as seamlessly as possible. In an effort to constantly optimize our operations, serve our customers, and train our systems to perform better and better, we leverage machine learning (ML). In addition, we make many of our ML tools open source, sharing them with the community to advance the state of the art. In this spirit, members of our Seattle Engineering team shared their work at an April 2019 meetup on ML and AI at Uber.
Uber expanded Michelangelo "to serve any kind of Python model from any source to support other Machine Learning and Deep Learning frameworks like PyTorch and TensorFlow [instead of just using Spark for everything]." So why did Uber (and many other tech companies) build its own platform and framework-independent machine learning infrastructure? The posts How to Build and Deploy Scalable Machine Learning in Production with Apache Kafka and Using Apache Kafka to Drive Cutting-Edge Machine Learning describe the benefits of leveraging the Apache Kafka ecosystem as a central, scalable, and mission-critical nervous system. It allows real-time data ingestion, processing, model deployment, and monitoring in a reliable and scalable way. This post focuses on how the Kafka ecosystem can help solve the impedance mismatch between data scientists, data engineers, and production engineers. By leveraging it to build your own scalable machine learning infrastructure and also make your data scientists happy, you can solve the same problems for which Uber built its own ML platform, Michelangelo. Based on what I've seen in the field, an impedance mismatch between data scientists, data engineers, and production engineers is the main reason why companies struggle to bring analytic models into production to add business value.
Uber Engineering formally introduced its internal Machine Learning as a Service platform Michelangelo in a company blog post Tuesday. Uber began building the AI platform with a combination of open-source and in-house components in 2015 and now deploys it across company services such as UberEATs. Michelangelo covers end-to-end ML workflow and allows Uber teams to manage data; teach, evaluate and employ models; and create and track predictions. It also serves deep learning, time series forecasting and other machine learning models, and the company is focusing on improving developer productivity on the platform. Uber is not the only large company creating in-house machine learning platforms tailored to its needs.