In order to do machine learning engineering, a model must first be deployed, in most cases as a prediction API. In order to make this API work in production, model serving infrastructure must first be built. This includes load balancing, scaling, monitoring, updating, and much more. At first glance, all of this work seems familiar. Web developers and DevOps engineers have been automating microservice infrastructure for years now.
The reality is that for many of applied machine learning's use cases, there is no need to train a new model from scratch. For example, if you are developing a conversational agent, Google's Meena is almost certainly going to outperform your model. If you're developing a text generator, you should use OpenAI's GPT-2 instead of building your own from scratch. For object detection, a model like YOLOv3 is probably your best bet. Thanks to transfer learning--a process in which the "knowledge" of a neural network is fine tuned to a new domain--you can take a relatively small amount of data and fine tune these open source, state-of-the-art models to your task.
Automate your ML model train-deploy cycle, garbage collection, and rollbacks, all from Python with an open-source PyPi package based on Cortex. It all started with modernization of a product categorization project. The goal was to replace complex low-level Docker commands with a very simple and user-friendly deployment utility called Cortex. The solution in the form of a Python package proved to be re-usable since we successfully used it as part of our recommendation engine project. We plan to deploy all ML projects like this. Since GLAMI relies heavily on open-source software, we wanted to contribute back and decided to open-source the package, calling it Cortex Serving Client.
Cortex is an open source platform that takes machine learning models--trained with nearly any framework--and turns them into production web APIs in one command. Autoscaling: Cortex automatically scales APIs to handle production workloads. Multi framework: Cortex supports TensorFlow, PyTorch, scikit-learn, XGBoost, and more. CPU / GPU support: Cortex can run inference on CPU or GPU infrastructure. Rolling updates: Cortex updates deployed APIs without any downtime.
For the last 20 years, machine learning has been about one question: Can we train a model to do something? Something, of course, can be any task. Predict the next word in a sentence, recognize faces in a photo, generate a certain sound. The goal was to see if machine learning worked, if we could make accurate predictions. What can we build with these models, and how can we do it?