As a data scientist, training your machine learning model is only a part of providing a solution for the client. Besides generating and cleaning the data, selecting and tuning the algorithms, you also need to deliver and deploy your results so that it is usable in production. This is a large field in itself with constantly evolving tools and standards. In this post, my goal is to present a practical guide on how to do this using the currently available state of the art tools and best practices. We are going to build a system which can serve as a starting point for your deployment tasks, regardless of the actual machine learning problem itself! Instead of a minimal app barely scratching the surface of the used tools, I aim to introduce best practices and demonstrate advanced features, so that you don't have to learn the hard way. Learning from your own mistakes is nice, but thinking ahead and not committing those mistakes is much better. To create our deployment-ready application, we will use two tools as our main building blocks: Docker and FastAPI.
Awesome new GitHub that comes with a full stack backend (fastAPI) and front-end (Streamlit). You can serve this framework out the box and it's a good choice to begin with if you want to venture out of the flask WSGI framework and get more into the ASGI side of things. When discussing conversational AI, RASA Co-Founder Alan Nichol discusses the 5 levels of chatbot tech that we all should be striving for, especially for the holy grail: Level 5 aka HAL 9000. The 5 levels of conversational AI are observed both via the end-user and the developer perspective. Interesting read to get a road map of where RASA thinks chatbots are heading and what is required to get HAL on Elon's Starship.
Kubernetes is a fantastic container orchestration tool for scalable cloud computing applications. That being said, we don't always need all of its batteries included. All of these details are crucial for managing a cloud solution, and some focus on this area as their main job. However, others want to focus on the application itself. In many small projects, you don't have a complex 80-bajillion-container behemoth requiring Kubernetes for orchestration.
Every package you'll see is free and open source software. Thank you to all the folks who create, support, and maintain these projects! If you're interested in learning about contributing fixes to open source projects, here's a good guide. And If you're interested in the foundations that support these projects, I wrote an overview here. Pandas is a workhorse to help you understand and manipulate your data.
Just because we can make models doesn't mean we are gods. It doesn't give us the freedom to write crap code. Since my start, I have made tremendous mistakes and thought of sharing what I see to be the most common skills for ML engineering. I call them software-illiterate data scientists because a lot of them are non-CS coursera baptised engineers. If it came to hiring between a great data scientist and a great ML engineer, I will hire the later.