Optimize Response Time of your Machine Learning API In Production - KDnuggets
This article demonstrates how building a smarter API serving Deep Learning models minimizes the response time. Your team worked hard to build a Deep Learning model for a given task (let's say: detecting bought products in a store thanks to Computer Vision). You then developed and deployed an API that integrates this model (let's keep our example: self-checkout machines would call this API). The new product is working well and you feel like all the work is done. But since the manager decided to install more self-checkout machines (I really like this example), users have started to complain about the huge latency that occurs each time they are scanning a product. Ask data scientists to try reducing the depth of the model without degrading its accuracy?
May-5-2020, 08:44:24 GMT
- Technology: