Trustworthy AI: Operationalizing AI Models with Governance – Part 2

#artificialintelligence 

Model Deployment in Production – Validated models are deployed in production. Now various applications can call the model to get predictions by sending scoring requests over technology-independent standard protocols like Rest/HTTP. There could be two types of deployments – online (synchronous access) and batch (asynchronous access). The online deployment of a model needs model execution runtime to run continuously so that the model can be accessed in a synchronous manner for a single prediction request (or a small set of prediction requests, aka micro-batch). This is typically used in use cases that need prediction from the model in real-time; for example, online transaction fraud prediction, intent identification for chatbots, etc. Batch deployment of models needs an infrastructure that can spawn the runtime on-demand and stop when predictions for all batch scoring requests are generated.