Deployments can be deployed in to ways:
This article describes the differences and similarities between the two different deployment options and helps to decide which option suits your situation.
Both Non-serverless and Serverless deployment methods provide automatic scaling to incoming traffic. This means that, as your model is being used more frequently, more instances of your model will be deployed to deal with the additional traffic. On the other hand, if there are more instances than necessary to serve the incoming traffic, your deployment will scale down.
Non-serverless deployments are ideal for real-time applications. As they are always-on and always consume resources, these deployments can instantly respond to incoming requests.
Choose this option if your applications is a real-time application.
Serverless deployments are deployments that do not always consume resources. They boot up when the first request comes in, and completely shut down if no request comes in for a while.
This approach saves costs when you are not using the deployment to fetch predictions, however, the main drawback of this approach is something called a "cold-start". This basically means that when the first request comes in, it can take a while for the application to serve the first request, as it is still booting up.
Choose this option if your application is not real-time (e.g. runs once a day).
Which should I choose?
If your applications is a real-time application, choose the Non-serverless deployment. If your application is not real-time (e.g. runs once a day), go with the Serverless option.
If, for whatever reason you're not sure, we advice the Non-serverless option.