In today’s world, Artificial Intelligence (AI) and TorchServe is widely used across a plethora of domains such as e-commerce, automobile engineering, medicine, smart farming, electronics, and cybersecurity. Model serving refers to the hosting of AI and machine learning (ML) models on the cloud or on-premises and exposing their functionality via application programming interfaces (APIs) that can be integrated into systems to become AI-enabled. MLOps (Machine Learning Operations) is the set of practices focused on operationalizing, deploying, and maintaining machine learning models in production.
There are several model-serving frameworks in use. Some examples include Kserve [1], Seldon Core [2], and BentoML [3]. In this blog, we will learn how to serve and deploy Akaike’s Detectron models (object detection models) using TorchServe [4], a model-serving framework for PyTorch [5], which is an open-source ML framework based on the Python programming language and the Torch library (an open-source ML library) [6]. We will use TorchServe to serve the trained transfer learning model and outline the steps that are to be followed prior to and after serving the model.
Using TorchServe
Prior to TorchServe, Akaike had its own, custom model serving solution to serve PyTorch models. This required custom handlers for the models, a model server, a Docker container, a mechanism for accessibility over the network, and integration with the cluster orchestration system. With the launch of TorchServe in 2020, Akaike shifted to using it to serve its PyTorch models. TorchServe facilitates the deployment of PyTorch models in a performant manner, at scale, without the need for custom handlers. A few lines of code are all that you need, to move from a trained model to production deployment!
Figure 1 shows the architecture of the TorchServe model serving framework.
Figure 1: TorchServe Architecture
Deploying a Model on TorchServe
To deploy a model we need to do the following:
- Create a MAR (Model ARchive) file for the model to be deployed using the torch-model-archiver. The MAR file should include the initializing, pre-processing, inference, and post-processing steps, along with the model.
- Once the MAR files of all your models are created, run these using the torch serve-CLI (TorchServe Command Line Interface).
- Obtain results using the curl command or python-requests. For example
- $ curl http://localhost:8080/predictions/resnet-18 -T kitten_small.jpg
- Here, http://localhost:8080/predictions/ is the location where the model has been hosted, resnet-18 is the model name, “-T” is for uploading an image to this endpoint, and “kitten_small.jpg” is the file name.
- JSON data containing the results of the model is returned.
Deploying the Detectron Model
To deploy the detectron model, we first need to create its MAR file.
If you are new to detection, please use this link to gain a better understanding. Follow this notebook for creating a MAR file for detection.
Deploying on Docker Compose
To deploy on Docker Compose, create a docker-compose.yml file for building docker-compose
version: ‘3.8’ services: torchserve: build: . ports: – 9095:8085 command: torchserve –start –foreground –model-store model_store –models my-model-1=my-model-1.mar my-model-2=my-model-2.mar my-model-3=my-model-3.mar –ts-config config.properties restart: always volumes: – ./model_store:/home/model_store |
Typically if you are using celery you would have multiple services in docker-compose like a flower, Redis, web, and worker. Please see this to know more about dockerizing celery.
Explanation of fields in docker-compose.yml:
- ports:
- – 9095:8085
- Here 8085 is a torch-serving port for inferring models, and 9095 is a VM port. Both of them are connected, the client can utilize this API by hitting on the 9095 port of the VM.
- command: torchserve –start –foreground –model-store model_store –models my-model-1=my-model-1.mar my-model-2=my-model-2.mar my-model-3=my-model-3.mar –ts-config config.properties
- To make this service active permanently we add “–foreground”
- You need to give the path of the model store using “–model-store [path of model store folder]”
- You can host multiple models using “–models [model name]=[mar file of model]”
- If you want to change the torchserve port or want to do some changes in the configuration of the hosting service. You can use –ts-config “–ts-config [path to config.properties]”
- Content of config.properties
- inference_address=http://0.0.0.0:8085
- default_workers_per_model=1
- Here, the inference address is changed to port 8085 and number of workers per model is changed to 1.
Testing the Model Serving
Use the steps below to execute and test your model.
- After creating the MAR file, you need to run the following command: “torch serve — start — model-store model_store — models detection-model=detectron-model.mar”
- Now, use either of the following methods to obtain the results of your model
- “curl http://localhost:8080/predictions/detectron-model -T image.jpg”, or
- “requests.put(torchserve_url,data=image_payload,headers={“Content-Type”: “image/jpeg”}”
- Here, torchserve_url is http://localhost:8080/predictions/detectron-model and image_payload is a binary image.
- For converting the image to binary you can use the code snippet below.
Figure 2: Code to convert image to binary
Benefits of TorchServe
Deploying models through a custom model server requires converting them to appropriate formats, which is time-consuming and burdensome. Using TorchServe, we can simplify model deployment using a single servable file that also serves as the single source of truth and is easy to share and manage. To summarise, it comes with the following advantages:
- Simultaneous and seamless handling of multiple requests
- Compatibility with both CPU and GPU systems
- Easy to use in production (results can be obtained through a single API call)
- Performant and scalable (the number of works can be configured through a management API)
- Loading and serving multiple models in parallel.
Conclusion
In this blog, we saw how TorchServe could be used to serve PyTorch models in an easy and consistent manner using the example of a detectron model. Being fully integrated with PyTorch, it is the recommended framework for serving PyTorch-based ML models in production environments. Both developers and operations engineers can leverage TorchServe to prepare models for production. With TorchServe, we can deploy models in a performant and scalable manner, without cumbersome code modification through the eager mode and TorchScript.
References
- Kserve Model serving using kserve – [https://www.kubeflow.org/docs/external-add-ons/kserve/kserve/]
- Seldon Core – [https://betterprogramming.pub/serving-deploying-ml-models-with-seldon-core-on-kubernetes-2022-eb459bb4d47a]
- BentoML: A Unified Model Serving Framework, https://docs.bentoml.org/en/latest/
Edited By : Naga Vydyanathan