Problem
You have a PyTorch model that you have already logged and registered in your workspace using MLflow. When loading it using the mlflow.pytorch.load_model()
function and passing inputs to perform predictions in your Databricks notebook, you notice after some time that the Cluster Metrics page shows 0% as GPU utilization for the cluster attached to the notebook.
Cause
You haven't loaded the model specifying the available GPU device when calling the mlflow.pytorch.load_model()
function.
Solution
The device parameter was added to the mlflow.pytorch.load_model()
function on Dec 27, 2023 to allow the model to be sent to the defined device when loading it. You can solve the issue by following this code snippet example.
# Get cpu or gpu for inference.
device = "cuda" if torch.cuda.is_available() else "cpu"
# Define the Model URI
logged_model = f"runs:/{<experiment_run_id>}/model"
loaded_model = mlflow.pytorch.load_model(model_uri=logged_model, device=device)