MLflow API 429 errors when transitioning models

Add retry logic with exponential backoff to avoid hitting the rate limit.

Last published at: December 2nd, 2024

Problem

You are downloading artifacts from models when you get an API error message. The error message indicates that the API request to list artifacts for a specific model version has failed, due to too many 429 error responses.

Max retries exceeded with url: /api/2.0/mlflow/model-versions/list-artifacts?name=model_name&version=version_number&path=some/path (Caused by ResponseError('too many 429 error responses'))")', 'some/path/logger.py': 'MlflowException("API request to https://<your-databricks-workspace>/api/2.0/mlflow/model-versions/list-artifacts failed with exception HTTPSConnectionPool(host='<your-databricks-workspace>', port=443): Max retries exceeded with url: /api/2.0/mlflow/model-versions/list-artifacts?name=model_name&version=version_number&path=some/path (Caused by ResponseError('too many 429 error responses'))")'

Cause

The rate limit for the MLflow Workspace Model Registry API is set to 40 queries per second, per workspace. When the rate limit is exceeded, the API returns a 429 error response. This error can occur when multiple jobs or processes are attempting to download artifacts from the same model version simultaneously, causing the rate limit to be exceeded.

For more information, review the Resource limits (AWS | Azure | GCP) documentation.

Solution

To work around this issue, you can set timeout and retry environment variables in your job clusters.

Click Workflows in the left navigation bar.
Click the name of the job you want to edit.
Click the Edit icon (looks like a pencil) in the Cluster field.
Click Advanced options to expand the section.
Add the following lines to the Environment variables field:
- MLFLOW_HTTP_REQUEST_TIMEOUT=360
- MLFLOW_HTTP_REQUEST_BACKOFF_FACTOR=5
- MLFLOW_HTTP_REQUEST_MAX_RETRIES=8
Click Confirm.
Restart your job.

The additional environment variables space out requests on the /api/2.0/mlflow/model-versions/list-artifacts endpoint that is hitting the rate limit.

MLFLOW_HTTP_REQUEST_TIMEOUT sets the maximum time in seconds to wait for a request to complete.
MLFLOW_HTTP_REQUEST_BACKOFF_FACTOR sets the backoff factor to apply between retry attempts.
MLFLOW_HTTP_REQUEST_MAX_RETRIES sets the maximum number of retries to attempt before giving up.

In addition to setting these environment variables, you also consider these best practices to avoid hitting the rate limit:

Limit the number of concurrent jobs or processes that are accessing the same model version.
Use versioning to create new versions of models instead of modifying the same version.
Use the MLflow API to list artifacts for a model version instead of downloading them directly.

For more information, review the MLflow API (AWS | Azure | GCP) documentation.

Databricks Help Center

Problem

Cause

Solution

Contact Us