Problem
You are downloading artifacts from models when you get an API error message. The error message indicates that the API request to list artifacts for a specific model version has failed, due to too many 429 error responses.
Max retries exceeded with url: /api/2.0/mlflow/model-versions/list-artifacts?name=model_name&version=version_number&path=some/path (Caused by ResponseError('too many 429 error responses'))")', 'some/path/logger.py': 'MlflowException("API request to https://<your-databricks-workspace>/api/2.0/mlflow/model-versions/list-artifacts failed with exception HTTPSConnectionPool(host='<your-databricks-workspace>', port=443): Max retries exceeded with url: /api/2.0/mlflow/model-versions/list-artifacts?name=model_name&version=version_number&path=some/path (Caused by ResponseError('too many 429 error responses'))")'
Cause
The rate limit for the MLflow Workspace Model Registry API is set to 40 queries per second, per workspace. When the rate limit is exceeded, the API returns a 429 error response. This error can occur when multiple jobs or processes are attempting to download artifacts from the same model version simultaneously, causing the rate limit to be exceeded.
For more information, review the Resource limits (AWS | Azure | GCP) documentation.
Solution
To work around this issue, you can set timeout and retry environment variables in your job clusters.
- Click Workflows in the left navigation bar.
- Click the name of the job you want to edit.
- Click the Edit icon (looks like a pencil) in the Cluster field.
- Click Advanced options to expand the section.
- Add the following lines to the Environment variables field:
MLFLOW_HTTP_REQUEST_TIMEOUT=360
MLFLOW_HTTP_REQUEST_BACKOFF_FACTOR=5
MLFLOW_HTTP_REQUEST_MAX_RETRIES=8
- Click Confirm.
- Restart your job.
The additional environment variables space out requests on the /api/2.0/mlflow/model-versions/list-artifacts
endpoint that is hitting the rate limit.
-
MLFLOW_HTTP_REQUEST_TIMEOUT
sets the maximum time in seconds to wait for a request to complete. -
MLFLOW_HTTP_REQUEST_BACKOFF_FACTOR
sets the backoff factor to apply between retry attempts. -
MLFLOW_HTTP_REQUEST_MAX_RETRIES
sets the maximum number of retries to attempt before giving up.
In addition to setting these environment variables, you also consider these best practices to avoid hitting the rate limit:
- Limit the number of concurrent jobs or processes that are accessing the same model version.
- Use versioning to create new versions of models instead of modifying the same version.
- Use the MLflow API to list artifacts for a model version instead of downloading them directly.
For more information, review the MLflow API (AWS | Azure | GCP) documentation.