Problem
Unity Catalog enforces resource quotas (AWS | Azure) on all securable objects, which limits the number of resources that can be created. Quotas are expressed in terms of a resource type and a parent – for example, tables per metastore or schemas per catalog. When you use the List all resource quotas under a metastore (AWS | Azure) workspace API, the default max_results
for the number of quotas to return is 500
, but you want to see all the quotas across all services.
Cause
You want to analyze overall usage together, not just the quotas for a specific service.
Solution
Use Python programming to persist the API results for each batch to a Unity Catalog table, then use Apache Spark SQL to analyze based on your needs.
Add all the resource quotas to a list by calling the ListQuotas API recursively. Note, it may take some time to execute. You can try to speed up execution by increasing the max_results parameter.
Example code
Databricks recommends using OAuth Machine-to-Machine (M2M) authentication when using the REST API. For more information, review the Authorize unattended access to Databricks resources with a service principal using OAuth (AWS | Azure) documentation. This documentation explains how to use the client_id
and client_secret
to generate an access token.
In the example code, we store client_id
and client_secret
as secrets. You must replace the following before running the example code:
-
<oidc-scope>
- This is the scope for where you store the client id and secret -
<oidc-client-id>
- This is the key for storing your client id secret value -
<oidc-client-secret>
- This is the key for storing your client secret value
- First, you must generate an access token so you can make REST API calls.
%python
import requests
client_id = dbutils.secrets.get(scope="<oidc-scope>", key="<oidc-client-id>")
client_secret = dbutils.secrets.get(scope="<oidc-scope>", key="<oidc-client-secret>")
workspace_url = spark.conf.get("spark.databricks.workspaceUrl")
token_endpoint_url = f"https://{workspace_url}/oidc/v1/token"
if not client_id or not client_secret:
raise ValueError("CLIENT_ID or CLIENT_SECRET environment variables are not set.")
try:
response = requests.post(
token_endpoint_url,
auth=(client_id, client_secret),
data={"grant_type": "client_credentials", "scope": "all-apis"},
)
response.raise_for_status()
token_data = response.json()
except requests.exceptions.RequestException as e:
print(f"Error: {e}")
except ValueError as e:
print(f"Error: {e}")
except KeyError as e:
print(f"Error: Key {e} not found in response json")
- Once you obtain the
access_token
, use it to callGET resource-quotas/all-resource-quotas
to get all the quotas with 100 items returned on each batch. You can adjust themax_results
value (default to100
in the example code) to control the request speed.
import time
headers = {"Authentication": f'Bearer {token_data["access_token"]}'}
next_page = None
max_results = 100
results = []
while True:
payload = {"max_results": max_results, "page_token": next_page}
r = requests.get(
f"https://{workspace_url}/api/2.1/unity-catalog/resource-quotas/all-resource-quotas",
headers=headers,
params=payload,
).json()
results.extend(r["quotas"])
if "next_page_token" not in r:
break
next_page = r["next_page_token"]
time.sleep(0.5)
print(len(results))
- Convert the results to a Spark Dataframe and register it as a Unity Catalog table.
%python
# Convert list of dictionaries to Spark DataFrame
df = spark.createDataFrame(results)
df.createOrReplaceTempView("quotas")
%sql
CREATE OR REPLACE TABLE `catalog`.`schema`.uc_quotas
AS (
SELECT * FROM quotas
)
- After persisting all the resource quotas to a Unity Catalog table, use a Spark SQL to query the data and analyze your specific needs.
%sql
SELECT
*
FROM
`catalog`.`schema`.uc_quotas
WHERE quota_name = "registered-model-alias-quota"