Introduction
You want to gather the total compute duration of a given cluster in their Databricks environment but don’t see a way to do it yourself.
Databricks provides system tables which you can query for various monitoring and reporting purposes. For more general information on system tables, refer to the Monitor account activity with system tables (AWS | Azure | GCP) documentation.
Instructions
To gather the total compute duration of a cluster, run the following SQL query in your SQL Analytics workspace. This query calculates the total cluster duration by summing the difference between the usage_start_time
and usage_end_time
for each usage record associated with the specified cluster ID.
Replace <cluster_id>
with the actual cluster ID for which you want to gather the total cluster duration.
SELECT
usage_metadata.<cluster_id>,
MIN(usage_start_time) AS creation_time,
MAX(usage_end_time) AS last_usage_time,
SUM(TIMESTAMPDIFF(SECOND, usage_start_time, usage_end_time)) /3600 AS total_usage_hours
FROM
system.billing.usage
WHERE
usage_metadata.cluster_id = '<cluster_id>' --optionally filter by clusterId
GROUP BY
usage_metadata.cluster_id;