Problem
You want to query the system.compute.clusters and system.billing.usage tables to gain insights into DBU consumption per cluster and identify all-purpose and job clusters within your workspace.
Cause
You may want these insights for multiple reasons, including old deleted clusters, potential field variations, and SQL query adjustments.
Deleted clusters
Clusters deleted before specific dates may not be included in the query results.
Field variations
Differences in the change_time and delete_time fields could sometimes cause cluster information to appear differently than expected.
SQL query adjustments
Queries may need to incorporate recent changes or deletions in cluster data to provide more accurate results.
Solution
These steps can be implemented to achieve more refined results by utilizing multiple system tables.
Enable required system schemas
- While the 
system.billingschema is enabled by default, thesystem.computeschema needs to be enabled manually. - For more information, please review the Enable system table schemas (AWS | Azure | GCP) documentation.
 
Refine SQL queries
- Adjust queries to include recent cluster changes and take deleted clusters into account for more accurate results.
 - Applying filters and joins using fields like 
change_timeanddelete_timecan help provide a comprehensive view of cluster states. 
Ensure accurate results from the system.compute.clusters table
- Ensure your SQL query correctly accounts for the latest changes and deletions in cluster data.
 
Example code
In this example code block, we are fetching the clusters present in the workspace created via UI and API.
SELECT DISTINCT 
    c.workspace_id, 
    c.cluster_id, 
    c.cluster_name, 
    MAX(c.create_time) AS first_created, 
    MAX(c.change_time) AS last_changed, 
    c.driver_node_type, 
    c.worker_node_type, 
    c.min_autoscale_workers, 
    c.max_autoscale_workers, 
    c.auto_termination_minutes, 
    c.dbr_version, 
    c.cluster_source
FROM 
    system.compute.clusters c
INNER JOIN (
    SELECT 
        cluster_id, 
        MAX(change_time) AS max_change_time 
    FROM 
        system.compute.clusters 
    GROUP BY 
        cluster_id
) m 
    ON c.cluster_id = m.cluster_id 
    AND c.change_time = m.max_change_time
WHERE 
    c.delete_time IS NULL 
    AND c.cluster_source IN ('UI', 'API') 
    AND c.workspace_id = <enter-your-workspace-id>
GROUP BY 
    c.workspace_id, 
    c.cluster_id, 
    c.cluster_name, 
    c.driver_node_type, 
    c.worker_node_type, 
    c.min_autoscale_workers, 
    c.max_autoscale_workers, 
    c.auto_termination_minutes, 
    c.dbr_version, 
    c.cluster_source
ORDER BY 
    c.workspace_id, 
    c.cluster_id;- Retrieve DBU consumption per cluster.
 
Example code
In this example code block, we are fetching the DBUs consumed per cluster on a monthly basis.
SELECT 
    date_format(u.usage_date, 'yyyy-MM') AS `Month`, 
    c.cluster_name AS `Cluster Name`, 
    c.cluster_id AS `Cluster ID`, 
    SUM(u.usage_quantity) AS `Total DBUs Consumed`
FROM 
    system.billing.usage u
INNER JOIN 
    system.compute.clusters c 
    ON u.usage_metadata.cluster_id = c.cluster_id
WHERE 
    c.cluster_name LIKE <cluster-name>
    AND c.cluster_id LIKE <cluster-id>
GROUP BY 
    date_format(u.usage_date, 'yyyy-MM'), 
    c.cluster_name, 
    c.cluster_id
ORDER BY 
    `Month` ASC, 
    `Cluster Name` ASC;- Verify that the 
delete_timefield is correctly handled in your queries to exclude deleted clusters. 
SELECT * FROM system.compute.clusters WHERE delete_time IS NULL;- Verify the sample queries with known data to ensure they return the expected results. If needed, customize the examples based on your specific workspace and cluster configurations.
 
For more information regarding the system.compute.clusters table, please review the Compute system tables reference (AWS | Azure | GCP) documentation.