Apache Airflow-triggered jobs terminating before completing

Increase the job timeout threshold on the Airflow side.

Written by allia.khosla

Last published at: June 18th, 2025

Problem

You have a job triggered through Apache Airflow using the DatabricksRunNowOperator that cancels after running for X hours (where X is the timeout value set through Airflow) even though the job is not complete. 

 

Cause

The Airflow DatabricksRunNowOperator operator uses the X hours configuration to determine the job’s length and when to stop, regardless of whether the job is complete.

 

If you have access to audit logs, you can see the cancellation request is sent by the Airflow operator, confirming the issue lies in the Airflow configuration rather than the Databricks job settings.

 

Audit logs snippet

{"version":"2.0","auditLevel":"WORKSPACE_LEVEL","timestamp":1746505827573,"orgId":"<org-id>","shardName":"<shard-name>","accountId":"xxxxxxxxxxxxxxxx","sourceIPAddress":"<source-ip-address>","userAgent":"databricks-airflow/6.7.0 _/0.0.0 python/3.11.11 os/linux airflow/2.9.3+astro.11 operator/DatabricksRunNowOperator","sessionId":null,"userIdentity":{"email":"<email>","subjectName":null},"principal":{"resourceName":"accounts/xxxxxxxxxxxxxxxx"/users/<user>","uniqueName":"<email>","contextId":"<context-id>","displayName":"Data Engineering"},"authorizeAs":{"resourceName":"accounts/xxxxxxxxxxxxxxxx"/users/<user>","uniqueName":"<email>","displayName":"Data Engineering","activatingResourceName":null},"serviceName":"jobs","actionName":"cancel","requestId":"<request-id>","requestParams":{"run_id":"<run-id>"},"response":{"statusCode":200,"errorMessage":null,"result":"{}"}}

 

Note

f you do not have audit logs configured for your workspace and you are on a premium plan or above, you can follow the instructions in the Audit log reference (AWSAzureGCP) documentation to configure them.

 

 

Solution

Increase the job timeout threshold on the Airflow side. 

 

  1. Review the Airflow Directed Acyclic Graph (DAG) that triggers the Databricks job. Look for the DatabricksRunNowOperator task and check its configuration.
  2. Adjust the parameter in DatabricksRunNowOperator that controls the timeout to a value beyond four hours. 
  3. Update your Airflow DAG with the adjusted timeout parameter and deploy the changes.
  4. After updating the DAG, trigger a new run and monitor the job to ensure it runs beyond the previous four-hour limit without being terminated.

 

For more information, review the Airflow Tasks documentation.