Problem
You have a streaming job that is ingesting data from a Delta table. Some columns in the Delta table may have been renamed or dropped (schema evolution) and you get a StreamingQueryException: [STREAM_FAILED]
error message.
StreamingQueryException: [STREAM_FAILED] Query [id = XXX, runId = XXXX] terminated with exception: The schema, table configuration or protocol of your Delta table has changed during streaming.The schema or metadata tracking log has been updated.Please restart the stream to continue processing using the updated metadata.
Cause
If you add, drop, or rename any column in the source table, the streaming job fails.
Solution
Update the required schema definition either at source or target and restart the streaming query to continue processing the job.
- For non additive schema changes such as rename or dropping columns, enable schema tracking. For the scenario to work, the schema must be specified, and each streaming read against a data source must have its own
schemaTrackingLocation
specified. For more information, review the Rename and drop columns with Delta Lake column mapping (AWS | Azure | GCP) documentation. This ensures that schema changes are properly tracked. - Set
spark.databricks.delta.streaming.allowSourceColumnRenameAndDrop
to true. - Restart the streaming query.
Note
This is supported in Databricks Runtime 13.3 LTS and above. If your workflow has non-additive schema changes such as renaming or dropping columns, this configuration is a good choice. Otherwise this configuration is not needed.