Problem
When working with Delta Live Tables, you should encounter an issue when a string value is cast to a date/timestamp datatype, but do not.
Example
A sample DLT pipeline definition:
@dlt.table(
name='test',
temporary=True
)
def df_src():
return (
spark.sql("select 101 as id, cast('test' as Date) as dt from <source-table>")
)Cause
The Delta Live Tables pipeline inserts NULLs in the target table for each string cast to a date/timestamp datatype instead of failing at the analysis phase.
Note
This occurs in Delta Live Tables in both CURRENT and PREVIEW channels but not on Databricks Runtime.
Solution
Configure the Delta Live Tables pipeline to enforce ANSI SQL compliance by setting the spark.sql.ansi.enabled parameter to true.
spark.conf.set("spark.sql.ansi.enabled", "true")As an alternative, you can add this configuration in your pipeline settings by clicking Add Configuration under the Advanced section.
The pipeline now fails as it should.
org.apache.spark.SparkDateTimeException: [CAST_INVALID_INPUT]
The value test of the type STRING cannot be cast to DATE because it is malformed. Correct the value as per the syntax, or change its target type. Use try_cast to tolerate malformed input and return NULL instead. If necessary, set spark.sql.ansi.enabled to false to bypass this error.
select 101 as id, cast('test' as Date) as dt from <source-table>