Problem
When joining two dataframes, joining a timestamp column with a date column results in null values.
Example
In this example, start_timestamp is of timestamp data type, and start_date is of date data type.
select * from table1
left join table2
on
table1.start_timestamp = table2.start_dateCause
A join between a timestamp and a date column will produce non-null results only if the time in the timestamp column is 00:00:00 UTC.
Additionally, if spark.sql.session.timeZone is set to a timezone other than UTC, 00:00:00 UTC is converted to the time as per the set timezone, leading to null results during the join.
Solution
Cast the value of the timestamp column to date datatype when joining it with a column of 'date' datatype.
select * from table1 t1
left join table2 t2
on
to_date(t1.start_timestamp, 'yyyy-MM-dd') = t2.start_date