Problem
When saving a table as a CSV file or other text-based format, your empty string values are replaced with NULL values.
Cause
Empty string values are interpreted as NULL during the Serialization/Deserialization (SerDe) step when saving tables as CSV files or other text-based data formats which do not have a defined schema present.
NULL values may also appear when performing joins on tables if the join condition is not met.
Solution
Save your data in Delta format instead of CSV or text-based formats. Delta tables handle empty strings and NULL values more effectively, ensuring that empty strings are preserved during data insertion.
If you need to use CSV format, ensure:
- The options being used to both read and write can correctly handle Empty /
NULLvalues. - Any external systems reading the file can properly serialize the data as intended.
For more information, please review the Apache Spark CSV Files documentation.