Handling case sensitivity issues in Delta Lake nested fields

Set a specific property in your Spark configuration to handle the case sensitivity of nested fields in Delta tables.

Written by Rajeev kannan Thangaiah

Last published at: September 12th, 2024

Problem

Apache Spark streaming jobs in Delta Lake may fail with errors indicating that the input schema contains nested fields that are capitalized differently than the target table.

[DELTA_NESTED_FIELDS_NEED_RENAME] 

 

The input schema contains nested fields that are capitalized differently than the target table. They need to be renamed to avoid the loss of data in these fields while writing to Delta.

Spark generally ignoring case in data columns is distinct from this error.

Note

This article applies to Databricks Runtime 14.3 and below.

 

Cause

While top-level fields in Delta Lake are case insensitive, nested fields must match the case exactly as defined in the table schema.

Solution

Set a specific property in your Spark configuration to handle the case sensitivity of nested fields in Delta tables.

Set the following property in your Spark configuration, which corrects the case of nested field names automatically to match the target table's schema.

spark.conf.set("spark.databricks.delta.nestedFieldNormalizationPolicy", "cast")

 

For further information, please review the Error classes in Databricks (AWSAzure) documentation.

 

Was this article helpful?