Drop operation fails with duplicate column(s) during save

Separate the column selection and drop() operations.

Written by abinaya.jayaprakasam

Last published at: August 19th, 2025

Problem

While working in Databricks 15.4 LTS, you try to conduct PySpark DataFrame operations such as chaining joinselectExpr, and drop().

df_conv_calc = df2.alias(“df2Alias”) \
.join(df1.alias(“df1”),...) \
.selectExpr(…
,“df2Alias.colB1 as colC1”
… ).drop(df2.colB2)

 

The code fails with the following error. 

[DELTA_DUPLICATE_COLUMNS_FOUND] Found duplicate column(s) in the data to save: <column-name> SQLSTATE: 42711 

 

The following image shows the code in a notebook and the error message generated.

 

You notice the issue only after the maintenance release 15.4.20. 

 

Cause

The maintenance release 15.4.20 reflects a shift to stricter and more consistent handling of columns. This includes a change in the way drop() is handled, especially when used together with column aliasing followed by an immediate drop in the same expression. 

 

Solution

Separate the column selection and drop() operations.

 

First, select the required columns.

df_conv_calc = df2.alias(“df2Alias”) \
.join(df1.alias(“df1”),...) \
.selectExpr(…
,“df2Alias.colB1 as colC1”
… )

 

Then on a new line, drop the column(s). 

df_conv_calc = df_conv_calc.drop(df_conv_calc[“colC2”])

 

The following image shows the code in a notebook and the expected output.