Use snappy and zstd compression types in a Delta table without rewriting entire table

Test your compression type, generate, and insert sample records using zstd, then write the zstd files to your Delta table.

Written by chandan.kumar

Last published at: April 16th, 2025

Problem

You need to change the compression codec for writing data to improve storage efficiency. Rewriting the entire table is impractical, but you are concerned that switching may corrupt existing tables if you mix snappy-compressed files and zstd-compressed files in a Delta table.

 

Cause

Databricks allows different codecs, but you must ensure data integrity and readability aren't affected by the transition.

 

Solution

Follow the list of steps to test your compression type, generate and insert sample records using zstd, then write the zstd files to your Delta table. If you already have a Delta table, you can skip to step 4. 

  1. Read 10 numbers into a DataFrame. 
df = spark.range(0, 10).toDF("numbers") 
display(df)

 

  1. Persist the DataFrame records as a Delta table.
df.write.format('delta').mode("append").saveAsTable("<catalog-name>.<schema-name>.<table-name>")

 

  1. Describe detail using %sql describe detail catalog_name.schema_name.table_name to get the location of your Delta table, and run the %fs ls command on the table location to see the list of files. All files in the list are written by default in .snappy.parquet compression codec file format. 
%fs ls /<path-to-your-location-folder>/

 

  1. To switch the compression type from the default snappy to zstd, generate sample records and insert them into the table using either incremental updates or append.

 

df_zstd = spark.range(11, 20).toDF("numbers")
display(df_zstd)

 

  1. Write this new data to the existing Delta table by changing the compression type to zstd using incremental updates or append.
df_zstd.write.format('delta').option("compression","zstd").mode("append").saveAsTable("<catalog-name>.<schema-name>.<table-name>")

 

  1. Run the %fs ls command on the table location again to see the list of .snappy.parquet and newly changed .zstd.parquet files, which co-exist in the table location.
%fs ls /<path-to-your-location-folder>/

 

  1. Review the table to verify the records by running a select statement. Ensure you can read all the data without corrupting the table.  
%sql select * <catalog-name>.<schema-name>.<table-name>;