INSERT OVERWRITE DIRECTORY with Hive format failing with “specified path already exists” error

Add USING PARQUET after the INSERT OVERWRITE DIRECTORY in your query.

Written by vinay.mr

Last published at: June 27th, 2025

Problem

You’re using Databricks Runtime 12.2 LTS or above on a No Isolation Shared compute to perform an INSERT OVERWRITE DIRECTORY operation with Hive format. The following code is an example.

INSERT OVERWRITE DIRECTORY '<directory-name>' SELECT * FROM <catalog>.<schema>.<table> 

 

The operation fails with the following error message. 

SparkException: Failed inserting overwrite directory <directory-name>
...
Caused by: Operation failed: "The specified path already exists.", 409,PUT,
<cloud-provider>://<bucket-name>/<folder-1>/<folder-2>/<folder-3>/<folder-4>/<file-name>?resource=file&timeout=90, PathAlreadyExists, "The specified path already exists. RequestId:xxxxxxxxxxx Time:yyyy-mm-ddThh:mm:ssZ"

 

Cause

As of Databricks Runtime 12.2 LTS, Hadoop Distributed File System (HDFS) has modified its handling of directory overwriting operations when using Hive format table queries. When a directory already exists before executing INSERT OVERWRITE DIRECTORY with Hive format, the error occurs.

 

Solution

Use a Parquet directory write instead. Add USING PARQUET after the INSERT OVERWRITE DIRECTORY as shown in the following example.

INSERT OVERWRITE DIRECTORY '<directory-name>' USING PARQUET SELECT * FROM <catalog>.<schema>.<table>