Resolve Spark directory structure conflicts

Fixing java.lang.AssertionError: assertion failed: Conflicting directory structures detected

Written by jayant.sharma

Last published at: April 26th, 2025

Problem

Your Databricks workflow fails due to an internal Apache Spark assertion error. 

 

Example code

This example code results in an assertion error.

%python

# Save the dataset to the table path (non-partitioned)
df.write.mode("overwrite").parquet("dbfs:/FileStore/Jayant/tableDir")

# Save the dataset again to a subdirectory path using `partitionBy` (partitioned)
df.write.mode("overwrite").partitionBy("year", "month").parquet("dbfs:/FileStore/Jayant/tableDir/2025/04")

# Reading this table using Spark results in an assertion error
spark.read.parquet("dbfs:/FileStore/Jayant/tableDir").display()

 

Error message

java.lang.AssertionError: assertion failed: Conflicting directory structures detected. Suspicious paths:
dbfs:/filestore/jayant/tabledir
dbfs:/filestore/jayant/tabledir/2025/04

If provided paths are partition directories, please set "basePath" in the options of the data source to specify the root directory of the table. If there are multiple root directories, please load them separately and then union them.
at scala.Predef$.assert(Predef.scala:223)
at org.apache.spark.sql.execution.datasources.PartitioningUtils$.parsePartitions(PartitioningUtils.scala:316)
at org.apache.spark.sql.execution.datasources.PartitioningUtils$.parsePartitions(PartitioningUtils.scala:155)
at org.apache.spark.sql.execution.datasources.PartitioningAwareFileIndex.inferPartitioning(PartitioningAwareFileIndex.scala:205)
at org.apache.spark.sql.execution.datasources.InMemoryFileIndex.partitionSpec(InMemoryFileIndex.scala:110)
at org.apache.spark.sql.execution.datasources.PartitioningAwareFileIndex.partitionSchema(PartitioningAwareFileIndex.scala:58)
at org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:205)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:494)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:394)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:350)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:350)
at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:871)

 

Cause 

This issue occurs when reading a Spark table which has been created with spark partitioning but with ambiguous sub-directories under a table directory.

The error is caused by a failure in Spark’s partition discovery logic while attempting to infer the schema of a directory structure with inconsistent layouts.

When you read data using Spark, it attempts to automatically infer partitioning by parsing the input paths. This is handled internally by org.apache.spark.sql.execution.datasources.PartitioningUtils.parsePartitions. For more information, review the source code.

In the reported error, Spark detects two distinct paths:

  • dbfs:/filestore/jayant/tabledir
  • dbfs:/filestore/jayant/tabledir/2025/04

These paths represent conflicting directory structures: one appears unpartitioned, and the other resembles a partitioned layout based on path depth. But since the folder names (2025, 04) are not in Hive style key=value format, Spark cannot map them to valid partition column names.

This ambiguity leads Spark to fail an internal assertion.

 

Solution

When reading partitioned data without partition column names in the path, set basePath to the common root to correctly infer partitioning.

 

Example code

This example code uses basePath so Spark can correctly read both partitions and return an output.

%python

spark.read.option("basePath", "dbfs:/FileStore/Jayant/tableDir").parquet("dbfs:/FileStore/Jayant/tableDir/2025/04").display()

 

 

Preventive measures

  1. Avoid mixing non-partitioned files and partitioned subdirectories under the same path when working with inference. Ensure that your table and partition directories adhere to a consistent format without extraneous directories.
  2. If you want to create a table with partitions, always use Spark partitioning with partitionBy() and keep the write path to be the root table directory.