Error [JVM_ATTRIBUTE_NOT_SUPPORTED] when trying to obtain the number of partitions in a Dataframe

Switch to single-user cluster or use the spark_partition_id() function in a shared cluster.

Written by manikandan.ganesan

Last published at: February 19th, 2025

Problem

When you try to use the df.rdd.getNumPartitions method to obtain the number of partitions in a DataFrame, the command fails with the following error.

 

[JVM_ATTRIBUTE_NOT_SUPPORTED] Attribute `rdd` is not supported in Spark Connect as it depends on the JVM. If you need to use this attribute, do not use Spark Connect when creating your session. Visit https://spark.apache.org/docs/latest/sql-getting-started.html#starting-point-sparksession for creating regular Spark Session in detail.

 

Cause

As of Databricks Runtime 14.0 and above, shared clusters use the Apache Spark Connect architecture. RDD APIs are not supported in shared access modes.

 

Solution

Switch to a single-user cluster. 

 

If switching to single-user cluster is not feasible, you can use the spark_partition_id() function, which can be used in shared clusters. 

 

The following code snippet retrieves the distinct spark_partition_id() values and counts them to determine the number of partitions in the DataFrame.

 

%python
from pyspark.sql import functions as F
df.select(F.spark_partition_id()).distinct().count()

 

For more information, review the Compute access mode limitations for Unity Catalog (AWSAzureGCP ) documentation.