Problem
Low-level Resilient Distributed Dataset (RDD) API calls in your JVM-based workloads cannot access Unity Catalog volumes. You encounter errors in SSL initialization, dependency loading, or schema registry access.
Cause
Unity Catalog locations, including volumes and external locations, are not compatible with RDDs in Apache Spark. The restrictions are in place to maintain the Unity Catalog-enforced security model and process isolation. Unity Catalog is designed to work primarily with higher-level Spark APIs and its governance model.
Spark and other JVM processes can only access Unity Catalog volumes or workspace files using the readers and writers that support Unity Catalog. For more information, refer to the Work with files on Databricks (AWS | Azure | GCP) documentation.
Solution
The appropriate solution depends on the workload. Two common approaches to try are:
- Use init scripts to copy necessary files to local storage, by copying the files from UC storage location to DBFS or direct cloud storage path. For more information, refer to the What are init scripts? (AWS | Azure | GCP) documentation.
- Configure libraries to use supported storage mechanisms, such as cloud storage paths. For details, refer to your respective documentation in the following sub-list.
If you find neither of these approaches works for you, contact Databricks Support to identify an alternative way forward.
Preventive Measures
To avoid similar issues in the future, assign checkpoint locations to explicit storage location paths (instead of relying on Unity Catalog volume) such as S3, Azure file paths, or Google Cloud GCS buckets.