Problem
When you try to access the Apache Spark UI in Databricks, you encounter a 502 error even though your Spark job is running without issues.
502 Bad Gateway: The server returned an invalid or incomplete response.
This error can occur in environments such as Data Engineering and Machine Learning, specifically when working with large Delta Lake tables.
When you check the driver logs, you notice the following error in the driver logs.
java.lang.StackOverflowError
Cause
The Spark UI stores data in memory by default. When a Spark job generates a large amount of data, that data can overflow memory, forcing the driver to address the overflow. The HTTP server, which is within the driver, cannot then respond to HTTP requests properly and throws a 502 error.
Solution
Enable the configuration to store Spark UI data on disk instead of in memory. This helps prevent the Spark UI from running out of memory.
To enable this configuration, add the following line to your Spark configuration in your cluster settings. Edit your cluster, scroll to Advanced options > Spark tab, then in the Spark config field, input the following setting.
spark.ui.store.path /databricks/driver/sparkuirocksdb
If you prefer to enable the configuration using a notebook, you can use the following Python code.
spark.conf.set("spark.ui.store.path", "/databricks/driver/sparkuirocksdb")