Apache Spark Submit job clusters do not terminate after sc.stop()

Explicitly invoke System.exit(0) after SparkContext.stop().

Written by Vidhi Khaitan

Last published at: March 28th, 2025

Problem

Your Apache Spark Submit jobs remain active even after invoking sc.stop(), and the underlying job cluster does not shut down as expected.

 

Cause

The cluster still has non-daemon threads running, preventing shutdown.

 

Context 

The sc.stop() command in Spark stops the SparkContext, which is responsible for managing the resources for a Spark application. However, it does not necessarily terminate the job cluster. 

The termination of a job cluster depends on various factors including the specific configurations and the termination policies set for the cluster.

In Spark Submit jobs, the job will not exit until the Spark Submit JVM shuts down. This shutdown normally occurs in one of two ways. 

 

  1. Explicitly, when the JVM’s System.exit() is method explicitly invoked somewhere via code.
  2. When all non-daemon threads have exited. A non-daemon thread ensures that the JVM waits for its completion before exiting, making it suitable for critical tasks that must be finished before the application terminates. This type of thread is crucial for maintaining data integrity and ensuring that important operations are fully executed.
    1. Some non-daemon threads can be only stopped when SparkContext.stop() is called.
    2. Some non-daemon threads are only cleaned up in a JVM shutdown hook.

 

Solution

To ensure the Spark Submit job terminates properly, explicitly invoke System.exit(0) after SparkContext.stop().

 

Python

import sys
sc = SparkSession.builder.getOrCreate().sparkContext  # Or otherwise obtain handle to SparkContext
runTheRestOfTheUserCode()
# Fall through to exit with code 0 in case of success, since failure will throw an uncaught exception
# and won't reach the exit(0) and thus will trigger a non-zero exit code that will be handled by
# PythonRunner
sc._gateway.jvm.System.exit(0)

 

Scala 

def main(args: Array[String]): Unit = {
  try {
    runTheRestOfTheUserCode() // The actual application logic
  } catch {
    case t: Throwable =>
      try {
        // Log the throwable or error here
      } finally {
        System.exit(1)
      }
  }
  System.exit(0)
}