Error "You cannot use dbutils within a Spark job" when registering a model in Databricks

Re-instantiate WorkspaceClient when needed, rather than storing it as an attribute of an object.

Written by Amruth Ashoka

Last published at: October 30th, 2025

Problem

When attempting to register a model using a notebook, you encounter the following error message.

"You cannot use dbutils within a Spark job." 

 

This issue may occur even when dbutils is not explicitly used in the code. 

 

Example code that can cause the error 

import mlflow
from databricks.sdk import WorkspaceClient
from mlflow.models.signature import infer_signature

class BadBot(mlflow.pyfunc.PythonModel):
    def __init__(self):
        self.ws_client = WorkspaceClient()

    def predict(self, context, model_input):
        print(model_input)
        return {"result": ["hello"]}

badbot = BadBot()
badbot.predict(None, "test")

mlflow.set_registry_uri("databricks-uc")
model_name = "catalog.schema.model_name"

input_example = "test"
signature = infer_signature([input_example], badbot.predict(None, input_example))

with mlflow.start_run():
    mlflow.pyfunc.log_model(
        artifact_path="bad",
        python_model=badbot,
        registered_model_name=model_name,
        pip_requirements=[f"mlflow=={mlflow.__version__}"],
        signature=signature,
        input_example=input_example
    )

 

Cause

When mlflow.pyfunc.log_model is called, MLflow uses the cloudpickle library to serialize the PythonModel instance on the driver to produce the model artifact. Since the instance includes a databricks.sdk.WorkspaceClient (such as self.workspace_client), cloudpickle attempts to serialize that client as well.

 

However, WorkspaceClient isn’t designed to be serialized. It keeps live HTTP sessions, auth or context state, and so on, and may rely on notebook-context utilities like dbutils. These environment-bound handles cannot be serialized, so capturing the client inside the model object leads to serialization errors.

 

Solution

Re-instantiate WorkspaceClient when needed, rather than storing it as an attribute of an object. 

 

1. Identify any instances where WorkspaceClient is stored as an attribute (for example, self.workspace_client).

2. Instead of storing WorkspaceClient as an attribute, create it as a local variable within the method or function where it is used. You can adapt and use the following example code. 

import mlflow
from mlflow.models.signature import infer_signature

class TestBot(mlflow.pyfunc.PythonModel):

    def predict(self, user_query):
        from databricks.sdk import WorkspaceClient
        from databricks.sdk.service.serving import ChatMessage, ChatMessageRole
        self.w = WorkspaceClient()
        print(user_query)
        return {"result: ['hello']"}

testbot = TestBot()                 # smoke-test
testbot.predict(user_query="Test Query")

mlflow.set_registry_uri("databricks-uc")
model_name = "catalog.schema.model_name"

# Example input data for signature inference
input_example = "Test Query"
signature = infer_signature([input_example], testbot.predict(input_example))

with mlflow.start_run():
    mlflow.pyfunc.log_model(
        artifact_path="test",
        python_model=TestBot(),          
        registered_model_name=model_name,
        pip_requirements=[f"mlflow=={mlflow.__version__}"],
        signature=signature,
        input_example=input_example
    )

 

For more information, refer to the Databricks SDK for Python Workspace Client documentation.