Problem
Long running jobs, such as streaming jobs, fail after 48 hours when using dbutils.secrets.get() (AWS | Azure | GCP).
For example:
%python
streamingInputDF1 = (
spark
.readStream
.format("delta")
.table("default.delta_sorce")
)
def writeIntodelta(batchDF, batchId):
table_name = dbutils.secrets.get("secret1","table_name")
batchDF = batchDF.drop_duplicates()
batchDF.write.format("delta").mode("append").saveAsTable(table_name)
streamingInputDF1 \
.writeStream \
.format("delta") \
.option("checkpointLocation", "dbfs:/tmp/delta_to_delta") \
.foreachBatch(writeIntodelta) \
.outputMode("append") \
.start()This example code returns an error after 48 hours.
<head> <meta http-equiv="Content-Type" content="text/html;charset=utf-8"/> <title>Error 403 Invalid access token.</title> </head> <body><h2>HTTP ERROR 403</h2> <p>Problem accessing /api/2.0/secrets/get. Reason: <pre> Invalid access token.</pre></p> </body>
Cause
Databricks Utilities (dbutils) (AWS | Azure | GCP) tokens expire after 48 hours.
This is by design.
Solution
You cannot extend the life of a token.
Jobs that take more than 48 hours to complete should not use dbutils.secrets.get().