Problem
You are trying to write a dataFrame as an Excel file to a Unity Catalog volume when it fails with an Operation not permitted
error message.
Cause
This is a known limitation when writing to Unity Catalog volumes.
As stated in the Databricks Volumes limitations (AWS | Azure | GCP) documentation, “Direct-append or non-sequential (random) writes, such as writing Zip and Excel files are not supported.”
Solution
You should perform the write operation on a local disk and copy the results to a Unity Catalog volume. You can use /local_disk0/tmp
for all cluster types. Other paths may fail.
Example code
These examples show you two possible ways to accomplish this task. The first uses openpyxl
, while the second does not require any additional package installation.
Use openpyxl
with /local_disk0/tmp
We create a sample dataframe to be written to a Unity Catalog volume path as an Excel file. The file is written to a temporary local path before it is copied to the volume path. The openpyxl
package is used to create the Excel file in the local path. Then we use the shutil
copyfile
function to copy the file from the local path to the volume.
%python
%pip install openpyxl
dbutils.library.restartPython()
import pandas as pd
from openpyxl import Workbook
from shutil import copyfile
def write_excel(data, filename):
"""
Write data to an Excel (.xlsx) file.
Parameters:
data (list of lists): The data to write, with the first inner list as the header row.
filename (str): The filename for the Excel file in local_disk.
"""
wb = Workbook() # Create a new workbook
ws = wb.active # Select the active worksheet
# Iterate over each row in the data and append it to the worksheet
for row in data:
ws.append(row)
wb.save(filename) # Save the workbook to the specified file
# Example usage:
data = [
["Name", "Age", "City"], # Header row
["Alice", 30, "New York"],
["Bob", 25, "San Francisco"],
["Charlie", 35, "Los Angeles"]
]
local_file = "/local_disk0/tmp/excel.xlsx"
filename = <full-file-path-in-volume-including-filename>
print(filename)
write_excel(data, local_file)
copyfile(local_file, filename)
No package installation
We create a sample dataframe to be written to a Unity Catalog volumes path as an Excel file. The file is written to a temporary local path before it is copied to the volume path. We use the pandas
function .to_excel()
to create the Excel file in the local path. Then we use the shutil
copyfile
function to copy the file from the local path to the volume.
%python
import pandas as pd
from shutil import copyfile
def write_excel(data, filename):
"""
Write data to an Excel (.xlsx) file.
Parameters:
data (list of lists): The data to write, with the first inner list as the header row.
filename (str): The filename for the Excel file.
"""
df = pd.DataFrame(data[1:], columns=data[0])
df.to_excel('/local_disk0/tmp/excel.xlsx', index=False, sheet_name="Sheet1")
copyfile('/local_disk0/tmp/excel.xlsx', filename)
# Example usage:
data = [
["Name", "Age", "City"], # Header row
["Alice", 30, "New York"],
["Bob", 25, "San Francisco"],
["Charlie", 35, "Los Angeles"]
]
filename = <full-file-path-in-volume-including-filename>
write_excel(data, filename)