You have a scenario that requires Apache Hadoop properties to be set.
You would normally do this in the core-site.xml file.
In this article, we explain how you can set core-site.xml in a cluster.
Create the core-site.xml file in DBFS
You need to create a core-site.xml file and save it to DBFS on your cluster.
An easy way to create this file is via a bash script in a notebook.
This example code creates a hadoop-configs folder on your cluster and then writes a single property core-site.xml file to that folder.
%sh mkdir -p /dbfs/hadoop-configs/ cat << 'EOF' > /dbfs/hadoop-configs/core-site.xml <property> <name><property-name-here></name> <value><property-value-here></value> </property> EOF
You can add multiple properties to the file by adding additional name/value pairs to the script.
You can also create this file locally, and then upload it to your cluster.
Create an init script that loads core-site.xml
This example code creates an init script called set-core-site-configs.sh that uses the core-site.xml file you just created.
If you manually uploaded a core-site.xml file and stored it elsewhere, you should update the config_xml value in the example code.
%python
dbutils.fs.put("/databricks/scripts/set-core-site-configs.sh", """
#!/bin/bash
echo "Setting core-site.xml configs at `date`"
START_DRIVER_SCRIPT=/databricks/spark/scripts/start_driver.sh
START_WORKER_SCRIPT=/databricks/spark/scripts/start_spark_slave.sh
TMP_DRIVER_SCRIPT=/tmp/start_driver_temp.sh
TMP_WORKER_SCRIPT=/tmp/start_spark_slave_temp.sh
TMP_SCRIPT=/tmp/set_core-site_configs.sh
config_xml="/dbfs/hadoop-configs/core-site.xml"
cat >"$TMP_SCRIPT" <<EOL
#!/bin/bash
## Setting core-site.xml configs
sed -i '/<\/configuration>/{
r $config_xml
a \</configuration>
d
}' /databricks/spark/dbconf/hadoop/core-site.xml
EOL
cat "$TMP_SCRIPT" > "$TMP_DRIVER_SCRIPT"
cat "$TMP_SCRIPT" > "$TMP_WORKER_SCRIPT"
cat "$START_DRIVER_SCRIPT" >> "$TMP_DRIVER_SCRIPT"
mv "$TMP_DRIVER_SCRIPT" "$START_DRIVER_SCRIPT"
cat "$START_WORKER_SCRIPT" >> "$TMP_WORKER_SCRIPT"
mv "$TMP_WORKER_SCRIPT" "$START_WORKER_SCRIPT"
echo "Completed core-site.xml config changes `date`"
""", True)Attach the init script to your cluster
You need to configure the newly created init script as a cluster-scoped init script.
If you used the example code, your Destination is DBFS and the Init Script Path is dbfs:/databricks/scripts/set-core-site-configs.sh.
If you customized the example code, ensure that you enter the correct path and name of the init script when you attach it to the cluster.