How to create an init script to collect tcp_dumps on a standard (formerly shared) access mode cluster

Modify previously published instructions to use a volume path and add a step to add the init script to an allowlist.

Written by saikumar.divvela

Last published at: May 22nd, 2025

How to collect tcp_dumps on a standard (formerly shared) cluster

Important

If you’re working on a dedicated (formerly single-user) cluster, follow the instructions in the Use tcpdump to create pcap files KB article instead.

 

 

When working on a standard (formerly shared) access mode cluster, direct access to Databricks File System (DBFS) fails and you can’t copy the files to the DBFS path. This article provides adjusted instructions to accommodate this difference.

 

Create a volume and then the tcp_dump init script

  1. Create a volume and provide the path to store the init script.
  2. Run the following sample init script in a notebook to collect tcp_dumps. This script uses curl to make a DBFS PUT API call to the workspace to upload the pcap files. You need to pass the token and the workspace host in the API path in the script. Your DAPI token should be an existing valid PAT (string) that has permission to use DBFS PUT API.

 

If you want to filter the tcp_dumps with host and port, you can uncomment the TCPDUMP_FILTER line in the script and add the required host and port. Alternatively, you can pass the host and port separately, depending on your requirements.

dbutils.fs.put("/Volumes/<path-to-init-script>/tcp_dumps.sh", """
#!/bin/bash

set -euxo pipefail

MYIP=$(echo $HOSTNAME)
TMP_DIR="/local_disk0/tmp/tcpdump"

[[ ! -d ${TMP_DIR} ]] && mkdir -p ${TMP_DIR}
TCPDUMP_WRITER="-w ${TMP_DIR}/trace_%Y%m%d_%H%M%S_${DB_CLUSTER_ID}_${MYIP}.pcap -W 1000 -G 900 -Z root -U -s256"
TCPDUMP_PARAMS="-nvv -K"
#TCPDUMP_FILTER="host xxxxxxxxx.dfs.core.windows.net and port 443" ## add host/port filter here based on the requirement

sudo tcpdump $(echo "${TCPDUMP_WRITER}") $(echo "${TCPDUMP_PARAMS}") $(echo "${TCPDUMP_FILTER}") &
echo "Started tcpdump $(echo "${TCPDUMP_WRITER}") $(echo "${TCPDUMP_PARAMS}") $(echo "${TCPDUMP_FILTER}")"

cat > /tmp/copy_stats.sh << 'EOF'
#!/bin/bash

TMP_DIR=$1
DB_CLUSTER_ID=$2
COPY_INTERVAL_IN_SEC=45
MYIP=$(echo $HOSTNAME)
echo "Starting copy script at `date`"

DEST_DIR="/Volumes/main/default/jar/"
#mkdir -p ${DEST_DIR}

sleep_duration=45

log_file="/tmp/copy_stats.log"
touch $log_file

declare -gA file_sizes

## logic to copy  files by checking previous size. Uses associative array to persist rotated files size.

while true; do
 sleep ${COPY_INTERVAL_IN_SEC}
 #ls -ltr ${DEST_DIR} >  $log_file
 for file in $(find "$TMP_DIR" -type f -mmin -3 ); do
   current_size=$(stat -c "%s" "$file")
   file_name=$(basename "$file")
   last_size=${file_sizes["$file_name"]}
   if [ "$current_size" != "$last_size" ]; then
       echo "Copying $file with current size: $current_size and last size: $last_size at `date`" | tee -a $log_file
       DBFS_PATH="dbfs:/FileStore/tcpdumpfolder/${DB_CLUSTER_ID}/trace_$(date +"%Y-%m-%d--%H-%M-%S")_${DB_CLUSTER_ID}_${MYIP}.pcap"
       curl -vvv -F contents=@$file -F path="$DBFS_PATH" -H "Authorization: Bearer <your-dapi-token>" https://<your-databricks-workspace-url>/api/2.0/dbfs/put  2>&1 | tee -a $log_file
       #cp --verbose "$file" "$DEST_DIR" | tee -a $log_file
       echo "done Copying $file with current size: $current_size at `date`" | tee -a $log_file

       file_sizes[$file_name]=$current_size
   else
       echo "Skip Copying $file with current size: $current_size and last size: $last_size at `date`" | tee -a $log_file
   fi
 done
 done

EOF

chmod a+x /tmp/copy_stats.sh
/tmp/copy_stats.sh $TMP_DIR $DB_CLUSTER_ID & disown              
""", True)

 

Note the volume path to the init script. You will need it when configuring your standard access mode cluster.

 

Add the init script to the allowlist

Follow the instructions to add the init script to the allowlist in the Allowlist libraries and init scripts on compute with standard access mode (formerly shared access mode) (AWSAzureGCP) documentation.

 

Configure the init script

  1. Follow the instructions to configure a cluster-scoped init script in the Cluster-scoped init scripts (AWSAzureGCP) documentation.
  2. Specify the volume path to the init script. Use the same path that you used in the preceding script. (/volumes/<path-to-init-script>/tcp_dump.sh)
  3. After configuring the init script, restart the cluster.

 

Locate the pcap files

Once the cluster has started, it automatically starts creating pcap files containing the recorded network information. Locate the pcap files in the folder dbfs:/FileStore/tcpdumpfolder/${DB_CLUSTER_ID}.

 

Download the pcap files

Download the pcap files from the DBFS path to your local host for analysis. There are multiple ways to download files to your local machine. One option is the Databricks CLI. For more information, review the What is the Databricks CLI? (AWSAzureGCP) documentation.