Using glob patterns for directory filtering impacting Auto Loader performance

Use a more specific root path to reduce the scope of the initial scan.

Written by avi.yehuda

Last published at: January 29th, 2025

Problem

Although using glob patterns to filter directories during file discovery in Auto Loader is a powerful tool, in certain cases using glob patterns has a significant impact on Auto Loader performance, especially when using partitions. 

 

Cause

While glob patterns help define which directories to include, they don’t limit Auto Loader's initial file discovery scan. Auto Loader still evaluates all subdirectories under the specified root. The glob pattern acts as a filter after the scan, determining which files or directories are processed further. 

 

Example

The pattern /mnt/my_table/{year=2025/month=1/day=2,year=2025/month=1/day=3} will cause the Auto Loader to scan all partitions and sub partitions even though only 2 days of data is desired. 

 

Solution

Use a more specific root path to reduce the scope of the initial scan. 

 

Example 

Instead of /mnt/my_table/{year=2025/month=1/day=2,year=2025/month=1/day=3}, use separate paths. 

 

/mnt/my_table/year=2025/month=1/{day=2,day=3}