Problem
Although using glob patterns to filter directories during file discovery in Auto Loader is a powerful tool, in certain cases using glob patterns has a significant impact on Auto Loader performance, especially when using partitions.
Cause
While glob patterns help define which directories to include, they don’t limit Auto Loader's initial file discovery scan. Auto Loader still evaluates all subdirectories under the specified root. The glob pattern acts as a filter after the scan, determining which files or directories are processed further.
Example
The pattern /mnt/my_table/{year=2025/month=1/day=2,year=2025/month=1/day=3}
will cause the Auto Loader to scan all partitions and sub partitions even though only 2 days of data is desired.
Solution
Use a more specific root path to reduce the scope of the initial scan.
Example
Instead of /mnt/my_table/{year=2025/month=1/day=2,year=2025/month=1/day=3}
, use separate paths.
/mnt/my_table/year=2025/month=1/{day=2,day=3}