Updated March 15th, 2023 by mounika.tarigopula

Cannot customize Apache Spark config in Databricks SQL warehouse

Problem You want to set Apache Spark configuration properties in Databricks SQL warehouses like you do on standard clusters. Cause Databricks SQL is a managed service. You cannot modify the Spark configuration properties on a SQL warehouse. This is by design. You can only configure a limited set of global Spark properties that apply to all SQL wareh...

0 min reading time
Updated September 9th, 2024 by mounika.tarigopula

Sort failed after writing partitioned data to parquet using PySpark on Databricks Runtime 13.3 LTS

Problem  In Databricks Runtime 13.3 LTS to 15.3, when using  sortWithinPartitions   to make sure the rows in each partition are ordered based on the columns, the sorted data frame looks correct when displayed, but after saving and reading it back, the sorting is lost. Cause  There is a bug in which the planned write local sort comes after the  sortW...

0 min reading time
Updated November 7th, 2022 by mounika.tarigopula

Understanding speculative execution

Speculative execution  Speculative execution can be used to automatically re-attempt a task that is not making progress compared to other tasks in the same stage. This means if one or more tasks are running slower in a stage, they will be re-launched. The task that completes first is marked as successful. The other attempt gets killed. Implementatio...

2 min reading time
Updated March 16th, 2023 by mounika.tarigopula

Programmatically determine if a table is a Delta table or not

You may not always know the type of table you need to read. For example, if a given table is a Delta table you may need to read it differently than if it were a Parquet table. This article explains how you can use Python code in a Databricks notebook to programmatically determine if a table is a Delta table or not. Instructions Attach your notebook ...

0 min reading time
Load More