Problem
When using a legacy Apache Spark user-defined function (UDF) to create a complex prompt and pass it dynamically to the ai_query() function, you receive an error. However, if you don’t use the UDF, the ai_query() function works.
Example prompt
In the following code, legacy spark udf create_prompt_udf() and python udf ai_query() are called on the same transformation.
result_df = df.withColumn(
"prompt",
create_prompt_udf(
F.col("question"),
F.col("topic"),
F.col("category")
)
).withColumn(
"answer",
F.expr("""
ai_query(
'databricks-meta-llama-3-1-70b-instruct',
prompt
)
""")
Error message
org.apache.spark.SparkException: [INTERNAL_ERROR] Expected udfs have the same evalType but got different evalTypes: 100,400 SQLSTATE: XX000
Cause
Legacy Spark UDFs and ai_query() UDFs have different processing which creates the error.
Legacy Spark UDFs have evalTypes: 100 which is itself a Spark UDF where data is processed row by row. The ai_query() UDF has evalTypes: 400 which is a Python function that triggers a Python runner internally and uses batch processing.
Solution
Use Unity Catalog (UC) UDFs instead of legacy Spark UDFs. UC UDFs are designed to be compatible with Spark functions like ai_query(). UC UDFs can handle the same batch processing method as the ai_query() UDF, ensuring that there is no conflict in the evaluation types.
For more information, refer to the User-defined functions (UDFs) in Unity Catalog (AWS | Azure | GCP) documentation.