Problem
When you try to decode a protocol buffer (protobuf) message containing timestamp data using the from_protobuf() Apache Spark SQL built-in function, you encounter an error message.
[PROTOBUF_DEPENDENCY_NOT_FOUND] Could not find dependency: google/protobuf/timestamp.proto
Cause
The from_protobuf() function cannot find the required protobuf dependency google/protobuf/timestamp.proto because the protobuf descriptor file does not include this dependency when it is created.
Solution
Use the option --include_imports while creating the protobuf descriptor file, and then use this descriptor file in the from_protobuf() function.
Example
protoc --descriptor_set_out=sample.desc --include_imports sample.proto
df.select(from_protobuf("value", "AppEvent", sample.desc).alias("event"))
Note
You only need an explicit import for TimestampType and DayTimeIntervalType.
Timestamp is represented as {seconds: Long, nanos: Int} and maps to the TimestampType in Spark SQL. Duration maps to DayTimeIntervalType in Spark SQL.
For more information on data type mapping, please refer to the Spark Protobuf Data Source Guide.