Illegal Parquet type exception when reading a parquet file with timestamp column of Datatype Int (nanoseconds)

Provide the schema to the Apache Spark reader explicitly where TIMESTAMP_NANOS should be referred as LONGTYPE.

Last published at: June 10th, 2025

Problem

When trying to read a Parquet file that contains a timestamp column of datatype INT (nanoseconds) using Databricks Runtime 11.3 LTS or later, you encounter an "illegal parquet type exception".

The stack trace shows the following output.

Caused by: org.apache.spark.sql.AnalysisException: Illegal Parquet type: INT64 (TIMESTAMP(NANOS,false))
at org.apache.spark.sql.errors.QueryCompilationErrors$.illegalParquetTypeError(QueryCompilationErrors.scala:1328)
at org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.illegalType$1(ParquetSchemaConverter.scala:178)
at org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.$anonfun$convertPrimitiveField$2(ParquetSchemaConverter.scala:247)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convertPrimitiveField(ParquetSchemaConverter.scala:196)
at org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.
at org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convert(ParquetSchemaConverter.scala:87)
at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.$anonfun$readSchemaFromFooter$2(ParquetFileFormat.scala:1040)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.readSchemaFromFooter(ParquetFileFormat.scala:1040)

Cause

Databricks Runtime versions 11.3 LTS and above do not support the TIMESTAMP_NANOS type in open source Apache Spark and Databricks Runtime. If a Parquet file contains fields with the TIMESTAMP_NANOS type, attempts to read it will fail with an Illegal Parquet Type exception. As a result, schema inference will also fail, since Spark cannot interpret the unsupported timestamp type.

Solution

Explicitly provide the schema where TIMESTAMP_NANOS should be referred to as LONGTYPE to the Spark reader.

1. Import the necessary Spark SQL types.

from pyspark.sql.types import StructType, StructField, LongType, StringType

2. Define the schema for the Parquet file.

schema = StructType([
    StructField("timestamp_nanos", LongType(), True),
    StructField("value", StringType(), True)
])

3. Read the Parquet file using the specified schema.

parquet_path = "</path/to/your/parquet/file.parquet>"
try:
    df = spark.read.schema(schema).parquet(parquet_path)
    df.show()
except Exception as e:
    print(f"Error reading Parquet file: {e}")

Databricks Help Center

Problem

Cause

Solution

Contact Us