Recurring error “Unable to get field from serde” when trying to perform operations on table

Verify your metadata using the Glue API, then recreate or update the table with the correct metadata.

Last published at: January 25th, 2025

Problem

When performing operations on a table created using the AWS Glue API, you encounter the following recurring error.

ERROR Table: Unable to get field from serde: org.apache.hadoop.hive.serde2.OpenCSVSerde  
java.lang.NullPointerException  
at java.lang.String.concat(String.java:2027)  
at org.apache.hadoop.hive.metastore.MetaStoreUtils.getSchema(MetaStoreUtils.java:1059)  
at org.apache.hadoop.hive.metastore.MetaStoreUtils.getTableMetadata(MetaStoreUtils.java:839)

Cause

Your table has missing or corrupted metadata.

Solution

Verify your metadata using the Glue API, then recreate or update the table with the correct metadata.

Verify metadata with the Glue API

Use the Glue API's get-table method to check the table metadata in the Glue catalog. Ensure that all required columns and their respective data types, including the partition column, are correctly specified in the metadata.
If the issue pertains to the partition column, verify that the data type for the partition column is accurately set in the table metadata. If necessary, alter the schema to include the correct data type for the partition column. You can use the update_table API call or a CLI command.

Example CLI command to verify metadata

aws glue get-table --<catalog-id> --database-name <value> --name <value>

Recreate or update the table with correct metadata

Recreate or alter the table using the correct metadata to address incomplete or corrupted metadata. Ensure that all column information, including the data types, is accurately provided during the table creation process.

Example CLI command to alter the table to update metadata

aws glue update-table \
    --database-name <your-database-name> \
    --table-input '{
        "Name": "<your-table-name>",
        "StorageDescriptor": {
            "Columns": [
                {"Name": "column1", "Type": "string"},
                {"Name": "column2", "Type": "int"}
            ],
            "Location": "s3://<my-new-bucket-name>/path/",
            "InputFormat": "org.apache.hadoop.mapred.TextInputFormat",
            "OutputFormat": "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
            "Compressed": false
        },
        "TableType": "EXTERNAL_TABLE",
        "Parameters": {
            "classification": "csv",
            "compressionType": "none"
        }
    }'

For the complete JSON request syntax to update the metadata, review the AWS update-table (CLI | Python API) documentation.

Databricks Help Center