Problem
When performing operations on a table created using the AWS Glue API, you encounter the following recurring error.
ERROR Table: Unable to get field from serde: org.apache.hadoop.hive.serde2.OpenCSVSerde
java.lang.NullPointerException
at java.lang.String.concat(String.java:2027)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.getSchema(MetaStoreUtils.java:1059)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.getTableMetadata(MetaStoreUtils.java:839)
Cause
Your table has missing or corrupted metadata.
Solution
Verify your metadata using the Glue API, then recreate or update the table with the correct metadata.
Verify metadata with the Glue API
- Use the Glue API's
get-table
method to check the table metadata in the Glue catalog. Ensure that all required columns and their respective data types, including the partition column, are correctly specified in the metadata. - If the issue pertains to the partition column, verify that the data type for the partition column is accurately set in the table metadata. If necessary, alter the schema to include the correct data type for the partition column. You can use the
update_table
API call or a CLI command.
Example CLI command to verify metadata
aws glue get-table --<catalog-id> --database-name <value> --name <value>
Recreate or update the table with correct metadata
Recreate or alter the table using the correct metadata to address incomplete or corrupted metadata. Ensure that all column information, including the data types, is accurately provided during the table creation process.
Example CLI command to alter the table to update metadata
aws glue update-table \
--database-name <your-database-name> \
--table-input '{
"Name": "<your-table-name>",
"StorageDescriptor": {
"Columns": [
{"Name": "column1", "Type": "string"},
{"Name": "column2", "Type": "int"}
],
"Location": "s3://<my-new-bucket-name>/path/",
"InputFormat": "org.apache.hadoop.mapred.TextInputFormat",
"OutputFormat": "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
"Compressed": false
},
"TableType": "EXTERNAL_TABLE",
"Parameters": {
"classification": "csv",
"compressionType": "none"
}
}'
For the complete JSON request syntax to update the metadata, review the AWS update-table (CLI | Python API) documentation.