HdfsBridge::recordReaderFillBuffer – Unexpected error encountered filling record reader buffer: IllegalArgumentException: Must be 12 bytes

Parquet is my preferred format for storing files in data lake. Parquet’s columnar storage and compression makes it very efficient for in-memory processing tasks like Spark/Databricks notebooks while saving cost on storage. Parquet also supports almost all encoding schemes out there. Perhaps the coolest thing in Parquet is unlike CSV there is no such thing as column/row separator. So there is no need to escape those characters if they are part of data.

Azure SQL Data Warehouse supports Parquet data format for External (PolyBase) tables. External tables reference the underlying storage blobs and gives an option to query the data lake using SQL. In fact this is recommended in Microsoft’s reference architecture. With some Parquet files this error gets thrown when the External Table is queried

HdfsBridge::recordReaderFillBuffer – Unexpected error encountered filling record reader buffer: IllegalArgumentException: Must be 12 bytes

In the absence of clear exception message it took a while to figure this out. This error usually happens on a Timestamp column specifically when the data is in yyyy/MM/dd hh:mm:ss format. For some reason SQL Data Warehouse expects the Timestamp data to be in yyyy-MM-dd hh:mm:ss format. Changing the date separator from / to – resolved this issue, although it must be mentioned the underlying file is a perfectly valid Parquet file.

3 thoughts on “HdfsBridge::recordReaderFillBuffer – Unexpected error encountered filling record reader buffer: IllegalArgumentException: Must be 12 bytes

      1. I cannot go directly to the source to make any changes. Customer is providing the files that they cannot change. I just kept the column type as varchar to make it work and then I can convert to datetime downstream.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s