Will order be preserved when writing/reading a parquet file with ordered dictionaries? #49508

redoak-thomas · 2026-03-13T14:41:31Z

redoak-thomas
Mar 13, 2026

If I write a table to a parquet that has a dictionary with ordered=True and the dictionary does not exceed the size limit will all of the categories always be read back in in the same order? If the parquet file is written in multiple row groups will each dictionary of the row groups be the same or does it remove values that are not present in that row group? Thank you.

Answered by HighpassStudio

Mar 20, 2026

This kind of question makes me think about how much behavior we rely on from the underlying format vs how data is packaged.

Will categories always be read back in the same order?

With PyArrow reading a file written by PyArrow, often yes, more faithfully, because of stored Arrow schema metadata. But I would not treat this as a universal Parquet guarantee, especially across different readers/writers.

If the file has multiple row groups, will each row group dictionary be the same?

No. In Parquet, dictionary pages are per column chunk / per row group, and they may differ and omit categories absent from that row group.

If you need a rock-solid guarantee of category order across files/reade…

View full answer

HighpassStudio · 2026-03-20T22:36:51Z

HighpassStudio
Mar 20, 2026

This kind of question makes me think about how much behavior we rely on from the underlying format vs how data is packaged.

Will categories always be read back in the same order?

With PyArrow reading a file written by PyArrow, often yes, more faithfully, because of stored Arrow schema metadata. But I would not treat this as a universal Parquet guarantee, especially across different readers/writers.

If the file has multiple row groups, will each row group dictionary be the same?

No. In Parquet, dictionary pages are per column chunk / per row group, and they may differ and omit categories absent from that row group.

If you need a rock-solid guarantee of category order across files/readers, try storing the category list explicitly in metadata or a companion schema artifact.

One thing I’ve run into is that once data leaves formats like Parquet and gets bundled into archives (zip/tar/etc.), we lose a lot of these guarantees around selective reads and ordering. You often end up decompressing everything just to access one piece.

Are people just avoiding archives entirely in these workflows, or is there a pattern for preserving efficient access once data is packaged?

1 reply

redoak-thomas Mar 20, 2026
Author

Thank you for your answer, I'll just create a separate schema. We don't use archives.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Will order be preserved when writing/reading a parquet file with ordered dictionaries? #49508

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Will order be preserved when writing/reading a parquet file with ordered dictionaries? #49508

Uh oh!

redoak-thomas Mar 13, 2026

Replies: 1 comment · 1 reply

Uh oh!

HighpassStudio Mar 20, 2026

Uh oh!

redoak-thomas Mar 20, 2026 Author

redoak-thomas
Mar 13, 2026

Replies: 1 comment 1 reply

HighpassStudio
Mar 20, 2026

redoak-thomas Mar 20, 2026
Author