Skip to content
Discussion options

You must be logged in to vote

This kind of question makes me think about how much behavior we rely on from the underlying format vs how data is packaged.

Will categories always be read back in the same order?

  • With PyArrow reading a file written by PyArrow, often yes, more faithfully, because of stored Arrow schema metadata. But I would not treat this as a universal Parquet guarantee, especially across different readers/writers.

If the file has multiple row groups, will each row group dictionary be the same?

  • No. In Parquet, dictionary pages are per column chunk / per row group, and they may differ and omit categories absent from that row group.

If you need a rock-solid guarantee of category order across files/reade…

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@redoak-thomas
Comment options

Answer selected by redoak-thomas
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants