Bugfix for tab separated values file parsing#115
Open
loamenya wants to merge 1 commit intoreproio:masterfrom
Open
Bugfix for tab separated values file parsing#115loamenya wants to merge 1 commit intoreproio:masterfrom
loamenya wants to merge 1 commit intoreproio:masterfrom
Conversation
loamenya
commented
Mar 25, 2026
- The original implementation was using comma separated values(csv) file parser with tab delimiter. CSV fields have different parsing rules especially with quoting compared to tab separated values files causing parsing to fail for tsv files when the values were not appropriate for comma separated value fields. Only tabs are interpreted as field delimiters all other characters are now assumed to be field values
- added corresponding tests for edge cases that revealed the bug
- the parser fails if the passed schema requires more fields than the record being parse
- the parser warns if too many fields are in the record compared to what the schema requires.
- the bug affect integration for fluentd s3 output plugin which uses columnify for generating parquet file outputs
3fe76ee to
268270a
Compare
* The original implementation was using comma separated values(csv) file parser with tab delimiter. CSV fields have different parsing rules epecially with quoting compared to tab separated values files causing parsing to fail for tsv files when the values were not appropriate for comma separated value fields. Only tabs are intepreted as field delimiters all other characters are now assumed to be field values * added corresponding test for edge cases that revealed the bug * the parser fails if the passed schema requires more fields than the record being parse * the parser warns if too many fields are in the record compared to what the schema requires.
268270a to
669f00a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.