Skip to content

Bugfix for tab separated values file parsing#115

Open
loamenya wants to merge 1 commit intoreproio:masterfrom
loamenya:feature/tsv_record_parser
Open

Bugfix for tab separated values file parsing#115
loamenya wants to merge 1 commit intoreproio:masterfrom
loamenya:feature/tsv_record_parser

Conversation

@loamenya
Copy link
Copy Markdown

  • The original implementation was using comma separated values(csv) file parser with tab delimiter. CSV fields have different parsing rules especially with quoting compared to tab separated values files causing parsing to fail for tsv files when the values were not appropriate for comma separated value fields. Only tabs are interpreted as field delimiters all other characters are now assumed to be field values
  • added corresponding tests for edge cases that revealed the bug
  • the parser fails if the passed schema requires more fields than the record being parse
  • the parser warns if too many fields are in the record compared to what the schema requires.
  • the bug affect integration for fluentd s3 output plugin which uses columnify for generating parquet file outputs

@loamenya loamenya force-pushed the feature/tsv_record_parser branch from 3fe76ee to 268270a Compare March 25, 2026 23:46
* The original implementation was using comma separated values(csv) file
  parser with tab delimiter. CSV fields have different parsing rules
  epecially with quoting compared to tab separated values files causing
  parsing to fail for tsv files when the values were not appropriate for
  comma separated value fields. Only tabs are intepreted as field
  delimiters all other characters are now assumed to be field values
* added corresponding test for edge cases that revealed the bug
* the parser fails if the passed schema requires more fields than the
  record being parse
* the parser warns if too many fields are in the record compared to what
  the schema requires.
@loamenya loamenya force-pushed the feature/tsv_record_parser branch from 268270a to 669f00a Compare March 26, 2026 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant