Skip to content

Issue #444: Improve CasToComparableText#445

Open
reckart wants to merge 1 commit intomainfrom
feature/444-Improve-CasToComparableText
Open

Issue #444: Improve CasToComparableText#445
reckart wants to merge 1 commit intomainfrom
feature/444-Improve-CasToComparableText

Conversation

@reckart
Copy link
Member

@reckart reckart commented Mar 19, 2026

What's in the PR

  • Enhance CasToComparableText: add HTML renderer (in addition to CSV).
  • Add configurable columns: <ANCHOR>, <INDEXED>, <COVERED_TEXT> and setMaxLengthCoveredText.
  • Stable, disambiguated anchors: optional unique anchors, sofa id marker, indexed marker, optional anchor feature hash.
  • Deterministic ordering: sort annotation-valued multi-valued features; feature-hash tie-breaker; indexed-first tie-break.
  • Exclude features/types via regex patterns with compiled-pattern cache.
  • Treat empty strings as null and configurable nullValue.
  • Robust multi-valued support: arrays and list types rendered recursively with primitive-array handling.
  • New configuration API (setters/getters) for rendering options (e.g., setOmitXmlDeclaration, setAnchorFeatureHash, setUniqueAnchors, etc.).
  • Update CasToComparableTextTest to cover HTML output, exclusions, ordering, anchor hashing and array/list rendering.

How to test manually

  • No specific test procedure

Automatic testing

  • PR adds/updates unit tests

Documentation

  • PR adds/updates documentation

Organizational

  • PR adds/updates dependencies.
    Only dependencies under approved licenses are allowed. LICENSE and NOTICE files in the respective modules where dependencies have been added as well as in the project root have been updated.

- Enhance `CasToComparableText`: add HTML renderer (in addition to CSV).
- Add configurable columns: `<ANCHOR>`, `<INDEXED>`, `<COVERED_TEXT>` and `setMaxLengthCoveredText`.
- Stable, disambiguated anchors: optional unique anchors, sofa id marker, indexed marker, optional anchor feature hash.
- Deterministic ordering: sort annotation-valued multi-valued features; feature-hash tie-breaker; indexed-first tie-break.
- Exclude features/types via regex patterns with compiled-pattern cache.
- Treat empty strings as null and configurable `nullValue`.
- Robust multi-valued support: arrays and list types rendered recursively with primitive-array handling.
- New configuration API (setters/getters) for rendering options (e.g., `setOmitXmlDeclaration`, `setAnchorFeatureHash`, `setUniqueAnchors`, etc.).
- Update `CasToComparableTextTest` to cover HTML output, exclusions, ordering, anchor hashing and array/list rendering.
@reckart reckart added this to the 3.7.0 milestone Mar 19, 2026
@reckart reckart self-assigned this Mar 19, 2026
@reckart reckart added the ⭐️ Enhancement Improvement or new feature for users label Mar 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⭐️ Enhancement Improvement or new feature for users

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant