Skip to content

Percent-encoding: add optional iri feature for IRI-style UTF-8 in encode output#1117

Open
fernandolins wants to merge 4 commits intoservo:mainfrom
ankitects:percent-encoding-iri
Open

Percent-encoding: add optional iri feature for IRI-style UTF-8 in encode output#1117
fernandolins wants to merge 4 commits intoservo:mainfrom
ankitects:percent-encoding-iri

Conversation

@fernandolins
Copy link
Copy Markdown

Context

This change is motivated by the discussion in #871 (readability of URLs when non-ASCII characters are percent-encoded, and making that behavior configurable without breaking the default).

Summary

Adds a Cargo feature iri on the percent-encoding crate. When enabled, AsciiSet::should_percent_encode only considers ASCII bytes: non-ASCII UTF-8 octets are not forced through percent-encoding based solely on being non-ASCII. When disabled (default), behavior matches the existing WHATWG-aligned rule (!byte.is_ascii() || self.contains(byte)).

Motivation

Some consumers (e.g. local file: / media paths in HTML) want Unicode segments to remain literal in the encoded string rather than always escaping every UTF-8 byte. That differs from the current default, which encodes all non-ASCII octets. Upstream url and WPT expectations rely on the default, so this is opt-in via a feature flag rather than a breaking change.

Behavior

iri Non-ASCII UTF-8 bytes ASCII bytes in the set
off (default) Percent-encoded Encoded
on Passed through Encoded

API / compatibility

  • Default: unchanged for existing dependents and for url + conformance tests.
  • Opt-in: percent-encoding = { version = "…", features = ["iri"] }.

Testing

  • cargo test (workspace default, no iri on percent-encoding in url’s manifest).
  • cargo test -p percent-encoding --features iri.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant