Skip to content

fix: Strip backslash before # in generated docstrings#677

Open
Alan4506 wants to merge 2 commits intosmithy-lang:developfrom
Alan4506:fix/docstring-escape-sequences
Open

fix: Strip backslash before # in generated docstrings#677
Alan4506 wants to merge 2 commits intosmithy-lang:developfrom
Alan4506:fix/docstring-escape-sequences

Conversation

@Alan4506
Copy link
Copy Markdown
Contributor

@Alan4506 Alan4506 commented Apr 3, 2026

Description of changes:
The MarkdownConverter strips unnecessary backslash escapes that pandoc adds for markdown syntax characters. However, # was missing from the strip list. When pandoc converts service documentation containing a literal # character, it escapes it to \# because # is a markdown heading marker. This \# then ends up in the generated Python docstring as an invalid escape sequence, causing pyright reportInvalidStringEscapeSequence errors.

This issue was identified during generation of Lex Runtime V2 client using its model file.

This PR adds # to the regex character class in postProcessPandocOutput() so that \# is correctly stripped to # in generated docstrings.

Verified by generating the Lex Runtime V2 client with the fix. smithy build passes ruff and pyright cleanly, and there is no \# in generated code.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@Alan4506 Alan4506 requested a review from a team as a code owner April 3, 2026 04:14
// These characters don't need escaping in Python docstrings
// Handles: [ ] ' { } ( ) < > ` @ _ * | ! ~ $
output = output.replaceAll("\\\\([\\[\\]'{}()<>`@_*|!~$])", "$1");
output = output.replaceAll("\\\\([\\[\\]'{}()<>`@_*|!~$#])", "$1");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We should be careful with the characters we unescape. Since our docstrings are in markdown, # is a format specifier for headings if it's at the beginning of a line. If we ever have a case where we unescape and it exists at the beginning of a line, it will lead to the entire line being a heading.

It's definitely a small risk but I think we may be able to escape it as \\# to avoid this issue entirely. However, this is also an issue for any Markdown character that can change the line formatting (e.g.> for code blocks). We might consider a double backslash for all the markdown characters but that makes the in-code docstrings more noisy.

This should be fine for now, but we should follow up to update how we escape markdown characters if it becomes a large issue.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to me. We should look into it more deeply if there is a large issue in the future.

@Alan4506 Alan4506 force-pushed the fix/docstring-escape-sequences branch from 4c2160b to 860716c Compare April 3, 2026 17:15
@Alan4506 Alan4506 requested a review from SamRemis April 3, 2026 17:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants