fix: Strip backslash before # in generated docstrings by Alan4506 · Pull Request #677 · smithy-lang/smithy-python

Alan4506 · 2026-04-03T04:14:47Z

Description of changes:
The MarkdownConverter strips unnecessary backslash escapes that pandoc adds for markdown syntax characters. However, # was missing from the strip list. When pandoc converts service documentation containing a literal # character, it escapes it to \# because # is a markdown heading marker. This \# then ends up in the generated Python docstring as an invalid escape sequence, causing pyright reportInvalidStringEscapeSequence errors.

This issue was identified during generation of Lex Runtime V2 client using its model file.

This PR adds # to the regex character class in postProcessPandocOutput() so that \# is correctly stripped to # in generated docstrings.

Verified by generating the Lex Runtime V2 client with the fix. smithy build passes ruff and pyright cleanly, and there is no \# in generated code.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

codegen/core/src/main/java/software/amazon/smithy/python/codegen/writer/MarkdownConverter.java

arandito · 2026-04-03T14:26:37Z

codegen/core/src/main/java/software/amazon/smithy/python/codegen/writer/MarkdownConverter.java

        // These characters don't need escaping in Python docstrings
        // Handles: [ ] ' { } ( ) < > ` @ _ * | ! ~ $
-        output = output.replaceAll("\\\\([\\[\\]'{}()<>`@_*|!~$])", "$1");
+        output = output.replaceAll("\\\\([\\[\\]'{}()<>`@_*|!~$#])", "$1");


nit: We should be careful with the characters we unescape. Since our docstrings are in markdown, # is a format specifier for headings if it's at the beginning of a line. If we ever have a case where we unescape and it exists at the beginning of a line, it will lead to the entire line being a heading.

It's definitely a small risk but I think we may be able to escape it as \\# to avoid this issue entirely. However, this is also an issue for any Markdown character that can change the line formatting (e.g.> for code blocks). We might consider a double backslash for all the markdown characters but that makes the in-code docstrings more noisy.

This should be fine for now, but we should follow up to update how we escape markdown characters if it becomes a large issue.

It makes sense to me. We should look into it more deeply if there is a large issue in the future.

fix: Strip backslash before # in generated docstrings

5c0ee7d

Alan4506 requested a review from a team as a code owner April 3, 2026 04:14

SamRemis requested changes Apr 3, 2026

View reviewed changes

codegen/core/src/main/java/software/amazon/smithy/python/codegen/writer/MarkdownConverter.java Show resolved Hide resolved

arandito reviewed Apr 3, 2026

View reviewed changes

Remove unnecessary comments

860716c

Alan4506 force-pushed the fix/docstring-escape-sequences branch from 4c2160b to 860716c Compare April 3, 2026 17:15

Alan4506 requested a review from SamRemis April 3, 2026 17:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Strip backslash before # in generated docstrings#677

fix: Strip backslash before # in generated docstrings#677
Alan4506 wants to merge 2 commits intosmithy-lang:developfrom
Alan4506:fix/docstring-escape-sequences

Alan4506 commented Apr 3, 2026

Uh oh!

Uh oh!

arandito Apr 3, 2026

Uh oh!

Alan4506 Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Alan4506 commented Apr 3, 2026

Uh oh!

Uh oh!

arandito Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Alan4506 Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants