fix: Strip backslash before # in generated docstrings#677
fix: Strip backslash before # in generated docstrings#677Alan4506 wants to merge 2 commits intosmithy-lang:developfrom
Conversation
codegen/core/src/main/java/software/amazon/smithy/python/codegen/writer/MarkdownConverter.java
Show resolved
Hide resolved
| // These characters don't need escaping in Python docstrings | ||
| // Handles: [ ] ' { } ( ) < > ` @ _ * | ! ~ $ | ||
| output = output.replaceAll("\\\\([\\[\\]'{}()<>`@_*|!~$])", "$1"); | ||
| output = output.replaceAll("\\\\([\\[\\]'{}()<>`@_*|!~$#])", "$1"); |
There was a problem hiding this comment.
nit: We should be careful with the characters we unescape. Since our docstrings are in markdown, # is a format specifier for headings if it's at the beginning of a line. If we ever have a case where we unescape and it exists at the beginning of a line, it will lead to the entire line being a heading.
It's definitely a small risk but I think we may be able to escape it as \\# to avoid this issue entirely. However, this is also an issue for any Markdown character that can change the line formatting (e.g.> for code blocks). We might consider a double backslash for all the markdown characters but that makes the in-code docstrings more noisy.
This should be fine for now, but we should follow up to update how we escape markdown characters if it becomes a large issue.
There was a problem hiding this comment.
It makes sense to me. We should look into it more deeply if there is a large issue in the future.
4c2160b to
860716c
Compare
Description of changes:
The
MarkdownConverterstrips unnecessary backslash escapes that pandoc adds for markdown syntax characters. However,#was missing from the strip list. When pandoc converts service documentation containing a literal#character, it escapes it to\#because#is a markdown heading marker. This\#then ends up in the generated Python docstring as an invalid escape sequence, causing pyrightreportInvalidStringEscapeSequenceerrors.This issue was identified during generation of Lex Runtime V2 client using its model file.
This PR adds
#to the regex character class inpostProcessPandocOutput()so that\#is correctly stripped to#in generated docstrings.Verified by generating the Lex Runtime V2 client with the fix.
smithy buildpasses ruff and pyright cleanly, and there is no\#in generated code.By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.