gh-146044: Fix ctrl-w (unix-word-rubout) to use whitespace word boundaries#146174
gh-146044: Fix ctrl-w (unix-word-rubout) to use whitespace word boundaries#146174kimimgo wants to merge 5 commits intopython:mainfrom
Conversation
…on#146044) The unix-word-rubout command (ctrl-w) was using syntax_table-based word boundaries (bow()), which treats punctuation as word separators. This differs from bash/readline's unix-word-rubout which uses only whitespace as word boundaries. Add bow_whitespace() method that uses whitespace-only boundaries, and use it in unix_word_rubout instead of bow(). The existing bow() method (used by backward-kill-word/M-Backspace) is unchanged. Example: with 'foo.bar baz' and cursor at end: - Before (bow): ctrl-w deletes 'baz', then 'bar', then 'foo' - After (bow_whitespace): ctrl-w deletes 'baz', then 'foo.bar'
|
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
|
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
Add a check in Parse() to prevent calls when in_callback is true, as this violates expat's requirements and can cause crashes. Now raises RuntimeError with a clear message. Add tests to verify the fix and ensure normal parsing still works.
Add a check in Parse() to prevent calls when in_callback is true, as this violates expat's requirements and can cause crashes. Now raises RuntimeError with a clear message. Add tests to verify the fix and ensure normal parsing still works.
Add a check in Parse() to prevent calls when in_callback is true, as this violates expat's requirements and can cause crashes. Now raises RuntimeError with a clear message. Add tests to verify the fix and ensure normal parsing still works.
Add a check in Parse() to prevent calls when in_callback is true, as this violates expat's requirements and can cause crashes. Now raises RuntimeError with a clear message. Add tests to verify the fix and ensure normal parsing still works.
Add a check in Parse() to prevent calls when in_callback is true, as this violates expat's requirements and can cause crashes. Now raises RuntimeError with a clear message. Add tests to verify the fix and ensure normal parsing still works.
Add a check in Parse() to prevent calls when in_callback is true, as this violates expat's requirements and can cause crashes. Now raises RuntimeError with a clear message. Add tests to verify the fix and ensure normal parsing still works.
Add a check in Parse() to prevent calls when in_callback is true, as this violates expat's requirements and can cause crashes. Now raises RuntimeError with a clear message. Add tests to verify the fix and ensure normal parsing still works.
Add a check in Parse() to prevent calls when in_callback is true, as this violates expat's requirements and can cause crashes. Now raises RuntimeError with a clear message. Add tests to verify the fix and ensure normal parsing still works.
Add a check in Parse() to prevent calls when in_callback is true, as this violates expat's requirements and can cause crashes. Now raises RuntimeError with a clear message. Add tests to verify the fix and ensure normal parsing still works.
|
I personally prefer stopping at separators rather than whitespaces. I think zsh stops at punctuations. So I would first open a DPO thread about this before changing anything. |
|
Thanks for the feedback @picnixz! The distinction I'm making is between two different readline commands:
The current code maps Ctrl-W to This PR keeps That said, I'm happy to open a Discourse thread if you think this needs broader discussion first. |
|
I am a bit torn here... I would say M-backspace is... less convenient to type but at the same time if ctrl+w is meant to emulate the Unix one then I guess it may be better to ensure this. What is the behavior on the old REPL? (in 3.12?) |
ctrl-w (unix-word-rubout) uses whitespace boundaries on bash, zsh, and fish on macOS. |
ctrl-w stops on whitespace boundaries in the 3.12 REPL.
|
|
Thanks @injust for confirming the cross-shell behavior and the 3.12 regression! To summarize: this is a regression introduced in 3.13's pyrepl where This PR restores the 3.12 behavior by adding |
picnixz
left a comment
There was a problem hiding this comment.
Thanks but please make tests less LLM-esque.
Lib/_pyrepl/reader.py
Outdated
| p -= 1 | ||
| return p + 1 | ||
|
|
||
| def bow_whitespace(self, p: int | None = None) -> int: |
There was a problem hiding this comment.
Maybe unix_bow() instead? or bow_ws()? since we use quite short names.
Lib/test/test_pyrepl/test_reader.py
Outdated
|
|
||
| class TestBowWhitespace(TestCase): | ||
| def test_bow_whitespace_stops_at_whitespace(self): | ||
| # GH#146044 |
There was a problem hiding this comment.
# See https://github.com/cpython/issues/146044Use a link looking like that so that I can click on it in my IDE. I created the link from memory sonjust check that it is the correct URL.
Lib/test/test_pyrepl/test_reader.py
Outdated
| reader.pos = len(reader.buffer) | ||
|
|
||
| result = reader.bow() | ||
| self.assertEqual(result, 8) # same — "baz" is all word chars |
There was a problem hiding this comment.
| self.assertEqual(result, 8) # same — "baz" is all word chars | |
| self.assertEqual(result, 8) # same: "baz" is all word chars |
Avoid LLM long dashes and use regular english please
Lib/test/test_pyrepl/test_reader.py
Outdated
| reader.buffer = list("foo.bar") | ||
| reader.pos = len(reader.buffer) | ||
|
|
||
| # bow() stops at '.' → returns index of 'b' in "bar" |
There was a problem hiding this comment.
| # bow() stops at '.' → returns index of 'b' in "bar" | |
| # bow() stops at '.' so we return the index of 'b' in "bar" |
| @@ -0,0 +1,3 @@ | |||
| Fix ``unix-word-rubout`` (Ctrl-W) in the REPL to use whitespace-only word | |||
| boundaries, matching bash/readline behavior. Previously it used | |||
There was a problem hiding this comment.
| boundaries, matching bash/readline behavior. Previously it used | |
| boundaries, matching behavior of the basic REPL. Previously it used |
Lib/test/test_pyrepl/test_reader.py
Outdated
| self.assert_screen_equal(reader, 'flag {o}={z} {s}"🏳️\\u200d🌈"{z}'.format(**colors)) | ||
|
|
||
|
|
||
| class TestBowWhitespace(TestCase): |
There was a problem hiding this comment.
This entire test case can have one single reference to the GH issue as a comment and we can remove them from the methods.
However I would prefer that we extend the existing test case with the bow tests and place them where existing ones are.
|
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
Lib/_pyrepl/reader.py
Outdated
| p -= 1 | ||
| while p >= 0 and b[p] in (" ", "\n"): | ||
| p -= 1 | ||
| while p >= 0 and b[p] not in (" ", "\n"): |
There was a problem hiding this comment.
| while p >= 0 and b[p] not in (" ", "\n"): | |
| while p >= 0 and b[p] not in " \n": |
Lib/_pyrepl/reader.py
Outdated
| p = self.pos | ||
| b = self.buffer | ||
| p -= 1 | ||
| while p >= 0 and b[p] in (" ", "\n"): |
There was a problem hiding this comment.
| while p >= 0 and b[p] in (" ", "\n"): | |
| while p >= 0 and b[p] in " \n": |
Lib/_pyrepl/reader.py
Outdated
| p -= 1 | ||
| while p >= 0 and b[p] in (" ", "\n"): | ||
| p -= 1 | ||
| while p >= 0 and b[p] not in (" ", "\n"): |
There was a problem hiding this comment.
OOC newlines are also counted but what about tabs? do we also convert them to 4-indents in the REPL or?
- Rename bow_whitespace() to bow_ws() for shorter naming convention - Move tests from separate TestBowWhitespace class into TestReader - Add tab character to whitespace set (space, newline, tab) - Add test_bow_ws_with_tabs for tab handling - Fix link format to python#146044 - Remove LLM-style long dashes from comments - Update NEWS wording per review
|
Thanks @picnixz! Updated:
|
| p defaults to self.pos; only whitespace is considered a word | ||
| boundary, matching the behavior of unix-word-rubout in bash/readline. | ||
| See https://github.com/python/cpython/issues/146044""" |
There was a problem hiding this comment.
| See https://github.com/python/cpython/issues/146044""" | |
| """ |
| self.assertEqual(reader.bow_ws(), 4) | ||
|
|
There was a problem hiding this comment.
| self.assertEqual(reader.bow_ws(), 4) | |
| self.assertEqual(reader.bow_ws(), 4) | |
| reader.setpos_from_xy(8, 0) | ||
| self.assertEqual(reader.pos, 7) | ||
|
|
||
| def test_bow_ws_stops_at_whitespace(self): |
There was a problem hiding this comment.
Where are the existing tests for bow()? id there are some, please put those tests next to them
| def test_bow_ws_includes_punctuation_in_word(self): | ||
| reader = prepare_reader(prepare_console([])) | ||
| reader.buffer = list("foo.bar(baz) qux") | ||
| reader.pos = 12 |
There was a problem hiding this comment.
Use the index() method to get the index of )
|
|
||
| def test_bow_ws_with_tabs(self): | ||
| reader = prepare_reader(prepare_console([])) | ||
| reader.buffer = list("foo\tbar") |
There was a problem hiding this comment.
Add tests with \n and ensure that you also properly jump lines.
Lib/test/test_pyrepl/test_reader.py
Outdated
|
|
||
|
|
Fixes #146044
Summary
The
unix-word-ruboutcommand (ctrl-w) was usingbow()which treatspunctuation as word separators via the syntax table. This differs from
bash/readline's
unix-word-ruboutwhich uses only whitespace as boundaries.Changes
reader.py: Addbow_whitespace()method — same asbow()but onlyconsiders spaces and newlines as word boundaries
commands.py:unix_word_ruboutnow callsbow_whitespace()instead ofbow()test_reader.py: AddTestBowWhitespaceclass with 4 tests verifyingthe whitespace-only behavior and the difference from
bow()Example
With buffer
foo.bar bazand cursor at end:baz, thenbar, thenfoo(3 operations)baz, thenfoo.bar(2 operations, matching bash)The existing
bow()method used bybackward-kill-word(M-Backspace) is unchanged.