fix: reset _authentication_state to none#30409
Open
AldoFusterTurpin wants to merge 2 commits intoredpanda-data:devfrom
Open
fix: reset _authentication_state to none#30409AldoFusterTurpin wants to merge 2 commits intoredpanda-data:devfrom
AldoFusterTurpin wants to merge 2 commits intoredpanda-data:devfrom
Conversation
9187616 to
3ef13f3
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
Fixes a Kafka client broker-connection edge case where a SASL authentication failure could leave remote_broker stuck in an in_progress authentication state, preventing subsequent retries from re-attempting authentication.
Changes:
- Reset
_authentication_stateback toauth_state::nonewhenmaybe_authenticate()throws, enabling future authentication attempts on retry.
Member
|
Thanks! |
ea5dcfd to
f73b1e2
Compare
So the next call to maybe_initialize_connection() sees needs_authentication()==true and retries the full reconnect+authenticate sequence instead of returning early
f73b1e2 to
587600b
Compare
b32eb45 to
0a3ddf8
Compare
1. Connect successfully first so the cluster is running and brokers exist 2. Swap in wrong credentials and force a reconnect via restart(): this causes the existing broker instance to reconnect and hit maybe_authenticate() with bad credentials 3. After the expected failure, restore correct credentials 4. Dispatch again: this is the moment that exercises the fix: the same broker instance calls maybe_initialize_connection() -> maybe_authenticate(), and with the fix _authentication_state is none so it retries; without the fix it's in_progress so it skips auth silently and fails
0a3ddf8 to
ec188ee
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I am not a C++ developer (good memories from college 😆 ), but I was investigating this issue I faced and then I discovered this...
When the schema registry client tries to connect to a broker, it calls
connect_with_retries()→connect()→do_connect(), which opens a TCP+TLS connection and stores it in_fd. Thenmaybe_authenticate()is called and throwssasl_authentication_failed.State after auth failure:
_fd= non-null (TLS socket open, TCP alive) →is_valid()= true_authentication_state=in_progress(never reset to none)Every subsequent retry (every ~20s):
maybe_initialize_connection()checks:is_valid()==true ANDneeds_authentication()==false (state is in_progress, not none)The TCP connection eventually drops (keepalive timeout). Then
is_valid()= false,connect_with_retries()proceeds, allocates a new connection, auth fails again, and the cycle repeats. The broker never successfully authenticates.What call is missing: the catch block in
maybe_authenticate()needs to reset_authentication_stateto none so that the next retry actually attempts to reconnect and re-authenticate:The TLDR; We do the code change so the next call to maybe_initialize_connection() sees needs_authentication()==true and retries the full reconnect+authenticate sequence instead of returning early.
Backports Required
Release Notes
Bug Fixes
in_progressauthentication state after a SASL authentication failure, preventing subsequent retries from re-attempting authentication.