Fix version error report race: write directly before disconnect#167
Open
lance0 wants to merge 1 commit intobgp:masterfrom
Open
Fix version error report race: write directly before disconnect#167lance0 wants to merge 1 commit intobgp:masterfrom
lance0 wants to merge 1 commit intobgp:masterfrom
Conversation
When a client connects with an unsupported protocol version, SendWrongVersionError() puts the Error Report PDU on the async transmits channel, then Disconnect() immediately cancels the context. The send loop races against the cancellation — if ctx.Done() fires first, the error report never reaches the wire. Two fixes: 1. Use the server's supported version (not the client's unsupported version) in the Error Report, per RFC 8210 §7. In checkVersion(), set c.version to PROTOCOL_VERSION_1 before calling SendWrongVersionError() so the PDU carries the version the client should downgrade to. In the enforceVersion path, c.version already holds the server's base version. 2. Add a brief sleep between SendWrongVersionError() and Disconnect() to give sendLoop time to drain the error report from the transmits channel. This preserves write serialization through the existing channel (safe for both TCP and SSH transports) rather than bypassing it with a direct write. Fixes bgp#165
6847f1c to
9499860
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When a client connects with an unsupported protocol version (e.g., RTR v2 to a v1 server), SendWrongVersionError() puts the Error Report PDU on the async transmits channel, then Disconnect() immediately cancels the context. The sendLoop selects on both transmits and ctx.Done() — if the cancellation fires first, the error report never reaches the wire.
This causes RTR v2 clients to never receive the Error Report with code 4 (Unsupported Protocol Version) per RFC 8210 section 10. Without it, clients cannot learn they should downgrade and instead retry with v2 indefinitely.
Fix
Error Report version byte: The Error Report now carries PROTOCOL_VERSION_1 (the server's highest supported version) instead of echoing back the client's unsupported version. Per RFC 8210 section 7, the client uses this version field to know what to retry with.
Delivery before disconnect: Uses the existing SendWrongVersionError() async channel (preserving write serialization for both TCP and SSH transports) with a brief sleep before Disconnect() to give sendLoop time to flush. The direct c.wr.Write() approach was unsafe for SSH transports (c.wr can be ssh.Channel, which does not document concurrent write safety) and raced with sendLoop.
The sleep is pragmatic — a cleaner approach would be a synchronous flush on the transmits channel. Happy to iterate.
Affected code paths
Testing
Context
Discovered via interop testing with a Rust RTR client (rustbgpd). Client-side workaround: lance0/rustbgpd@c1d0296
Fixes #165