server: Cleanup external address discovery. by davecgh · Pull Request #3640 · decred/dcrd

davecgh · 2026-03-06T10:18:25Z

This requires #3638.

This current code that handles the external server address discovery is rather difficult to follow because it is poorly named and interspersed with other code that is not related.

This refactors the code to make it better separated and easier to follow. It also switches the main data structure that limits the number of addresses to use an LRU instead.

While refactoring this, I noticed there is room for improvement in terms of the logic as well. However, in order to keep the changes easier to review for correctness, this does not contain any notable overall logic changes and limits the changes to refactors and cleanup.

This correct the version in README.md to the most recent released version and brings the documentation in doc.go to more modern standards.

This does some basic test cleanup and modernizes some of the peer tests as follows: - Consolidates the mock peer config used throughout the tests - Consolidates and simplifies the mock pipe creation - Marks peer state tests as a helper - Uses t.Fatalf where appropriate - Removes additional newlines in failure strings

The majority of the tests in TestOutboundPeer are not actually testing anything because nothing is checked. This moves the one thing that is being tested into a separate test func and removes the rest since it is already tested elsewhere.

The refactors the primary inbound message processing that checks requirements, updates state, and invokes any configure message handlers into a separate method. This is primarily being done to support an upcoming change that will need to make use of the same logic before the main read loop.

Due to legacy reasons that no longer apply, connections are currently associated with a peer after the constructors have been called via AssociateConnection. This modifies the code to instead accept the connections in the inbound and outbound constructors and exports the Start method in its place. Ultimately, the goal is to split the handshake into a separate method and convert the lifecycle over to use contexts.

The current design where the handshake happens asynchronously when the async I/O is started is less than ideal and is quite brittle. It also significantly complicates everything as evidenced by several minor bugs over the years that have resulted from faulty assumptions which directly stem from its asynchronous nature. For an example of some of the complexity it causes, it means that a bunch of additional flags are required that solely related to the handshake. Namely, whether or not the version if known, whether the verack has been received, and whether the handshake is done. Then, because it's all happening asynchronously, later code has to be vigilant about checking that those events have happened. All of this complexity can entirely be avoided by simply requiring a successful synchronous handshake to take place prior to starting async I/O. With that in mind, this significantly reworks the way the handshake is handled so that happens via a separate blocking method and removes async handlers which are no longer required as a result. The following is a high level overview of the changes: - Introduce programmatically detectable errors consistent with other code throughout the repository - Move the handshake code to a separate blocking method named Handshake that accepts a callback to invoke with the received version message - The new method returns an error that callers can use to reliably detect a failed handshake - The callback can return an error to cause the handshake to fail and pass the error along to the caller - Make the initial handshake block until both the version and verack message are received - Introduce delayed processing for up to 3 messages sent between the version and verack message on old protocol versions - Any further received version or verack messages in the async I/O handlers are now unconditionally an error - Removes the OnVersion and OnVerAck async listeners that no longer apply - Updates the calling server code thread the overall process context down to the handshake and Run methods - Adds several additional tests for correctness - Updates the example to clearly show the new semantics - Includes extra documentation to elucidate the exact requirements for establishing a new peer as well as exactly which properties the caller can and can't rely on during the handshake

Now that the handshake is required to take place prior to starting async i/o processing, the version and verack messages are guaranteed to have been seen for a successful handshake. Given that, this removes the related fields and methods since they are no longer needed.

This modifies the lifecycle of peers to use the more modern Run pattern that based on contexts. In particular, this replaces the Start and WaitForDisconnect methods with a single method named Run and arranges for it to block until the provided context is cancelled or the peer is disconnected. This is more flexible for the caller since it can easily turn blocking code into async code while the reverse is not true. The new Run method waits for all goroutines that it starts to shutdown before returning to help ensure an orderly shutdown. Since all exported methods that send messages to the various groroutines via channels already select across the quit channel which is closed when the peer disconnects, the peer is now forcibly disconnected when the context is cancelled. This approach allows the flexibility for callers to use any combination of manually disconnecting peers via the Disconnect method and allowing them to automatically be disconnected when the context is cancelled. It also updates the server code accordingly.

Currently, the whitelist and banning detection splits the host and port multiple times since the address previously hadn't been parsed yet. However, it is no longer necessary since the parsed address is now available as soon as the peer first connects. This updates the detection funcs to take the parsed address directly.

This current code that handles the external server address discovery is rather difficult to follow because it is poorly named and interspersed with other code that is not related. This refactors the code to make it better separated and easier to follow. It also switches the main data structure that limits the number of addresses to use an LRU instead. While refactoring this, I noticed there is room for improvement in terms of the logic as well. However, in order to keep the changes easier to review for correctness, this does not contain any notable overall logic changes and limits the changes to refactors and cleanup.

davecgh added this to the 2.2.0 milestone Mar 6, 2026

davecgh force-pushed the server_external_addr_discovery branch from 221a417 to 1bb946e Compare March 6, 2026 18:31

davecgh added 10 commits March 17, 2026 01:07

peer: Update README.md and doc.go.

550f966

This correct the version in README.md to the most recent released version and brings the documentation in doc.go to more modern standards.

davecgh force-pushed the server_external_addr_discovery branch from 1bb946e to f7d3721 Compare March 17, 2026 06:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server: Cleanup external address discovery.#3640

server: Cleanup external address discovery.#3640
davecgh wants to merge 10 commits intodecred:masterfrom
davecgh:server_external_addr_discovery

davecgh commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

davecgh commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant