Skip to content

server: Cleanup external address discovery.#3640

Open
davecgh wants to merge 10 commits intodecred:masterfrom
davecgh:server_external_addr_discovery
Open

server: Cleanup external address discovery.#3640
davecgh wants to merge 10 commits intodecred:masterfrom
davecgh:server_external_addr_discovery

Conversation

@davecgh
Copy link
Member

@davecgh davecgh commented Mar 6, 2026

This requires #3638.

This current code that handles the external server address discovery is rather difficult to follow because it is poorly named and interspersed with other code that is not related.

This refactors the code to make it better separated and easier to follow. It also switches the main data structure that limits the number of addresses to use an LRU instead.

While refactoring this, I noticed there is room for improvement in terms of the logic as well. However, in order to keep the changes easier to review for correctness, this does not contain any notable overall logic changes and limits the changes to refactors and cleanup.

@davecgh davecgh added this to the 2.2.0 milestone Mar 6, 2026
@davecgh davecgh force-pushed the server_external_addr_discovery branch from 221a417 to 1bb946e Compare March 6, 2026 18:31
davecgh added 10 commits March 17, 2026 01:07
This correct the version in README.md to the most recent released
version and brings the documentation in doc.go to more modern standards.
This does some basic test cleanup and modernizes some of the peer tests
as follows:

- Consolidates the mock peer config used throughout the tests
- Consolidates and simplifies the mock pipe creation
- Marks peer state tests as a helper
- Uses t.Fatalf where appropriate
- Removes additional newlines in failure strings
The majority of the tests in TestOutboundPeer are not actually testing
anything because nothing is checked.  This moves the one thing that is
being tested into a separate test func and removes the rest since it is
already tested elsewhere.
The refactors the primary inbound message processing that checks
requirements, updates state, and invokes any configure message handlers
into a separate method.

This is primarily being done to support an upcoming change that will
need to make use of the same logic before the main read loop.
Due to legacy reasons that no longer apply, connections are currently
associated with a peer after the constructors have been called via
AssociateConnection.

This modifies the code to instead accept the connections in the inbound
and outbound constructors and exports the Start method in its place.

Ultimately, the goal is to split the handshake into a separate method
and convert the lifecycle over to use contexts.
The current design where the handshake happens asynchronously when the
async I/O is started is less than ideal and is quite brittle.  It also
significantly complicates everything as evidenced by several minor bugs
over the years that have resulted from faulty assumptions which directly
stem from its asynchronous nature.

For an example of some of the complexity it causes, it means that a
bunch of additional flags are required that solely related to the
handshake.  Namely, whether or not the version if known, whether the
verack has been received, and whether the handshake is done.  Then,
because it's all happening asynchronously, later code has to be vigilant
about checking that those events have happened.

All of this complexity can entirely be avoided by simply requiring a
successful synchronous handshake to take place prior to starting async
I/O.

With that in mind, this significantly reworks the way the handshake is
handled so that happens via a separate blocking method and removes async
handlers which are no longer required as a result.

The following is a high level overview of the changes:

- Introduce programmatically detectable errors consistent with other
  code throughout the repository
- Move the handshake code to a separate blocking method named Handshake
  that accepts a callback to invoke with the received version message
  - The new method returns an error that callers can use to reliably
    detect a failed handshake
  - The callback can return an error to cause the handshake to fail
    and pass the error along to the caller
- Make the initial handshake block until both the version and verack
  message are received
- Introduce delayed processing for up to 3 messages sent between the
  version and verack message on old protocol versions
- Any further received version or verack messages in the async I/O
  handlers are now unconditionally an error
- Removes the OnVersion and OnVerAck async listeners that no longer apply
- Updates the calling server code thread the overall process context
  down to the handshake and Run methods
- Adds several additional tests for correctness
- Updates the example to clearly show the new semantics
- Includes extra documentation to elucidate the exact requirements for
  establishing a new peer as well as exactly which properties the caller
  can and can't rely on during the handshake
Now that the handshake is required to take place prior to starting
async i/o processing, the version and verack messages are guaranteed to
have been seen for a successful handshake.

Given that, this removes the related fields and methods since they are
no longer needed.
This modifies the lifecycle of peers to use the more modern Run pattern
that based on contexts.

In particular, this replaces the Start and WaitForDisconnect methods
with a single method named Run and arranges for it to block until the
provided context is cancelled or the peer is disconnected.  This is more
flexible for the caller since it can easily turn blocking code into
async code while the reverse is not true.

The new Run method waits for all goroutines that it starts to shutdown
before returning to help ensure an orderly shutdown.

Since all exported methods that send messages to the various groroutines
via channels already select across the quit channel which is closed when
the peer disconnects, the peer is now forcibly disconnected when the
context is cancelled.

This approach allows the flexibility for callers to use any combination
of manually disconnecting peers via the Disconnect method and allowing
them to automatically be disconnected when the context is cancelled.

It also updates the server code accordingly.
Currently, the whitelist and banning detection splits the host and port
multiple times since the address previously hadn't been parsed yet.

However, it is no longer necessary since the parsed address is now
available as soon as the peer first connects.

This updates the detection funcs to take the parsed address directly.
This current code that handles the external server address discovery is
rather difficult to follow because it is poorly named and interspersed
with other code that is not related.

This refactors the code to make it better separated and easier to
follow.  It also switches the main data structure that limits the number
of addresses to use an LRU instead.

While refactoring this, I noticed there is room for improvement in terms
of the logic as well.  However, in order to keep the changes easier to
review for correctness, this does not contain any notable overall logic
changes and limits the changes to refactors and cleanup.
@davecgh davecgh force-pushed the server_external_addr_discovery branch from 1bb946e to f7d3721 Compare March 17, 2026 06:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant