Replace REST and RabbitMQ with gRPC by benthecarman · Pull Request #176 · lightningdevkit/ldk-server

benthecarman · 2026-03-27T21:38:32Z

Implement gRPC directly on hyper's HTTP/2 support, without tonic, to keep the dependency tree small. This consolidates two separate protocols (protobuf-over-REST for RPCs, RabbitMQ for events) into a single gRPC interface.

#175 proved the migration works with tonic. We opted for from-scratch approach instead, informed by the plan at: https://gist.github.com/tnull/1f6ffc5ae71c418f28844e27eee8c62b

Events are now delivered via a server-streaming SubscribeEvents RPC backed by a tokio broadcast channel, eliminating the RabbitMQ operational dependency. HMAC auth is simplified to timestamp-only signing since TLS guarantees body integrity.

Had claude break it into several commits that should make it easy to review.

Also tested with the cli made in #175 and it works so seems our implementation is compliant

ldk-reviews-bot · 2026-03-27T21:38:35Z

👋 Thanks for assigning @tnull as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

joostjager

Quick Claude check for missing parts

joostjager · 2026-03-28T06:30:34Z

ldk-server/src/grpc.rs

@@ -0,0 +1,265 @@
+// This file is Copyright its original authors, visible in version control


What's missing vs. the gRPC spec

Critical for external client compatibility

Trailers-Only encoding may be wrong - For error responses with no body (GrpcBody::Error), the gRPC spec says to use "Trailers-Only" mode: a single HEADERS frame containing both the HTTP headers (:status, content-type) and the gRPC trailers (grpc-status, grpc-message). The current implementation sends the HTTP response headers first, then the grpc-status as a separate trailers frame. Some client libraries (especially grpc-go, grpc-java) may not find grpc-status where they expect it, or may interpret the empty-body-with-trailers differently. This affects every error path.

Content-type check is too loose - starts_with("application/grpc") (line 220) accepts application/grpcfoo or application/grpc+json. The spec requires exactly application/grpc, application/grpc+proto, or application/grpc+<codec> for a codec you support. A client sending application/grpc+json would be accepted but get protobuf back. Should reject anything other than application/grpc and application/grpc+proto.

No grpc-timeout / deadline support - External clients routinely set grpc-timeout (e.g., grpc-timeout: 5S). This server ignores it entirely, so a client expecting a DEADLINE_EXCEEDED after 5 seconds will instead hang until the handler finishes or the connection drops.

No grpc-accept-encoding advertisement - The server doesn't send grpc-accept-encoding: identity in responses. External clients that default to gzip compression will try it, get UNIMPLEMENTED, then retry without compression on every single call (doubling latency on first request per stream). Advertising identity upfront avoids this.

No server reflection - Tools like grpcurl, Postman, and Kreya that external developers use for testing/debugging rely on the grpc.reflection.v1 service to discover the API. Without it, external developers need the .proto files out-of-band, which is a significant friction point.

Functional issues (affect all clients)

Stream errors always appear as OK - GrpcBody::Stream sends OK trailers when the mpsc channel closes (line 191-193), regardless of why it closed. If the server encounters an error mid-stream (or the broadcast sender is dropped), the client sees grpc-status: 0. There's no mechanism to send error trailers on a stream.

Broadcast lag silently terminates streams - The broadcast->mpsc bridge (line 307-313) doesn't handle RecvError::Lagged. If a slow client falls behind the 1024-message broadcast buffer, rx.recv() returns Err(Lagged), which breaks the loop and closes the stream with OK status. The client has no idea events were lost.

No graceful shutdown for streams - On SIGTERM, active SubscribeEvents streams aren't drained or notified. The TCP connection just drops, so external clients see a transport error rather than a clean gRPC status.

Trailing data after gRPC frame is silently ignored - decode_grpc_body reads only one frame. If a client sends multiple gRPC frames in a single request body, the extra data is discarded silently. (Only matters if a client reuses a connection oddly, since these are all unary RPCs.)

No request cancellation detection - When a client cancels (RST_STREAM), the handler keeps running to completion. Handlers that do expensive work (e.g., ExportPathfindingScores) waste resources on cancelled requests.

Nice-to-have for production with external clients

No health checking protocol - No grpc.health.v1.Health service. External infrastructure (load balancers, Kubernetes, service meshes) uses this standard protocol for liveness/readiness probes.

No binary metadata support - The gRPC spec says headers ending in -bin are base64-encoded binary. External clients/interceptors may send binary metadata (e.g., tracing context in grpc-trace-bin), which this server would misinterpret as text.

No compression support - Correctly returns UNIMPLEMENTED per spec, which is fine. But combined with 4 (no grpc-accept-encoding), external clients pay a round-trip penalty discovering this.

Verdict

For the controlled ldk-server-client, most of this works because both sides agree on the wire format quirks. For external clients, items #1-#5 are the real blockers: #1 can cause error responses to be misinterpreted, #2 can silently accept incompatible codecs, #3 breaks client-side timeout contracts, #4 causes unnecessary round-trips, and #5 makes the API hard to discover. Items #6-#8 are correctness issues that affect everyone. Items #11-#13 are quality-of-life gaps for production deployments.

Did most of these beside

reflection: tonic doesn't even support this out of the box

health check: can just use, get-node-info

compression: added the flag so people won't try and not really needed

ldk-reviews-bot · 2026-03-30T00:01:27Z

🔔 1st Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

tnull

Started a first pass, so far reviewed the first 4 commits.

In general it will be hard to catch all the details, so the strategy likely is to drastically improve test coverage where we see the opportunity and go from there.

tnull · 2026-03-31T12:30:04Z

ldk-server/src/grpc.rs

+use std::task::{Context, Poll};
+
+use base64::Engine;
+use bytes::{BufMut, Bytes, BytesMut};


I think dropping bytes and replacing base64 with our own (e.g., @tankyleo's) might be worth exploring in a follow-up.

ldk-server/src/grpc.rs

tnull · 2026-03-31T12:55:08Z

ldk-server/src/main.rs

 									Ok(tls_stream) => {
 										let io_stream = TokioIo::new(tls_stream);
-										if let Err(err) = http1::Builder::new().serve_connection(io_stream, node_service).await {
+										if let Err(err) = http2::Builder::new(TokioExecutor::new()).serve_connection(io_stream, node_service).await {


Seems this is not creating another runtime, but maybe we should we be more intentional about this and maybe reuse a specific new runtime for gRPC requests to avoid any weird tokio behavior with LDK Node's runtime? I.e., do we want to implement Executor for a new GrpcRuntime object and then (re-)use this always? Or maybe we should punt on this and rethink runtime handling in a follow up (in this case open an issue?)?

Yeah probably better as a follow up

tnull · 2026-03-31T12:56:19Z

ldk-server/src/main.rs

 	paginated_store: Arc<dyn PaginatedKVStore>,
 ) {
 	if let Some(payment_details) = event_node.payment(payment_id) {
 		let payment = payment_to_proto(payment_details);


Hmm, pre-existing, but should we really query the payment store each time? Can't we use the data from the LDK Node events for this?

We only have this because ldk-server is currently duplicating the payment store for the pagination. Once we move over in ldk-node we can remove this.

ldk-server/src/service.rs

benthecarman · 2026-04-01T01:48:27Z

rebased and responded to @tnull's comments and added a bunch more tests (and prop tests) around some of the grpc stuff

G8XSU · 2026-04-01T02:03:01Z

The reason we chose protobuf-over-rest earlier was due to web support, I don't see that being covered or discussed here? What are your thoughts on that?
(There is grpc-web but it adds further complexity.)

And multiple reasons for protobuf over json:

'some' schema validation
generated typed objects in other languages.
binary efficiency for payload size and low bandwidth devices.

joostjager · 2026-04-01T07:08:11Z

My view on this is that some kind of http interface is probably needed anyway. For web support, or other connections where grpc doesn't work. In an AI-first world, that is sufficient. The AIs will sort it out. Adding (DIY) grpc on top of that seems mostly a theoretical advantage.

Similarly, I don't think protobuf-over-rest is necessary anymore. The LLM will do json just fine too.

tnull · 2026-04-01T07:57:27Z

Cargo.lock

 ]

-[[package]]
-name = "time"


Big win right here.

benthecarman · 2026-04-02T17:46:21Z

fixed rebase conflicts

ldk-reviews-bot · 2026-04-06T00:00:13Z

🔔 1st Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

tnull · 2026-04-06T13:16:35Z

The reason we chose protobuf-over-rest earlier was due to web support, I don't see that being covered or discussed here? What are your thoughts on that? (There is grpc-web but it adds further complexity.)

Yes, this was the initial motivation for doing protobuf-over-REST for VSS Server, but not so much in LDK Server. Here we mostly followed the same approach, IIRC?

Of course, protobuf would have always required a respective compiler/library in the web setting. So arguably, real/native web support would indeed mean JSON-over-REST, as @joostjager points out, which however has several idiosyncrasies of its own (starting with JSON being a super inefficient data format in general, but also stuff like only 56-bit integers, etc). And of course the reasons you've listed above on why protobuf might be preferable over JSON are valid (at least IMO). However, in turn, protobuf-over-REST was missing a few features we want / need (more integrated/ergonomic APIs for streaming notifications, etc pp).

Now, we concluded that either migrating to 'full' gRPC (to optimize for features and efficiency) or to JSON-over-REST (to optimize for compatibility with the web) would make sense. However, as discussed above both approaches also have their own cons, so we kind of expect that either way we'd end up with requiring additional software components that would allow us to supplement the weaknesses of either approach (depending on the use case). Thankfully, building and maintaining simple API surfaces that only delegate logic to core components is very very cheap since the recent improvements of AI tooling, so no big deal either way.

tnull

Now did ~a full pass. Please squash the fixup commits into the feature commits first touching any of the given codepaths to at least make this PR a little less unwieldy.

Mostly looks good, I think, though one remaining question I have is whether we should give the gRPC stuff its own crate from the getgo.

ldk-server/src/grpc.rs

e2e-tests/tests/e2e.rs

tnull · 2026-04-06T15:18:57Z

ldk-server-client/src/client.rs

+		let request = SubscribeEventsRequest {};
+		let proto_bytes = request.encode_to_vec();
+		let mut grpc_body = Vec::with_capacity(5 + proto_bytes.len());
+		grpc_body.push(0u8);


Would even little stuff like this make it worth to move the gRPC out to a dedicated library in the workspace that both sides could reuse, maybe ldk-server-protos should become ldk-server-grpc? Or we introduce the latter alongside the former?

That might also the right place to add quite a bit of additional test coverage based on tonic and/or other pre-existing gRPC libraries to make sure we're compliant and interoperable?

Renamed it to ldk-server-grpc and moved a lot of the grpc logic into there. Added some tests to test against tonic as well.

ldk-server/src/api/bolt11_claim_for_hash.rs

github-advanced-security · 2026-04-07T07:22:18Z

You are seeing this message because GitHub Code Scanning has recently been set up for this repository, or this pull request contains the workflow file for the Code Scanning tool.

What Enabling Code Scanning Means:

The 'Security' tab will display more code scanning analysis results (e.g., for the default branch).
Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results.
You will be able to see the analysis results for the pull request's branch on this overview once the scans have completed and the checks have passed.

For more information about GitHub Code Scanning, check out the documentation.

tnull

Needs a rebase now that #180 landed.

Define the LightningNode service with all 38 unary RPCs and a server-streaming SubscribeEvents RPC. Add SubscribeEventsRequest message and events.proto import. Update response comments from REST/HTTP style to gRPC style. prost-build ignores service blocks, so this is a documentation- only change that prepares for the gRPC migration without altering any generated code or runtime behavior. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Take &Context instead of Context by value in all 37 handler functions. The handlers only read from context.node and context.paginated_kv_store (both Arc), so borrowing avoids unnecessary Arc clones per request and prepares for the gRPC service rewrite where Context lives in the service struct. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Rename the crate and move shared gRPC wire protocol primitives (framing, status codes, percent encode/decode, timeout parsing) into ldk-server-grpc so both server and client can reuse them without duplicating code or pulling in each other's dependencies. Server-specific helpers (GrpcBody, response builders, request validation) remain in the server crate and re-export the shared items so existing imports are unaffected. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Switch from HTTP/1.1 protobuf-over-REST to gRPC over HTTP/2, implemented directly on hyper without tonic. The gRPC framing module handles encode/decode of the 5-byte length-prefixed wire format, status codes, and HTTP/2 trailers. Key changes: - hyper http1 → http2, add TokioExecutor for HTTP/2 connections - TLS ALPN set to h2 for HTTP/2 negotiation - HMAC auth simplified to timestamp-only (no body) since TLS guarantees integrity - Events delivered via tokio broadcast channel, replacing the RabbitMQ EventPublisher infrastructure - Config renamed rest_service_addr → grpc_service_addr - Removed lapin, async-trait, events-rabbitmq feature Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Switch from protobuf-over-REST to gRPC wire format. Requests use gRPC framing (5-byte length-prefix) with application/grpc+proto content-type, routed to /api.LightningNode/{method}. HMAC auth simplified to timestamp-only signing to match the server change. Error handling uses Trailers-Only mode: when the server returns a gRPC error without a body, grpc-status appears as a regular HTTP/2 header that reqwest can read. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Implement the SubscribeEvents RPC end-to-end: a Stream variant in GrpcBody that delivers multiple gRPC-framed messages from an mpsc channel, a handler in service.rs that bridges the broadcast channel to the streaming response, a client EventStream type that reads gRPC frames incrementally from the HTTP/2 body, and e2e test assertions that verify payment events arrive via the stream. Also regenerates proto code to include SubscribeEventsRequest. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Rename rest_service_address to grpc_service_address in CLI config and e2e test harness. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

benthecarman · 2026-04-08T19:26:07Z

rebased

benthecarman requested review from joostjager and tnull March 27, 2026 21:38

benthecarman force-pushed the from-scratch-grpc branch from 0fc1127 to f8a3565 Compare March 27, 2026 21:57

joostjager reviewed Mar 28, 2026

View reviewed changes

benthecarman force-pushed the from-scratch-grpc branch 2 times, most recently from a9e3448 to e554203 Compare March 28, 2026 18:41

benthecarman mentioned this pull request Mar 30, 2026

Replace REST and RabbitMQ with gRPC #175

Closed

tnull reviewed Mar 31, 2026

View reviewed changes

benthecarman force-pushed the from-scratch-grpc branch from e554203 to 309b589 Compare April 1, 2026 01:47

benthecarman force-pushed the from-scratch-grpc branch from 309b589 to ef97b98 Compare April 1, 2026 02:02

tnull reviewed Apr 1, 2026

View reviewed changes

Cargo.lock

]

[[package]]

name = "time"

Copy link
Copy Markdown

Collaborator

tnull Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big win right here.

benthecarman self-assigned this Apr 2, 2026

benthecarman added this to Weekly Goals Apr 2, 2026

benthecarman force-pushed the from-scratch-grpc branch from ef97b98 to c4cf408 Compare April 2, 2026 17:46

tnull self-requested a review April 3, 2026 12:46

tnull reviewed Apr 6, 2026

View reviewed changes

benthecarman force-pushed the from-scratch-grpc branch from c4cf408 to 8ef47de Compare April 7, 2026 07:21

benthecarman force-pushed the from-scratch-grpc branch from 8ef47de to b48b6bf Compare April 7, 2026 07:25

tnull reviewed Apr 7, 2026

View reviewed changes

benthecarman force-pushed the from-scratch-grpc branch from b48b6bf to 40f533a Compare April 7, 2026 10:46

benthecarman requested a review from tnull April 7, 2026 10:46

benthecarman force-pushed the from-scratch-grpc branch from 8dbb366 to dd75438 Compare April 7, 2026 22:12

benthecarman and others added 8 commits April 8, 2026 14:16

Update cli for gRPC

a870be0

Rename rest_service_address to grpc_service_address in CLI config and e2e test harness. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Update e2e tests CI

2a689fd

benthecarman force-pushed the from-scratch-grpc branch from dd75438 to 2a689fd Compare April 8, 2026 19:25

		@@ -0,0 +1,265 @@
		// This file is Copyright its original authors, visible in version control

Conversation

benthecarman commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ldk-reviews-bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joostjager left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joostjager Mar 28, 2026 • edited by benthecarman Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

What's missing vs. the gRPC spec

Critical for external client compatibility

Functional issues (affect all clients)

Nice-to-have for production with external clients

Verdict

Uh oh!

benthecarman Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ldk-reviews-bot commented Mar 30, 2026

Uh oh!

tnull left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

benthecarman commented Apr 1, 2026

Uh oh!

G8XSU commented Apr 1, 2026

Uh oh!

joostjager commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

benthecarman commented Apr 2, 2026

Uh oh!

ldk-reviews-bot commented Apr 6, 2026

Uh oh!

tnull commented Apr 6, 2026

Uh oh!

tnull left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-advanced-security bot commented Apr 7, 2026

What Enabling Code Scanning Means:

Uh oh!

tnull left a comment

Choose a reason for hiding this comment

Uh oh!

benthecarman commented Apr 8, 2026

Uh oh!

Reviewers

benthecarman commented Mar 27, 2026 •

edited

Loading

ldk-reviews-bot commented Mar 27, 2026 •

edited

Loading

joostjager left a comment •

edited

Loading

joostjager Mar 28, 2026 •

edited by benthecarman

Loading

benthecarman Mar 28, 2026 •

edited

Loading

joostjager commented Apr 1, 2026 •

edited

Loading