Skip to content

consensus/bor, internal/cli: full grpc implementation#2194

Open
marcello33 wants to merge 8 commits intodevelopfrom
mardizzone/grpc_improv
Open

consensus/bor, internal/cli: full grpc implementation#2194
marcello33 wants to merge 8 commits intodevelopfrom
mardizzone/grpc_improv

Conversation

@marcello33
Copy link
Copy Markdown
Collaborator

@marcello33 marcello33 commented Apr 23, 2026

Summary

In this PR we implement the server-side half of the heimdall-bor gRPC migration started by polyproto's GetBlockInfoInBatch, GetTdByHash, GetTdByNumber, GetAuthor messages (see this PR). It provides:

Results

  • All 7 methods are faster or at parity on gRPC.
  • Biggest win: GetBorChainBlockInfoInBatch with 77% latency reduction — the call used in the consensus-critical ExtendVoteHandler path.
  • 50–67% fewer bytes allocated per RPC across the board — less GC pressure.

Latency (ns per op)

Method                          HTTP ns/op    gRPC ns/op    Improvement
-------------------------------------------------------------------------
GetBorChainBlock                  584 873       406 353       -30.5%
GetBorChainBlockAuthor            476 365       417 626       -12.3%
GetBorChainBlockTd                466 027       404 187       -13.3%
GetBorChainBlockInfoInBatch     7 819 395     1 771 033       -77.3%
GetRootHash                       452 269       482 345        +6.7%
CheckIfBlocksExist                540 378       416 757       -22.9%
GetBlockByNumber                  644 355       430 615       -33.2%

Allocations (bytes and count per op)

Method                        HTTP B/op    gRPC B/op      Δ       HTTP allocs  gRPC allocs
-------------------------------------------------------------------------------------------
GetBorChainBlock                22 020        9 197    -58.2%          143          129
GetBorChainBlockAuthor          12 083        5 827    -51.8%          100           98
GetBorChainBlockTd              13 040        5 843    -55.2%          114           96
GetBorChainBlockInfoInBatch  1 164 435      386 226    -66.8%         6615         3473
GetRootHash                     12 171        5 811    -52.3%           98           95
CheckIfBlocksExist              22 088        9 205    -58.3%          147          131
GetBlockByNumber                24 853        9 596    -61.4%          163          132

Ambient system-level resource usage (per phase, 5 s interval across 1000 blocks)

Metric                       HTTP phase    gRPC phase       Δ
-------------------------------------------------------------
bor CPU avg                      2.60%         2.77%      +6.5%
bor CPU p95                      5.75%         7.18%     +24.9%
bor mem avg                  107.3 MiB     127.9 MiB     +19.2%
bor mem p95                  343.4 MiB     346.7 MiB      ~flat

heimdall CPU avg                 8.03%         7.94%      -1.1%
heimdall CPU p95                10.75%        11.30%      +5.1%
heimdall mem avg             121.5 MiB     127.9 MiB      +5.3%
heimdall mem p95             148.1 MiB     150.9 MiB      +1.9%

samples per phase (n=)           592           576

CPU samples during bench window (pprof, 120 s each, aligned to bench run)

Service                HTTP phase         gRPC phase          Δ
------------------------------------------------------------------
bor total CPU          35.25 s (29.4%)    34.93 s (29.1%)    -0.9%
heimdall total CPU      5.41 s  (4.5%)     6.34 s  (5.3%)   +17.2%

Heimdall CPU is ~17% higher on gRPC — consistent with the client doing proto decode per call instead of JSON-RPC decode.

Details

  • New gRPC handlers: GetBlockInfoInBatch, GetTdByHash, GetTdByNumber, GetAuthor.
  • Full proto Header fidelity: headerToProtoBorHeader now emits every 21 header fields.
  • Bearer-token authentication on the gRPC server: --grpc.token flag (or BOR_GRPC_TOKEN env var) + unary/stream interceptors. If the token is empty, auth is disabled. This matches the RFC 6750 §2.1 case-insensitive scheme.
  • Bor.APIs() caches a single *API instance via sync.Once, so the gRPC backend shares the same rootHashCache as the JSON-RPC registration path. Without this, I detected a 2.1× GetRootHash regression.
  • GetBlockInfoInBatch safety: size gate (≤ 256 blocks), end > MaxInt64 guard, count-based loop (wrap-safe), per-iteration ctx.Err() check. fetchBlockInfo returns (info, ok, err) so real failures propagate instead of silently truncating batches.
  • Proper gRPC status codes on every handler: InvalidArgument for malformed requests, NotFound for missing blocks/TD, OutOfRange for numeric-overflow cases. Callers can branch on canonical codes instead of everything collapsing to codes.Unknown.

What this enables on the heimdall side

  • Every heimdall-facing RPC now has a gRPC path (previously only GetRootHash, GetVoteOnHash, HeaderByNumber, BlockByNumber, TransactionReceipt, BorBlockReceipt).
  • Milestone proposition generation in heimdall (x/milestone/abci) can issue a single GetBlockInfoInBatch round-trip instead of 3×N JSON-RPC batch elements — up to 4.4× faster per call and 66% fewer bytes allocated per request on the kurtosis devnet benchmark.
  • Token-gated deployments are supported for same-host validator pairs; loopback-only safety check on the heimdall client refuses to transmit a token over plaintext to any non-loopback peer.

Info about the gRPC authentication

gRPC authentication via bearer token is not strictly needed when bor and heimdall run on the same host with bor bound to 127.0.0.1:3131 (the post-PR-#2078 default). This is because:

  • Loopback is the access control. Only processes on the same host (and in the same network namespace, so same container / same k8s pod) can reach 127.0.0.1:3131. Theoretically, remote attackers can't touch it.
  • Bor's gRPC exposes only read-only, public chain data (nothing private, no signing keys, no state-changing methods). Even an unauthenticated caller can only read things that are already public via the JSON-RPC endpoint, on-chain data, or a block explorer.
  • Bor accepts empty token to mean "auth disabled" — consistent with how bor treats loopback for TLS (plaintext is allowed on loopback without token).

The token becomes meaningful in three cases:

  • Cross-host: bor on a separate box (0.0.0.0:3131 + https://). The token is mandatory here, and heimdall refuses to transmit it over plaintext to a non-loopback peer.
  • Shared-host with multiple untrusted tenants: another process on the same machine could dial the loopback port. Rare for a validator box, but auth closes the gap.
  • Defense-in-depth against misconfiguration: if an operator later flips bor to 0.0.0.0 without noticing, a configured token prevents the port from being trivially open to the network.

Enabling it

gRPC is opt-in. Default config has bor listening on 127.0.0.1:3131 and heimdall dialing over HTTP JSON-RPC — no behavior change for existing operators.

Step 1: Bor side

  1. Upgrade bor to a version that includes this PR and PR security: harden gRPC server — disable by default, bind to localhost #2078 (the upstream [grpc] loopback default). Required — heimdall's clients assume both.
  2. Edit config.toml to opt into gRPC on the address heimdall can reach:
    [grpc]
    addr = "127.0.0.1:3131"    # same-host validator pair; no TLS required
    # addr = "0.0.0.0:3131"    # cross-host; must pair with TLS / operator firewall
    token = "the_token"         # prefer setting this via BOR_GRPC_TOKEN env var instead (see below)
  3. For authenticated deployments, prefer setting the token via env var (cli leaks into ps aux / shell history):
    export BOR_GRPC_TOKEN="$(openssl rand -hex 32)"
    
    bor server reads it at startup; the flag stays as a fallback.

Step 2: Heimdall side

Flip the config in app.toml:

bor_grpc_flag = "true"
bor_grpc_url  = "http://127.0.0.1:3131"   # same-host
# bor_grpc_url = "https://bor.example.net:3131"  # TLS cross-host
bor_grpc_token = "<match-bor-side>"        # empty if bor has empty token

Restart heimdall; the startup parity check runs one HeaderByNumber over both transports and logs a Warning / Fatal if the hashes diverge (guard against a stale bor that doesn't populate the full proto header).

Rollout recommendation

  • Validators: same-host bor+heimdall, addr=127.0.0.1:3131, token empty or set via BOR_GRPC_TOKEN. No TLS needed on loopback.
  • Cross-host (e.g., sentry + shared bor): require addr=0.0.0.0:3131 + https:// URL + matching token. Heimdall refuses to send a token over plaintext to a non-loopback peer.
  • Bridge nodes / RPC providers: can stay on the HTTP path indefinitely. The flag only affects the heimdall→bor direction.

Downgrade path

Set bor_grpc_flag = "false" in heimdall and restart. Bor doesn't need to be downgraded — it keeps serving both transports regardless of the flag.

Tests

  • Extensively unit-tested.
  • Ran on a kurtosis-pos devnet by switching on-off the gRPC flag. Waited for 1000-block window each: producer rotation, milestone formation, state sync — all match between transports.
  • HTTP vs gRPC benchmark against the same devnet: GetBlockInfoInBatch 4.4× faster on gRPC, 50–67% allocation reduction across all methods, GetRootHash regression closed via the cached API.

Dependencies

  • Requires heimdall PR fix : TestRemoteMultiNotifyFull #576 for the client-side counterpart. Neither side is functional alone — ship together or bor-first (bor stays backward-compatible with HTTP-only heimdall).
  • Requires polyproto ≥ v0.0.8 (see this PR) for the new GetBlockInfoInBatch / GetTdByHash / GetTdByNumber / GetAuthor RPCs and the expanded Header message.
  • Requires kurtosis-pos changes (small: expose bor's [grpc] port, thread cl_bor_grpc_flag + el_grpc_url through the CL launcher template) if you want to run the end-to-end bench on a devnet. The changes are here

Out of scope

  • Receipt Bloom and Logs are intentionally not populated yet. No heimdall consumer reads them today; will be a small follow-up when needed.
  • GetTdByNumber tag resolution (latest/pending) via HeaderByNumber → GetTd(hash) is scoped to this gRPC handler; the same defect in upstream eth/api_backend.go:GetTdByNumber affects the JSON-RPC path and is tracked separately.

Copilot AI review requested due to automatic review settings April 23, 2026 14:07
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

Tip: disable this comment in your organization's Code Review settings.

@marcello33
Copy link
Copy Markdown
Collaborator Author

@claude review.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 23, 2026

Claude finished @marcello33's task in 3m 47s —— View job


Code Review: gRPC Implementation

  • Gather context and understand changes
  • Review gRPC server and auth changes (internal/cli/server/)
  • Review consensus changes (consensus/bor/)
  • Review configuration and flag changes
  • Review test files
  • Review dependency changes
  • Post final review feedback

Overall this is a solid WIP. The auth interceptor, API caching, and header expansion are well-designed. I found one security-sensitive bug and a few smaller items.


Finding 1 — uint64 overflow in GetBlockInfoInBatch loop (HIGH)

File: internal/cli/server/api_service.go:212

for i := req.StartBlockNumber; i <= req.EndBlockNumber; i++ {

When EndBlockNumber == math.MaxUint64, i++ wraps to 0 after the last iteration, and 0 <= MaxUint64 is true — the loop restarts from block 0 and scans the entire chain.

The batch-size check at line 201 doesn't catch this. Example: start = MaxUint64 - 100, end = MaxUint64 passes validation (100 < 256) but the loop wraps.

This is RPC-triggerable (gRPC endpoint). Each iteration does 3 backend calls (header, TD, author), so the loop can burn significant resources before hitting a missing block.

Suggested fix — use a count-based loop:

count := req.EndBlockNumber - req.StartBlockNumber + 1
for j := uint64(0); j < count; j++ {
    info, ok := s.fetchBlockInfo(ctx, req.StartBlockNumber+j)
    if !ok {
        break
    }
    out.Blocks = append(out.Blocks, info)
}

Fix this →

Note: the existing test TestGetBlockInfoInBatch_RangeOverflow tests start=0, end=MaxUint64 which is caught by the batch-size check. You'd need a test for start=MaxUint64-100, end=MaxUint64 to cover the wrap-around case.


Finding 2 — gRPC auth token visible in process listing (MEDIUM)

File: internal/cli/server/flags.go:1229-1234

The --grpc.token CLI flag means the token appears in ps aux, /proc/<pid>/cmdline, and shell history. Consider supporting an environment variable (e.g., BOR_GRPC_TOKEN) or file-path-based token reading (--grpc.token-file) as the primary mechanism, with the flag as a fallback.


Finding 3 — Shared *uint64 pointers in headerToProtoBorHeader (LOW)

File: internal/cli/server/api_service.go:67-68

out.BlobGasUsed = h.BlobGasUsed
out.ExcessBlobGas = h.ExcessBlobGas

These share the pointer from the source header to the proto output. After proto serialization (the production path) this is harmless, but for any in-process usage the proto message and the types.Header alias the same *uint64. A defensive copy would be consistent with how ExtraData is handled on line 52 (append([]byte(nil), h.Extra...)):

if h.BlobGasUsed != nil {
    v := *h.BlobGasUsed
    out.BlobGasUsed = &v
}

Finding 4 — Missing context cancellation check in batch loop (LOW)

File: internal/cli/server/api_service.go:212-221

The loop doesn't check ctx.Done(). With up to 256 iterations, a cancelled context keeps calling backend methods until they individually fail or all blocks are fetched. A select on ctx.Done() at the top of the loop would give faster cancellation:

select {
case <-ctx.Done():
    return nil, ctx.Err()
default:
}

Finding 5 — Redundant hash conversion in GetTdByHash (LOW)

File: internal/cli/server/api_service.go:169-170

hashBytes := protoutil.ConvertH256ToHash(req.Hash)
hash := common.BytesToHash(hashBytes[:])

ConvertH256ToHash already returns a [32]byte which can be assigned directly to common.Hash (same underlying type). The second conversion is redundant.


Finding 6 — data.csv row reordering (INFO)

File: eth/tracers/data.csv

The diff shows pure row reordering within each transaction group (same data, different order). This looks like non-deterministic map iteration in the test data generator. Worth verifying this isn't a concurrency issue — if this file is committed to the repo as test fixture, non-deterministic ordering causes noisy diffs.


Positive observations

  • Auth interceptor: Good use of subtle.ConstantTimeCompare. Properly gates both unary and stream RPCs. Error messages are appropriately vague ("invalid token" vs leaking details).
  • API caching via sync.Once (consensus/bor/bor.go:1532-1535): Correctly solves the per-call cache invalidation issue. GetRootHash and GetVoteOnHash in eth/bor_api_backend.go call APIs() on every invocation, so without caching the rootHashCache LRU inside *API would be thrown away each time.
  • headerToProtoBorHeader expansion: Complete field mapping with proper nil guards for post-Shanghai/Cancun optional fields. The round-trip tests (including the zero-blob-gas edge case) are well-motivated.
  • Batch endpoint design: Sensible maxBlockInfoBatchSize limit, gap-stop semantics, and genesis-aware author handling.
  • Test coverage: Comprehensive unit tests for auth (all error paths + constant-time check), batch boundary conditions, overflow, and header round-trips across fork eras.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This WIP PR extends the gRPC implementation by adding bearer-token authentication and expanding the Bor gRPC API surface, while aligning Bor’s RPC API lifecycle with gRPC’s per-call access pattern.

Changes:

  • Add optional gRPC bearer-token auth + unified unary/stream interceptors (also gating reflection).
  • Expand BorApi gRPC handlers and improve header→proto conversion + batch endpoint size gating.
  • Cache Bor’s *API instance across APIs() calls to preserve per-API state; update deps/tests accordingly.

Reviewed changes

Copilot reviewed 11 out of 13 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
internal/cli/server/server.go Add combined unary/stream interceptors and metadata-based bearer auth.
internal/cli/server/config.go Add GRPC.Token config field.
internal/cli/server/flags.go Add --grpc.token CLI flag.
internal/cli/server/grpc_auth_test.go Add unit tests for auth + interceptors.
internal/cli/server/api_service.go Add/extend BorApi gRPC handlers, header conversion, and batch endpoint with size gate.
internal/cli/server/api_service_test.go Add tests for header round-trip and batch-range guards.
consensus/bor/bor.go Cache the Bor *API returned by APIs() across calls.
consensus/bor/bor_test.go Add test asserting APIs() returns the same service instance across calls.
go.mod / go.sum Bump github.com/0xPolygon/polyproto to a newer pseudo-version.
cmd/keeper/go.mod / cmd/keeper/go.sum Bump indirect golang.org/x/sys.
eth/tracers/data.csv Reorder/update CSV rows.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread internal/cli/server/api_service_test.go
Comment thread internal/cli/server/api_service.go Outdated
Comment thread internal/cli/server/api_service.go
Comment thread internal/cli/server/api_service.go Outdated
Comment thread internal/cli/server/flags.go Outdated
Comment thread consensus/bor/bor.go Outdated
Comment thread internal/cli/server/api_service_test.go
@claude
Copy link
Copy Markdown

claude Bot commented Apr 23, 2026

Code Review

Bug: uint64 overflow causes infinite loop in GetBlockInfoInBatch (DoS)

File: internal/cli/server/api_service.go L197-L218

The for loop for i := req.StartBlockNumber; i <= req.EndBlockNumber; i++ at line 212 will never terminate when EndBlockNumber == math.MaxUint64. After i reaches MaxUint64, i++ wraps to 0 (unsigned integer overflow), and 0 <= MaxUint64 is always true.

The size gate does not prevent this. For example, StartBlockNumber = MaxUint64, EndBlockNumber = MaxUint64 has a difference of 0, which passes the >= 256 check.

Why fetchBlockInfo doesn't save us: rpc.BlockNumber(math.MaxUint64) casts to int64(-1) which is PendingBlockNumber — a valid block on mining/validator nodes. After wrapping to 0, the loop fetches from genesis through every block in the chain, causing unbounded memory allocation and CPU consumption.

Suggested fix: Restructure the loop to count iterations instead of iterating on the block number directly:

count := req.EndBlockNumber - req.StartBlockNumber + 1
for j := uint64(0); j < count; j++ {
    info, ok := s.fetchBlockInfo(ctx, req.StartBlockNumber+j)
    if !ok {
        break
    }
    out.Blocks = append(out.Blocks, info)
}

This avoids the i <= MaxUint64 trap since j < count where count is bounded by the size gate.

Additional concern: fetchBlockInfo casts uint64 block numbers to rpc.BlockNumber (int64) without checking for overflow. Large uint64 values silently map to negative int64 values corresponding to special block numbers (-1 = Pending, -2 = Latest), which could return unexpected blocks.

Comment thread internal/cli/server/api_service.go
Comment thread internal/cli/server/api_service.go Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 23, 2026

Codecov Report

❌ Patch coverage is 50.19011% with 131 lines in your changes missing coverage. Please review.
✅ Project coverage is 51.99%. Comparing base (7cd4e72) to head (a7ebf67).
⚠️ Report is 1 commits behind head on develop.

Files with missing lines Patch % Lines
internal/cli/server/api_service.go 40.22% 103 Missing and 4 partials ⚠️
internal/cli/server/utils.go 0.00% 10 Missing ⚠️
internal/cli/server/command.go 30.00% 6 Missing and 1 partial ⚠️
consensus/bor/bor.go 44.44% 4 Missing and 1 partial ⚠️
internal/cli/server/server.go 95.91% 2 Missing ⚠️

❌ Your patch check has failed because the patch coverage (50.19%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2194      +/-   ##
===========================================
- Coverage    51.99%   51.99%   -0.01%     
===========================================
  Files          884      884              
  Lines       155503   155735     +232     
===========================================
+ Hits         80854    80973     +119     
- Misses       69448    69552     +104     
- Partials      5201     5210       +9     
Files with missing lines Coverage Δ
consensus/bor/api.go 86.12% <ø> (ø)
internal/cli/server/config.go 63.84% <ø> (ø)
internal/cli/server/flags.go 100.00% <100.00%> (ø)
internal/cli/server/server.go 37.39% <95.91%> (+6.87%) ⬆️
consensus/bor/bor.go 83.35% <44.44%> (-0.29%) ⬇️
internal/cli/server/command.go 37.70% <30.00%> (-0.45%) ⬇️
internal/cli/server/utils.go 0.00% <0.00%> (ø)
internal/cli/server/api_service.go 31.34% <40.22%> (+31.34%) ⬆️

... and 18 files with indirect coverage changes

Files with missing lines Coverage Δ
consensus/bor/api.go 86.12% <ø> (ø)
internal/cli/server/config.go 63.84% <ø> (ø)
internal/cli/server/flags.go 100.00% <100.00%> (ø)
internal/cli/server/server.go 37.39% <95.91%> (+6.87%) ⬆️
consensus/bor/bor.go 83.35% <44.44%> (-0.29%) ⬇️
internal/cli/server/command.go 37.70% <30.00%> (-0.45%) ⬇️
internal/cli/server/utils.go 0.00% <0.00%> (ø)
internal/cli/server/api_service.go 31.34% <40.22%> (+31.34%) ⬆️

... and 18 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@marcello33
Copy link
Copy Markdown
Collaborator Author

@claude addressed your points. Review again.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 23, 2026

Claude finished @marcello33's task in 2m 4s —— View job


Re-review: gRPC Implementation (post fe62ec8)

  • Gather latest changes and compare against previous findings
  • Review fix for uint64 overflow in GetBlockInfoInBatch (Finding 1)
  • Review fix for shared pointer / slice aliasing issues (Finding 3 + Copilot)
  • Review fix for fetchBlockInfo uint64→int64 cast (Copilot)
  • Review fix for flag help text (Copilot)
  • Review fix for initializeRootHashCache race (Copilot)
  • Review fix for genesis author comment (Finding from claude review)
  • Check for any remaining issues
  • Post final review feedback

All the review comments from the first round have been addressed in fe62ec8. Here's how each fix landed:

Verified fixes

1. uint64 overflow in GetBlockInfoInBatch — Fixed (api_service.go:204-234)

The loop is now count-based (for j := uint64(0); j < count; j++) which cannot wrap. An explicit req.EndBlockNumber > math.MaxInt64 guard at line 211 rejects near-MaxUint64 ranges before they reach the loop. A new test TestGetBlockInfoInBatch_NearMaxUint64 covers the previously-missing boundary (MaxUint64-3, MaxUint64). Context cancellation check (ctx.Err()) was also added inside the loop. All good.

2. Shared *uint64 pointers — Fixed (api_service.go:69-76)

BlobGasUsed and ExcessBlobGas are now copied through a fresh variable, consistent with the rest of the conversion.

3. Bloom/Nonce slice aliasing — Fixed (api_service.go:49,54)

Both now use append([]byte(nil), ...) to create independent copies.

4. fetchBlockInfo int64 overflow guard — Fixed (api_service.go:242-244)

Added if blockNum > math.MaxInt64 { return nil, false } before the cast. Belt-and-suspenders with the GetBlockInfoInBatch guard.

5. Flag help text — Fixed (flags.go:1229)

Clear explanation of Bearer prefix stripping and recommendation to use BOR_GRPC_TOKEN env var.

6. BOR_GRPC_TOKEN env var — Added (command.go:137-142)

Good fallback that addresses the token-in-process-listing concern from Finding 2.

7. initializeRootHashCache race — Fixed (bor.go:1537-1542)

Cache is now initialized eagerly inside apiOnce.Do before c.api is assigned.

8. Genesis author comment — Fixed (api_service.go:239-240)

Comment now accurately says "nil *H160" and instructs callers to nil-check.

9. Redundant hash conversion in GetTdByHash — Fixed (api_service.go:177)

Now uses a direct type conversion: common.Hash(protoutil.ConvertH256ToHash(req.Hash)).


New observations on fe62ec8

1. panic in APIs() on cache init failure — Worth discussing (consensus/bor/bor.go:1539-1541)

if err := a.initializeRootHashCache(); err != nil {
    panic(fmt.Errorf("bor: failed to initialize rootHashCache: %w", err))
}

The initializeRootHashCache method creates an ARC LRU cache of size 10 — in practice, lru.NewARC(10) only fails if size <= 0, which can't happen here with a literal 10. So this panic is unreachable, and the defensive choice is fine. However, if the caching library ever changes failure modes, a panic in APIs() (called during node startup) would crash the node with a stack trace instead of a clean error message. Since APIs() returns []rpc.API with no error channel, this is about the only option — just noting it for awareness.

2. initializeRootHashCache is now called in two places (bor.go:1539 and api.go:282)

The eager call in apiOnce.Do guarantees the cache exists before any handler runs. The existing call in GetRootHash (api.go:282) is now redundant — it will always find rootHashCache != nil and be a no-op. Not a bug (the nil-check makes it idempotent), but you could clean it up in a follow-up to avoid confusion about which call actually initializes the cache.

3. getRpcBlockNumberFromString overflow check — Good addition (api_service.go:290-292)

The blckNum > math.MaxInt64 guard in the string parser ensures HeaderByNumber, GetAuthor, and GetTdByNumber (all of which take string block numbers) are also protected from the uint64→int64 reinterpretation. This was not flagged in the first review but is a nice improvement.


Summary

All HIGH and MEDIUM findings from the first review are resolved. The code is in good shape — the uint64 overflow was the most important fix and it's landed correctly with both the count-based loop and the MaxInt64 boundary check. The remaining items above are minor observations, not blockers.


View job

Comment thread internal/cli/server/api_service.go
Comment thread internal/cli/server/api_service.go Outdated
Comment thread internal/cli/server/api_service.go
Copilot AI review requested due to automatic review settings April 23, 2026 17:15
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 15 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread internal/cli/server/server.go
Comment thread internal/cli/server/server.go Outdated
Comment thread internal/cli/server/server.go
Comment thread internal/cli/server/api_service.go Outdated
@marcello33
Copy link
Copy Markdown
Collaborator Author

@claude addressed your points from the previous run.
Review once more.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 23, 2026

Claude finished @marcello33's task in 1m 53s —— View job


Re-review: gRPC Implementation (3rd pass — post edafc70)

  • Gather latest changes since last review (fe62ec8edafc70)
  • Review fixes for nil guard on GetTdByHash (claude finding)
  • Review fixes for fetchBlockInfo error handling (claude finding)
  • Review fixes for GetTdByNumber tag resolution (claude finding)
  • Review fixes for gRPC status codes (copilot finding)
  • Review fixes for auth interceptor issues (copilot findings)
  • Check for any new or remaining issues
  • Post final review feedback

All findings from the second review round have been addressed in edafc70. Here's the verification:

Verified fixes

1. Nil guard on GetTdByHash / TransactionReceipt / BorBlockReceipt — Fixed (api_service.go:22-27, 138-142, 161-165, 196-200)

A new protoHashToCommon helper centralizes the nil check for H256 and its Hi/Lo sub-messages, returning codes.InvalidArgument on malformed input. All three hash-consuming handlers now use it. This matches the guard the test helper already had (convH at line 224-230) and closes the panic-on-nil-hash path cleanly. Good factoring — one guard, three call sites.

2. fetchBlockInfo error handling — Fixed (api_service.go:272-316)

The function now returns (info, bool, error) with clear three-way semantics documented in the comment:

  • (info, true, nil) → success
  • (nil, false, nil) → legitimate gap (header missing / TD not indexed)
  • (nil, false, err) → real failure (backend error, ecrecover, overflow)

The critical fix: Author() failure is now propagated as an error instead of being silently swallowed as a gap. GetBlockInfoInBatch correctly short-circuits on real errors at line 257-259. Well done.

3. GetTdByNumber tag resolution — Fixed (api_service.go:212-233)

The handler now resolves the block number tag to a concrete header via HeaderByNumber first, then calls GetTd(ctx, header.Hash()) — avoiding the pre-existing upstream bug where GetTdByNumber passed the raw negative tag cast to uint64 as the block number key. This makes the gRPC endpoint deterministic regardless of TD cache state.

4. initializeRootHashCache dual-call documentation — Addressed (consensus/bor/api.go:280-282)

The GetRootHash comment now explains why the lazy init is kept: "fallback for direct-API paths (e.g., tests) that don't go through Bor.APIs." Good — this makes the intentional redundancy explicit.

Remaining observations (non-blocking)

1. GetBlockInfoInBatch validation errors use plain errors.New instead of status.Error(codes.InvalidArgument, ...) — LOW

File: api_service.go:238-245

The three validation checks (end < start, exceeds max batch size, exceeds max int64) return errors.New(...). In gRPC, plain errors become codes.Unknown on the wire. The fetchBlockInfo function already correctly uses status.Error(codes.InvalidArgument, ...) at line 281, and protoHashToCommon does the same at line 24. Making the validation errors consistent would let clients distinguish bad input from internal failures. Copilot flagged this in the latest review as well.

Fix this →

2. Auth interceptor: rejected requests are not logged — LOW

File: server.go:466-469

The comment says the interceptor "always logs the request duration", but when auth fails the function returns before start := time.Now() and the log.Trace call (line 471-473). Rejected auth attempts are therefore invisible in logs. This was flagged by Copilot in the latest round. Not a bug, but a monitoring gap — consider either logging rejected attempts (at a different level/message) or updating the comment.

3. Case-sensitive Bearer prefix check — INFO

File: server.go:508

Per RFC 6750, the auth scheme name is case-insensitive (bearer, BEARER, Bearer should all work). The current strings.HasPrefix(h, "Bearer ") rejects non-canonical casing. For an internal gRPC endpoint consumed by heimdall (which you control), this is fine — just document that only Bearer (capitalized) is accepted, or switch to strings.EqualFold on the prefix if broader client compatibility is desired.

4. subtle.ConstantTimeCompare timing note — INFO

File: server.go:512

The function comment says "constant-time comparison", which is accurate for equal-length inputs. subtle.ConstantTimeCompare does return immediately when lengths differ. Since the token is compared after stripping the Bearer prefix, an attacker could probe for the token length via timing. In practice this is irrelevant for a gRPC token on a local/internal endpoint, but if you want full constant-time guarantees, compare sha256.Sum256 of both sides instead.


Summary

The PR is in good shape. All HIGH and MEDIUM findings from the first two review rounds are resolved. The fetchBlockInfo three-way return is well-designed, the protoHashToCommon helper is a clean factoring, and the GetTdByNumber tag resolution fix sidesteps a real upstream bug. The remaining items are minor consistency and documentation nits — none are blockers.


View job

@marcello33
Copy link
Copy Markdown
Collaborator Author

@claude addressed latest comments. Review.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 23, 2026

Claude finished @marcello33's task in 1m 16s —— View job


Re-review: gRPC Implementation (4th pass — post 43b6efc)

  • Gather latest changes since last review (edafc7043b6efc)
  • Verify fix for GetBlockInfoInBatch validation errors (gRPC status codes)
  • Verify fix for auth interceptor logging on rejected requests
  • Verify fix for subtle.ConstantTimeCompare timing concerns
  • Verify fix for case-insensitive Bearer prefix
  • Check for any new or remaining issues
  • Post final review feedback

All findings from the 3rd review round have been addressed in 43b6efc. Here's the verification:

Verified fixes

1. GetBlockInfoInBatch validation errors now use gRPC status codes — Fixed (api_service.go:239-247)

All three validation errors (end < start, exceeds max batch size, end exceeds max int64) now return status.Error(codes.InvalidArgument, ...) instead of plain errors.New(...). This is consistent with fetchBlockInfo (line 283) and protoHashToCommon (line 24). Clients can now reliably distinguish bad input from internal failures.

2. Auth interceptor now logs rejected requests — Fixed (server.go:470, 490)

Both combinedUnaryInterceptor and combinedStreamInterceptor now log at Debug level when auth fails, before returning the error. The comments have been updated to accurately describe the logging behavior (Debug for rejections, Trace for successful calls). Good distinction between the two levels.

3. subtle.ConstantTimeCompare comment updated — Fixed (server.go:501-505)

The function comment now honestly states that ConstantTimeCompare is constant-time for equal-length inputs and that length-mismatched inputs short-circuit. No overstated guarantees.

4. Case-insensitive Bearer prefix — Fixed (server.go:517)

strings.HasPrefix(h, "Bearer ") replaced with len(h) < len(prefix) || !strings.EqualFold(h[:len(prefix)], prefix). This correctly handles bearer, BEARER, and other casings per RFC 6750 §2.1 while still extracting the token bytes after the 7-character prefix.

Remaining observations (non-blocking)

1. No test for case-insensitive Bearer prefix — INFO

The auth tests all use "Bearer secret" (canonical casing). Since this was a notable behavioral change, a test case with "bearer secret" or "BEARER secret" would document the RFC 6750 compliance and prevent regression. One-liner addition to the existing test suite.

2. errors.New vs status.Error inconsistency in other handlers — INFO

The GetBlockInfoInBatch validation now correctly uses status.Error(codes.InvalidArgument, ...), but several other handlers still return errors.New(...) for user-facing errors:

  • HeaderByNumber line 109: errors.New("header not found") → would be codes.NotFound
  • BlockByNumber line 126: errors.New("block not found") → would be codes.NotFound
  • TransactionReceipt lines 151, 155: errors.New("no receipts found"), errors.New("transaction index out of bounds")codes.NotFound / codes.InvalidArgument
  • GetTdByHash line 204: errors.New("total difficulty not found")codes.NotFound
  • GetTdByNumber lines 224, 228: same pattern

These all surface as codes.Unknown on the wire today. Not a bug — heimdall likely treats any non-nil error as a failure — but converting them to proper gRPC status codes in a follow-up would make the API more standard. Low priority.


Summary

All findings from the 3rd review round are resolved. The PR is in good shape — the gRPC status codes, auth logging, RFC 6750 compliance, and timing-guarantee documentation are all properly landed. The only remaining items are minor consistency nits (test coverage for case-insensitive Bearer, and gRPC status codes in other handlers), neither of which are blockers. LGTM for the scope of this WIP.


View job

@claude
Copy link
Copy Markdown

claude Bot commented Apr 23, 2026

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

Copilot AI review requested due to automatic review settings April 23, 2026 18:40
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 15 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread internal/cli/server/server.go
Comment thread internal/cli/server/api_service.go
Comment thread internal/cli/server/api_service.go
@claude
Copy link
Copy Markdown

claude Bot commented Apr 23, 2026

Code review

3 issues found. Checked for bugs and CLAUDE.md compliance.

1. Bug: new() called with value expressions — test file will not compile

File: internal/cli/server/api_service_test.goL46-L52, L78-L81, L260-L271

Go's new builtin has the signature func new(Type) *Type — it accepts a type, not a value expression. new(uint64(131072)) and new(common.HexToHash(...)) are compile errors. This affects 9 call sites, preventing the entire test file from compiling and all new tests from running.

Fix: use a generic helper func ptr[T any](v T) *T { return &v } and replace e.g. new(uint64(131072)) with ptr(uint64(131072)).

2. CLAUDE.md violation: panic() in consensus engine path

File: consensus/bor/bor.goL1538-L1542

The panic in APIs() violates security-common.md ("No panics in consensus, sync, or block production paths — return errors instead") and consensus-security.md ("Consensus code must never crash the node"). Consider using log.Crit instead.

3. CLAUDE.md violation: Bearer token exposed via CLI flag (visible in /proc)

File: internal/cli/server/flags.goL1227-L1232

The --grpc.token flag accepts a bearer secret on the command line, visible in /proc/<pid>/cmdline. rpc-security.md flags this pattern at MEDIUM severity. The env var fallback is good; consider either removing the flag or emitting a log.Warn when the token is sourced from CLI.

@marcello33
Copy link
Copy Markdown
Collaborator Author

@claude wrt #2194 (comment)
1 is not an issue: this is allowed in go 1.26+
2 and 3 are addressed.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 23, 2026

Claude finished @marcello33's task in 54s —— View job


Re-review: Verifying latest fixes (5th pass — post 5d681d7)

  • Acknowledge new(expr) is valid in Go 1.26+
  • Verify fix for panic() in APIs() (Finding 2)
  • Verify fix for Bearer token CLI flag exposure (Finding 3)
  • Verify additional fixes in 5d681d7
  • Post final review feedback

Understood on new(expr) — Go 1.26 supports new with value expressions. Apologies for the false positive.

Verified fixes in 5d681d7

1. panic() replaced with log.Crit — Fixed (consensus/bor/bor.go:1539-1542)

panic(fmt.Errorf(...)) replaced with log.Crit(...). The comment correctly notes this path is unreachable (constant size arg to lru.NewARC), and log.Crit is the appropriate pattern for fatal initialization errors in consensus code per consensus-security.md.

2. Bearer token CLI flag warning — Fixed (command.go:137-146)

When the token is sourced from --grpc.token or TOML config (i.e., visible in /proc), a log.Warn is now emitted directing operators to prefer BOR_GRPC_TOKEN. The env-var fallback was already in place from the prior round.

3. TransactionReceipt now checks found bool — Fixed (api_service.go:142-144)

GetTransaction return's found bool is now captured and returns codes.NotFound for missing transactions, instead of falling through to the misleading "no receipts found" error.

4. ConvertReceiptToProtoReceipt nil guards — Fixed (utils.go:64-70)

Both EffectiveGasPrice and BlockNumber now have nil guards before calling .Int64(), closing the pre-existing panic path on BorBlockReceipt where DeriveFieldsForBorReceipt doesn't populate these fields.

Summary

All findings from the previous rounds are resolved. No new issues found. LGTM.


View job

Comment thread eth/tracers/data.csv
Copilot AI review requested due to automatic review settings April 23, 2026 20:39
@marcello33 marcello33 review requested due to automatic review settings April 23, 2026 20:40
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Apr 23, 2026

Quality Gate Passed Quality Gate passed

Issues
0 New issues
4 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

@marcello33 marcello33 changed the title [WIP] grpc implementation full grpc implementation Apr 24, 2026
@marcello33 marcello33 changed the title full grpc implementation consensus/bor, internal/cli: full grpc implementation Apr 24, 2026
@marcello33 marcello33 requested a review from a team April 24, 2026 08:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants