Skip to content

Add metric to crit invariants#1037

Open
RensR wants to merge 3 commits intomainfrom
upgrade-errors-to-crit
Open

Add metric to crit invariants#1037
RensR wants to merge 3 commits intomainfrom
upgrade-errors-to-crit

Conversation

@RensR
Copy link
Copy Markdown
Collaborator

@RensR RensR commented Apr 21, 2026

No description provided.

@RensR RensR force-pushed the upgrade-errors-to-crit branch from 57f72ae to b07e4f2 Compare April 21, 2026 11:37
@RensR RensR force-pushed the upgrade-errors-to-crit branch from b07e4f2 to 6cb7828 Compare April 21, 2026 11:50

// Explicitly check for the expected number of topics
if len(log.Topics) < 4 {
r.onCriticalInvariant(ctx)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding it to all errors, but some error logs are promoted to fatals to make them failing loud. The ones upgraded are the ones that would be the result from a malicious ramp. The ones that are still just skipping (but do send a metric!) are wrong payloads but not in a clearly malicious way.

Let me know if everything should be fatal.

}

results = append(results, protocol.MessageSentEvent{
MessageID: protocol.Bytes32(event.MessageId),
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Casts not needed

KodeyThomas
KodeyThomas previously approved these changes Apr 21, 2026
@RensR RensR marked this pull request as ready for review April 21, 2026 12:57
@RensR RensR requested a review from skudasov as a code owner April 21, 2026 12:57
Copilot AI review requested due to automatic review settings April 21, 2026 12:57
@RensR RensR requested a review from a team as a code owner April 21, 2026 12:57
@RensR RensR enabled auto-merge April 21, 2026 12:57
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new verifier metric for “critical source invariant violations” and wires it through the monitoring interface so the EVM source reader can emit it when encountering critical invariants / malformed source-chain message data.

Changes:

  • Extend MetricLabeler with IncrementCriticalSourceInvariantViolations and implement it across real/noop/fake/mock labelers.
  • Register a new OTel counter (verifier_critical_source_invariant_violations_total) and add the corresponding labeler method implementation.
  • Thread a callback into NewEVMSourceReader and invoke it on several critical validation / decoding paths.

Reviewed changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
verifier/testutil/metric_labeler.go Add noop implementation for the new metric method.
verifier/pkg/vtypes/interfaces.go Extend monitoring interface with the new metric method + doc comment.
verifier/pkg/monitoring/monitoring.go Add fake labeler implementation for the new metric method.
verifier/pkg/monitoring/metrics.go Register new counter + expose increment method on VerifierMetricLabeler.
verifier/pkg/helpers_test.go Update local noop labeler used in tests to satisfy interface.
verifier/internal/mocks/mock_MetricLabeler.go Update generated mock with new method/expecter.
integration/pkg/constructors/committee_verifier.go Provide per-chain labeled metric labeler and pass callback into EVM source reader.
integration/pkg/accessors/evm/factory.go Update EVM source reader constructor call with new parameter.
integration/pkg/accessors/evm/evm_source_reader.go Add callback plumbing and invoke it on critical validation/decoding paths.
.gitignore Normalize build ignore entry and add coverage.out.
Comments suppressed due to low confidence (1)

integration/pkg/accessors/evm/evm_source_reader.go:174

  • onCriticalInvariant (and the new IncrementCriticalSourceInvariantViolations metric) is triggered here for malformed logs (insufficient topics). That doesn’t match the interface doc that describes disagreement with configured on-chain facts; it will make the metric hard to interpret and may inflate counts due to decode/parsing issues. Consider either narrowing invocation to true invariant mismatches, or broadening/renaming the metric+doc to reflect all parse/validation failures it covers.
		// Explicitly check for the expected number of topics
		if len(log.Topics) < 4 {
			r.onCriticalInvariant(ctx)
			r.lggr.Errorw("CCIPMessageSent event has insufficient topics",
				"expected", 4,
				"actual", len(log.Topics),
				"blockNumber", log.BlockNumber,
				"txHash", log.TxHash.Hex())
			continue // to next message

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

// IncrementTaskVerificationPermanentErrors increments the counter for non-retryable verification errors.
IncrementTaskVerificationPermanentErrors(ctx context.Context)

// IncrementCriticalSourceInvariantViolations increments when encoded source-chain data disagrees with configured on-chain facts (e.g. onRamp in message vs observed contract).
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc comment for IncrementCriticalSourceInvariantViolations is narrowly scoped to encoded data disagreeing with configured on-chain facts, but current usage also covers decode/unpack/structural validation failures in the EVM source reader. Either update this comment (and metric description) to reflect all intended triggers, or introduce separate metrics so dashboards/alerts are not ambiguous.

Suggested change
// IncrementCriticalSourceInvariantViolations increments when encoded source-chain data disagrees with configured on-chain facts (e.g. onRamp in message vs observed contract).
// IncrementCriticalSourceInvariantViolations increments the counter for critical source-reader
// validation failures, including when encoded source-chain data disagrees with configured
// or observed on-chain facts, as well as decode, unpack, or structural validation failures
// while interpreting source-chain data.

Copilot uses AI. Check for mistakes.
Comment on lines 291 to 297
results = append(results, protocol.MessageSentEvent{
MessageID: protocol.Bytes32(event.MessageId),
MessageID: event.MessageId,
Message: *decodedMsg,
Receipts: allReceipts, // Keep original order from OnRamp event
BlockNumber: log.BlockNumber,
TxHash: protocol.ByteSlice(log.TxHash.Bytes()),
TxHash: log.TxHash.Bytes(),
})
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

protocol.MessageSentEvent expects MessageID to be protocol.Bytes32 and TxHash to be protocol.ByteSlice (both defined types), so assigning event.MessageId ([32]byte) and log.TxHash.Bytes() ([]byte) will not compile. Convert both to the expected protocol types when building the struct literal.

Copilot uses AI. Check for mistakes.
Comment on lines 255 to +259
if !decodedMsg.OnRampAddress.Equal(expectedSourceAddressBytes(r.onRampAddress)) {
r.lggr.Errorw("onRampAddress must match the value configured. This should never happen, if it does something is seriously wrong. Escalate immediately", "messageId", common.Bytes2Hex(event.MessageId[:]))
continue // to next message
r.onCriticalInvariant(ctx)
r.lggr.Fatalw("onRampAddress must match the value configured — critical invariant violated; escalate immediately",
"messageId", common.Bytes2Hex(event.MessageId[:]))
continue // ensure we never process this msg
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using lggr.Fatalw inside FetchMessageSentEvents is risky because fatal logging typically terminates the process; this would turn a single bad event into a full service outage, and the subsequent continue becomes effectively dead code. Prefer returning an error (so the caller can decide), or logging at Error/Panic with clear context and skipping the message.

Copilot uses AI. Check for mistakes.
carte7000
carte7000 previously approved these changes Apr 23, 2026
# Conflicts:
#	integration/pkg/accessors/evm/factory.go
@RensR RensR dismissed stale reviews from carte7000 and KodeyThomas via 7c9ca13 April 23, 2026 14:08
@github-actions
Copy link
Copy Markdown

Code coverage report:

Package main upgrade-errors-to-crit diff
github.com/smartcontractkit/chainlink-ccv/aggregator 48.44% 48.46% +0.02%
github.com/smartcontractkit/chainlink-ccv/bootstrap 42.60% 42.60% +0.00%
github.com/smartcontractkit/chainlink-ccv/cli 65.13% 65.13% +0.00%
github.com/smartcontractkit/chainlink-ccv/cmd 0.00% 0.00% +0.00%
github.com/smartcontractkit/chainlink-ccv/common 50.74% 50.74% +0.00%
github.com/smartcontractkit/chainlink-ccv/executor 45.97% 45.97% +0.00%
github.com/smartcontractkit/chainlink-ccv/indexer 37.69% 37.68% -0.01%
github.com/smartcontractkit/chainlink-ccv/integration 46.82% 46.82% +0.00%
github.com/smartcontractkit/chainlink-ccv/pkg 84.62% 84.62% +0.00%
github.com/smartcontractkit/chainlink-ccv/pricer 0.00% 0.00% +0.00%
github.com/smartcontractkit/chainlink-ccv/protocol 65.19% 65.19% +0.00%
github.com/smartcontractkit/chainlink-ccv/verifier 32.46% 32.46% +0.00%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants