-
Notifications
You must be signed in to change notification settings - Fork 1.7k
feat(metrics): add block transaction count and SR set change monitoring #6624
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Changes from 11 commits
9b80f91
1d2de1a
1c0c1b7
bc87e2e
6bece43
39dd708
e92b8e2
846e3d3
ff17a19
7f351bf
7fdb3c5
7791944
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -14,6 +14,7 @@ class MetricsCounter { | |
| init(MetricKeys.Counter.TXS, "tron txs info .", "type", "detail"); | ||
| init(MetricKeys.Counter.MINER, "tron miner info .", "miner", "type"); | ||
| init(MetricKeys.Counter.BLOCK_FORK, "tron block fork info .", "type"); | ||
| init(MetricKeys.Counter.SR_SET_CHANGE, "tron sr set change .", "action", "witness"); | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The Practical impact assessment:
Possible approaches:
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for raising the high-cardinality concern — it's a valid Prometheus best practice to watch for. However, I believe the witness label is acceptable here given TRON's specific constraints:
In summary, the worst-case time series count is roughly candidate_pool_size × 2 (add/remove), which stays well within Prometheus's comfortable operating range. That said, I agree it's worth documenting this characteristic. Could you suggest what form you'd prefer — a code comment at the metric registration site, a note in the PR description, or something else?
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. prefer to |
||
| init(MetricKeys.Counter.P2P_ERROR, "tron p2p error info .", "type"); | ||
| init(MetricKeys.Counter.P2P_DISCONNECT, "tron p2p disconnect .", "type"); | ||
| init(MetricKeys.Counter.INTERNAL_SERVICE_FAIL, "internal Service fail.", | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -3,10 +3,13 @@ | |
| import com.codahale.metrics.Counter; | ||
| import com.google.protobuf.ByteString; | ||
| import java.util.ArrayList; | ||
| import java.util.HashSet; | ||
| import java.util.List; | ||
| import java.util.Map; | ||
| import java.util.Set; | ||
| import java.util.SortedMap; | ||
| import java.util.concurrent.ConcurrentHashMap; | ||
| import java.util.stream.Collectors; | ||
| import lombok.Getter; | ||
| import lombok.Setter; | ||
| import org.bouncycastle.util.encoders.Hex; | ||
|
|
@@ -42,6 +45,10 @@ public class BlockChainMetricManager { | |
| private long failProcessBlockNum = 0; | ||
| @Setter | ||
| private String failProcessBlockReason = ""; | ||
| // Only accessed from block processing thread (synchronized on blockLock) | ||
| private final Set<String> lastActiveWitnesses = new HashSet<>(); | ||
| // To control SR set change metric update logic, -1 means not initialized | ||
| private long lastNextMaintenanceTime = -1; | ||
|
|
||
| public BlockChainInfo getBlockChainInfo() { | ||
| BlockChainInfo blockChainInfo = new BlockChainInfo(); | ||
|
|
@@ -164,11 +171,58 @@ public void applyBlock(BlockCapsule block) { | |
| } | ||
|
|
||
| //TPS | ||
| if (block.getTransactions().size() > 0) { | ||
| MetricsUtil.meterMark(MetricsKey.BLOCKCHAIN_TPS, block.getTransactions().size()); | ||
| Metrics.counterInc(MetricKeys.Counter.TXS, block.getTransactions().size(), | ||
| int txCount = block.getTransactions().size(); | ||
| if (txCount > 0) { | ||
| MetricsUtil.meterMark(MetricsKey.BLOCKCHAIN_TPS, txCount); | ||
| Metrics.counterInc(MetricKeys.Counter.TXS, txCount, | ||
| MetricLabels.Counter.TXS_SUCCESS, MetricLabels.Counter.TXS_SUCCESS); | ||
| } | ||
| // Record transaction count distribution for all blocks (including empty blocks) | ||
| Metrics.histogramObserve(MetricKeys.Histogram.BLOCK_TRANSACTION_COUNT, txCount, | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good catch. I actually noticed this ordering issue during my investigation — However, this is not specific to the histogram I added — the existing metrics in the same method (e.g., If you agree, I'd like to open a separate issue to move
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [SHOULD] If the primary goal is empty block monitoring, a simple Counter or Gauge would suffice. A Histogram is better suited when transaction count distribution analysis is actually needed — it can always be upgraded later.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually, we have disscussed this question in the issue #6590, and finally decide to implement
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The SR set change detection logic (reading Suggest wrapping the entire SR detection block and the histogram observation (including the
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good catch. Wrapped the entire block (histogram observation + SR set change detection) with a Note that the existing metrics in the same method (e.g., |
||
| StringUtil.encode58Check(address)); | ||
|
|
||
| // SR set change detection | ||
| long nextMaintenanceTime = dbManager.getDynamicPropertiesStore().getNextMaintenanceTime(); | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The SR-set change detection is reading
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same root cause as above — since In fact, this is precisely why the SR-set change detection relies on This would be addressed together in that follow-up issue: once
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The SR set change detection logic in
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Moving the SR-set change detection into The |
||
| if (lastNextMaintenanceTime == -1) { | ||
| lastNextMaintenanceTime = nextMaintenanceTime; | ||
| lastActiveWitnesses.addAll(chainBaseManager.getWitnessScheduleStore().getActiveWitnesses() | ||
| .stream().map(w -> StringUtil.encode58Check(w.toByteArray())) | ||
| .collect(Collectors.toSet())); | ||
| } else if (nextMaintenanceTime != lastNextMaintenanceTime) { | ||
| Set<String> currentWitnesses = chainBaseManager.getWitnessScheduleStore().getActiveWitnesses() | ||
| .stream() | ||
| .map(w -> StringUtil.encode58Check(w.toByteArray())) | ||
| .collect(Collectors.toSet()); | ||
| if (nextMaintenanceTime < lastNextMaintenanceTime) { | ||
| // Fork rollback detected — reinitialize instead of diffing | ||
| lastActiveWitnesses.clear(); | ||
| lastActiveWitnesses.addAll(currentWitnesses); | ||
| } else { | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If a fork rollback does occur, wouldn’t the error message already have been written to the metrics?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @bladehan1 @lvs0075 You are absolutely right: the rollback guard can only re-calibrate the in-memory baseline for subsequent blocks; it cannot retract Prometheus counters that have already been incremented. This is an inherent limitation because in-process metrics do not participate in the chain's revoking mechanism. That said, this scenario — a fork rollback that crosses a maintenance boundary — is extremely rare in practice because TRON's maintenance interval is 6 hours and deep reorganizations of that magnitude are uncommon on the mainnet. From an observability perspective, the recorded add/remove events are not necessarily "false positives" either: they reflect real SR set transitions that occurred on the abandoned chain at the time. When a fork rollback does cross a maintenance boundary, the chain genuinely experienced a different SR set during that (later invalidated) period. The metric captures that transient state change, even though the chain eventually switched to another fork. In other words, This is a semantic trade-off. If the project prefers metrics to strictly mirror only the final canonical chain state, the only robust solution is to move SR change detection off-chain as @bladehan1 suggested. Otherwise, we can accept this limitation and document it clearly. Please let me know which direction the team prefers. |
||
| recordSrSetChange(currentWitnesses); | ||
| } | ||
| lastNextMaintenanceTime = nextMaintenanceTime; | ||
| } | ||
| } | ||
|
|
||
| private void recordSrSetChange(Set<String> currentWitnesses) { | ||
| Set<String> added = new HashSet<>(currentWitnesses); | ||
| added.removeAll(lastActiveWitnesses); | ||
|
|
||
| Set<String> removed = new HashSet<>(lastActiveWitnesses); | ||
| removed.removeAll(currentWitnesses); | ||
|
|
||
| for (String address : added) { | ||
| Metrics.counterInc(MetricKeys.Counter.SR_SET_CHANGE, 1, | ||
| MetricLabels.Counter.SR_ADD, address); | ||
| } | ||
| for (String address : removed) { | ||
| Metrics.counterInc(MetricKeys.Counter.SR_SET_CHANGE, 1, | ||
| MetricLabels.Counter.SR_REMOVE, address); | ||
| } | ||
| if (!added.isEmpty() || !removed.isEmpty()) { | ||
| lastActiveWitnesses.clear(); | ||
| lastActiveWitnesses.addAll(currentWitnesses); | ||
| } | ||
|
Sunny6889 marked this conversation as resolved.
|
||
| } | ||
|
|
||
| private List<WitnessInfo> getSrList() { | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.