-
Notifications
You must be signed in to change notification settings - Fork 1.7k
feat(metrics): add block transaction count and SR set change monitoring #6624
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Changes from 10 commits
9b80f91
1d2de1a
1c0c1b7
bc87e2e
6bece43
39dd708
e92b8e2
846e3d3
ff17a19
7f351bf
7fdb3c5
7791944
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -14,6 +14,7 @@ class MetricsCounter { | |
| init(MetricKeys.Counter.TXS, "tron txs info .", "type", "detail"); | ||
| init(MetricKeys.Counter.MINER, "tron miner info .", "miner", "type"); | ||
| init(MetricKeys.Counter.BLOCK_FORK, "tron block fork info .", "type"); | ||
| init(MetricKeys.Counter.SR_SET_CHANGE, "tron sr set change .", "action", "witness"); | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The Practical impact assessment:
Possible approaches:
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for raising the high-cardinality concern — it's a valid Prometheus best practice to watch for. However, I believe the witness label is acceptable here given TRON's specific constraints:
In summary, the worst-case time series count is roughly candidate_pool_size × 2 (add/remove), which stays well within Prometheus's comfortable operating range. That said, I agree it's worth documenting this characteristic. Could you suggest what form you'd prefer — a code comment at the metric registration site, a note in the PR description, or something else?
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. prefer to |
||
| init(MetricKeys.Counter.P2P_ERROR, "tron p2p error info .", "type"); | ||
| init(MetricKeys.Counter.P2P_DISCONNECT, "tron p2p disconnect .", "type"); | ||
| init(MetricKeys.Counter.INTERNAL_SERVICE_FAIL, "internal Service fail.", | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -3,10 +3,13 @@ | |
| import com.codahale.metrics.Counter; | ||
| import com.google.protobuf.ByteString; | ||
| import java.util.ArrayList; | ||
| import java.util.HashSet; | ||
| import java.util.List; | ||
| import java.util.Map; | ||
| import java.util.Set; | ||
| import java.util.SortedMap; | ||
| import java.util.concurrent.ConcurrentHashMap; | ||
| import java.util.stream.Collectors; | ||
| import lombok.Getter; | ||
| import lombok.Setter; | ||
| import org.bouncycastle.util.encoders.Hex; | ||
|
|
@@ -42,6 +45,9 @@ public class BlockChainMetricManager { | |
| private long failProcessBlockNum = 0; | ||
| @Setter | ||
| private String failProcessBlockReason = ""; | ||
| private final Set<String> lastActiveWitnesses = ConcurrentHashMap.newKeySet(); | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [P1] Two issues with 1) False thread-safety guarantee from
Suggest replacing with a plain // Only accessed from block processing thread
private final Set<String> lastActiveWitnesses = new HashSet<>();2) In-memory state is not rolled back during fork switching
During
Scenario — fork across a maintenance boundary:
Suggest adding a rollback guard: } else if (nextMaintenanceTime != lastNextMaintenanceTime) {
if (nextMaintenanceTime < lastNextMaintenanceTime) {
// Fork rollback detected — reinitialize instead of diffing
lastActiveWitnesses.clear();
lastActiveWitnesses.addAll(...current witnesses...);
lastNextMaintenanceTime = nextMaintenanceTime;
return;
}
// normal maintenance transition — diff and record
...
}
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On second thought, adding a rollback guard here over-complicates the monitoring layer. The core issue is that SR set change detection is business logic monitoring — it depends on chain state transitions (maintenance boundaries, fork switching), which makes it fundamentally unsuitable for an in-process metric that cannot participate in the revoke mechanism. A simpler and more robust approach would be to move SR set change detection off-chain: an external monitor reads the For the histogram (
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the thorough analysis! Both points are valid — I've addressed them in the latest commit (7fdb3c5): 1) Agreed. Since 2) Fork rollback guard Added a check for if (nextMaintenanceTime < lastNextMaintenanceTime) {
// Fork rollback detected — reinitialize instead of diffing
lastActiveWitnesses.clear();
lastActiveWitnesses.addAll(currentWitnesses);
} else {
recordSrSetChange(currentWitnesses);
}
lastNextMaintenanceTime = nextMaintenanceTime;Regarding your second comment about moving SR set change detection off-chain entirely — I understand the concern that cross-block state cannot participate in the revoke mechanism, but I believe the rollback guard above addresses the root issue with minimal complexity. Removing the in-process metric would lose the at-a-glance visibility in Grafana dashboards that makes it useful for operators, and pushing the diff logic to an external monitor adds operational overhead (deploying/maintaining a separate component) for an edge case that is already handled. I'd prefer to keep it in-chain with this guard in place. |
||
| // To control SR set change metric update logic, -1 means not initialized | ||
| private long lastNextMaintenanceTime = -1; | ||
|
|
||
| public BlockChainInfo getBlockChainInfo() { | ||
| BlockChainInfo blockChainInfo = new BlockChainInfo(); | ||
|
|
@@ -164,11 +170,52 @@ public void applyBlock(BlockCapsule block) { | |
| } | ||
|
|
||
| //TPS | ||
| if (block.getTransactions().size() > 0) { | ||
| MetricsUtil.meterMark(MetricsKey.BLOCKCHAIN_TPS, block.getTransactions().size()); | ||
| Metrics.counterInc(MetricKeys.Counter.TXS, block.getTransactions().size(), | ||
| int txCount = block.getTransactions().size(); | ||
| if (txCount > 0) { | ||
| MetricsUtil.meterMark(MetricsKey.BLOCKCHAIN_TPS, txCount); | ||
| Metrics.counterInc(MetricKeys.Counter.TXS, txCount, | ||
| MetricLabels.Counter.TXS_SUCCESS, MetricLabels.Counter.TXS_SUCCESS); | ||
| } | ||
| // Record transaction count distribution for all blocks (including empty blocks) | ||
| Metrics.histogramObserve(MetricKeys.Histogram.BLOCK_TRANSACTION_COUNT, txCount, | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good catch. I actually noticed this ordering issue during my investigation — However, this is not specific to the histogram I added — the existing metrics in the same method (e.g., If you agree, I'd like to open a separate issue to move
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [SHOULD] If the primary goal is empty block monitoring, a simple Counter or Gauge would suffice. A Histogram is better suited when transaction count distribution analysis is actually needed — it can always be upgraded later.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually, we have disscussed this question in the issue #6590, and finally decide to implement
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The SR set change detection logic (reading Suggest wrapping the entire SR detection block and the histogram observation (including the
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good catch. Wrapped the entire block (histogram observation + SR set change detection) with a Note that the existing metrics in the same method (e.g., |
||
| StringUtil.encode58Check(address)); | ||
|
|
||
| // SR set change detection | ||
| long nextMaintenanceTime = dbManager.getDynamicPropertiesStore().getNextMaintenanceTime(); | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The SR-set change detection is reading
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same root cause as above — since In fact, this is precisely why the SR-set change detection relies on This would be addressed together in that follow-up issue: once
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The SR set change detection logic in
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Moving the SR-set change detection into The |
||
| if (lastNextMaintenanceTime == -1) { | ||
| lastNextMaintenanceTime = nextMaintenanceTime; | ||
| lastActiveWitnesses.addAll(chainBaseManager.getWitnessScheduleStore().getActiveWitnesses() | ||
| .stream().map(w -> StringUtil.encode58Check(w.toByteArray())) | ||
| .collect(Collectors.toSet())); | ||
| } else if (nextMaintenanceTime != lastNextMaintenanceTime) { | ||
| Set<String> currentWitnesses = chainBaseManager.getWitnessScheduleStore().getActiveWitnesses() | ||
| .stream() | ||
| .map(w -> StringUtil.encode58Check(w.toByteArray())) | ||
| .collect(Collectors.toSet()); | ||
| recordSrSetChange(currentWitnesses); | ||
| lastNextMaintenanceTime = nextMaintenanceTime; | ||
| } | ||
| } | ||
|
|
||
| private void recordSrSetChange(Set<String> currentWitnesses) { | ||
| Set<String> added = new HashSet<>(currentWitnesses); | ||
| added.removeAll(lastActiveWitnesses); | ||
|
|
||
| Set<String> removed = new HashSet<>(lastActiveWitnesses); | ||
| removed.removeAll(currentWitnesses); | ||
|
|
||
| for (String address : added) { | ||
| Metrics.counterInc(MetricKeys.Counter.SR_SET_CHANGE, 1, | ||
| MetricLabels.Counter.SR_ADD, address); | ||
| } | ||
| for (String address : removed) { | ||
| Metrics.counterInc(MetricKeys.Counter.SR_SET_CHANGE, 1, | ||
| MetricLabels.Counter.SR_REMOVE, address); | ||
| } | ||
| if (!added.isEmpty() || !removed.isEmpty()) { | ||
| lastActiveWitnesses.clear(); | ||
| lastActiveWitnesses.addAll(currentWitnesses); | ||
| } | ||
|
Sunny6889 marked this conversation as resolved.
|
||
| } | ||
|
|
||
| private List<WitnessInfo> getSrList() { | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.