Draft
Conversation
Both SAW and SAA use GenericExecutor with a simple Execute function. SAW gets a dedicated minimal workflow registered on the existing Go worker. Reuse existing "payload" activity registration. Drop "cogs" from names.
WorkerOptions.FlagSet() now takes a prefix parameter. The outer CLI passes "worker-" (so users write --worker-max-concurrent-activities), and passthrough() strips it for the subprocess. The Go worker binary passes "" so it accepts the stripped names, matching dotnet/python/ typescript/java workers.
dandavison
commented
Mar 23, 2026
| m.fs.IntVar(&m.ActivityPollerAutoscaleMax, prefix+"activity-poller-autoscale-max", 0, "Max for activity poller autoscaling (overrides max-concurrent-activity-pollers)") | ||
| m.fs.IntVar(&m.WorkflowPollerAutoscaleMax, prefix+"workflow-poller-autoscale-max", 0, "Max for workflow poller autoscaling (overrides max-concurrent-workflow-pollers)") | ||
| m.fs.Float64Var(&m.WorkerActivitiesPerSecond, prefix+"activities-per-second", 0, "Per-worker activity rate limit") | ||
| m.fs.BoolVar(&m.ErrOnUnimplemented, prefix+"err-on-unimplemented", false, "Fail on unimplemented actions (currently this only applies to concurrent client actions)") |
Contributor
Author
There was a problem hiding this comment.
This is addressing what seemed to be a bug in omes.
This reverts commit 312bbe3.
New activity "payloadWithRetries" fails for N attempts then succeeds. Both scenarios accept --option fail-for-attempts=N (default 0, no retries). Retry backoff is 1ms with coefficient 1.0 to minimize wait time.
The Go SDK's PollActivityExecution uses a 10s default gRPC timeout when the context has no deadline. With 9 activity retries at server- enforced ~1s backoff, the activity takes >10s total, hitting this limit. Pass an explicit 60s timeout context to handle.Get().
Fixes PollActivityExecution 10s default timeout bug for standalone activity handle.Get() when context has no deadline.
The previous commit only upgraded the worker module. The starter (scenarios/loadgen) uses the root module, which is where handle.Get() runs and hits the 10s default gRPC timeout bug.
- Standalone activity StartToCloseTimeout 5s -> 30s (tight timeout caused failures at high throughput; irrelevant for COGS experiment) - Default executor rate 1000/s -> 500/s to start conservatively - Fix fish variable scoping bug in run-executor.fish (yq_expr was local to if block, causing placeholder image) - Add worker tuning flags and image to patch-worker.fish - Support API key auth in deploy-omes scripts
SAA scenario options (with defaults): - start-to-close-timeout-seconds (30) - schedule-to-close-timeout-seconds (120) - get-timeout-seconds (120) Also bump SAW workflow's activity StartToCloseTimeout from 5s to 30s.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.