Add project executor framework for workflow load testing#301
Add project executor framework for workflow load testing#301
Conversation
6b6fec8 to
047022f
Compare
|
FWIW - I spent some time looking at migrating some of the other executors as well (namely, ebbandflow). I think in cases where the executor wants to modify it's load based on feedback from an |
f32508d to
960bbc9
Compare
960bbc9 to
8ab5cd7
Compare
Sushisource
left a comment
There was a problem hiding this comment.
Overall this makes sense, but I have a few concerns:
- We now have two entire harnesses/frameworks that need to be implemented for every language. That's kind of a lot. I would really prefer if there were just one, but it's not clear to me that there is a path to consolidating the existing stuff into the new structure. IMO we should try very hard to figure out how to make that work. Maybe that means something like retaining the gRPC invocation structure but allowing for project and non-project based things to get invoked that way. If we can't... I can probably accept having the two options... but it feels confusing and messy.
- I think this could've used a bit more self review before publishing. There's a lot of AI-written comments that sort of say what something is without explaining why it matters or what the semantics are. Try to put your self in a reviewers shoe's when self-reviewing and editing those
- The README situation could use some improvement. The one in projecttests/ is not bad at all but I desperately would like an architectural overview, probably with a diagram (this also makes life easier for reviewers) and I'd shorten an merge the docker readme into it.
| run: | | ||
| go test -v -race -timeout 10m ./projecttests/... 2>&1 | \ | ||
| go run github.com/jstemmer/go-junit-report/v2@latest \ | ||
| -set-exit-code -iocopy -out junit-projecttests-go.xml |
There was a problem hiding this comment.
This should probably just be two separate steps
| - name: Smoke test overlay image | ||
| run: | | ||
| docker run --rm omes-projecttest-helloworld:ci --help |
There was a problem hiding this comment.
Not sure I'd call that a smoke test?
| } | ||
|
|
||
| r.sdkOpts.AddCLIFlags(cmd.Flags()) | ||
| cmd.Flags().Lookup("language").Usage = "Language to use for workflow tests (go only)" |
There was a problem hiding this comment.
It won't be Go only except currently though?
Also, seemingly the language flag is redundant, since the projecttests path includes the language
| cmd.Flags().Lookup("language").Usage = "Language to use for workflow tests (go only)" | ||
| r.programOpts.AddFlags(cmd.Flags()) | ||
| cmd.Flags().AddFlagSet(r.loggingOpts.FlagSet()) | ||
| cmd.Flags().StringVar(&r.processMonitorAddr, "process-monitor-addr", "", "Address for process metrics sidecar (e.g. :9091)") |
There was a problem hiding this comment.
Should this be just a port number? Seems like it would not make sense to ever bind to anything other than 0.0.0.0?
| omes project --language go --project-dir ./projecttests/go/tests/helloworld --spawn-worker --iterations 100 | ||
| omes project --language go --project-dir ./projecttests/go/tests/helloworld --iterations 100`, |
There was a problem hiding this comment.
I'm not sure the names of the commands feel obvious. exec isn't in any way obviously related to project, but, it is. Maybe exec should be project-exec and this can be project-run.
Same comment as with exec that the language flag feels unnecessary here.
|
|
||
| // DefaultDerivedQueries returns a curated list of PromQL queries that precompute | ||
| // worker/process metrics into Omni-friendly metric lines. | ||
| func DefaultDerivedQueries() []PromQuery { |
There was a problem hiding this comment.
This is very duplicative of buildMetricQueries
| @@ -0,0 +1,151 @@ | |||
| # projecttests Docker Setup | |||
There was a problem hiding this comment.
This file is fairly useful (if a bit overly verbose), but it's not in a very useful place. Per my overall review comments, this should be combined with some kind of higher level readme that explains the whole structure and how to use it.
| func clientMain(ctx context.Context, config *harness.Config) error { | ||
| c, err := pool.GetOrDial("default", config.ConnectionOptions) |
There was a problem hiding this comment.
IMO rather than expecting everyone to use this pool, it would make more sense to just provide the client as part of the harness execute signature.
| # Project Tests | ||
|
|
||
| Self-contained Go programs for testing Temporal workflows under load. Each project is an independent Go module that implements its own workflows, activities, and execution logic, coordinated via gRPC. | ||
|
|
There was a problem hiding this comment.
This is more like the overview I was looking for but still lacks a real overview of the architecture, and still is written like Go will forever be the only language. Combining the docker readme with this, giving an architectural overview, and at least linking to it from the top level README would go a long way.
| fs.StringVar(&r.runFamily, "run-family", "", "Human-readable identifier for grouping related runs") | ||
| fs.StringVar(&r.taskQueue, "task-queue", "", "Task queue name (default: omes-<run-id>)") | ||
| fs.IntVar(&r.clientPort, "client-port", 0, "Port for local client HTTP server (0 = auto)") | ||
| fs.StringVar(&r.executor, "executor", "", "projecttests executor to run") |
There was a problem hiding this comment.
Seems like this should have a default?
|
Addressing feedback in: #317 |
Motivation
Omes today tests Temporal SDKs through scenarios — predefined load patterns that drive a generic KitchenSink worker using protobuf action sequences. This works well for cross-SDK conformance testing, but makes it difficult to test real workflows written in native Go (or any other language). The protobuf-action model is indirect: you describe what the workflow should do in protobuf, and a generic worker interprets it. You never actually test the workflow code that a team would write in practice.
Project tests are a new paradigm for omes. Instead of describing workflows indirectly, each project is a self-contained Go module with its own workflows, activities, worker, and execution logic. The framework coordinates these projects via gRPC, making them buildable, runnable, and testable independently — while still plugging into omes' load generation and metrics infrastructure.
Architecture
A project test is a Go binary that exposes two subcommands:
<project> worker— starts a Temporal worker with the project's real workflows and activities<project> project-server— starts a gRPC server that acceptsInitandExecuteRPCsThe harness (
projecttests/go/harness/) provides this framework. Projects register three callbacks —RegisterWorker,OnInit,OnExecute— and callRun(). The harness handles gRPC server lifecycle, worker management, client pooling, Prometheus metrics, and TLS configuration.The test runner (CLI or integration test) spawns both processes, sends an
InitRPC with connection details and config, then drives iterations viaExecuteRPCs. This keeps projects decoupled from the omes CLI while allowing omes to orchestrate load patterns (steady-rate, ebb-and-flow, saturation) against any project.What's included
Two example projects demonstrate the framework:
helloworld— minimal (~100 LOC), shows the basic patternthroughputstress— more involved, it's effectively a port of the existing throughput stress scenario. It is similar in that it parses a config file to conditionally execute a workflow (local/remote activities, child workflows, self-queries, self-signals, workflow updates, retry scenarios, heartbeats, and continue-as-new, etc.).Note: The throughput stress project acts as an example/demonstration of porting an existing scenario to a project. It's also the most commonly used scenario for our load testing, so is a valuable scenario to transition. You are not limited to using this project for throughput testing. Any project using the
steady-stateexecutor (i.e. the generic executor) can run throughput testing.CLI integration adds
omes projectandomes execcommands for running project-based load tests with configurable executors, metrics collection, and post-run verification.Note: the post-run verification is quite simply at the moment, basically a small collection/library of existing verification functions we use already
Docker support uses a two-layer image pattern: a base image with the Go toolchain and harness, plus thin per-project overlay images. A Docker Compose stack provides Temporal server + Prometheus for local testing with metrics.
CI added a
projecttests.ymlworkflow with three jobs for simple regression testing (integration tests - builds and runs helloworld + throughputstress against a dev server, docker image build verification, and proto generation/lint consistency checks)This PR also refactors the existing ebb-and-flow scenario to use shared internal utilities and adds Prometheus export/query support (
metrics/prom_export.go,metrics/prom_query.go)