Skip to content

feat(scheduler): replace clone concurrency limit with cost-based admission#242

Draft
worstell wants to merge 1 commit intomainfrom
eworstell/scheduler-job-cost
Draft

feat(scheduler): replace clone concurrency limit with cost-based admission#242
worstell wants to merge 1 commit intomainfrom
eworstell/scheduler-job-cost

Conversation

@worstell
Copy link
Copy Markdown
Contributor

Summary

Replace the binary isCloneJob/MaxCloneConcurrency mechanism with a generic cost model. Strategies declare the cost of each job at submit time, and the scheduler tracks total active cost against a configurable budget (max-cost).

Design

The Submit and SubmitPeriodicJob interface methods now accept a cost int parameter. The scheduler admits a job only when activeCost + job.cost <= maxCost (with a safety valve: any job is admitted when nothing else is running, preventing permanent starvation from misconfiguration).

This removes isCloneJob — the scheduler no longer has knowledge of git-specific job types.

Cost constants (git strategy)

Job type Cost Rationale
clone 4 Heavy CPU/IO/network, minutes
snapshot 3 Heavy CPU (zstd), moderate IO
repack 2 Heavy CPU, no network
fetch 1 Lightweight, seconds

Configuration

scheduler {
  max-cost = 16  # default: concurrency * 4
}

With the defaults (concurrency=4, max-cost=16), you can run up to 4 clones, or 1 clone + 3 snapshots + 1 fetch, etc. The worker count remains the hard parallelism cap.

Breaking changes

  • max-clone-concurrency config replaced by max-cost
  • Scheduler.Submit and SubmitPeriodicJob signatures changed (added cost int)

…ssion

Replace the binary isCloneJob/MaxCloneConcurrency mechanism with a
generic cost model. Strategies now declare the cost of each job at
submit time, and the scheduler tracks total active cost against a
configurable budget (max-cost).

Cost constants defined in the git strategy:
  clone=4, snapshot=3, repack=2, fetch=1

Default max-cost is Concurrency * 4. A job is always admitted when
nothing else is running, even if its cost exceeds max-cost, to prevent
permanent starvation from misconfiguration.

This removes isCloneJob and the scheduler's knowledge of git-specific
job types. The scheduler now sees only costs and the strategy decides
what each job is worth.

Amp-Thread-ID: https://ampcode.com/threads/T-019d404e-21ec-723a-b211-c619925dd12e
Co-authored-by: Amp <amp@ampcode.com>
MaxCloneConcurrency int `hcl:"max-clone-concurrency" help:"Maximum number of concurrent clone jobs. Remaining worker slots are reserved for fetch/repack/snapshot jobs. 0 means no limit." default:"0"`
SchedulerDB string `hcl:"scheduler-db" help:"Path to the scheduler state database." default:"${CACHEW_STATE}/scheduler.db"`
Concurrency int `hcl:"concurrency" help:"The maximum number of concurrent jobs to run (0 means number of cores)." default:"4"`
MaxCost int `hcl:"max-cost" help:"Maximum total cost of concurrently running jobs. Each job declares its own cost at submission. 0 means Concurrency * 4." default:"0"`
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this as a general scheduling improvement, but is this going to work as a replacement for the existing MaxCloneConcurrency? I feel like they're solving slightly different problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants