feat: Add additional target groups fields, update strategy block, periodic config block etc. to nomad job resource and data source#585
Conversation
d95d83e to
92376c0
Compare
…d to job and task group
…iodic config block etc. to nomad job resource and data source This commit introduces new computed fields to the `nomad_job` resource and data source to expose the deployment state of task groups, update_strategy, periodic_config and other attributes. This includes fields such as `placed_canaries`, `desired_total`, `healthy_allocs`, and others. To prevent perpetual diffs caused by these server-set values, the resource's `CustomizeDiff` logic has been updated to merge the existing deployment state from the Terraform state into the planned `task_groups`. This ensures that plans remain clean after the initial apply. New acceptance tests have been added to verify that these deployment state attributes are correctly populated for both the resource and the data source. The existing service deployment test has also been stabilized by setting `detach = true`.
…d to job and task group
44f33d3 to
cd23044
Compare
nomad/datasource_nomad_job.go
Outdated
|
|
||
| // Fetch the latest deployment for task group deployment status. | ||
| var deploymentTGs map[string]*api.DeploymentState | ||
| deploys, _, err := client.Jobs().Deployments(id, false, &api.QueryOptions{ |
There was a problem hiding this comment.
I don't think we guarantee any particular ordering here. There's a LatestDeployment method you should use here instead. Also, note that batch and sysbatch jobs don't have deployments. So we need to handle the nil case safely and should probably document that this data won't be filled in for those job types.
nomad/datasource_nomad_job.go
Outdated
| } else if len(deploys) > 0 { | ||
| deploymentTGs = deploys[0].TaskGroups | ||
| } | ||
| d.Set("task_groups", jobTaskGroupsRaw(job.TaskGroups, deploymentTGs)) |
There was a problem hiding this comment.
I know we call this task_groups in the (incorrect) docs, but the Job objectalready hasTaskGroupsand we do expose a bunch of data from that even though it's undocumented. That is,task_groups should describe the contents of a [group`](https://developer.hashicorp.com/nomad/docs/job-specification/group) block from the jobspec, whereas this is about the deployment.
So we should definitely avoid merging these two items. (Especially if we ever try to fix #1 and #530). Maybe we move these into a new deployment_state top-level field and make it clear in the docs that's what the data represents? Or maybe we provide a new data source entirely for the deployment? That would be a closer representation of the underlying API.
There was a problem hiding this comment.
Makes sense, I’ve implemented the top-level deployment_state field and moved the deployment-specific data there. It now includes the latest deployment metadata plus task group deployment state.
| func canonicalizeTaskGroupUpdateStrategies(job *api.Job) { | ||
| if job == nil { | ||
| return | ||
| } | ||
|
|
||
| for _, taskGroup := range job.TaskGroups { | ||
| if taskGroup == nil { | ||
| continue | ||
| } | ||
|
|
||
| if jobUpdate, taskGroupUpdate := job.Update != nil, taskGroup.Update != nil; jobUpdate && taskGroupUpdate { | ||
| merged := job.Update.Copy() | ||
| merged.Merge(taskGroup.Update) | ||
| taskGroup.Update = merged | ||
| } else if jobUpdate && !job.Update.Empty() { | ||
| taskGroup.Update = job.Update.Copy() | ||
| } | ||
|
|
||
| if taskGroup.Update != nil { | ||
| taskGroup.Update.Canonicalize() | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Should we really be canonicalizing this? If it's coming from Nomad's state, doesn't mutating it misrepresent what data Nomad has?
There was a problem hiding this comment.
We’re not canonicalizing data read back from Nomad state here. This only runs on the parsed jobspec during CustomizeDiff, before we build the planned task_groups value.
The reason for doing it is that task_groups.update_strategy is exposed as the effective task group update config, but the jobspec may omit task-group-level fields that Nomad inherits from the job-level update block. On read, we end up showing those effective values, so without this normalization the plan can show diffs even when the jobspec is unchanged.
So this is less about mutating Nomad’s state and more about making the planned representation match the effective values Nomad will apply.
There was a problem hiding this comment.
Ok that makes sense given the architecture we currently have, because we're parsing the job client-side just as the CLI does but then need the canonicalized job to determine client-side diffs with state. Whereas the CLI doesn't diff locally -- the job plan send the request to the server to generate the diff. (Which is a Nomad diff and not a Terraform diff.)
nomad/resource_job.go
Outdated
| } | ||
| } | ||
|
|
||
| func mergeDeploymentStateFromState(planned []interface{}, state interface{}) []interface{} { |
There was a problem hiding this comment.
Style nitpick: use any instead of interface{}
Add deployment-level id, status, and status_description to the deployment_state field so it represents the latest deployment more completely, instead of only exposing nested task group state. Also update resource and data source tests plus docs to reflect the expanded schema.
There was a problem hiding this comment.
Sorry for not realizing this earlier but the other conversations we've been having has made me question putting the deployment state in this resource at all. When you deploy a job, the deployment state evolves over the course of a period of time, and we don't write a deployment object to Nomad's state until the evaluation is processed. So you have a bunch of different possible race conditions here:
- We submit the job and read back
LatestDeploymentbefore the deployment has even been created - We submit the job and read back
LatestDeploymentafter the deployment has been created but before the scheduler has found any placements for allocations - We submit the job and read back
LatestDeploymentafter the deployment has been created and some but not all allocs have been placed (this will be common, given the speed of the scheduler but slowness of deployments)
Having the deployment state in the data source makes sense, I guess, but having it in the job resource doesn't make sense to me. What do you think?
(See also #1 for the mismatch between TF resources and Nomad's state of the job)
website/docs/d/job.html.markdown
Outdated
| * `spec_type`: `(string)` Type of periodic specification, such as `cron`. | ||
| * `prohibit_overlap`: `(boolean)` Whether this job should wait until previous instances of the same job have completed before launching again. | ||
| * `timezone`: `(string)` Time zone used to evaluate the next launch interval. | ||
| * `deployment_state`: `(list of maps)` State from the latest deployment for the job, if one exists. This data is typically empty for job types without deployments, such as `batch` and `sysbatch`. |
There was a problem hiding this comment.
We can use a much more authoritative voice here:
| * `deployment_state`: `(list of maps)` State from the latest deployment for the job, if one exists. This data is typically empty for job types without deployments, such as `batch` and `sysbatch`. | |
| * `deployment_state`: `(map)` State from the latest deployment for the job, if one exists. This data is always empty for the `batch` and `sysbatch` job types, as they do not have deployments. |
Also, this is a map now, not a (list of maps). (Same change in the resource doc)
| func canonicalizeTaskGroupUpdateStrategies(job *api.Job) { | ||
| if job == nil { | ||
| return | ||
| } | ||
|
|
||
| for _, taskGroup := range job.TaskGroups { | ||
| if taskGroup == nil { | ||
| continue | ||
| } | ||
|
|
||
| if jobUpdate, taskGroupUpdate := job.Update != nil, taskGroup.Update != nil; jobUpdate && taskGroupUpdate { | ||
| merged := job.Update.Copy() | ||
| merged.Merge(taskGroup.Update) | ||
| taskGroup.Update = merged | ||
| } else if jobUpdate && !job.Update.Empty() { | ||
| taskGroup.Update = job.Update.Copy() | ||
| } | ||
|
|
||
| if taskGroup.Update != nil { | ||
| taskGroup.Update.Canonicalize() | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Ok that makes sense given the architecture we currently have, because we're parsing the job client-side just as the CLI does but then need the canonicalized job to determine client-side diffs with state. Whereas the CLI doesn't diff locally -- the job plan send the request to the server to generate the diff. (Which is a Nomad diff and not a Terraform diff.)
nomad/resource_job.go
Outdated
| deployment, _, err := client.Jobs().LatestDeployment(jobID, opts) | ||
| if err != nil { | ||
| log.Printf("[WARN] error reading latest deployment for Job %q: %s", jobID, err) | ||
| return nil |
There was a problem hiding this comment.
If we got an error, shouldn't we return it?
Remove deployment_state from the nomad_job resource and data source. The field exposed deployment runtime state with ambiguous timing semantics and could be absent or partially populated depending on when Nomad was queried. Remove the related tests and docs as well.
|
I’ve removed The main issue is that this field reflects time-evolving deployment runtime state rather than stable job state. Depending on when Nomad is queried, it can legitimately be absent, only partially populated, or represent an intermediate deployment phase. That makes the semantics unclear and introduces race-prone behavior in both the resource and data source. Given that we don’t currently have a concrete use case that justifies exposing this with those timing caveats, removing it seemed better than keeping an attribute that may be misleading or nondeterministic. The related helper code, tests, and docs have been removed as part of this change. |
This commit introduces new computed fields to the
nomad_jobresource and data source to expose the deployment state of task groups, update_strategy, periodic_config and other attributes. This includes fields such asplaced_canaries,desired_total,healthy_allocs, and others.To prevent perpetual diffs caused by these server-set values, the resource's
CustomizeDifflogic has been updated to merge the existing deployment state from the Terraform state into the plannedtask_groups. This ensures that plans remain clean after the initial apply.New acceptance tests have been added to verify that these deployment state attributes are correctly populated for both the resource and the data source. The existing service deployment test has also been stabilized by setting
detach = true.Fixes #435
JIRA Ticket