Skip to content

feat: Add additional target groups fields, update strategy block, periodic config block etc. to nomad job resource and data source#585

Merged
vishnuedavarath merged 9 commits intomainfrom
b-job-datasource-mismatch
Mar 17, 2026
Merged

feat: Add additional target groups fields, update strategy block, periodic config block etc. to nomad job resource and data source#585
vishnuedavarath merged 9 commits intomainfrom
b-job-datasource-mismatch

Conversation

@vishnuedavarath
Copy link
Collaborator

@vishnuedavarath vishnuedavarath commented Mar 12, 2026

This commit introduces new computed fields to the nomad_job resource and data source to expose the deployment state of task groups, update_strategy, periodic_config and other attributes. This includes fields such as placed_canaries, desired_total, healthy_allocs, and others.

To prevent perpetual diffs caused by these server-set values, the resource's CustomizeDiff logic has been updated to merge the existing deployment state from the Terraform state into the planned task_groups. This ensures that plans remain clean after the initial apply.

New acceptance tests have been added to verify that these deployment state attributes are correctly populated for both the resource and the data source. The existing service deployment test has also been stabilized by setting detach = true.

Fixes #435
JIRA Ticket

  • If a change needs to be reverted, we will roll out an update to the code within 7 days.

@vishnuedavarath vishnuedavarath force-pushed the b-job-datasource-mismatch branch 3 times, most recently from d95d83e to 92376c0 Compare March 13, 2026 12:09
vishnuedavarath added a commit that referenced this pull request Mar 13, 2026
@vishnuedavarath vishnuedavarath self-assigned this Mar 13, 2026
@vishnuedavarath vishnuedavarath marked this pull request as ready for review March 13, 2026 12:31
@vishnuedavarath vishnuedavarath requested review from a team as code owners March 13, 2026 12:31
…iodic config block etc. to nomad job resource and data source

This commit introduces new computed fields to the `nomad_job` resource and data source to expose the deployment state of task groups, update_strategy, periodic_config and other attributes. This includes fields such as `placed_canaries`, `desired_total`, `healthy_allocs`, and others.

To prevent perpetual diffs caused by these server-set values, the resource's `CustomizeDiff` logic has been updated to merge the existing deployment state from the Terraform state into the planned `task_groups`. This ensures that plans remain clean after the initial apply.

New acceptance tests have been added to verify that these deployment state attributes are correctly populated for both the resource and the data source. The existing service deployment test has also been stabilized by setting `detach = true`.
@vishnuedavarath vishnuedavarath force-pushed the b-job-datasource-mismatch branch from 44f33d3 to cd23044 Compare March 16, 2026 08:55
@vishnuedavarath vishnuedavarath requested a review from tgross March 16, 2026 16:31

// Fetch the latest deployment for task group deployment status.
var deploymentTGs map[string]*api.DeploymentState
deploys, _, err := client.Jobs().Deployments(id, false, &api.QueryOptions{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we guarantee any particular ordering here. There's a LatestDeployment method you should use here instead. Also, note that batch and sysbatch jobs don't have deployments. So we need to handle the nil case safely and should probably document that this data won't be filled in for those job types.

} else if len(deploys) > 0 {
deploymentTGs = deploys[0].TaskGroups
}
d.Set("task_groups", jobTaskGroupsRaw(job.TaskGroups, deploymentTGs))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we call this task_groups in the (incorrect) docs, but the Job objectalready hasTaskGroupsand we do expose a bunch of data from that even though it's undocumented. That is,task_groups should describe the contents of a [group`](https://developer.hashicorp.com/nomad/docs/job-specification/group) block from the jobspec, whereas this is about the deployment.

So we should definitely avoid merging these two items. (Especially if we ever try to fix #1 and #530). Maybe we move these into a new deployment_state top-level field and make it clear in the docs that's what the data represents? Or maybe we provide a new data source entirely for the deployment? That would be a closer representation of the underlying API.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I’ve implemented the top-level deployment_state field and moved the deployment-specific data there. It now includes the latest deployment metadata plus task group deployment state.

Comment on lines +1019 to +1041
func canonicalizeTaskGroupUpdateStrategies(job *api.Job) {
if job == nil {
return
}

for _, taskGroup := range job.TaskGroups {
if taskGroup == nil {
continue
}

if jobUpdate, taskGroupUpdate := job.Update != nil, taskGroup.Update != nil; jobUpdate && taskGroupUpdate {
merged := job.Update.Copy()
merged.Merge(taskGroup.Update)
taskGroup.Update = merged
} else if jobUpdate && !job.Update.Empty() {
taskGroup.Update = job.Update.Copy()
}

if taskGroup.Update != nil {
taskGroup.Update.Canonicalize()
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we really be canonicalizing this? If it's coming from Nomad's state, doesn't mutating it misrepresent what data Nomad has?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We’re not canonicalizing data read back from Nomad state here. This only runs on the parsed jobspec during CustomizeDiff, before we build the planned task_groups value.

The reason for doing it is that task_groups.update_strategy is exposed as the effective task group update config, but the jobspec may omit task-group-level fields that Nomad inherits from the job-level update block. On read, we end up showing those effective values, so without this normalization the plan can show diffs even when the jobspec is unchanged.

So this is less about mutating Nomad’s state and more about making the planned representation match the effective values Nomad will apply.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok that makes sense given the architecture we currently have, because we're parsing the job client-side just as the CLI does but then need the canonicalized job to determine client-side diffs with state. Whereas the CLI doesn't diff locally -- the job plan send the request to the server to generate the diff. (Which is a Nomad diff and not a Terraform diff.)

}
}

func mergeDeploymentStateFromState(planned []interface{}, state interface{}) []interface{} {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style nitpick: use any instead of interface{}

Add deployment-level id, status, and status_description to the
deployment_state field so it represents the latest deployment more
completely, instead of only exposing nested task group state.

Also update resource and data source tests plus docs to reflect the
expanded schema.
@vishnuedavarath vishnuedavarath requested a review from tgross March 17, 2026 10:39
Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for not realizing this earlier but the other conversations we've been having has made me question putting the deployment state in this resource at all. When you deploy a job, the deployment state evolves over the course of a period of time, and we don't write a deployment object to Nomad's state until the evaluation is processed. So you have a bunch of different possible race conditions here:

  • We submit the job and read back LatestDeployment before the deployment has even been created
  • We submit the job and read back LatestDeployment after the deployment has been created but before the scheduler has found any placements for allocations
  • We submit the job and read back LatestDeployment after the deployment has been created and some but not all allocs have been placed (this will be common, given the speed of the scheduler but slowness of deployments)

Having the deployment state in the data source makes sense, I guess, but having it in the job resource doesn't make sense to me. What do you think?

(See also #1 for the mismatch between TF resources and Nomad's state of the job)

* `spec_type`: `(string)` Type of periodic specification, such as `cron`.
* `prohibit_overlap`: `(boolean)` Whether this job should wait until previous instances of the same job have completed before launching again.
* `timezone`: `(string)` Time zone used to evaluate the next launch interval.
* `deployment_state`: `(list of maps)` State from the latest deployment for the job, if one exists. This data is typically empty for job types without deployments, such as `batch` and `sysbatch`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use a much more authoritative voice here:

Suggested change
* `deployment_state`: `(list of maps)` State from the latest deployment for the job, if one exists. This data is typically empty for job types without deployments, such as `batch` and `sysbatch`.
* `deployment_state`: `(map)` State from the latest deployment for the job, if one exists. This data is always empty for the `batch` and `sysbatch` job types, as they do not have deployments.

Also, this is a map now, not a (list of maps). (Same change in the resource doc)

Comment on lines +1019 to +1041
func canonicalizeTaskGroupUpdateStrategies(job *api.Job) {
if job == nil {
return
}

for _, taskGroup := range job.TaskGroups {
if taskGroup == nil {
continue
}

if jobUpdate, taskGroupUpdate := job.Update != nil, taskGroup.Update != nil; jobUpdate && taskGroupUpdate {
merged := job.Update.Copy()
merged.Merge(taskGroup.Update)
taskGroup.Update = merged
} else if jobUpdate && !job.Update.Empty() {
taskGroup.Update = job.Update.Copy()
}

if taskGroup.Update != nil {
taskGroup.Update.Canonicalize()
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok that makes sense given the architecture we currently have, because we're parsing the job client-side just as the CLI does but then need the canonicalized job to determine client-side diffs with state. Whereas the CLI doesn't diff locally -- the job plan send the request to the server to generate the diff. (Which is a Nomad diff and not a Terraform diff.)

deployment, _, err := client.Jobs().LatestDeployment(jobID, opts)
if err != nil {
log.Printf("[WARN] error reading latest deployment for Job %q: %s", jobID, err)
return nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we got an error, shouldn't we return it?

Remove deployment_state from the nomad_job resource and data source.
The field exposed deployment runtime state with ambiguous timing
semantics and could be absent or partially populated depending on when
Nomad was queried. Remove the related tests and docs as well.
@vishnuedavarath
Copy link
Collaborator Author

I’ve removed deployment_state from both the nomad_job resource and data source.

The main issue is that this field reflects time-evolving deployment runtime state rather than stable job state. Depending on when Nomad is queried, it can legitimately be absent, only partially populated, or represent an intermediate deployment phase. That makes the semantics unclear and introduces race-prone behavior in both the resource and data source.

Given that we don’t currently have a concrete use case that justifies exposing this with those timing caveats, removing it seemed better than keeping an attribute that may be misleading or nondeterministic. The related helper code, tests, and docs have been removed as part of this change.

@vishnuedavarath vishnuedavarath requested a review from tgross March 17, 2026 15:25
Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@vishnuedavarath vishnuedavarath merged commit 18008b7 into main Mar 17, 2026
20 checks passed
@vishnuedavarath vishnuedavarath deleted the b-job-datasource-mismatch branch March 17, 2026 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

nomad_job data source documentation doesn't match attributes from actual implementation

2 participants