Skip to content

NO-ISSUE: commitchecker: debug --ancestry-path discrepancy#114

Open
tmshort wants to merge 9 commits intoopenshift:masterfrom
tmshort:debug-ancestry-path
Open

NO-ISSUE: commitchecker: debug --ancestry-path discrepancy#114
tmshort wants to merge 9 commits intoopenshift:masterfrom
tmshort:debug-ancestry-path

Conversation

@tmshort
Copy link
Copy Markdown
Contributor

@tmshort tmshort commented Mar 20, 2026

Run git log both with and without --ancestry-path. If the outputs differ, print a warning showing both sets of commits so it is visible in CI logs when --ancestry-path is silently dropping PR commits.

This can happen when a PR branch has fallen behind main: the PR commits are not descendants of PULL_BASE_SHA, so --ancestry-path excludes them and the checker validates zero commits (false positive pass).

Assisted-by: Claude Sonnet 4.6 noreply@anthropic.com

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 20, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@tmshort: This pull request explicitly references no jira issue.

Details

In response to this:

Run git log both with and without --ancestry-path. If the outputs differ, print a warning showing both sets of commits so it is visible in CI logs when --ancestry-path is silently dropping PR commits.

This can happen when a PR branch has fallen behind main: the PR commits are not descendants of PULL_BASE_SHA, so --ancestry-path excludes them and the checker validates zero commits (false positive pass).

Assisted-by: Claude Sonnet 4.6 noreply@anthropic.com

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from benluddy and sanchezl March 20, 2026 19:17
Run the commit validation twice using a CheckMode type (AncestryPath,
NoAncestryPath) to make the intent clear:

- NoAncestryPath uses the CLI-provided start (e.g. PULL_BASE_SHA),
  catching PR commits even when the branch has fallen behind main —
  the case where --ancestry-path silently returns zero commits and
  produces a false-positive pass.

- AncestryPath uses the computed upstream merge base, walking the
  correct linear carry path through the downstream repo.

Both checks log any errors they find. Only AncestryPath errors cause
a non-zero exit; to also fail on NoAncestryPath errors, add
commitchecker.NoAncestryPath: true to failingChecks.

Signed-off-by: Todd Short <tshort@redhat.com>
Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com>
@tmshort tmshort force-pushed the debug-ancestry-path branch from 355b2cf to 9c2390f Compare March 23, 2026 16:00
@benluddy
Copy link
Copy Markdown
Contributor

/cc @jacobsee

Thanks for digging into this. It sounds like the same problem I ran into last year in openshift/release#67720 (comment). We will definitely want to fix it across all repos with a verify-commits presubmit. Here it is breaking on openshift/kubernetes: openshift/kubernetes#2368 (comment).

@openshift-ci openshift-ci bot requested a review from jacobsee March 23, 2026 17:21
@tmshort
Copy link
Copy Markdown
Contributor Author

tmshort commented Mar 23, 2026

Here is our failing false positive PR:
openshift/operator-framework-operator-controller#650

@tmshort
Copy link
Copy Markdown
Contributor Author

tmshort commented Mar 24, 2026

This should not change behavior, it is mostly adding debugging, so perhaps we can get it in?

@tmshort
Copy link
Copy Markdown
Contributor Author

tmshort commented Mar 24, 2026

@benluddy @sanchezl @jacobsee can we get some traction on ths?

@benluddy
Copy link
Copy Markdown
Contributor

Would it be simpler to consider it a failure when start and end are different and there are zero commits along the path?

@tmshort
Copy link
Copy Markdown
Contributor Author

tmshort commented Mar 24, 2026

Would it be simpler to consider it a failure when start and end are different and there are zero commits along the path?

I could also add an error for that.

If CommitsBetween returns zero commits but start and end resolve to
different SHAs, the range is effectively empty due to filtering (e.g.
--ancestry-path silently dropping all PR commits). Treat this as an
error rather than a silent pass.

Adds SameCommit() and ErrNoCommitsFound to the package to support
the check.

Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Todd Short <todd.short@me.com>
@tmshort
Copy link
Copy Markdown
Contributor Author

tmshort commented Mar 24, 2026

And there it is

}

var ErrNotCommit = fmt.Errorf("one or both of the provided commits was not a valid commit")
var ErrNoCommitsFound = fmt.Errorf("no commits found between start and end, but they resolve to different commits")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused.

The pattern `commitchecker` matches any file starting with that prefix,
including test files like commits_test.go in subdirectories. Use
`/commitchecker` to only match the built binary at the directory root.
The `if err != nil` check was always false because `err` was declared
as `var err error` and never assigned before the check.
…ectCommitsBetween

Split CommitsBetween (which always used --ancestry-path) into two
explicit functions:
- AllCommitsBetween: finds all non-merge commits reachable from end
  but not start
- DirectCommitsBetween: same, but restricted to the direct ancestry
  path via --ancestry-path

Both call a shared private gitLog() helper. The caller in main() is
updated to call each explicitly instead of looping over CheckMode
values, with shared error handling extracted into listCommits() and
validation into validateCommits().

The error message for git log failures now includes the full command
for easier copy-paste debugging.
When all reachable commits are found but none are on the direct ancestry
path, fail with an error telling the user to rebase. This prevents false
positive passes when a PR branch has fallen behind main.

Also removes SameCommit, ErrNoCommitsFound, and CheckMode which are no
longer used after the refactor.
TestStaleBranch creates a DAG where a PR branch forks from an early
commit, main advances, then the PR is merged. Verifies that
AllCommitsBetween finds the PR commits but DirectCommitsBetween does
not, confirming the --ancestry-path behavior.

TestUpToDateBranch verifies both functions return the same commits on
a linear history.
Add a Limitations section to the README explaining stale-branch
detection, the error message users will see, how to read CI logs,
and how to fix it (rebase the PR).

Add ancestry-path.md with a detailed explanation of --ancestry-path
git semantics, why the commitchecker needs it, and how it can drop
commits on stale branches.
@sanchezl
Copy link
Copy Markdown
Contributor

After spending some time understanding the PR and the --ancestry-path semantics, I went ahead and created a PR with my suggested changes rather than leaving a bunch of inline comments: tmshort#1

Key changes:

  • Addressed @benluddy's suggestion: the tool now explicitly fails when commits are found between start..end but none are on the direct ancestry path, instead of relying on SameCommit (which I removed along with the unused ErrNoCommitsFound and CheckMode).
  • Refactored for clarity: split CommitsBetween into AllCommitsBetween and DirectCommitsBetween (calling a shared gitLog() helper), and replaced the CheckMode loop in main() with two explicit calls. This should make it easier to follow what each pass does and why.
  • Added tests: TestStaleBranch creates a temporary git repo with the exact DAG that triggers the --ancestry-path problem, verifying that AllCommitsBetween finds the PR commits while DirectCommitsBetween does not. TestUpToDateBranch verifies both agree on linear history.
  • Added documentation: a Limitations section in the README and a separate ancestry-path.md explaining the git semantics and the trade-off.
  • Minor fixes: fixed .gitignore pattern that was hiding test files, removed a pre-existing dead code block in NewCommitFromOnelineLog.

The PR is split into 6 commits to make it easier to review incrementally.

commitchecker: refactor ancestry-path handling and add stale-branch detection
@tmshort
Copy link
Copy Markdown
Contributor Author

tmshort commented Mar 25, 2026

Thanks @sanchezl. I actually reviewed the changes against the master branch, to get a better understanding of the overall change. The documentation will be greatly appreciated.

I think is is ready for final review.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Mar 25, 2026

@tmshort: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@tmshort
Copy link
Copy Markdown
Contributor Author

tmshort commented Mar 27, 2026

ping @sanchezl @benluddy

@benluddy
Copy link
Copy Markdown
Contributor

To save review time, could you explain what scenario this fixes that isn't also fixed by starting from the merge base of the pull request ref and the base ref?

@tmshort
Copy link
Copy Markdown
Contributor Author

tmshort commented Mar 30, 2026

Unfortunately, we just had another failure of the commit checker, and our downstream bumper is broken.

@benluddy wrote:

To save review time, could you explain what scenario this fixes that isn't also fixed by starting from the merge base of the pull request ref and the base ref?

According to claude:

The merge-base approach would also fix the false positive, but has a failure mode in the rebase scenario that --ancestry-path was specifically designed to handle.

Stale branch + merge-base:

main:  A --- B --- C --- D   (PULL_BASE_SHA = D)
        \\
PR:      E --- F --- G        (PR HEAD)

merge-base(D, G) = A. Then git log --no-merges A..merge(D,G) = {B, C, D, E, F}. That includes the main-only commits B, C, D - which in openshift/kubernetes are upstream commits without the UPSTREAM: prefix. They'd trigger false validation failures on the very commits we never intended to validate.

Even adding --ancestry-path: git log --ancestry-path A..merge(D,G) still includes B, C, D because all of them are descendants of A.

The merge-base approach would give the right answer if we used the PR head (G = HEAD^2 or PULL_PULL_SHA) rather than the merge commit as end: git log --no-merges A..G = {E, F}. But the current tool takes only --start and --end, and Prow passes the post-merge commit as end. Supporting this would require adding a separate --pr-head flag and updating every job invocation, whereas this PR requires no changes to job configs

@jianzhangbjz
Copy link
Copy Markdown
Member

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 31, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Mar 31, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: camilamacedo86, grokspawn, jianzhangbjz, tmshort
Once this PR has been reviewed and has the lgtm label, please assign jsafrane for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sanchezl
Copy link
Copy Markdown
Contributor

@tmshort please provide explicit links to the failing PRs so we can verify our understanding.

@benluddy and I spent time last week investigating this. Here's what we found:

  1. Every repo running commitchecker is a downstream fork with a known upstream, but only 3 out of ~28 repos have a commitchecker.yaml. Without it, no merge-base is computed against upstream, and the tool is vulnerable to the stale-branch --ancestry-path problem. Adding commitchecker.yaml (with just upstreamOrg/upstreamRepo/upstreamBranchexpectedMergeBase is optional) to the remaining repos would fix this at the source.

  2. When a merge-base IS computed via config, --ancestry-path works correctly because the merge-base is by definition an ancestor of the PR commits. The two-pass approach in this PR may not be necessary.

  3. The CI merge commit issue can be solved at the job level by passing the actual PR head, as @benluddy already did in KSVM: Fix pull ref used to find merge base for verify-commits. release#67720: --start "$(git merge-base $PULL_BASE_SHA $PULL_PULL_SHA)" --end "$PULL_PULL_SHA". No tool changes needed.

With (1) and (3), a single pass with --ancestry-path using the merge-base as start should be sufficient — no two-pass diagnostic needed, no tool-level merge commit detection needed. The fix is config + job invocation, not new tool complexity.

If the failures you're seeing are in the operator-framework repos that already have a commitchecker.yaml, the issue might be the expectedMergeBase field going stale after a rebase — in which case removing that field from the config (it's optional) could be the fix. But without seeing the actual failures, we're guessing. Please share them so we can provide concrete guidance.

@tmshort
Copy link
Copy Markdown
Contributor Author

tmshort commented Mar 31, 2026

@sanchezl

This is an open test PR that is failing:
openshift/operator-framework-operator-controller#650

This a PR that unfortunately received a false positive and was merged:
openshift/operator-framework-operator-controller#664

@tmshort
Copy link
Copy Markdown
Contributor Author

tmshort commented Mar 31, 2026

@sanchezl
Here's a PR to remove setting that field in our bumper utility:
openshift/operator-framework-tooling#95

@tmshort
Copy link
Copy Markdown
Contributor Author

tmshort commented Apr 3, 2026

@sanchezl
Even after deleting the specified line from the commitchecker.yaml file, I'm still getting a false positive:

See: openshift/operator-framework-operator-controller#650

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants