-
Notifications
You must be signed in to change notification settings - Fork 732
tx: detect cycles in forEachPageInternal #1194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
ivangsm
wants to merge
1
commit into
etcd-io:main
Choose a base branch
from
ivangsm:tx/foreachpage-detect-cycles
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,80 @@ | ||
| package bbolt | ||
|
|
||
| import ( | ||
| "fmt" | ||
| "path/filepath" | ||
| "testing" | ||
|
|
||
| "github.com/stretchr/testify/require" | ||
|
|
||
| "go.etcd.io/bbolt/internal/common" | ||
| "go.etcd.io/bbolt/internal/guts_cli" | ||
| "go.etcd.io/bbolt/internal/surgeon" | ||
| ) | ||
|
|
||
| // TestTx_forEachPage_CycleTerminates corrupts a db so that a branch page's | ||
| // child list loops back on itself, then verifies that tx.forEachPage | ||
| // terminates instead of recursing until the goroutine stack overflows. | ||
| // See issue #701 for the real-world corruption pattern. | ||
| func TestTx_forEachPage_CycleTerminates(t *testing.T) { | ||
| path := filepath.Join(t.TempDir(), "db") | ||
| db, err := Open(path, 0600, nil) | ||
| require.NoError(t, err) | ||
| require.NoError(t, db.Update(func(tx *Tx) error { | ||
| b, err := tx.CreateBucketIfNotExists([]byte("data")) | ||
| if err != nil { | ||
| return err | ||
| } | ||
| for i := 0; i < 500; i++ { | ||
| if err := b.Put([]byte(fmt.Sprintf("%04d", i)), make([]byte, 100)); err != nil { | ||
| return err | ||
| } | ||
| } | ||
| return nil | ||
| })) | ||
| require.NoError(t, db.Close()) | ||
|
|
||
| // Pick a branch ancestor and one of its leaf descendants; overwriting the | ||
| // leaf with a copy of the branch leaves the leaf as a branch whose child | ||
| // list still references the leaf's own pgid, forming a cycle. | ||
| xray := surgeon.NewXRay(path) | ||
| paths, err := xray.FindPathsToKey([]byte("0001")) | ||
| require.NoError(t, err) | ||
| require.NotEmpty(t, paths) | ||
| p0 := paths[0] | ||
| require.GreaterOrEqual(t, len(p0), 2, "need at least one branch above the leaf") | ||
| ancestor := p0[len(p0)-2] | ||
| leaf := p0[len(p0)-1] | ||
| require.NoError(t, surgeon.CopyPage(path, ancestor, leaf)) | ||
|
|
||
| ancestorPage, _, err := guts_cli.ReadPage(path, uint64(ancestor)) | ||
| require.NoError(t, err) | ||
| require.True(t, ancestorPage.IsBranchPage()) | ||
| var hasLeafAsChild bool | ||
| for i := uint16(0); i < ancestorPage.Count(); i++ { | ||
| if ancestorPage.BranchPageElement(i).Pgid() == common.Pgid(leaf) { | ||
| hasLeafAsChild = true | ||
| break | ||
| } | ||
| } | ||
| require.True(t, hasLeafAsChild, "expected ancestor to reference the leaf directly") | ||
|
|
||
| db, err = Open(path, 0600, nil) | ||
| require.NoError(t, err) | ||
| defer func() { _ = db.Close() }() | ||
|
|
||
| require.NoError(t, db.View(func(tx *Tx) error { | ||
| rootPage := tx.Bucket([]byte("data")).RootPage() | ||
| var count int | ||
| leafVisits := 0 | ||
| tx.forEachPage(rootPage, func(p *common.Page, _ int, _ []common.Pgid) { | ||
| count++ | ||
| if p.Id() == common.Pgid(leaf) { | ||
| leafVisits++ | ||
| } | ||
| }) | ||
| require.Less(t, count, 10_000, "forEachPage walked %d pages; cycle detection likely missing", count) | ||
| require.GreaterOrEqual(t, leafVisits, 2, "expected fn to fire on the cycle back-edge so duplicate-reference diagnostics still run") | ||
| return nil | ||
| })) | ||
| } |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the question is what we're going to do here, reading from a corrupted db should definitely be surfaced.
I (personally) would rather side with a panic than changing the function interface to an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good push — I audited the
forEachPagecallers and you're right that the soft-return is a blind spot for one of them:tx_check.go:148(Check): the fn isverifyPageReachable, which already emitspage N: multiple referencesto the error channel on duplicate pgids (tx_check.go:163-165), andtx.checkwraps its body indefer recover()that converts panics intopanicked{}errors on the same channel (tx_check.go:39-43). Surfacing is fine here under either strategy.bucket.go:621(Bucket.Stats): the fn just accumulates counters and recursively re-enters sub-buckets viasubStats.Add(b.openBucket(...).Stats()). On a revisit it silently inflatesKeyN/LeafPageN/BranchPageNand double-enters sub-buckets. No recover on this path. With the current soft-return,bbolt statson a cycle-corrupted db prints wrong numbers with no indication — exactly the silent behavior you're flagging.A few ways forward, in increasing cost:
forEachPageInternal.Checkstays clean via the existing recover;Stats/bbolt statscrash loudly with the offending pgid. Simplest, matches your preference.db.Logger().Errorffirst. Same as (1), plus a log entry via the existingLoggerinterface so the cycle is recorded even if some caller recovers the panic. Marginal, but ~free.Corrupted booltoBucketStats(additive, non-breaking).forEachPagewould route the cycle through a sink thatStatsreads, so the CLI could print a prominent warning alongside the (inflated) numbers instead of crashing. Strictly the most graceful UX, but it's more scope than the bug this PR is fixing — probably wants its own issue/PR.My default would be (1), possibly (2). Happy to go straight to (3) in a follow-up if you'd rather keep this PR tight. Which direction do you want?