-
Notifications
You must be signed in to change notification settings - Fork 0
fix: resolve flaky e2e test root causes in kine #11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -139,7 +139,23 @@ func setup(db *sql.DB) error { | |||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| for _, stmt := range schema { | ||||||||||||||||||||||||||
| logrus.Tracef("SETUP EXEC : %v", util.Stripped(stmt)) | ||||||||||||||||||||||||||
| _, err := db.Exec(stmt) | ||||||||||||||||||||||||||
| var err error | ||||||||||||||||||||||||||
| // Retry index creation to work around a known PostgreSQL race | ||||||||||||||||||||||||||
| // where CREATE INDEX IF NOT EXISTS can fail with a duplicate-key | ||||||||||||||||||||||||||
| // error on pg_class when concurrent sessions (or background | ||||||||||||||||||||||||||
| // processes) touch the same catalog entries. | ||||||||||||||||||||||||||
| for attempt := 0; attempt < 3; attempt++ { | ||||||||||||||||||||||||||
| _, err = db.Exec(stmt) | ||||||||||||||||||||||||||
| if err == nil { | ||||||||||||||||||||||||||
| break | ||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||
| if pgErr, ok := err.(*pgconn.PgError); ok && pgErr.Code == pgerrcode.UniqueViolation && strings.Contains(pgErr.ConstraintName, "pg_class") { | ||||||||||||||||||||||||||
| logrus.Warnf("SETUP EXEC attempt %d hit pg_class race, retrying: %v", attempt+1, err) | ||||||||||||||||||||||||||
| time.Sleep(100 * time.Millisecond) | ||||||||||||||||||||||||||
| continue | ||||||||||||||||||||||||||
|
Comment on lines
+152
to
+155
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Avoid “retrying” log and sleep on the last attempt. Line 153 and Line 154 execute even on the final attempt ( Proposed fix- if pgErr, ok := err.(*pgconn.PgError); ok && pgErr.Code == pgerrcode.UniqueViolation && strings.Contains(pgErr.ConstraintName, "pg_class") {
- logrus.Warnf("SETUP EXEC attempt %d hit pg_class race, retrying: %v", attempt+1, err)
- time.Sleep(100 * time.Millisecond)
- continue
- }
- return err
+ if pgErr, ok := err.(*pgconn.PgError); ok && pgErr.Code == pgerrcode.UniqueViolation && strings.Contains(pgErr.ConstraintName, "pg_class") {
+ if attempt < 2 {
+ logrus.Warnf("SETUP EXEC attempt %d hit pg_class race, retrying: %v", attempt+1, err)
+ time.Sleep(100 * time.Millisecond)
+ continue
+ }
+ }
+ return err📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||
| return err | ||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||
| if err != nil { | ||||||||||||||||||||||||||
| return err | ||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -17,7 +17,11 @@ const ( | |
| compactInterval = 5 * time.Minute | ||
| compactTimeout = 60 * time.Second | ||
| compactMinRetain = 1000 | ||
| compactBatchSize = 1000 | ||
| // Use smaller batches to reduce lock contention on MSSQL, where | ||
| // SERIALIZABLE DELETE transactions can block other queries for the | ||
| // entire batch duration. 100 revisions per batch keeps each | ||
| // transaction under ~1 s even on slow backends. | ||
| compactBatchSize = 100 | ||
|
Comment on lines
+20
to
+24
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Compaction is over-throttled globally ( Line 24 plus Lines 174-176 can reduce compaction throughput by an order of magnitude under backlog, and it diverges from the stated 1,000-row objective. Proposed patch- compactBatchSize = 100
+ compactBatchSize = 1000
@@
- time.Sleep(50 * time.Millisecond)
+ select {
+ case <-s.ctx.Done():
+ return
+ case <-time.After(50 * time.Millisecond):
+ }Also applies to: 174-176 🤖 Prompt for AI Agents |
||
| pollBatchSize = 500 | ||
| ) | ||
|
|
||
|
|
@@ -166,6 +170,10 @@ outer: | |
| metrics.CompactTotal.WithLabelValues(metrics.ResultError).Inc() | ||
| continue outer | ||
| } | ||
|
|
||
| // Yield briefly between batches so that other queries blocked | ||
| // by SERIALIZABLE locks can make progress. | ||
| time.Sleep(50 * time.Millisecond) | ||
| } | ||
|
|
||
| if err := s.postCompact(); err != nil { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🌐 Web query:
In SQL Server, for a polling query on an append-only event table, does WITH (NOLOCK) allow reading uncommitted rows that may later roll back? Compare this with WITH (READPAST) regarding lock avoidance and consistency guarantees.💡 Result:
Yes, in SQL Server, WITH (NOLOCK) allows reading uncommitted rows from an append-only event table that may later roll back. NOLOCK (equivalent to READUNCOMMITTED) enables dirty reads by not acquiring shared locks and ignoring exclusive locks from other transactions, potentially reading data modifications that get rolled back. This provides high concurrency and avoids blocking but risks inconsistency, such as seeing phantom rows that disappear or data never committed. In comparison, WITH (READPAST) avoids locks by skipping locked rows entirely, ensuring only committed data is read (no dirty reads) but potentially missing rows that match the query criteria if they are locked during an insert/update transaction. READPAST requires READ COMMITTED or REPEATABLE READ isolation (or SNAPSHOT with other hints) and is useful for queue-like scenarios but offers no consistency guarantee against missing data. For a polling query on an append-only event table: - NOLOCK: May read uncommitted inserts (later rolled back), avoiding all blocking. - READPAST: Skips locked (uncommitted) rows, so polls only committed events but ignores in-flight ones, also avoiding blocking. NOLOCK prioritizes speed over accuracy (dirty reads possible); READPAST prioritizes avoiding dirty reads but sacrifices completeness.
Citations:
NOLOCKallows reading uncommitted rows that may later roll back.Lines 160–164 use
WITH (NOLOCK)hints on the polling query, which enables dirty reads. For an append-only event table, this risks processing events from transactions that subsequently roll back, causing phantom rows to disappear from the dataset after being published. Consider usingWITH (READPAST)instead to skip locked (uncommitted) rows while avoiding blocking, ensuring only committed data is polled.🤖 Prompt for AI Agents