Skip to content

mssqlserver: various test fixes#4304

Merged
josephwoodward merged 7 commits into
mainfrom
mmt/mssqlserver-fixes-v2
May 5, 2026
Merged

mssqlserver: various test fixes#4304
josephwoodward merged 7 commits into
mainfrom
mmt/mssqlserver-fixes-v2

Conversation

@mmatczuk
Copy link
Copy Markdown
Contributor

@mmatczuk mmatczuk commented Apr 20, 2026

Commits

  • guard Connect against concurrent calls
  • fix flaky CDC streaming tests under x86 emulation
  • fix flaky CDC checkpoint resume test under x86 emulation
  • tolerate context.Canceled in resume test stream goroutine
  • synchronise ordering test stream goroutine

Jira

  • CON-383
  • CON-389
  • CON-414
  • CON-437
image

The benthos framework can call Connect() concurrently or re-enter it
before the background goroutine has stopped. Without a guard, two
goroutines share the same publisher and checkpoint tracker, corrupting
state and causing the streaming phase to hang indefinitely.

Add a mutex and a HasStoppedChan check so that a second Connect() is
a no-op while the previous goroutine is still running.

Fixes CON-389
The CDC capture agent in SQL Server under x86 emulation on Apple Silicon
is too slow to process streaming inserts within the test timeout,
especially with 3 parallel subtests each spinning up their own container.

- Add WaitForCDCChanges helper that polls CDC change tables until the
  expected row count appears, replacing fixed time.Sleep(5s)
- Reduce streaming insert count from 3000 to 30 (snapshot already
  validates volume at 3000 rows)
- Wait for CDC to process streaming inserts before asserting

Fixes CON-414
Same CDC agent slowness issue as CON-414. The test inserts rows and
expects them via pure CDC streaming, but under emulation the capture
agent is too slow.

- Reduce rows per phase from 1000 to 100
- Wait for CDC to process inserts before starting the stream
- Remove fixed time.Sleep in favor of deterministic CDC polling

Fixes CON-382
When StopWithin cancels the resume stream context, Run returns
context.Canceled. Match the first launch's handling so the goroutine
does not fail the test on expected shutdown.

Fixes CON-437
The CDC checkpoint cache stored LSNs in a varchar(100) column. When
writing a []byte to a varchar, go-mssqldb performs an implicit
binary-to-hex-string conversion (e.g. "0x0000002d00000bc3b80003"), so
reads returned the ASCII representation of the LSN rather than the
original binary value. On resume, those ASCII bytes were passed as a
varbinary parameter to the change-table query and, starting with 0x30,
sorted above any real LSN (which starts with 0x00), leaving the stream
spinning with no rows matched.

Change the cache column and stored procedure parameter to varbinary(10)
so the driver transfers the LSN as-is.

Fixes CON-382
The ordering-of-iterator test's background goroutine called t.Error if
stream.Run returned a non-context.Canceled error. When the test failed
early or timed out, the goroutine could still be running and fire
t.Error after the test function had returned, panicking and surfacing
as an unnamed package-level event.

Park the Run error on a buffered channel and drain it in t.Cleanup
(after calling StopWithin), so late errors are reported against the
still-active test instead of a completed one.

Fixes CON-383
@claude
Copy link
Copy Markdown

claude Bot commented Apr 20, 2026

Commits
LGTM

Review
Targeted fixes for a cluster of mssqlserver CDC flakiness issues: LSN stored as varbinary (root cause for resume hang), Connect guarded against concurrent re-entry, integration tests polled via a new WaitForCDCChanges helper instead of sleeping, and two test goroutines synchronised so late errors report to the still-active test.

LGTM

This was referenced Apr 20, 2026
@claude
Copy link
Copy Markdown

claude Bot commented May 5, 2026

Commits
LGTM

Review
The Connect mutex + HasStoppedChan guard correctly serialises reconnections, relying on the explicit TriggerHasStopped() in the constructor. Test changes replace fixed sleeps with deterministic polling via the new WaitForCDCChanges helper, and the ordering test goroutine is properly synchronised through a buffered channel drained in t.Cleanup.

LGTM

@josephwoodward
Copy link
Copy Markdown
Contributor

Moved CON-382 to a separate PR so I can verify backwards compatibility concerns.

@josephwoodward josephwoodward merged commit 8fe821c into main May 5, 2026
7 checks passed
@josephwoodward josephwoodward deleted the mmt/mssqlserver-fixes-v2 branch May 5, 2026 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants