mssqlserver: store checkpoint LSN as varbinary#4394
Conversation
The CDC checkpoint cache stored LSNs in a varchar(100) column. When writing a []byte to a varchar, go-mssqldb performs an implicit binary-to-hex-string conversion (e.g. "0x0000002d00000bc3b80003"), so reads returned the ASCII representation of the LSN rather than the original binary value. On resume, those ASCII bytes were passed as a varbinary parameter to the change-table query and, starting with 0x30, sorted above any real LSN (which starts with 0x00), leaving the stream spinning with no rows matched. Change the cache column and stored procedure parameter to varbinary(10) so the driver transfers the LSN as-is. Fixes CON-382
| CREATE TABLE %s ( | ||
| cache_key varchar(7) NOT NULL PRIMARY KEY, | ||
| cache_val varchar(100) | ||
| cache_val varbinary(10) |
There was a problem hiding this comment.
Schema migration concern: createCacheTable uses IF NOT EXISTS so existing deployments that already have the table with cache_val varchar(100) will keep the old column type after this fix. Meanwhile createUpsertStoredProc uses CREATE OR ALTER, so the stored procedure parameter is updated to @Value varbinary(10). On an upgrade, the proc will then INSERT/UPDATE a varbinary(10) value into a varchar(100) column — MSSQL will perform the same implicit binary-to-hex-string conversion that this PR is trying to eliminate, and the bug reproduces.
Either:
- migrate the existing column (e.g.
ALTER TABLE ... ALTER COLUMN cache_val varbinary(10)after detecting the old type), or - document that operators must drop the existing checkpoint cache table on upgrade.
If a manual drop is the intended remediation for CON-382, please add a note in the PR description / release notes since the existing cached LSN is already corrupted.
There was a problem hiding this comment.
See comment in PR description 👍
|
Commits Review One concern:
|
The CDC checkpoint cache stored LSNs in a varchar(100) column. When
writing a []byte to a varchar, go-mssqldb performs an implicit
binary-to-hex-string conversion (e.g. "0x0000002d00000bc3b80003"), so
reads returned the ASCII representation of the LSN rather than the
original binary value. On resume, those ASCII bytes were passed as a
varbinary parameter to the change-table query and, starting with 0x30,
sorted above any real LSN (which starts with 0x00), leaving the stream
spinning with no rows matched.
Change the cache column and stored procedure parameter to varbinary(10)
so the driver transfers the LSN as-is.
Fixes CON-382
Cherry picked from #4304 so I can verify backwards compatibility.