Skip to content

Followup: ensure the LSM iterators are invalidated on exceptions#30362

Open
ballard26 wants to merge 3 commits intoredpanda-data:devfrom
ballard26:ct-at-bug-investigation
Open

Followup: ensure the LSM iterators are invalidated on exceptions#30362
ballard26 wants to merge 3 commits intoredpanda-data:devfrom
ballard26:ct-at-bug-investigation

Conversation

@ballard26
Copy link
Copy Markdown
Contributor

A follow up to the previous PR on this matter. Goes through the remainder of the LSM iterators and ensures each of them are properly invalidated when an exception is thrown. See individual commits for details on each.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v26.1.x
  • v25.3.x
  • v25.2.x

Release Notes

  • none

@ballard26 ballard26 requested a review from andrwng May 1, 2026 02:03
@ballard26 ballard26 marked this pull request as ready for review May 1, 2026 02:03
Copilot AI review requested due to automatic review settings May 1, 2026 02:03
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Follow-up exception-safety hardening for LSM iterators: ensures valid() cannot remain true after an exception during a mutating iterator operation, preventing downstream use of stale iterator state.

Changes:

  • Extracted a reusable throwing_iterator test utility and wired it into existing DB iterator tests.
  • Updated two_level_iterator, merging_iterator, and the block reader iterator to invalidate state at the start of mutating operations and re-establish validity only on success.
  • Added targeted exception-safety tests for two_level_iterator and merging_iterator.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/v/lsm/db/tests/iter_test.cc Switches DB iterator exception-safety tests to shared throwing_iterator utility.
src/v/lsm/db/tests/BUILD Adds dependency on the new throwing_iterator test library.
src/v/lsm/core/internal/two_level_iterator.cc Adds explicit invalidation/revalidation so valid() can’t remain stale-true after exceptions.
src/v/lsm/core/internal/tests/two_level_iterator_test.cc New tests validating two_level_iterator invalidates on exceptions across seek/next/prev paths.
src/v/lsm/core/internal/tests/throwing_iterator.h New reusable throwing iterator for exception-path testing of higher-level iterators.
src/v/lsm/core/internal/tests/merging_iterator_test.cc Adds exception-safety tests ensuring merging iterator becomes invalid after child exceptions.
src/v/lsm/core/internal/tests/BUILD Adds throwing_iterator test library and new two_level_iterator_test target + deps.
src/v/lsm/core/internal/merging_iterator.cc Invalidates _current early so valid() becomes false if an awaited child op throws.
src/v/lsm/block/reader.cc Invalidates block iterator position during mutating operations so corruption exceptions don’t leave a stale-valid position.
Comments suppressed due to low confidence (1)

src/v/lsm/db/tests/iter_test.cc:21

  • std::map is used later in this test file (e.g. in DBIteratorExceptionSafetyTest::SetUp), but <map> is no longer included here. The file currently relies on throwing_iterator.h to transitively include <map>, which is brittle if that header changes. Please add an explicit #include <map> in this file.
#include "lsm/core/internal/iterator.h"
#include "lsm/core/internal/keys.h"
#include "lsm/core/internal/options.h"
#include "lsm/core/internal/tests/throwing_iterator.h"
#include "lsm/db/iter.h"
#include "lsm/db/memtable.h"

#include <gtest/gtest.h>

#include <memory>
#include <stdexcept>
#include <string_view>

@vbotbuildovich
Copy link
Copy Markdown
Collaborator

vbotbuildovich commented May 1, 2026

CI test results

test results on build#83923
test_status test_class test_method test_arguments test_kind job_url passed reason test_history
FLAKY(PASS) TxAtomicProduceConsumeTest test_basic_tx_consumer_transform_produce {"with_failures": true} integration https://buildkite.com/redpanda/redpanda/builds/83923#019de151-4bdd-44d9-991e-ca3ca86b8feb 10/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0019, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=TxAtomicProduceConsumeTest&test_method=test_basic_tx_consumer_transform_produce
test results on build#83966
test_status test_class test_method test_arguments test_kind job_url passed reason test_history
FLAKY(PASS) RedpandaNodeOperationsSmokeTest test_node_ops_smoke_test {"cloud_storage_type": 1, "mixed_versions": false} integration https://buildkite.com/redpanda/redpanda/builds/83966#019de6dc-8182-4cb3-90b1-1fe7ee640a81 10/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0000, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=RedpandaNodeOperationsSmokeTest&test_method=test_node_ops_smoke_test

@ballard26 ballard26 force-pushed the ct-at-bug-investigation branch 2 times, most recently from 3ec9ba5 to a3a1449 Compare May 2, 2026 03:28
@ballard26 ballard26 requested a review from Copilot May 2, 2026 03:29
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (2)

src/v/lsm/db/tests/iter_test.cc:21

  • iter_test.cc still uses std::map (e.g., in DBIteratorExceptionSafetyTest::SetUp) but the direct #include <map> was removed. Consider re-adding the explicit include to avoid relying on transitive includes from throwing_iterator.h.
#include <gtest/gtest.h>

#include <memory>
#include <stdexcept>
#include <string_view>

src/v/lsm/core/internal/tests/BUILD:111

  • This BUILD file ends with multiple trailing blank lines. Running buildifier will typically remove these; consider trimming to a single trailing newline to avoid formatting-only diffs from automated tooling.

ballard26 added 3 commits May 1, 2026 23:36
The merging iterator only set _current at the end of each mutating
method, via find_smallest or find_largest. If a child threw partway
through, those helpers never ran and _current still pointed at the
previous child. valid() then returned true with stale state. Clear
_current at the start of every mutating method so a thrown await
leaves the iterator invalid. Also adds a shared throwing_iterator
test helper.
The previous fix only cleared _data_iter inside init_data_block.
Other throws in a mutating method could still leave _data_iter
non-null with undefined inner state. The derived valid() check would
then return true. Replace it with a cached _valid bool. Each
mutating method clears _valid at the start and re-establishes it on
the success path.
parse_next_key set _current to the next entry's offset before
calling decode_entry. If decode_entry threw a corruption_exception,
valid() returned true while _key and _value still held the previous
entry's bytes. Move the _current assignment to the success tail of
parse_next_key. Also invalidate at the start of every mutating
method.
@ballard26 ballard26 force-pushed the ct-at-bug-investigation branch from a3a1449 to 586ad29 Compare May 2, 2026 03:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants