Skip to content

Fix reverse-scan off-by-one and replace index data array with fixed-size ring buffer#255

Closed
Copilot wants to merge 19 commits intomainfrom
copilot/improve-index-memory-management
Closed

Fix reverse-scan off-by-one and replace index data array with fixed-size ring buffer#255
Copilot wants to merge 19 commits intomainfrom
copilot/improve-index-memory-management

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 21, 2026

Fixes a pre-existing off-by-one in reverse batch iteration and replaces the unbounded index data array with a fixed-size ring buffer to bound memory usage. The bench+fix were landed separately in #258; this PR (rebased onto main after the performance regression fix in #262) contains the ring buffer implementation.

Benchmark results (bench/bench-read-scenarios.js) — after rebase onto main (#262)

Scenario stable ring buffer Δ
1 – forward full scan ~48 ops/sec ~45 ops/sec −6%
2 – backwards full scan ~44 ops/sec ~41 ops/sec −7%
3 – join stream (stream-1 + stream-2) ~47 ops/sec ~45 ops/sec −4%
4 – range scan (middle third, binary-search start) ~150 ops/sec ~139 ops/sec −7%

The ring buffer variant is now within ~4–7% of stable (the original −3.5× penalty was caused by an unrelated regression fixed in #262).

Changes

  • src/Index/ReadableIndex.js – replaces this.data (sparse Array, grows without bound) with this.cache (ring buffer of cacheSize slots) and this._length (total entry count). The ring buffer window is always [_length − cacheSize, _length − 1]; entries outside that window are read from disk on demand without consuming memory. New cacheSize option (default 1024) configures how many of the most-recent entries are kept in memory.

  • src/Index/WritableIndex.jsadd() stores new entries in the ring buffer and increments _length; truncate() clears stale ring buffer slots for removed entries.

  • src/Index/ReadOnlyIndex.jsonChange() manages _length directly and clears stale cache slots when the underlying file is truncated by the writer.

  • test/Index.spec.js – 5 new tests validate: memory-bounding after cycling, disk fallback for out-of-window entries, cross-boundary range reads, stale-slot safety after truncation + re-add, and correctness after full ring cycles.

Original prompt

This section details on the original issue you should resolve

<issue_title>Improve index memory management</issue_title>
<issue_description>Currently an index preallocates an array of the length of the index and then fills it with data on demand. Since the internal array is untyped, memory is allocated just for the structure. Still, for large indices this means a whole lot of memory being kept allocated unused in a couple of use cases, most notably the write-only scenario.

Optimally an index would only keep a (configurable) fixed upper bound of data in memory, then fill that on demand.
This could be a good scenario for a ring buffer.

Also, optimally the internal array buffer would be a typed array of (u)int32 to have index data in a contiguous memory block. This could potentially optimize index entry/buffer translation since the entry would just be a typed view on the underlying buffer and no copying on read would be involved.</issue_description>

<agent_instructions>@copilot rerun the performance benchmark done in the comment link to check if things changed since back then. Then suggest a strategy to make index memory management more scalable without forfeiting performance. If there's still no working solution in JavaScript yet, it's acceptable to conclude to keep things as is.</agent_instructions>

Comments on the Issue (you are @copilot in this section)

@albe Index range reading should return a generator and work with the same semantics as partition reading. Hence, the underlying file abstraction could be shared between index and partition. @albe See https://gist.github.com/albe/39c7b79f46daa49d2cf373ffab3c4513 -ugh @albe The feedback on nodejs/help repo suggests this is to be expected, as typed arrays do a bit more. After some testing with creating a custom implementation of a "buffer view entry" the single access use case is faster than the current implementation by a factor of 2, but slower by a factor of two when a second access happens (and hence likely even more for further accesses - i.e. it behaves bad for "cache hits" in the index reader). So the buffer reads need to be memoized eagerly (lazy adds a condition in the access path).

⚡ Quickly spin up Copilot coding agent tasks from anywhere on your macOS or Windows machine with Raycast.

…dableIndex

- Replace `this.data` (sparse Array, grows without bound) with `this.cache`
  (ring buffer of `cacheSize` slots) and `this._length` (total entry count).
- The ring buffer window is always [_length - cacheSize, _length - 1]; entries
  outside that window are read from disk on demand without consuming memory.
- New `cacheSize` option (default 1024) configures how many of the most-recent
  entries are kept in memory, bounding memory use regardless of index size.
- WritableIndex.add() stores new entries in the ring buffer and increments
  _length; truncate() clears stale ring buffer slots for removed entries.
- ReadOnlyIndex.onChange() manages _length directly and clears stale cache
  slots when the underlying file is truncated by the writer.
- readRange() handles out-of-window reads (disk-only, no cache) separately
  from in-window reads (disk + ring buffer), with explanatory comments showing
  why in-window slots can never hold stale data.
- 5 new tests validate: memory-bounding after cycling, disk fallback for
  out-of-window entries, cross-boundary range reads, stale-slot safety after
  truncation + re-add, and correctness after full ring cycles.

Co-authored-by: albe <4259532+albe@users.noreply.github.com>
Agent-Logs-Url: https://github.com/albe/node-event-storage/sessions/58a8bfd7-606c-4d78-86e4-713c830c84be
Copilot AI changed the title [WIP] Refactor index memory management for efficiency feat: bound index memory usage with a configurable ring buffer cache Mar 21, 2026
Copilot AI requested a review from albe March 21, 2026 09:40
Copilot finished work on behalf of albe March 21, 2026 09:40
@coveralls
Copy link
Copy Markdown

coveralls commented Mar 21, 2026

Coverage Status

coverage: 97.907% (+0.02%) from 97.892%
when pulling 5d9626d on copilot/improve-index-memory-management
into 5c60d83 on main.

Copilot AI changed the title feat: bound index memory usage with a configurable ring buffer cache Add read-scenario benchmark and fix reverse-scan off-by-one Mar 21, 2026
Copilot finished work on behalf of albe March 21, 2026 11:58
@albe
Copy link
Copy Markdown
Owner

albe commented Mar 21, 2026

Given the benchmark result I would refrain from merging this. The penalty is not worth the potential gain for write cases and overall reduced memory consumption.

However @copilot please add the test it('reads all items in reverse when range length is a multiple of batch size', function() { as a new PR onto main branch and make sure current main passes.

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Mar 21, 2026

Given the benchmark result I would refrain from merging this. The penalty is not worth the potential gain for write cases and overall reduced memory consumption.

However @copilot ...

The test and the one-line fix have been committed to a local branch copilot/fix-reverse-scan-off-by-one based on main. The exact patch (two files, src/Storage/ReadableStorage.js and test/Storage.spec.js) is the same as commit 5d9626d on this PR — it applies cleanly on top of main and all 301 tests pass.

Since I'm unable to open new PRs directly, you can create the PR by cherry-picking those two file changes from 5d9626d onto main, or by pushing the copilot/fix-reverse-scan-off-by-one branch that was prepared locally.

Copilot finished work on behalf of albe March 21, 2026 14:27
@albe
Copy link
Copy Markdown
Owner

albe commented Mar 21, 2026

Closing this as of the benchmark results. The bug and test case is handled in #258

@albe albe closed this Mar 21, 2026
@albe
Copy link
Copy Markdown
Owner

albe commented Mar 21, 2026

With the performance fix in #262 this should be reconsidered, as the performance degradation was not caused by this change

@albe albe reopened this Mar 21, 2026
@albe
Copy link
Copy Markdown
Owner

albe commented Mar 21, 2026

@copilot rebase this PR then run the benchmark again

Copilot AI and others added 3 commits March 21, 2026 21:54
…ata support

Co-authored-by: albe <4259532+albe@users.noreply.github.com>
…partitionMetadata)

Co-authored-by: albe <4259532+albe@users.noreply.github.com>
Copilot AI and others added 13 commits March 21, 2026 21:54
Co-authored-by: albe <4259532+albe@users.noreply.github.com>
…ner and streamMetadata fallback paths

Co-authored-by: albe <4259532+albe@users.noreply.github.com>
Agent-Logs-Url: https://github.com/albe/node-event-storage/sessions/d1ba14e3-fb46-42f3-88fc-05ecf508f745
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Mar 21, 2026

@copilot rebase this PR then run the benchmark again

Rebased onto main (commit a16905c). After the performance fix in #262, the ring buffer variant is now within 4–7% of stable across all scenarios (was −3.5× before):

Scenario stable ring buffer Δ
1 – forward full scan ~48 ops/sec ~45 ops/sec −6%
2 – backwards full scan ~44 ops/sec ~41 ops/sec −7%
3 – join stream ~47 ops/sec ~45 ops/sec −4%
4 – range scan ~150 ops/sec ~139 ops/sec −7%

All 335 tests pass.

Copilot AI changed the title Add read-scenario benchmark and fix reverse-scan off-by-one Fix reverse-scan off-by-one and replace index data array with fixed-size ring buffer Mar 21, 2026
Copilot finished work on behalf of albe March 21, 2026 21:59
@albe
Copy link
Copy Markdown
Owner

albe commented Mar 22, 2026

Closing in favor of #267

@albe albe closed this Mar 22, 2026
@albe albe deleted the copilot/improve-index-memory-management branch March 31, 2026 14:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve index memory management

3 participants