virtio/block: rewritten and enabled multi-segments by dreamliner787-9 · Pull Request #212 · au-ts/libvmm

dreamliner787-9 · 2026-04-07T06:23:57Z

Depends on #214 for the new virtio descriptor chain R/W utility functions.

Rewritten a majority of the virtio block code to make it easier to read and less bug-prone. Which made virtio block more resillient against buffer overflow, and fixed a bug where the request body was undercounted by 1 descriptor, leading to data corruption when there are multiple data descriptors in 1 request.

Enabled multi-segment requests by increasing seg_max from 1 to 8 to pave way for UEFI and Windows booting. Guests can now make up to 32k size requests, rather than just 4k like before. This can be increased further but risk running out of space in the sDDF Block data region.

Informally documented our entire virtio block device implementation.

midnightveil

You've pulled some features out into functions, which is good, for implementing multiple segments. That makes sense.

I still don't like the way that the request overlapping works. It feels very hacky, and definitely is significantly slower because it iterates through the entire set of active requests for every request; which would approach O(n²). It also ignores the existence of our barrier API, and whilst we don't have a defined semantics for these, I suspect it wouldn't work at all.

Also, you combine a rewrite (bug fixes) with adding a feature (multi segments) in one commit, that should really be 2 commits. You can leave in the skeleton functions used for multi-segments and make them only handle one but they really shouldn't be in one commit, there's two many changes to easily tell.

In my mind we need a mental model written down and documented somewhere before we merge this. I can sort of intuit it here, but I'd like to see it explicitly written out the various states requests can be in and how that interacts, so that we can tell later if we run into issues if the code is behaving in a way contrary to our model, or if our model for which the code implements is correct. In other words, I think this deserves an informal abstract specification and proof of correctness against that spec.

As for "better" ways to implement this that isn't O(n²):

At the very least we could use some kind of statically-sized "range tree"-like datastructure. I put "range tree" in quotes because I don't know if there's a canonical name for these, there's probably other names: what I mean is some sort of datastructure that allows you to efficiently query a range in a tree in O(log n) time and check for overlaps, else you insert your new range. This could then store request IDs and this would reduce the cost of future requests. This should be able to be sized correctly to fit the same amount of entries as we have queue entries.
What I think personally is the better solution: take a leaf out of ARM's book with their exclusive monitors. What we are implementing here is the same thing: we have exclusive load/stores to a memory location for which we only allow one to happen at a time. And we want to prevent multiple writers (or readers & writes) at the same time to the same 'exclusive granule'. This could be done by storing, for each 4k region in the buffer window, a single bit indicating whether the region is open (no active requests) / closed (active requests). That would give you O(1) random access at the cost of O(buffer region) space. Then once you have dequeued responses from upwards to the driver you can iterate over the queued requests (intrusive linked list, maybe?) and recheck them or make the exclusive monitor state instead of being a single bit be a queue entry ID or something, trading off storage space in a different dimension here for retrying: hopefully these should be relatively rare given we report optimal IO/buffer alignment.

dreamliner787-9 · 2026-04-07T09:15:02Z

I suspect it wouldn't work at all.

My understanding is that you are referring to the sDDF Block's Barrier "request" so that we can queue up overlapping reads and writes, then just signal the virtualiser once. But I don't understand what you meant about "wouldn't work" specifically, as in, overlapping requests would still cause problems if we enqueue them all with barriers?

Edit: resolved offline

Also, you combine a rewrite (bug fixes) with adding a feature (multi segments) in one commit, that should really be 2 commits. You can leave in the skeleton functions used for multi-segments and make them only handle one but they really shouldn't be in one commit, there's two many changes to easily tell.

The reason why I combined them is that the original code was already able to recognise multi-segments requests. It just didn't handle them correctly. Plus, enabling multi-segments is just a one line change (changing VIRTIO_BLK_SEG_MAX) and experimenting with multi-segments lead to the rewrite so I just combined them. I can spit them up though if that is more desired.

Edit: resolved offline, will split.

In my mind we need a mental model written down and documented somewhere before we merge this.

Yes this is fair I can write it down.

Edit: resolved offline.

As for "better" ways to implement this that isn't O(n²):

It was a concern when this logic was originally implemented in #160 but we didn't fix it to get it merged quicker, and I think there were some concerns about statically allocating a structure that is tied to the underlying block device's size as well. Since you would need to recompile the code if you increased your block device size.

To properly fix it, I'm gravitating towards the "bitmap" approach as it is simpler than a tree/other structures and comes with a benefit of being able to reuse sDDF's bitarray library. With the bitmap approach, the time complexity will now be O(nk) instead, where n is the number of requests and k is the number of 4k transfer windows in all the requests.

Edit: unresolved offline, a bookkeeping bitmap for 1GiB of disk would need 32KiB of memory which is a fair amount. So it is unclear whether we would want to trade space for time with the bitmap approach, or complexity for time with a tree approach.

>>> 1073741824 / 4096
262144.0
>>> _ / 64
4096.0
>>> _ * 8
32768.0

Previously virtio MMIO devices have a virtqueue size of 128 while PCI have 256. Now we make them both 128 for consistency. Signed-off-by: Bill Nguyen <bill.nguyen@unsw.edu.au>

Signed-off-by: Bill Nguyen <bill.nguyen@unsw.edu.au>

Also made the device more robust against TX queue full condition, and noted down an existing issue where requests larger than TX queue capacity aren't handled properly. Signed-off-by: Bill Nguyen <bill.nguyen@unsw.edu.au>

Rewritten a majority of the code to make it easier to read and less bug-prone. Which made virtio block more resillient against buffer overflow, and fixed a bug where the request body was undercounted by 1 descriptor, leading to data corruption when there are multiple descriptors in 1 request. Signed-off-by: Bill Nguyen <bill.nguyen@unsw.edu.au>

Enabled multi-segment requests by increasing seg_max from 1 to 8. Guests can now make up to 32k size requests, rather than 4k like before. This can be increased further but risk running out of space in the sDDF Block data region. Signed-off-by: Bill Nguyen <bill.nguyen@unsw.edu.au>

By default the mountpoint `/mnt` always exist. So if the mount process failed, it only print an easy to miss message and can mislead the user into thinking that they are using the virtio block device. But instead they are just writing data into the ramdisk. Signed-off-by: Bill Nguyen <bill.nguyen@unsw.edu.au>

midnightveil requested changes Apr 7, 2026

View reviewed changes

dreamliner787-9 force-pushed the virtio_block_fixes branch 4 times, most recently from ed59e76 to 737b52c Compare April 8, 2026 05:28

dreamliner787-9 mentioned this pull request Apr 8, 2026

virtio: added utility functions for manipulating virtqueue #214

Open

dreamliner787-9 force-pushed the virtio_block_fixes branch 4 times, most recently from ff9ebbe to 5307091 Compare April 9, 2026 04:57

dreamliner787-9 added 6 commits April 9, 2026 15:29

virtio: unified virtqueue size between MMIO & PCI

9df9439

Previously virtio MMIO devices have a virtqueue size of 128 while PCI have 256. Now we make them both 128 for consistency. Signed-off-by: Bill Nguyen <bill.nguyen@unsw.edu.au>

virtio: add util funcs for manipulating virtqueue

4a9c54d

Signed-off-by: Bill Nguyen <bill.nguyen@unsw.edu.au>

virtio/console: use new virtqueue util functions

a35b338

Also made the device more robust against TX queue full condition, and noted down an existing issue where requests larger than TX queue capacity aren't handled properly. Signed-off-by: Bill Nguyen <bill.nguyen@unsw.edu.au>

dreamliner787-9 force-pushed the virtio_block_fixes branch from 5307091 to 6f2afec Compare April 9, 2026 06:01

dreamliner787-9 mentioned this pull request Apr 10, 2026

ci/tests: add network ping test #215

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

virtio/block: rewritten and enabled multi-segments#212

virtio/block: rewritten and enabled multi-segments#212
dreamliner787-9 wants to merge 6 commits intomainfrom
virtio_block_fixes

dreamliner787-9 commented Apr 7, 2026 •

edited

Loading

Uh oh!

midnightveil left a comment

Uh oh!

dreamliner787-9 commented Apr 7, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dreamliner787-9 commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

midnightveil left a comment

Choose a reason for hiding this comment

Uh oh!

dreamliner787-9 commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dreamliner787-9 commented Apr 7, 2026 •

edited

Loading

dreamliner787-9 commented Apr 7, 2026 •

edited

Loading