Skip to content

feat: introduce stack-allocated PyBuffer#5894

Open
winstxnhdw wants to merge 9 commits intoPyO3:mainfrom
winstxnhdw:feat/stacked-pybuffer
Open

feat: introduce stack-allocated PyBuffer#5894
winstxnhdw wants to merge 9 commits intoPyO3:mainfrom
winstxnhdw:feat/stacked-pybuffer

Conversation

@winstxnhdw
Copy link
Copy Markdown

Summary

This PR implements a pinned stack-allocated PyUntypedBuffer variant, PyUntypedBufferView.

Closes #5836

Copy link
Copy Markdown
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much for this. Various thoughts around the accessor methods.

Also, out of scope for this PR, but I keep wondering if we should provide iterators for these structures. Especially with the strides / suboffsets etc, it's not necessarily trivial to get this right.

@winstxnhdw
Copy link
Copy Markdown
Author

winstxnhdw commented Mar 20, 2026

Also, out of scope for this PR, but I keep wondering if we should provide iterators for these structures. Especially with the strides / suboffsets etc, it's not necessarily trivial to get this right.

Yes, I think it would definitely be useful as well. I am not yet sure how this would look like though.

Copy link
Copy Markdown
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the continued work here, looking great!

Implementing Drop directly on PyUntypedBufferView is functionally equivalent to what I had in mind from the "drop guard", so gets a 👍 from me.

Afraid I had quite a few more thoughts around some of the edge cases (plus some ideas we can probably ignore). Hopefully helps us get to the right eventual abstraction!

src/buffer.rs Outdated
///
/// Returns `None` if the exporter set `shape` to NULL (e.g. `PyBUF_SIMPLE` was requested).
#[inline]
pub fn shape(&self) -> Option<&[usize]> {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there's some way with const generics that we could make a compile-time guarantee on the nullability of this field (and others).

Probably leads to a horrendously messy type system where in practice these null checks are not the bottleneck.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, not at all. I agree with you. It's a great idea. Let me think about this.

Copy link
Copy Markdown
Author

@winstxnhdw winstxnhdw Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I think I came up with something that isn’t messy, but it makes a lot of the other suggestions moot now.

@winstxnhdw winstxnhdw force-pushed the feat/stacked-pybuffer branch 2 times, most recently from 37f9233 to 5430e96 Compare March 22, 2026 18:03
@winstxnhdw winstxnhdw force-pushed the feat/stacked-pybuffer branch from 5430e96 to 48b5fdd Compare March 22, 2026 18:11
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq bot commented Mar 24, 2026

Merging this PR will not alter performance

✅ 105 untouched benchmarks
⏩ 1 skipped benchmark1


Comparing winstxnhdw:feat/stacked-pybuffer (dd63129) with main (68b719f)

Open in CodSpeed

Footnotes

  1. 1 benchmark was skipped, so the baseline result was used instead. If it was deleted from the codebase, click here and archive it to remove it from the performance reports.

@winstxnhdw winstxnhdw force-pushed the feat/stacked-pybuffer branch from b4f1eab to dd63129 Compare March 27, 2026 15:03
impl_element!(f64, Float);

/// Sealed marker for buffer field availability. Either [`Known`] or [`Unknown`].
mod buffer_info {
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should be moved to another file or moved to the top level.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually already have pyo3::pyclass::boolean_struct::True / False which could potentially be deduplicated with these types.

It might be possible to also use const FORMAT: bool etc as const-generics instead of types. Not sure if that offers any practical benefit here. I think it's not possible to provide defaults for const generics, so it might work out strictly worse I guess?

@winstxnhdw winstxnhdw requested a review from davidhewitt March 30, 2026 10:09
Copy link
Copy Markdown
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool!

After reading through this implementation a couple of times, I really like the expressiveness this gives in the type system. I have some ideas which might refine it further (see BufferFlags struct idea).

Main concern is about generic code bloat from explosion of generic parameters. I am unsure if there's a way that we can mitigate that at all.

impl_element!(f64, Float);

/// Sealed marker for buffer field availability. Either [`Known`] or [`Unknown`].
mod buffer_info {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually already have pyo3::pyclass::boolean_struct::True / False which could potentially be deduplicated with these types.

It might be possible to also use const FORMAT: bool etc as const-generics instead of types. Not sure if that offers any practical benefit here. I think it's not possible to provide defaults for const generics, so it might work out strictly worse I guess?

Comment on lines +933 to +944
impl<Format: FieldInfo, Stride: FieldInfo> PyUntypedBufferView<Format, Known, Stride> {
/// Returns the shape array. `shape[i]` is the length of dimension `i`.
///
/// Despite Python using an array of signed integers, the values are guaranteed to be
/// non-negative. However, dimensions of length 0 are possible and might need special
/// attention.
#[inline]
pub fn shape(&self) -> &[usize] {
debug_assert!(!self.raw.shape.is_null());
unsafe { slice::from_raw_parts(self.raw.shape.cast(), self.raw.ndim as usize) }
}
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's potentially also valid to have PyUntypedBufferView<Format, Unknown, Stride> implement a shape() accessor which returns Option<&[usize]>. I am unsure if that's of any practical value, it seems to me that it would be expected to always return None.

(Maybe similar observation for format / stride parameters.)

#[repr(transparent)]
pub struct PyBufferView<
T,
Format: FieldInfo = Known,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose the typed buffer view always needs to know format, this parameter might not be necessary (however I guess is useful for consistency).

T,
Format: FieldInfo = Known,
Shape: FieldInfo = Known,
Stride: FieldInfo = Known,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As well as these three, other possible flags are

  • Suboffsets - looks like there are many cases where they would be known to return NULL. Seems like they are never guaranteed to be non-null, so I see that you just left this out for now. Probably fine.
  • Writable - if a writable array is requested, then readonly must always be 0. It would allow avoiding some of the self.readonly() checks, potentially.
  • Contiguous - this is a three-state field with C / Fortran / Unknown, however again could allow avoiding some runtime checks if the contiguity is guaranteed.

However I guess the number of type parameters here gets worse and worse. I have an idea how to resolve that (see comment on PyUntypedBufferView below).

Comment on lines +816 to +820
pub struct PyUntypedBufferView<
Format: FieldInfo = Unknown,
Shape: FieldInfo = Unknown,
Stride: FieldInfo = Unknown,
> {
Copy link
Copy Markdown
Member

@davidhewitt davidhewitt Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this further, I wonder if a cleaner struct definition would be to have just one Flags generic parameter here.

/// Thin wrapper around actual flags value, with type information
#[repr(transparent)]
struct BufferFlags<Format: FieldInfo = Unknown, Shape: FieldInfo = Unknown, Stride: FieldInfo = Unknown>(std::ffi::c_int);

/// Define known combinations of flags
impl BufferFlags<Unknown, Unknown, Unknown> {
    pub const SIMPLE: BufferFlags<Unknown, Unknown, Unknown> = BufferFlags(ffi::PyBuf_SIMPLE);
    pub const FULL: BufferFlags<Known, Known, Known> = BufferFlags(ffi::PyBuf_FULL);

    // Potentially could introduce builder methods as well as constants, if needed??
    // e.g. `require_c_contiguous()` could set `PyBUF_C_CONTIGUOUS`
}

/// The view is just then parameterised by the Flags type
pub struct PyUntypedBufferView<Flags = BufferFlags::SIMPLE> { /* ... */ }

I think for sake of making other code easier to read, the following trait would also be desirable:

pub trait BufferFlagsType: Sealed {
    type Format;
    type Shape;
    type Stride;

    /// See `PyBufferView::with_flags` constructor.
    type WithKnownFormat: BufferFlagsType;
}

impl<Format, Shape, Stride> BufferFlagsType for BufferFlags<Format, Shape, Stride> {
    type Format = Format;
    type Shape = Shape;
    type Stride = Stride;

    type WithKnownFormat = BufferFlags<Known, Shape, Stride>;
}

Comment on lines +901 to +930
/// Attempt to interpret this untyped view as containing elements of type `T`.
pub fn as_typed<T: Element>(&self) -> PyResult<&PyBufferView<T, Known, Shape, Stride>> {
self.ensure_compatible_with::<T>()?;
// SAFETY: PyBufferView<T, ..> is repr(transparent) around PyUntypedBufferView<..>
let typed = unsafe {
NonNull::from(self)
.cast::<PyBufferView<T, Known, Shape, Stride>>()
.as_ref()
};

Ok(typed)
}

fn ensure_compatible_with<T: Element>(&self) -> PyResult<()> {
let name = std::any::type_name::<T>();

if mem::size_of::<T>() != self.item_size() || !T::is_compatible_format(self.format()) {
return Err(PyBufferError::new_err(format!(
"buffer contents are not compatible with {name}"
)));
}

if self.raw.buf.align_offset(mem::align_of::<T>()) != 0 {
return Err(PyBufferError::new_err(format!(
"buffer contents are insufficiently aligned for {name}"
)));
}

Ok(())
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One downside of the high numbers of generic parameters is that these methods are monomorphised many times over during codegen. We might want to have fn inner internal methods which are only generic on T for these larger methods. Small ones probably aren't worth applying such complexity for, I don't have a good instinct for a cut-off threshold on that.

Comment on lines +980 to +990
impl PyUntypedBufferView<Unknown, Unknown, Unknown> {
/// Acquire a buffer view with arbitrary flags,
/// pass it to `f`, then release the buffer.
///
/// The `flags` parameter controls which buffer fields are requested from the exporter.
/// Use constants like [`ffi::PyBUF_SIMPLE`], [`ffi::PyBUF_ND`], [`ffi::PyBUF_STRIDES`], etc.
pub fn with_flags<R>(
obj: &Bound<'_, PyAny>,
flags: c_int,
f: impl FnOnce(&PyUntypedBufferView<Unknown, Unknown, Unknown>) -> R,
) -> PyResult<R> {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With my BufferFlags suggestion above, it would be possible to have type inference on this constructor based on the flags passed in.

Suggested change
impl PyUntypedBufferView<Unknown, Unknown, Unknown> {
/// Acquire a buffer view with arbitrary flags,
/// pass it to `f`, then release the buffer.
///
/// The `flags` parameter controls which buffer fields are requested from the exporter.
/// Use constants like [`ffi::PyBUF_SIMPLE`], [`ffi::PyBUF_ND`], [`ffi::PyBUF_STRIDES`], etc.
pub fn with_flags<R>(
obj: &Bound<'_, PyAny>,
flags: c_int,
f: impl FnOnce(&PyUntypedBufferView<Unknown, Unknown, Unknown>) -> R,
) -> PyResult<R> {
impl<Flags: BufferFlagsType> PyUntypedBufferView<Flags> {
/// Acquire a buffer view with arbitrary flags,
/// pass it to `f`, then release the buffer.
///
/// The `flags` parameter controls which buffer fields are requested from the exporter.
/// Use constants like [`ffi::PyBUF_SIMPLE`], [`ffi::PyBUF_ND`], [`ffi::PyBUF_STRIDES`], etc.
pub fn with_flags<R>(
obj: &Bound<'_, PyAny>,
flags: Flags,
f: impl FnOnce(&Self) -> R,
) -> PyResult<R> {

}
}

impl<Shape: FieldInfo, Stride: FieldInfo> PyUntypedBufferView<Known, Shape, Stride> {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the BufferFlags suggestion above, this could be

Suggested change
impl<Shape: FieldInfo, Stride: FieldInfo> PyUntypedBufferView<Known, Shape, Stride> {
impl<Flags: BufferFlagsType<Format = Known>> PyUntypedBufferView<Flags> {

... which in my opinion is more intuitive to understand what's being required versus listing generics for the irrelevant parameters (and scales better if more parameters are added).

Comment on lines +1062 to +1070
impl<T: Element> PyBufferView<T, Known, Unknown, Unknown> {
/// Acquire a typed buffer view with user-specified flags.
///
/// `PyBUF_FORMAT` is implicitly added to the flags for type validation.
pub fn with_flags<R>(
obj: &Bound<'_, PyAny>,
flags: c_int,
f: impl FnOnce(&PyBufferView<T, Known, Unknown, Unknown>) -> R,
) -> PyResult<R> {
Copy link
Copy Markdown
Member

@davidhewitt davidhewitt Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the above with_flags constructor, this could also use inference with a slight helper from the trait.

Suggested change
impl<T: Element> PyBufferView<T, Known, Unknown, Unknown> {
/// Acquire a typed buffer view with user-specified flags.
///
/// `PyBUF_FORMAT` is implicitly added to the flags for type validation.
pub fn with_flags<R>(
obj: &Bound<'_, PyAny>,
flags: c_int,
f: impl FnOnce(&PyBufferView<T, Known, Unknown, Unknown>) -> R,
) -> PyResult<R> {
impl<T: Element, Flags: BufferFlagsType> PyBufferView<T, Flags> {
/// Acquire a typed buffer view with user-specified flags.
///
/// `PyBUF_FORMAT` is implicitly added to the flags for type validation.
pub fn with_flags<R>(
obj: &Bound<'_, PyAny>,
flags: Flags,
f: impl FnOnce(&PyBufferView<T, Flags::WithKnownFormat>) -> R,
) -> PyResult<R> {

Comment on lines +893 to +899
/// A [struct module style](https://docs.python.org/3/c-api/buffer.html#c.Py_buffer.format)
/// string describing the contents of a single item.
#[inline]
pub fn format(&self) -> &CStr {
debug_assert!(!self.raw.format.is_null());
unsafe { CStr::from_ptr(self.raw.format) }
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did like that we patched .format() to return static "B" if a "simple" request was made. I wonder if there's a way we could re-introduce that logic with some of this new generic machinery.

@davidhewitt
Copy link
Copy Markdown
Member

Thanks for all of this - would be interested to see what you think of these suggestions (none are mandatory, I'm just pushing out ideas of what I think I like, which may not be to others' taste!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Consider stack-allocated PyBuffer variant

2 participants