Fix memory leak: clear managed dict in pybind11_object_dealloc on Python 3.13+ #5999
Fix memory leak: clear managed dict in pybind11_object_dealloc on Python 3.13+ #5999yamatveev wants to merge 3 commits intopybind:masterfrom
Conversation
On Python 3.14, PyObject_GC_Del (tp_free) no longer implicitly clears the managed dict of objects with Py_TPFLAGS_MANAGED_DICT. Without an explicit PyObject_ClearManagedDict() call before tp_free(), objects stored in the __dict__ of py::dynamic_attr() instances have their refcounts permanently abandoned, causing memory leaks — capsule destructors for numpy arrays (and other objects) never run. Adds a regression test: stores a py::capsule in the __dict__ of a DynamicClass instance and asserts the capsule destructor is called when the instance is deleted.
|
|
||
|
|
||
| @pytest.mark.xfail("env.PYPY") | ||
| @pytest.mark.skipif("env.GRAALPY", reason="Cannot reliably trigger GC") |
There was a problem hiding this comment.
Shouldn't this be xfail strict=False for flakey tests?
There was a problem hiding this comment.
Skipping it is fine if passing vs. failing doesn't mean anything. If you hope it always passes some day, then yes.
There was a problem hiding this comment.
So, should I modify it or not?
There was a problem hiding this comment.
For GRAALPY, it's fine. If the comment was that the PYPY xfail needed a strict=False, that would be fine, though I think due to the way the GC works in PyPy this will always be failed. And PyPy is dead now so we don't need to worry about that changing in the future.
…ict=False xfail on PYPY
| // test_dynamic_attr_dealloc_frees_dict_contents | ||
| // Regression test: pybind11_object_dealloc() must call PyObject_ClearManagedDict() | ||
| // before tp_free() so that objects stored in a py::dynamic_attr() instance __dict__ | ||
| // have their refcounts decremented when the pybind11 object is freed. On Python 3.14+ |
There was a problem hiding this comment.
Cursor Opus 4.6 1M Thinking prompt and response:
Question
There is a small asymmetry between:
include/pybind11/detail/class.h line 507:
// On Python 3.13+, PyObject_GC_Del no longer implicitly clears the managed
tests/test_methods_and_attributes.cpp line 401+:
// have their refcounts decremented when the pybind11 object is freed. On Python 3.14+
// tp_free no longer implicitly clears the managed dict, causing permanent leaks.
Should it be Python 3.13+ also in test_methods_and_attributes.cpp?
Analysis
The code fix in class.h is gated on 3.13+ (#if PY_VERSION_HEX >= 0x030D0000), which is correct because PyObject_ClearManagedDict and Py_TPFLAGS_MANAGED_DICT were introduced in 3.13, and calling it is safe (idempotent) even if 3.13 still does the implicit clearing. The PR description itself says "On Python <= 3.13, PyObject_GC_Del ... cleared the managed dict as a side effect. On Python 3.14 this implicit clearing was removed."
So the comment in the test is more precisely correct about when the bug manifests (3.14+), while the code fix in class.h is correct about when the guard is needed (3.13+, since that's when the API exists).
The test comment should say "3.13+" to be consistent with the fix's comment and the #if guard. The leak behavior might only be observable on 3.14+, but the comment is describing the fix rationale, not the exact Python version where the bug appears. Saying "3.13+" keeps both comments aligned and avoids confusing a reader who sees 0x030D0000 (3.13) in the code but "3.14+" in the test comment.
There was a problem hiding this comment.
OK, I made the comments aligned. I do not know your policy: should I just add new commits or squash them and force-push?
| // have their refcounts decremented when the pybind11 object is freed. On Python 3.14+ | ||
| // tp_free no longer implicitly clears the managed dict, causing permanent leaks. | ||
| m.def("make_dynamic_attr_with_capsule", []() -> py::object { | ||
| s_dynamic_attr_capsule_freed = false; |
There was a problem hiding this comment.
I was wondering if there is an easy way to avoid the global variable (and the subtle pitfalls that come with globals). Cursor then pointed out this:
DynamicClass already exists in the test module as a py::dynamic_attr() class, so the test can be done entirely in Python:
def test_dynamic_attr_dealloc_frees_dict_contents():
freed = [False]
class Flag:
def __del__(self):
freed[0] = True
obj = m.DynamicClass()
obj.flag = Flag()
del obj
pytest.gc_collect()
assert freed[0]This would remove the need for s_dynamic_attr_capsule_freed, make_dynamic_attr_with_capsule, and is_dynamic_attr_capsule_freed on the C++ side entirely.
It should work because obj.flag = Flag() stores into the managed dict — exactly the mechanism being tested. The __del__ on CPython is called synchronously when the refcount hits 0. For PyPy/GraalPy (where __del__ timing is non-deterministic), the test is already xfail/skip.
Could you please try this out?
There was a problem hiding this comment.
Nope, this won't work, this test always passes, event if I comment out fix. Here is claude summary:
Here's what happens with the commented-out fix on Python 3.14:
After del obj → pybind11_object_dealloc:
- PyObject_GC_UnTrack(self) — removes the pybind11 object from GC tracking
- PyObject_ClearManagedDict — skipped (fix commented out)
- tp_free(self) — frees raw memory; the managed dict's ob_refcnt is still 1 (now a dangling reference from freed memory)
Now pytest.gc_collect() runs the GC. The GC initializes gc_refs = ob_refcnt for all tracked objects and traverses them to subtract internal references:
| Object | gc_refs start | effect of traversal | gc_refs after |
|---|---|---|---|
| orphaned dict | 1 | nobody decrements it (pybind11 obj is untracked) | 1 → "reachable from outside" |
Flag instance |
1 | dict.tp_traverse decrements it | 0 → "unreachable candidate" |
py::capsule |
n/a | not GC-tracked, invisible | never enters GC set |
After this phase:
Flaghas gc_refs=0 → GC considers it a finalization candidate → del is called → freed[0] = True → test passes ✓py::capsuleis not GC-tracked → never gets gc_refs=0 → GC never sees it → destructor never runs → test fails ✗
The dict stays perpetually "reachable" (gc_refs=1 from the dangling reference in freed memory), so it leaks permanently along with the capsule. The Flag gets rescued as a side effect of the GC's traversal decrementing its to 0 — which masks the exact bug we're testing for.
The pure Python test is therefore not a valid regression test for this bug. We need to revert to the capsule approach.
Distinguish between when the API is available (3.13+, where PyObject_ClearManagedDict was introduced) and when the leak actually manifests (3.14+, where tp_free stopped implicitly clearing the managed dict).
Closes #5998
Problem
On Python 3.14, objects stored in the
__dict__ofpy::dynamic_attr()instances are permanently leaked — their refcounts are abandoned when the pybind11 object is freed, so destructors (including capsule destructors) never run.Root cause:
pybind11_object_dealloc()callstype->tp_free(self)without first callingPyObject_ClearManagedDict(self). On Python ≤ 3.13,PyObject_GC_Del(whattp_freeresolves to for GC-tracked objects) cleared the managed dict as a side effect. On Python 3.14 this implicit clearing was removed, requiring an explicit call intp_dealloc.The existing
pybind11_clear()(which correctly callsPyObject_ClearManagedDict) is only invoked by the cyclic GC viatp_clear— it is never called during normal reference-count-driven deallocation.Fix
// pybind11/detail/class.h — pybind11_object_dealloc()
PyObject_ClearManagedDictis idempotent, so calling it beforetp_freeis safe on all Python 3.13+ versions.Test
Added test_dynamic_attr_dealloc_frees_dict_contents to test_methods_and_attributes:
DynamicClass (py::dynamic_attr())instancepy::capsulein its__dict__; the capsule destructor sets a global flagdel instance+gc_collect()Impact
Any
py::dynamic_attr()class that stores Python objects in instance__dict__from C++ (viaobj.attr("x") = value) is affected. A prominent real-world case is https://gitlab.com/tango-controls/pytango, whereTango::DeviceAttributeusespy::dynamic_attr()to store zero-copy capsule-backed numpy arrays — these arrays leaked completely under Python 3.14 (https://gitlab.com/tango-controls/pytango/-/issues/744).