Strip HTTP response bodies after all modules finish processing#3002
Strip HTTP response bodies after all modules finish processing#3002
Conversation
📊 Performance Benchmark Report
📈 Detailed Results (All Benchmarks)
🎯 Performance Summary+ 1 improvement 🚀
! 1 regression ⚠️
22 unchanged ✅🔍 Significant Changes (>10%)
🐍 Python Version 3.11.15 |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## 3.0 #3002 +/- ##
======================================
+ Coverage 91% 91% +1%
======================================
Files 440 440
Lines 37230 37330 +100
======================================
+ Hits 33711 33809 +98
- Misses 3519 3521 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
3c3862a to
db6b2b0
Compare
aa7e2bc to
590e979
Compare
pytest's own allocations (~200 MB) contaminate tracemalloc peak measurements when scans run in-process, masking real differences between branches. Run each benchmark scan as a subprocess instead so measurements reflect only the scan's own memory use. Also rename tests to test_memory_use_* for clarity.
db6b2b0 to
38db27e
Compare
|
@TheTechromancer i looked into what would happen if a module errored. I think in a situation where a module actually hits an unhandled exception, and errors out that way - yes - we'd miss some. But all the failure does is revert back to the old system - of the bodies hanging around until GC. So i don't think it's a serious concern. |
|
@TheTechromancer simplified some of the changes on this branch |
Summary
BBOT is essentially one big memory leak - this is unavoidable, because it is truly recursive, and the whole scan is essentially a giant tree that is formed throughout the scan. That tree includes HTTP_RESPONSE events, which include the full body of the response, which can be significantly large. Prior to this fix, every single one of those was sitting in memory for the entire scan. This change aims to alleviate that by removing the response body from HTTP_RESPONSE events after every module that wants to use it, has.
How it works
_module_consumerscounter onBaseEvent— incremented when an event is queued to a module, decremented when the module finishes processing (via_release())_minimize()— stripsbodyandraw_headerfrom the event's data dict when the counter reaches zero. The event object itself stays alive (parent chains, tags, metadata all preserved)handle_event,handle_batch, events rejected by postcheck,FINISHEDevents, and a fallback inScanEgress.forward_event()for events no module accepts