Skip to content

Strip HTTP response bodies after all modules finish processing#3002

Merged
liquidsec merged 6 commits into3.0from
memory-optimize-refcount-httpresponse
Apr 2, 2026
Merged

Strip HTTP response bodies after all modules finish processing#3002
liquidsec merged 6 commits into3.0from
memory-optimize-refcount-httpresponse

Conversation

@liquidsec
Copy link
Copy Markdown
Contributor

@liquidsec liquidsec commented Mar 31, 2026

Summary

BBOT is essentially one big memory leak - this is unavoidable, because it is truly recursive, and the whole scan is essentially a giant tree that is formed throughout the scan. That tree includes HTTP_RESPONSE events, which include the full body of the response, which can be significantly large. Prior to this fix, every single one of those was sitting in memory for the entire scan. This change aims to alleviate that by removing the response body from HTTP_RESPONSE events after every module that wants to use it, has.

How it works

  • _module_consumers counter on BaseEvent — incremented when an event is queued to a module, decremented when the module finishes processing (via _release())
  • _minimize() — strips body and raw_header from the event's data dict when the counter reaches zero. The event object itself stays alive (parent chains, tags, metadata all preserved)
  • All decrement paths covered: handle_event, handle_batch, events rejected by postcheck, FINISHED events, and a fallback in ScanEgress.forward_event() for events no module accepts

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 31, 2026

📊 Performance Benchmark Report

Comparing 3.0 (baseline) vs memory-optimize-refcount-httpresponse (current)

📈 Detailed Results (All Benchmarks)

📋 Complete results for all benchmarks - includes both significant and insignificant changes

🧪 Test Name 📏 Base 📏 Current 📈 Change 🎯 Status
Bloom Filter Dns Mutation Tracking Performance 4.14ms 5.07ms +22.4% 🔴🔴🔴 ⚠️
Bloom Filter Large Scale Dns Brute Force 17.02ms 17.01ms -0.1%
Large Closest Match Lookup 353.83ms 350.76ms -0.9%
Realistic Closest Match Workload 187.02ms 185.98ms -0.6%
Event Memory Medium Scan 1776 B/event 1784 B/event +0.5%
Event Memory Large Scan 1759 B/event 1768 B/event +0.5%
Event Validation Full Scan Startup Small Batch 397.25ms 401.65ms +1.1%
Event Validation Full Scan Startup Large Batch 568.66ms 576.17ms +1.3%
Make Event Autodetection Small 30.73ms 30.76ms +0.1%
Make Event Autodetection Large 315.33ms 316.80ms +0.5%
Make Event Explicit Types 13.80ms 13.74ms -0.5%
Excavate Single Thread Small 3.947s 3.962s +0.4%
Excavate Single Thread Large 9.582s 9.559s -0.2%
Excavate Parallel Tasks Small 4.075s 4.061s -0.3%
Excavate Parallel Tasks Large 7.182s 7.162s -0.3%
Is Ip Performance 3.19ms 3.17ms -0.6%
Make Ip Type Performance 11.41ms 11.41ms +0.1%
Mixed Ip Operations 4.50ms 4.50ms -0.0%
Memory Use Web Crawl 259.7 MB 43.6 MB -83.2% 🟢🟢🟢 🚀
Memory Use Subdomain Enum 19.3 MB 19.3 MB +0.2%
Scan Throughput 100 7.807s 8.045s +3.1%
Scan Throughput 1000 39.868s 40.817s +2.4%
Typical Queue Shuffle 65.61µs 63.78µs -2.8%
Priority Queue Shuffle 735.02µs 732.40µs -0.4%

🎯 Performance Summary

+ 1 improvement 🚀
! 1 regression ⚠️
  22 unchanged ✅

🔍 Significant Changes (>10%)

  • Bloom Filter Dns Mutation Tracking Performance: 22.4% 🐌 slower
  • Memory Use Web Crawl: 83.2% 🚀 less memory

🐍 Python Version 3.11.15

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 31, 2026

Codecov Report

❌ Patch coverage is 92.30769% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 91%. Comparing base (6b359ac) to head (db8d16c).
⚠️ Report is 12 commits behind head on 3.0.

Files with missing lines Patch % Lines
bbot/modules/base.py 90% 3 Missing ⚠️
Additional details and impacted files
@@          Coverage Diff           @@
##             3.0   #3002    +/-   ##
======================================
+ Coverage     91%     91%    +1%     
======================================
  Files        440     440            
  Lines      37230   37330   +100     
======================================
+ Hits       33711   33809    +98     
- Misses      3519    3521     +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@liquidsec liquidsec force-pushed the memory-optimize-refcount-httpresponse branch 3 times, most recently from 3c3862a to db6b2b0 Compare March 31, 2026 19:47
@liquidsec liquidsec force-pushed the additional-memory-benchmarks branch from aa7e2bc to 590e979 Compare March 31, 2026 19:48
pytest's own allocations (~200 MB) contaminate tracemalloc peak
measurements when scans run in-process, masking real differences
between branches. Run each benchmark scan as a subprocess instead
so measurements reflect only the scan's own memory use.

Also rename tests to test_memory_use_* for clarity.
@liquidsec liquidsec force-pushed the memory-optimize-refcount-httpresponse branch from db6b2b0 to 38db27e Compare March 31, 2026 19:49
Base automatically changed from additional-memory-benchmarks to 3.0 April 1, 2026 16:22
@liquidsec
Copy link
Copy Markdown
Contributor Author

@TheTechromancer i looked into what would happen if a module errored. I think in a situation where a module actually hits an unhandled exception, and errors out that way - yes - we'd miss some. But all the failure does is revert back to the old system - of the bodies hanging around until GC. So i don't think it's a serious concern.

@liquidsec
Copy link
Copy Markdown
Contributor Author

@TheTechromancer simplified some of the changes on this branch

@liquidsec liquidsec merged commit edefefc into 3.0 Apr 2, 2026
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants