Conversation
226bcf3 to
338501d
Compare
There was a problem hiding this comment.
Pull request overview
This PR adds first-class, end-to-end ESM instrumentation to Jazzer.js (coverage + compare hooks + function hooks + sourcemaps) using Node’s native module.register() loader hooks, aligns CLI/docs with the resulting behavior, and adds integration tests for pure-ESM and mixed CJS+ESM targets.
Changes:
- Added a Node loader-hook based ESM instrumentation pipeline (coverage counters, sourcemaps, and function-hook propagation via
MessagePortwhen supported). - Introduced per-module coverage counter regions for ESM and corresponding native addon support.
- Added new ESM-focused integration/unit tests and updated docs/readmes for current flags, behavior, and links.
Reviewed changes
Copilot reviewed 42 out of 43 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/esm_instrumentation/target.mjs | Pure-ESM target used to validate compare-hook feedback. |
| tests/esm_instrumentation/package.json | Adds ESM instrumentation integration test package. |
| tests/esm_instrumentation/fuzz.mjs | ESM fuzz entrypoint that exercises the ESM target. |
| tests/esm_instrumentation/esm_instrumentation.test.js | Jest-based integration test for pure ESM instrumentation. |
| tests/esm_cjs_mixed/package.json | Adds mixed CJS+ESM integration test package. |
| tests/esm_cjs_mixed/fuzz.mjs | Mixed-module fuzz target to exercise both instrumentation paths. |
| tests/esm_cjs_mixed/esm_cjs_mixed.test.js | Integration test for combined CJS require-hook + ESM loader-hook behavior. |
| tests/esm_cjs_mixed/esm-check.mjs | ESM half of the mixed secret check. |
| tests/esm_cjs_mixed/cjs-check.cjs | CJS half of the mixed secret check. |
| packages/jest-runner/readme.md | Documentation fixes/updates and link correction. |
| packages/instrumentor/plugins/esmCodeCoverage.ts | New ESM-specific coverage plugin emitting per-module counter writes. |
| packages/instrumentor/plugins/esmCodeCoverage.test.ts | Unit tests for ESM coverage plugin behavior and interaction with compare hooks. |
| packages/instrumentor/plugins/coverageVisitor.ts | Shared coverage visitor extracted for reuse across CJS/ESM coverage modes. |
| packages/instrumentor/plugins/codeCoverage.ts | Refactors CJS coverage plugin to use shared visitor. |
| packages/instrumentor/instrument.ts | Registers module.register() loader hook and wires hook serialization to loader thread. |
| packages/instrumentor/esmSourceMaps.test.ts | Unit tests for ESM preamble sourcemap shifting + registry registration. |
| packages/instrumentor/esmFunctionHooks.test.ts | Unit tests for function-hook serialization and hook ID consistency. |
| packages/instrumentor/esm-loader.mts | New ESM loader implementation: transforms modules + injects preamble + drains hook updates. |
| packages/instrumentor/README.md | Updates instrumentor documentation to describe ESM loader-hook path. |
| packages/fuzzer/shared/coverage.h | Adds native API declaration for module counter registration. |
| packages/fuzzer/shared/coverage.cpp | Implements RegisterModuleCounters for per-module coverage regions. |
| packages/fuzzer/shared/callbacks.cpp | Exposes registerModuleCounters from the native addon. |
| packages/fuzzer/coverage.ts | Adds createModuleCounters to allocate/register per-module counter buffers. |
| packages/fuzzer/addon.ts | Extends native addon type surface for registerModuleCounters. |
| packages/fuzzer/README.md | Updates documentation to reflect sync vs async fuzzing modes. |
| packages/core/package.json | Pins yargs v17 and updates tmp/@types versions for Node compatibility. |
| packages/core/finding.ts | Fixes stale/inherited stack headers and improves non-Finding error printing. |
| packages/core/finding.test.ts | Adds coverage for stale/inherited stack header handling and name/message fallback. |
| packages/core/core.ts | Sends finalized hooks to the ESM loader thread before importing user modules. |
| packages/core/README.md | Minor doc/link corrections. |
| packages/bug-detectors/README.md | Updates links and formatting in bug-detectors docs. |
| package.json | Adjusts root engines field (currently to an invalid semver range). |
| package-lock.json | Lockfile updates, including engines field changes. |
| examples/jest_typescript_integration/README.md | Updates Jest TS integration example config and import path. |
| eslint.config.mjs | Ignores AGENTS.md for markdown linting. |
| docs/release.md | Updates GitHub release link. |
| docs/jest-integration.md | Updates Jest integration docs (versions, formatting, and examples). |
| docs/fuzz-targets.md | Major update: documents direct ESM support, requirements, and CLI flag naming. |
| docs/fuzz-settings.md | Updates env var name and fixes example snippets. |
| docs/bug-detectors.md | Fixes CLI flag naming/spacing and formatting. |
| docs/architecture.md | Updates architecture documentation to include ESM loader-hook path. |
| README.md | Documents ESM support and fixes docs link casing. |
| .npmignore | Excludes .mts sources from publish artifacts while keeping .d.mts. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
AGENTS.md uses multiple top-level headings and unlabeled code fences by design. Exclude it from markdownlint rather than restructuring the document.
afebfdd to
c4a5b94
Compare
kyakdan
left a comment
There was a problem hiding this comment.
Great Improvement!
I only have some small remarks/questions.
| // cumulative allocation is negligible. | ||
| auto *pcEntries = new PCTableEntry[size]; | ||
| for (std::size_t i = 0; i < size; ++i) { | ||
| pcEntries[i] = {gNextModulePC++, 0}; |
There was a problem hiding this comment.
Why is gNextModulePC needed? With Flags=0 the PC value is deadcode
There was a problem hiding this comment.
Right, the values are only used by -print_pcs and __sanitizer_cov_get_observed_pcs . They have to be unique within the module only, so setting pcEntries[i] = {i, 0}; should be enough here.
There was a problem hiding this comment.
@kyakdan Or should we leave this in as forward-compatibility with symbolizer?
| pcEntries[i] = {gNextModulePC++, 0}; | ||
| } | ||
|
|
||
| __sanitizer_cov_8bit_counters_init(buf.Data(), buf.Data() + size); |
There was a problem hiding this comment.
This function has similar code parts to RegisterNewCounters and they can be extracted into a shared helper.
There was a problem hiding this comment.
They do different things.
The CJS function uses a pre-allocated array passed earlier to C++ side with RegisterCoverageMap and gets called with different offsets.
The ES function gets buffers owned by each module individually.
The common part would be something like this, but that would obfuscate what's actually happening:
void registerCounterRegion(uint8_t *start, uint8_t *end,
uintptr_t *pcs_start, uintptr_t *pcs_end) {
__sanitizer_cov_8bit_counters_init(start, end);
__sanitizer_cov_pcs_init(pcs_start, pcs_end);
}IMO the current version is good as is, unless you have a nice way to unify!
yargs v18 is ESM-only and requires require()-of-ESM support, which only shipped unflagged in Node 20.19 / 22.12. Since the jazzer CLI entry point must remain loadable via require() on Node >= 20.6, pin yargs to the v17 line.
toThrowError -> toThrow, toBeCalledTimes -> toHaveBeenCalledTimes
Move the 'dist' pattern from modulePathIgnorePatterns (which blocks module resolution) to testPathIgnorePatterns (which only affects test discovery). Disable ts-jest diagnostics since it does not support composite project references — tsc -b is the source of truth for type checking.
When a before-hook throws inside an async function body, the async function returns a rejected Promise AND the finding is stored synchronously. Previously, both paths would resolve the C++ promise and deferred (via the synchronous throw's catch block and the .then() rejection handler's microtask), which is undefined behavior that can hang forked child processes. Check clearFirstFinding() before attaching .then() handlers. If a synchronous finding exists, suppress the rejected Promise and handle the finding exclusively through the synchronous path.
Some libraries (e.g. pdf.js) use prototype-based error constructors where .stack is inherited from a prototype Error, showing a stale header like "Error" instead of the actual .name and .message. Detect this mismatch and replace the first line of .stack with the correct name: message before printing. Only applies to non-Finding errors, since cleanErrorStack already rewrites Finding headers.
Each ESM module will need its own coverage counter buffer, independent of the shared global coverage map used by CJS modules. libFuzzer supports multiple disjoint counter regions, so we add a C++ function that registers per-module buffers and a TypeScript API to allocate them. The GC-prevention array in CoverageTracker keeps module buffers alive for the process lifetime, matching libFuzzer's expectation that counter memory remains valid.
Add an ESM loader that intercepts import() and static imports on Node >= 20.6, applying Babel coverage and compare-hook transforms. Each module gets its own counter buffer via a preamble that runs on the main thread. The shared coverage visitor is extracted from codeCoverage.ts so both the CJS path (global IDs via incrementCounter) and the new ESM path (module-local IDs via direct array writes) reuse the same branch/loop/ternary instrumentation logic. The ESM counter uses % 255 + 1 for NeverZero instead of || 1 to avoid infinite Babel visitor recursion on the generated expression. Istanbul source coverage is included from the start so that ESM modules appear in --coverage reports alongside CJS ones. On older Node versions the loader is silently skipped.
The jest-runner instruments code through Jest's own transformer pipeline and monkey-patches built-ins at runtime — it never uses the ESM loader hooks. Extracted registerEsmLoaderHooks from registerInstrumentor, called only from startFuzzing (the CLI entry point). hookRequire stays in registerInstrumentor because custom hooks loaded via import() in initFuzzing still need coverage instrumentation through require.extensions.
The ESM loader now generates separate source map objects (instead of inline ones) and registers them with the main-thread SourceMapRegistry via a preamble call. This lets source-map-support remap ESM stack traces to original source positions. The preamble line-shift is handled by prepending VLQ semicolons to the mappings string — each semicolon represents an unmapped generated line, pushing all real mappings down by the correct offset.
Establish a MessagePort channel between the main thread and the ESM loader thread. After bug detectors register their hooks and finalizeHooks() completes, sendHooksToLoader() serializes the hook metadata and posts it to the loader. The loader uses receiveMessageOnPort() — a synchronous, non-blocking read — to drain the message queue at the start of each module load. Received hooks are registered as stubs (with no-op functions) in the loader's own hookManager, which the functionHooks Babel plugin reads from. At runtime, the instrumented code calls HookManager.callHook() on the main-thread global, where the real hook functions live. Explicit hook IDs in the serialization format guard against index mismatches between threads. The MessagePort requires Node >= 20.11 (transferList support in module.register); older 20.x builds degrade to ESM instrumentation without function hooks.
Replace crypto.randomInt(512) in fakePC() with a deterministic xorshift32 PRNG seeded from the libFuzzer -seed= value. This makes TORC slot assignments identical across runs, so a given seed produces the exact same mutation schedule every time. If no seed is provided, one is generated, injected into the libFuzzer args, and printed — giving full reproducibility from a single seed. The seed flows from the CLI through the Instrumentor to both the CJS transform path (main thread) and the ESM loader (via module.register initialization data).
Replace the outdated "pure ESM projects will not be instrumented" warning with real documentation: how the loader hook works, the minimal setup (export function fuzz + type: module), Node >= 20.6 requirement, and the function hooks restriction (>= 20.11). Add a comparison table between direct ESM (npx jazzer) and Jest-based ESM (Babel transform approach) so users can pick the right path for their project.
…riptions - fuzz-targets.md: --fuzzFunction -> --fuzzEntryPoint, corpus is positional not a --corpus flag, normalize coverage reporter order - fuzz-settings.md: JAZZER_COVERAGE env -> JAZZER_COVERAGE_REPORTERS, remove incorrect -runs=-1 equivalence claim, fix malformed xmlDictionary and timeout snippets - jest-integration.md: close unclosed code blocks, update stale version pin from 2.1.0 to ^3.0.0 - bug-detectors.md: normalize --disable_bug_detectors to --disableBugDetectors, fix missing spaces before 'in CLI mode' - examples/jest_typescript_integration: runner -> testRunner, fix stale import path, fix trailing commas in JSON, fix title typo - architecture.md: mention ESM loader hook alongside CJS hookRequire - Package READMEs: update jazzer.js-commercial links to jazzer.js, document async fuzzing mode in fuzzer README, add ESM loader path to instrumentor README, fix typos
Users on Node < 20.6 get no ESM instrumentation; users on 20.6–20.10 get coverage but no function hooks in ESM. Both cases previously failed silently, making it hard to understand why coverage wasn't increasing. Now a one-time warning is printed to stderr. Also documents why gNextModulePC uses unique values despite PCFlags=0 making the PC field inert in libFuzzer's core feedback loop.
Direct CLI fuzzing (
npx jazzer) could load ESM targets, but pure ESM code was not fully instrumented end-to-end the same way as CJS/Jest paths. This PR closes that gap and documents the actual behavior/limits.What changed
module.register()loader hooks.MessagePort.yargsto v17 so the CJS CLI entrypoint stays compatible on supported Node versions.Runtime constraints
transferListsupport inmodule.register).