sycl: add JIT output caching for SPIR-V and Level Zero native binaries by pvelesko · Pull Request #1943 · CEED/libCEED

pvelesko · 2026-03-18T06:35:09Z

Purpose:

Add two-tier JIT output caching for SYCL backends to avoid re-compiling GPU kernels on every run.

Without caching, each process invocation re-runs OpenCL C → SPIR-V compilation (online_compiler) and SPIR-V → native GPU code JIT (zeModuleCreate), adding ~2s per run on Intel GPUs. The SYCL_CACHE_PERSISTENT environment variable does not cache zeModuleCreate calls made via the raw Level Zero API.

Changes

SPIR-V cache ($SYCL_CACHE_DIR/ceed_spirv/<hash>.spv or $HOME/.cache/ceed_lz/): caches output of online_compiler keyed on hash(source + flags), skipping the OpenCL C → SPIR-V step on cache hit
Level Zero native binary cache (ceed_lz/<hash>.native): caches output of zeModuleCreate(IL_SPIRV) via zeModuleGetNativeBinary, reloads with ZE_MODULE_FORMAT_NATIVE on cache hit, skipping the GPU JIT
CeedBuildBundleCached_Sycl: wraps sycl::build() for tensor basis kernels that use specialization constants (used by ceed-sycl-ref-basis). Uses sycl::build directly since native binary caching via raw zeModuleCreate + make_kernel_bundle does not preserve SYCL kernel IDs for bundles with specialization constants.
Test t366-basis: exercises the JIT cache path by creating a basis, applying it, destroying the context, then repeating — the second run hits the cache

Benchmarks (Intel Arc A770, ex1-volume 3D, 5M DOFs)

Backend	Without PR (every run)	With PR (cold)	With PR (warm)	Speedup
sycl/ref	3.12s	3.16s	2.91s	1.07x
sycl/shared	3.03s	3.03s	1.23s	2.46x
sycl/gen	3.27s	3.29s	1.05s	3.11x

The gen backend benefits most (3.1x faster) because it JIT-compiles the most code. Cold-cache runs are identical to baseline (cache miss → normal compile + write). The ref backend sees less improvement because its tensor basis kernels use sycl::build with specialization constants, which cannot be cached via native binary reloading (kernel IDs are lost); only the non-tensor SPIR-V modules benefit from caching.

Cache invalidation

Cache directory defaults to $SYCL_CACHE_DIR/ceed_lz/ or $HOME/.cache/ceed_lz/. Invalidation is manual (delete the directory). Cache write failures are silently ignored so the code works on read-only filesystems.

LLM/GenAI Disclosure:

Claude Code was used to diagnose the kernel bundle does not contain the kernel bug in CeedBuildBundleCached_Sycl and to write the t366-basis test.

By submitting this PR, the author certifies to its contents as described by the Developer's Certificate of Origin.
Please follow the Contributing Guidelines for all PRs.

…ary) CeedJitCompileSource_Sycl now caches its SPIR-V output keyed on hash(source + flags) under $SYCL_CACHE_DIR/ceed_spirv/. On cache hit the online_compiler step is skipped entirely. CeedLoadModule_Sycl now saves the Level Zero native binary produced by zeModuleCreate(IL_SPIRV) via zeModuleGetNativeBinary and reloads it with ZE_MODULE_FORMAT_NATIVE on subsequent runs, skipping the ~2.5s GPU JIT. Cache location: $SYCL_CACHE_DIR/ceed_lz/. Also add CeedBuildBundleCached_Sycl for kernel bundles built via sycl::build() (used by the sycl-ref tensor-basis). Caches the native binary keyed on kernel names + specialization constants (dim, num_comp, Q_1d, P_1d). ceed-sycl-ref-basis switches to CeedBuildBundleCached_Sycl. Both caches default to $HOME/.cache/ceed_lz/ and ceed_spirv/ when SYCL_CACHE_DIR is not set. Cache write failures are non-fatal. Benchmark (Intel Arc A770, ex1-volume/ex2-surface, 200K DOF, p=3, warm cache): gen backend: SYCL/HIP = 1.24-1.31x (was 7x; now within 30% goal) shared backend: SYCL/HIP = 1.71-1.78x (was 7x) ref backend: SYCL/HIP = 2.25-2.53x (was 7x)

pvelesko mentioned this pull request Mar 18, 2026

Add chipStar (SPIR-V) support for HIP backends #1942

Draft

pvelesko force-pushed the sycl-jit-cache branch from a9ea3df to 6142cd0 Compare March 18, 2026 06:50

pvelesko force-pushed the sycl-jit-cache branch from 6142cd0 to d539a07 Compare March 18, 2026 07:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sycl: add JIT output caching for SPIR-V and Level Zero native binaries#1943

sycl: add JIT output caching for SPIR-V and Level Zero native binaries#1943
pvelesko wants to merge 1 commit intoCEED:mainfrom
CHIP-SPV:sycl-jit-cache

pvelesko commented Mar 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pvelesko commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Benchmarks (Intel Arc A770, ex1-volume 3D, 5M DOFs)

Cache invalidation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pvelesko commented Mar 18, 2026 •

edited

Loading