Skip to content

sycl: add JIT output caching for SPIR-V and Level Zero native binaries#1943

Draft
pvelesko wants to merge 1 commit intoCEED:mainfrom
CHIP-SPV:sycl-jit-cache
Draft

sycl: add JIT output caching for SPIR-V and Level Zero native binaries#1943
pvelesko wants to merge 1 commit intoCEED:mainfrom
CHIP-SPV:sycl-jit-cache

Conversation

@pvelesko
Copy link

@pvelesko pvelesko commented Mar 18, 2026

Purpose:

Add two-tier JIT output caching for SYCL backends to avoid re-compiling GPU kernels on every run.

Without caching, each process invocation re-runs OpenCL C → SPIR-V compilation (online_compiler) and SPIR-V → native GPU code JIT (zeModuleCreate), adding ~2s per run on Intel GPUs. The SYCL_CACHE_PERSISTENT environment variable does not cache zeModuleCreate calls made via the raw Level Zero API.

Changes

  • SPIR-V cache ($SYCL_CACHE_DIR/ceed_spirv/<hash>.spv or $HOME/.cache/ceed_lz/): caches output of online_compiler keyed on hash(source + flags), skipping the OpenCL C → SPIR-V step on cache hit
  • Level Zero native binary cache (ceed_lz/<hash>.native): caches output of zeModuleCreate(IL_SPIRV) via zeModuleGetNativeBinary, reloads with ZE_MODULE_FORMAT_NATIVE on cache hit, skipping the GPU JIT
  • CeedBuildBundleCached_Sycl: wraps sycl::build() for tensor basis kernels that use specialization constants (used by ceed-sycl-ref-basis). Uses sycl::build directly since native binary caching via raw zeModuleCreate + make_kernel_bundle does not preserve SYCL kernel IDs for bundles with specialization constants.
  • Test t366-basis: exercises the JIT cache path by creating a basis, applying it, destroying the context, then repeating — the second run hits the cache

Benchmarks (Intel Arc A770, ex1-volume 3D, 5M DOFs)

Backend Without PR (every run) With PR (cold) With PR (warm) Speedup
sycl/ref 3.12s 3.16s 2.91s 1.07x
sycl/shared 3.03s 3.03s 1.23s 2.46x
sycl/gen 3.27s 3.29s 1.05s 3.11x

The gen backend benefits most (3.1x faster) because it JIT-compiles the most code. Cold-cache runs are identical to baseline (cache miss → normal compile + write). The ref backend sees less improvement because its tensor basis kernels use sycl::build with specialization constants, which cannot be cached via native binary reloading (kernel IDs are lost); only the non-tensor SPIR-V modules benefit from caching.

Cache invalidation

Cache directory defaults to $SYCL_CACHE_DIR/ceed_lz/ or $HOME/.cache/ceed_lz/. Invalidation is manual (delete the directory). Cache write failures are silently ignored so the code works on read-only filesystems.

LLM/GenAI Disclosure:

Claude Code was used to diagnose the kernel bundle does not contain the kernel bug in CeedBuildBundleCached_Sycl and to write the t366-basis test.

By submitting this PR, the author certifies to its contents as described by the Developer's Certificate of Origin.
Please follow the Contributing Guidelines for all PRs.

…ary)

CeedJitCompileSource_Sycl now caches its SPIR-V output keyed on
hash(source + flags) under $SYCL_CACHE_DIR/ceed_spirv/. On cache hit
the online_compiler step is skipped entirely.

CeedLoadModule_Sycl now saves the Level Zero native binary produced by
zeModuleCreate(IL_SPIRV) via zeModuleGetNativeBinary and reloads it with
ZE_MODULE_FORMAT_NATIVE on subsequent runs, skipping the ~2.5s GPU JIT.
Cache location: $SYCL_CACHE_DIR/ceed_lz/.

Also add CeedBuildBundleCached_Sycl for kernel bundles built via
sycl::build() (used by the sycl-ref tensor-basis). Caches the native
binary keyed on kernel names + specialization constants (dim, num_comp,
Q_1d, P_1d). ceed-sycl-ref-basis switches to CeedBuildBundleCached_Sycl.

Both caches default to $HOME/.cache/ceed_lz/ and ceed_spirv/ when
SYCL_CACHE_DIR is not set. Cache write failures are non-fatal.

Benchmark (Intel Arc A770, ex1-volume/ex2-surface, 200K DOF, p=3,
warm cache):
  gen backend:    SYCL/HIP = 1.24-1.31x (was 7x; now within 30% goal)
  shared backend: SYCL/HIP = 1.71-1.78x (was 7x)
  ref backend:    SYCL/HIP = 2.25-2.53x (was 7x)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant