sycl: add JIT output caching for SPIR-V and Level Zero native binaries#1943
Draft
sycl: add JIT output caching for SPIR-V and Level Zero native binaries#1943
Conversation
a9ea3df to
6142cd0
Compare
…ary) CeedJitCompileSource_Sycl now caches its SPIR-V output keyed on hash(source + flags) under $SYCL_CACHE_DIR/ceed_spirv/. On cache hit the online_compiler step is skipped entirely. CeedLoadModule_Sycl now saves the Level Zero native binary produced by zeModuleCreate(IL_SPIRV) via zeModuleGetNativeBinary and reloads it with ZE_MODULE_FORMAT_NATIVE on subsequent runs, skipping the ~2.5s GPU JIT. Cache location: $SYCL_CACHE_DIR/ceed_lz/. Also add CeedBuildBundleCached_Sycl for kernel bundles built via sycl::build() (used by the sycl-ref tensor-basis). Caches the native binary keyed on kernel names + specialization constants (dim, num_comp, Q_1d, P_1d). ceed-sycl-ref-basis switches to CeedBuildBundleCached_Sycl. Both caches default to $HOME/.cache/ceed_lz/ and ceed_spirv/ when SYCL_CACHE_DIR is not set. Cache write failures are non-fatal. Benchmark (Intel Arc A770, ex1-volume/ex2-surface, 200K DOF, p=3, warm cache): gen backend: SYCL/HIP = 1.24-1.31x (was 7x; now within 30% goal) shared backend: SYCL/HIP = 1.71-1.78x (was 7x) ref backend: SYCL/HIP = 2.25-2.53x (was 7x)
6142cd0 to
d539a07
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose:
Add two-tier JIT output caching for SYCL backends to avoid re-compiling GPU kernels on every run.
Without caching, each process invocation re-runs OpenCL C → SPIR-V compilation (
online_compiler) and SPIR-V → native GPU code JIT (zeModuleCreate), adding ~2s per run on Intel GPUs. TheSYCL_CACHE_PERSISTENTenvironment variable does not cachezeModuleCreatecalls made via the raw Level Zero API.Changes
$SYCL_CACHE_DIR/ceed_spirv/<hash>.spvor$HOME/.cache/ceed_lz/): caches output ofonline_compilerkeyed onhash(source + flags), skipping the OpenCL C → SPIR-V step on cache hitceed_lz/<hash>.native): caches output ofzeModuleCreate(IL_SPIRV)viazeModuleGetNativeBinary, reloads withZE_MODULE_FORMAT_NATIVEon cache hit, skipping the GPU JITCeedBuildBundleCached_Sycl: wrapssycl::build()for tensor basis kernels that use specialization constants (used byceed-sycl-ref-basis). Usessycl::builddirectly since native binary caching via rawzeModuleCreate+make_kernel_bundledoes not preserve SYCL kernel IDs for bundles with specialization constants.t366-basis: exercises the JIT cache path by creating a basis, applying it, destroying the context, then repeating — the second run hits the cacheBenchmarks (Intel Arc A770, ex1-volume 3D, 5M DOFs)
The
genbackend benefits most (3.1x faster) because it JIT-compiles the most code. Cold-cache runs are identical to baseline (cache miss → normal compile + write). Therefbackend sees less improvement because its tensor basis kernels usesycl::buildwith specialization constants, which cannot be cached via native binary reloading (kernel IDs are lost); only the non-tensor SPIR-V modules benefit from caching.Cache invalidation
Cache directory defaults to
$SYCL_CACHE_DIR/ceed_lz/or$HOME/.cache/ceed_lz/. Invalidation is manual (delete the directory). Cache write failures are silently ignored so the code works on read-only filesystems.LLM/GenAI Disclosure:
Claude Code was used to diagnose the
kernel bundle does not contain the kernelbug inCeedBuildBundleCached_Sycland to write thet366-basistest.By submitting this PR, the author certifies to its contents as described by the Developer's Certificate of Origin.
Please follow the Contributing Guidelines for all PRs.