Skip to content

Add pluggable backend dispatch system#1151

Draft
Intron7 wants to merge 4 commits intomainfrom
add-backend-to-squidpy
Draft

Add pluggable backend dispatch system#1151
Intron7 wants to merge 4 commits intomainfrom
add-backend-to-squidpy

Conversation

@Intron7
Copy link
Copy Markdown
Member

@Intron7 Intron7 commented Apr 7, 2026

Summary

  • Add squidpy._backends package with a @dispatch decorator that routes function calls to registered backends (e.g. GPU) via Python entrypoints, falling back to the CPU implementation
    when a backend doesn't implement a function.
  • Apply @dispatch to spatial_autocorr, co_occurrence, and ligrec — backends like rapids-singlecell register via squidpy.backends
    entrypoints and are discovered at import time.
  • Add squidpy.settings.backend and squidpy.settings.use_backend() context manager for global/scoped backend selection.
  • Add squidpy.testing.backend_conformance module for backend packages to validate correctness against CPU reference results in their own CI.

How it works

import squidpy as sq

# Global
sq.settings.backend = "gpu"
sq.gr.spatial_autocorr(adata, mode="moran")

# Per-call
sq.gr.spatial_autocorr(adata, mode="moran", backend="cuda")

# Scoped
with sq.settings.use_backend("gpu"):
    sq.gr.co_occurrence(adata, cluster_key="cell_type")

The @dispatch decorator uses signature introspection to route kwargs:
- Shared params (e.g. adata, mode, copy) — forwarded to the backend
- CPU-only params (e.g. n_jobs, show_progress_bar) — silently dropped (or warn if non-default)
- Backend-only params (e.g. use_sparse, multi_gpu) — forwarded to the backend, injected into the host function's signature and docstring automatically

No manual backend parameter declaration needed on decorated functions — @dispatch injects it. Backend-specific params and their docstrings are merged from discovered backends into the
host function's signature at import time, so help() and IDE tooltips show the full parameter list.

Design decisions

- backend not devicethis dispatches to an implementation (CPU, rapids-singlecell, future JAX backend), not a hardware target. A JAX backend could run on CPU, GPU, or TPU.
- Trusted backends listbackends in TRUSTED_BACKENDS (currently: rapids_singlecell with aliases rsc, cuda, gpu, rapids-singlecell) have passed the conformance test suite. Untrusted
backends still work but emit a one-time warning encouraging authors to submit a PR to become trusted.
- Thread-safeContextVar-backed settings, safe for threaded and async use.
- Conformance suite in squidpy, runs in backend CIsquidpy.testing.backend_conformance.validate_backend() compares backend results against CPU reference. Never runs in squidpy CI (no
GPU).
- Removed joblib backend param from spatial_autocorr (hardcoded to "loky") to avoid naming collision with the dispatch backend.

Towards a shared scverse dispatch package

The _backends package contains no squidpy-specific logicit is pure dispatch machinery (entrypoint discovery, settings, signature introspection, docstring merging). Long term this can
be extracted into a standalone scverse-dispatch package that any scverse host library can use:

# scanpy
@dispatch
def pca(adata, n_comps=50, ...): ...

# squidpy
@dispatch
def spatial_autocorr(adata, mode="moran", ...): ...

Backend packages (e.g. rapids-singlecell) would register one entrypoint per host library. The API surface (@dispatch, settings.backend, use_backend()) and the conformance test pattern
would stay the same. This PR is the proving groundonce the pattern is validated here, spinning it out is straightforward.

New files

┌────────────────────────────────────────────┬───────────────────────────────────────────────────────────────┐
│                    FilePurpose                            │
├────────────────────────────────────────────┼───────────────────────────────────────────────────────────────┤
│ src/squidpy/_backends/__init__.pyPublic API: dispatch, settings, get_backend                   │
├────────────────────────────────────────────┼───────────────────────────────────────────────────────────────┤
│ src/squidpy/_backends/_settings.pysettings.backend property + use_backend() context manager     │
├────────────────────────────────────────────┼───────────────────────────────────────────────────────────────┤
│ src/squidpy/_backends/_registry.pyLazy entrypoint discovery, alias resolution, trusted backends │
├────────────────────────────────────────────┼───────────────────────────────────────────────────────────────┤
│ src/squidpy/_backends/_dispatch.py         │ @dispatch decorator, signature/docstring merging              │
├────────────────────────────────────────────┼───────────────────────────────────────────────────────────────┤
│ src/squidpy/testing/backend_conformance.pyConformance test harness for backends                         │
├────────────────────────────────────────────┼───────────────────────────────────────────────────────────────┤
│ tests/test_backends.py17 tests for settings, dispatch, aliasing, fallback, warnings │
└────────────────────────────────────────────┴───────────────────────────────────────────────────────────────┘

Test plan

- pytest tests/test_backends.pyall 17 dispatch/settings tests pass (CPU-only, uses FakeBackend)
- Verify help(sq.gr.spatial_autocorr) shows merged params when a backend is installed
- Verify sq.settings.backend = "nonexistent" raises with helpful error
- Verify sq.settings.backend = "cuda" without rapids-singlecell gives install hint
- Verify untrusted backends work with a warning, not blocked
- Backend conformance tests run in https://github.com/scverse/rapids_singlecell, not here

@Intron7 Intron7 requested review from flying-sheep, selmanozleyen and timtreis and removed request for flying-sheep and selmanozleyen April 7, 2026 10:02
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 7, 2026

Codecov Report

❌ Patch coverage is 71.64557% with 112 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.91%. Comparing base (76ca03f) to head (0aef28b).

Files with missing lines Patch % Lines
src/squidpy/testing/backend_conformance.py 0.00% 50 Missing ⚠️
src/squidpy/_backends/_dispatch.py 81.05% 28 Missing and 15 partials ⚠️
src/squidpy/_backends/_registry.py 77.94% 13 Missing and 2 partials ⚠️
src/squidpy/_backends/_settings.py 90.90% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1151      +/-   ##
==========================================
- Coverage   74.05%   73.91%   -0.15%     
==========================================
  Files          39       43       +4     
  Lines        6495     6889     +394     
  Branches     1122     1196      +74     
==========================================
+ Hits         4810     5092     +282     
- Misses       1230     1323      +93     
- Partials      455      474      +19     
Files with missing lines Coverage Δ
src/squidpy/_docs.py 94.20% <ø> (ø)
src/squidpy/gr/_ligrec.py 77.74% <100.00%> (+0.14%) ⬆️
src/squidpy/gr/_ppatterns.py 79.14% <100.00%> (+0.26%) ⬆️
src/squidpy/_backends/_settings.py 90.90% <90.90%> (ø)
src/squidpy/_backends/_registry.py 77.94% <77.94%> (ø)
src/squidpy/_backends/_dispatch.py 81.05% <81.05%> (ø)
src/squidpy/testing/backend_conformance.py 0.00% <0.00%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

# Trusted but not installed
if canonical in TRUSTED_BACKENDS and get_backend(canonical) is None:
package = TRUSTED_BACKENDS[canonical]["package"]
raise ValueError(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be ImportError

# conformance test suite (squidpy.testing.backend_conformance).
TRUSTED_BACKENDS: dict[str, dict[str, Any]] = {
"rapids_singlecell": {
"aliases": ["rapids-singlecell", "rsc", "cuda", "gpu"],
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why have multiple? maybe just have a error message helper that recognizes them and does “backend 'gpu' does not exist, did you mean 'rapids-singlecell'?”

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn’t @selmanozleyen already build something like this? this looks like it’s similar code, so if the other version is merged, this should be unified with that.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@selmanozleyen approach is similar but less extensible. It's still a draft PR like this one. This approach would immediately open the door for other backends. With 0 updates needed in squidpy.

return wrapper


def _update_signatures() -> None:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This private function is not used in this file but it is exported. Therefore it shouldn't be private. The module _dispatch is already private so it is fine to remove the _ here

def _get_param_sets(
func: Callable,
adapter_method: Callable,
func_name: str,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is unused

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be more clear func_name is not being used here

@selmanozleyen
Copy link
Copy Markdown
Member

compared to #1093 I agree that renaming it to backend rather than device is way better.

However, currently there is a blocker because the backend argument is reserved for the parallelize backend. Which is also one more motivation to get rid of it. To fix it quickly we can basically rename it to parallelize_backend and give warnings until we hit a good enough version and introduce backend again which doesn't sound ideal but it can be ok with sq 2.0

Comment on lines +273 to +275
Called once automatically after backend discovery so that ``help()`` /
IDE tooltips show the full parameter list (CPU + GPU + backend) with
documentation.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about dynamically updating the signature and docstrings. I like the approach with just linking to the dispatched backend better in :#1093

@flying-sheep @ilan-gold wdyt? For example if I compiled the docs with gpu will it compile a mix of rsc and squidpy docs?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would ideally want to expose everything, even if dispatched to another function in the public function signature

_update_signatures()


def _check_trusted(name: str) -> None:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, should be check_trusted

@Zethson
Copy link
Copy Markdown
Member

Zethson commented Apr 8, 2026

I agree that renaming it to backend rather than device is way better.

I think device has the advantage that the actual (CUDA) device could be specified -> like a specific GPU number. This would be weird to pass to backend, right?

But I think the right answer is to do what the rest of the pydata ecosystem does and just follow these patterns. I don't see any reason for us to stick out.

Generally, I hope that a generalized version of this could be implemented into https://github.com/scverse/scverse-misc so that it would eventually be reused across all of our packages trivially.

Edit: I just read on zulip:

My plan is to longterm move this over into a dedicated scverse_backends package that every package can import

Yeah SGTM but maybe scverse-misc could also be the place.

@selmanozleyen
Copy link
Copy Markdown
Member

I think device has the advantage that the actual (CUDA) device could be specified -> like a specific GPU number. This would be weird to pass to backend, right?

But this way we can leave it to the user to either set the context with whatever backend of their choice. If we had jax backend for example we could do it either in cpu or gpu.

effective = local_backend or settings.backend

if effective == "cpu":
return func(*args, **kwargs)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has a subtle bug if we want to generalize. We only bind *args by order and don't ever check if the order and the names match.

For example:

CPU: func(a, b)
GPU: func(b, a)
and both a and b are integers:

func(5, 10, backend="gpu")
becomes:

GPU sees b=5, a=10
If the backend computes something like a / b, you silently get 10 / 5 instead of 5 / 10. No exception, just incorrect output.

or imagine this

if you have
CPU: threshold, n_perms
GPU: n_perms, threshold

these won't error at the level they should. Maybe it will say n_perms should be an integer inside the function but it shouldn't have been dispatchable to begin with.

@selmanozleyen
Copy link
Copy Markdown
Member

@Zethson We should also document the contract here. There are lots of assumptions hidden in this PR if we want to generalize.

For example currently the dispatch is position based for args, see the review I did, if it's not addressed and generalized it will introduce subtle bugs.

To expose these assumptions ideally we should have a dispatchable(f_base, f_backend_impl) that checks if can be dispatchable (not only true false but also with a state: throws warning during runtime). This doesn't have to be written in code but I think it's easier to formulate it this way for me.

For two functions f_base(*base_args, **base_kwargs), f_backend_impl(*backend_args, **backend_kwargs) to be dispatchable:

  • can have base_args that aren't in backend_args which we can call base_only_args. These are dropped:
    • silently if their defaults are equal
    • with warning if their defaults don't match
  • same for base_only_kwargs we have two options.
  • currently we can similarly have backend_only_args but we should think about this more, current behaviour is we can't document these in the base implementation therefore we silently update the signature of the base function for them to be included. This is a big redflag for me. I will write some of my suggestions to fix this below
  • the intersection by arg name of base_args and backend_args should be in same order i.e., (this isn't currently done but I assume it will be fixed)
shared = [name for name in base_args if name in backend_arg_names]
backend_shared = [name for name in backend_args if name in base_arg_names]
# we want
shared == backend_shared

These are the current terms for the contract. But instead of update-the-signature trick we can disallow backend_only_args to exist and only support backend_only_kwargs. Then we can have in the base signature backend_kwargs inf_base(...,backend_kwargs). We can update then documentation of backend_kwargs dynamically that might explain what these kwargs do but we won't be changing our function signatures this way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants