Add VAE-based dimensionality reduction to cluster_seqlets by DadaAb · Pull Request #45 · aertslab/TF-MINDI

DadaAb · 2026-04-17T09:21:25Z

Summary

Adds a non-linear dimensionality reduction option (reduction="vae") to cluster_seqlets as an alternative to PCA. Everything downstream (neighbour graph, t-SNE, Leiden, DBD annotation) is unchanged.

Motivation

PCA is linear and may not capture the full structure of seqlet contribution score matrices, particularly when patterns lie on a non-linear manifold. A β-VAE learns a compact latent representation that can better separate motif families before Leiden clustering.

Changes

src/tfmindi/tl/vae.py (new file)

Implements fit_vae_latents(X, ...) — trains a β-VAE on the seqlet similarity matrix and returns the posterior means as a (n_seqlets, latent_dim) float32 array
MLP encoder/decoder with ReLU + Dropout; reparameterisation trick for sampling
KL summed over latent dim then averaged over batch, so beta is scale-independent of latent_dim
logvar clamped to [-4, 15] to prevent overflow early in training
Both numpy and torch.manual_seed seeded for full reproducibility
AMP + GradScaler enabled automatically on CUDA, silently disabled on CPU
Lazy torch import — no hard dependency unless reduction="vae" is actually used

src/tfmindi/tl/cluster.py

Added reduction: str = "pca" and vae_kwargs: dict | None = None parameters (keyword-only, fully backwards compatible)
VAE branch stores embedding in adata.obsm["X_vae"] and passes it as use_rep to neighbours and t-SNE, consistent with how X_pca is handled
recompute=False skips VAE training if X_vae already present, same caching behaviour as PCA
Device selection respects the existing _using_gpu flag: device="cuda" when the GPU backend is active, "auto" otherwise

pyproject.toml

Added [vae] optional dependency group: torch>=2.0, installable via pip install tfmindi[vae]

Tested

Run on a combined human + mouse dataset of 1,149,067 seqlets × 17,995 features on CPU:

Training VAE on cpu (latent_dim=10, epochs=15, beta=0.01)...
  epoch   1/15  loss=0.9186  recon=0.9040  kl=1.4568
  epoch   5/15  loss=0.4835  recon=0.3945  kl=8.9024
  epoch  10/15  loss=0.4511  recon=0.3585  kl=9.2634
  epoch  15/15  loss=0.4377  recon=0.3428  kl=9.4936
VAE embedding complete. Shape: (1149067, 10)
Computing neighborhood graph (use_rep='X_vae')...  [1min 58s]
Computing t-SNE embedding (use_rep='X_vae')...     [1h 12min]

For reference, PCA on the same dataset took 1h 9min (the VAE was run for only 15 epochs here as a proof of concept; 50 is the recommended default). GPU execution is implemented and should work but has not been tested.

Usage

# Default behaviour unchanged
tm.tl.cluster_seqlets(adata, resolution=3.0)

# VAE alternative
tm.tl.cluster_seqlets(adata, resolution=3.0, reduction="vae")

# With custom settings
tm.tl.cluster_seqlets(
    adata,
    resolution=3.0,
    reduction="vae",
    vae_kwargs=dict(latent_dim=10, epochs=50, beta=0.1),
)

# Re-cluster at a different resolution — VAE training is skipped (X_vae cached)
tm.tl.cluster_seqlets(adata, resolution=5.0, reduction="vae")

Known limitations / future work

GPU path is implemented but untested — feedback welcome
VAE hyperparameters (latent_dim, beta, epochs) currently require manual tuning; automated hyperparameter search could be added in a follow-up
No quantitative comparison of cluster quality between PCA and VAE is included yet

Add VAE-based dimensionality reduction to cluster_seqlets

5954a15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add VAE-based dimensionality reduction to cluster_seqlets#45

Add VAE-based dimensionality reduction to cluster_seqlets#45
DadaAb wants to merge 1 commit intomainfrom
dim_reduction_with_vae

DadaAb commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

DadaAb commented Apr 17, 2026

Summary

Motivation

Changes

Tested

Usage

Known limitations / future work

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant