Feature: dimer detector. by SeppeDeWinter · Pull Request #16 · aertslab/TF-MINDI

SeppeDeWinter · 2025-10-07T10:29:47Z

Add functionalities to detect dimers (and potentially multimers)

Example usage

Load pre-processed anndata object.

import logomaker  # type: ignore
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd  # type: ignore
import scanpy as sc  # type: ignore
import seaborn as sns  # type: ignore
from tqdm import tqdm  # type: ignore

import tfmindi as tm  # type: ignore
from tfmindi.tl import distance_bias  # type: ignore

with open("paper/sampled_motifs.txt") as f:
    motif_names = [line.strip() for line in f.readlines()]


motif_collection = tm.load_motif_collection(tm.fetch_motif_collection(), motif_names=motif_names)

motif_annotations = tm.load_motif_annotations(tm.fetch_motif_annotations())

motif_to_db = tm.load_motif_to_dbd(motif_annotations)

adata = tm.io.load_h5ad(
    <PATH_TO_ANNDATA>
)

In this dataset I know there are SOX dimers, however few are detected.

dbd_to_color = {
    dbd: plt.cm.tab20(i)  # type: ignore
    for i, dbd in enumerate(adata.obs["cluster_dbd"].value_counts().index)
}

dbd_to_color[np.nan] = "black"

fig, ax = plt.subplots(figsize=(16, 16))
ax.scatter(
    adata.obsm["X_tsne"][:, 0],
    adata.obsm["X_tsne"][:, 1],
    c=[dbd_to_color[dbd] for dbd in adata.obs["cluster_dbd"]],
    s=5,
)
ax.set_axis_off()
for dbd, color in dbd_to_color.items():
    ax.scatter([], [], color=color, label=dbd)
ax.legend()
fig.tight_layout()
fig.savefig("test_dist_bias_plots/seqlets_tsne_cluster_dbd.png", dpi=300)

Basically the HMG/Sox cluster in the upper right corner contains dimers at this point.

Dimers (or multimers, I call it fixed_distance_bias in general) are detected from patterns.
Note, the Tomtom procedure for generating patterns can take parts of sequences outside the actual called seqlet
(i.e. the patterns often look nice but do not always entirely represent the seqlet as shown in the tSNE). Using the k-mer approach only the seqlets themselves are considered. This is what I will use here. We can consider changing the tomtom approach to also do this.

In this step it is important to not use a subset of seqlets, only the seqlets in the patterns are considered for detecting distance bias.

patterns = tm.tl.create_patterns(adata, method="kmer")

Next we detect the distance bias. This is done by aggregating the contribution score across all seqlets within a pattern (these seqlets are already aligned to each other) and considering a window (20bp in this example) around the seqlet. In this window we will try to detect peaks corresponding to another TFBS that always occurs at a fixed distance relative to the pattern instances.

bias_detectors = {
    k: distance_bias.detect_fixed_distance_bias(adata=adata, pattern=pattern, window=20)
    for k, pattern in patterns.items()
}

for pattern in tqdm(bias_detectors):
    bias_detectors[pattern].detect_distance_bias()
    profile, pattern_location, peaks = bias_detectors[pattern].profile_plot_data
    fig, ax = plt.subplots()
    ax.plot(np.arange(profile.shape[0]), profile, color="black")
    for start_end in pattern_location:
        # plot location of the pattern
        ax.axvline(start_end, color="red")
    for peak_left, peak_right in peaks:
        # plot location of identified peaks
        ax.axvline(peak_left, color="orange")
        ax.axvline(peak_right, color="orange")
    fig.savefig(f"test_dist_bias_plots/profile_pattern_{pattern}.png", dpi=300)

This is an example of a pattern that has another TFBS near it. the location of the pattern instances are indicated by the red line, the called peak(s) are indicated using the orange line.

This is an example of a pattern that has no TFBS near it (at a fixed distance), in this case no peaks are called.

The contribution score per seqlet instance can also be visualised using a heatmap. This plot show the z-score (along each row) of the contribution score.

contribution, pattern_location, peaks = bias_detectors["0"].heatmap_plot_data
print(bias_detectors["0"].up_downstream_window)
# 9, 0 <---the instances of this pattern will be extended by 9 bp upstream in the next code block.

fig, ax = plt.subplots(figsize=(4, 10))
sns.heatmap(contribution, cmap="gray_r", robust=True, ax=ax, yticklabels=False, vmin=0.5)
for start_end in pattern_location:
    # plot location of the pattern
    ax.axvline(start_end, color="red")
for peak_left, peak_right in peaks:
    # plot location of identified peaks
    ax.axvline(peak_left, color="orange")
    ax.axvline(peak_right, color="orange")
fig.savefig("test_dist_bias_plots/heatmap_pattern_0.png", dpi=300)

The called peaks will be used to extend the seqlets. Also we will take care of removing overlapping seqlets after performing the extension (it might be that a dimer was called as two separate seqlets for example, we want only a single seqlet), overlaps are detected using ncls.

The function below will take care of this. An important parameter in this function is the threshold value. This decides for each seqlet instance whether or not another binding site is near it. For this the maximum z-score within the detected peak window (orange lines above) is considered. Note that I put vmin=0.5, this same value will be used as threshold.

I also add an additional 10bp flanks to each called seqlet, this helps with generating nice patterns later-on.

new_seqlet_df, new_seqlet_matrices = distance_bias.create_seqlet_matrices_with_distance_bias(
    adata=adata, fixed_distance_bias_detectors=bias_detectors.values(), threshold=0.5, extra_flanks_to_add=10
)

Now we can generate a new similarity matrix and perform clustering again.

new_similarity_matrix = tm.pp.calculate_motif_similarity(new_seqlet_matrices, motif_collection, chunk_size=10000)

# Create AnnData object for analysis
new_adata = tm.pp.create_seqlet_adata(
    new_similarity_matrix,
    new_seqlet_df,
    seqlet_matrices=new_seqlet_matrices,
    oh_sequences=adata.uns["unique_examples"]["oh"],
    contrib_scores=adata.uns["unique_examples"]["contrib"],
    motif_collection=motif_collection,
    motif_annotations=motif_annotations,
    motif_to_dbd=motif_to_db,
)

sc.tl.pca(
    new_adata
)  # running pca with default params is much faster, will update the code to allow the user to choose the algorithm.
tm.tl.cluster_seqlets(new_adata, resolution=3)

dbd_to_color = {
    dbd: plt.cm.tab20(i)  # type: ignore
    for i, dbd in enumerate(new_adata.obs["cluster_dbd"].value_counts().index)
}

dbd_to_color[np.nan] = "black"

fig, ax = plt.subplots(figsize=(16, 16))
ax.scatter(
    new_adata.obsm["X_tsne"][:, 0],
    new_adata.obsm["X_tsne"][:, 1],
    c=[dbd_to_color[dbd] for dbd in new_adata.obs["cluster_dbd"]],
    s=5,
)
ax.set_axis_off()
for dbd, color in dbd_to_color.items():
    ax.scatter([], [], color=color, label=dbd)
ax.legend()
fig.tight_layout()
fig.savefig("test_dist_bias_plots/seqlets_tsne_cluster_dbd_w_dimer.png", dpi=300)

In this plot I annotated Sox dimers (dark blue) manually by inspecting the patterns. Note that we have many more dimers now.

LukasMahieu · 2025-10-13T13:30:28Z

Nice, these look good. I'm testing on the full tutorial data and seems to work there too.
I'll write a small tutorial with the text from this PR. I'll also add the plots you created here as functions to the package in tfmindi.pl.
I do wonder if the class design is the cleanest approach here though. Maybe a more functional approach with a dataclass to store the results in would make this a bit more readable (e.g. like we have a Pattern and Seqlet we could have a DistanceBias object)?

What do you think? I could make these changes tomorrow.

Something like this

import tfmindi as tm

adata = tm.load_h5ad("seqlets_clustered.h5ad")
patterns = tm.tl.create_patterns(adata, method="kmer")

# detect distance bias
bias_results = {}
for k, pattern in patterns.items():
    bias_results[k] = tm.tl.detect_distance_bias(
        adata=adata,
        pattern=pattern,
        window=20,
        height=0.25
    ) # returns DistanceBias objects

# add plotting functions
for pattern_id, result in bias_results.items():
    if result.has_bias:
        # Profile plot
        fig1 = tm.pl.distance_bias_profile(result, title=f"Pattern {pattern_id}")

        # Heatmap plot  
        fig2 = tm.pl.distance_bias_heatmap(result, figsize=(4, 10))

# extend seqlets
results_with_bias = [r for r in bias_results.values() if r.has_bias]
new_seqlets_df, new_seqlet_matrices = tm.tl.extend_seqlets_with_bias(
    adata=adata,
    bias_results=results_with_bias,
    threshold=0.5,
    extra_flanks=10
)

SeppeDeWinter · 2025-10-13T13:35:21Z

Yep makes sense too, I guess that fits better with the reset of the package indeed.

LukasMahieu · 2025-10-21T12:24:55Z

@SeppeDeWinter did the above mentioned mini-refactor and wrote the tutorial (see docs/notebooks for the tutorial).
Also, I added one function pl.pattern_logo to plot the logo of a single pattern since I needed this for the tutorial.

One thing before this can be merged: at the end of that tutorial I perform clustering as well as pattern detection again and plot the results. I would have hoped that the extended seqlets would still be detected as a SOX motif, and that the full SOX dimer would now show up in the pattern logo plogo. However, this doesn't seem to be the case: we still get a separate SOX monomer showing up in the plot and no dimer. Any idea why this is the case? Is this because I did not perform manual annotation and there is no SOX-dimer in the motif collection, so our new sox-dimer motifs get assigned to an "incorrect" cluster?

…notation Allow for patterns to be generated by any annotation in .obs, not only leiden.

…F-MInDi-v0.6.0 Update template to v0.6.0

LukasMahieu · 2025-10-23T09:32:28Z

@SeppeDeWinter I tried a bunch of different things but I'm still stuck here, so would be good if you can take another crack at it when you have time (see the tutorial notebook for what goes wrong).
I went through the code and to me everything seems to be correct (the correct seqlets get extended in the correct direction).
Yet, after calculating motif similarities and re-clustering, I'm back to the beginning. I tried different clustering resolutions and extra flanks when extending seqlets, yet that doesn't seem to help in finding the dimer...

SeppeDeWinter · 2025-10-23T12:22:55Z

@LukasMahieu I think your threshold value was too stringent.

Running, using vmin=1 (the default threshold)

tm.pl.distance_bias_heatmap(result, title=f"Pattern {pattern_id} - {patterns[pattern_id].dbd}", x_label_rotation=90, vmin = 1)

there is very low signal in the "orange peak" .

this is with vmin=0 instead

Running extend_biased_seqlets with threshold set to 0 does produce dimer motifs.
I have modified the code in the notebook to only return the seqlets that get modified (for easier debugging). I will send you this notebook on Slack.

These are the detected SOX motifs

--> many SOX dimer.

That being said, we might consider not thresholding at all and just extending all seqlet instances for the clusters for which we detect the distance bias?

I have not tried reclustering all the seqlets, but I suppose that will also work (will try now)

filter legend based on colors in adata.obs[color_by]

- concatentation of TF-MInDi anndata objects while preserving adata.var and adata.uns["unique_examples"] - Has option `idx_match` so user can specify whether index columns in adata.obs ("example_oh_idx", "example_contrib_idx", "example_idx") refer to the same data across adatas or not Related to issue: Better user experience for concatenating multiple TF-MInDi objects. Fixes #22

Update tfmindi description and add overview figure

- concatentation of TF-MInDi anndata objects while preserving adata.var and adata.uns["unique_examples"] - Has option `idx_match` so user can specify whether index columns in adata.obs ("example_oh_idx", "example_contrib_idx", "example_idx") refer to the same data across adatas or not Related to issue: Better user experience for concatenating multiple TF-MInDi objects. Fixes #22

….10.2

Replace MIT License with Academic Non-commercial License

…qlet embedding.

change logo paths and add logo to readthedocs

change heights logos and fix build

pin sphinx to <9

1.2.0 release

- concatentation of TF-MInDi anndata objects while preserving adata.var and adata.uns["unique_examples"] - Has option `idx_match` so user can specify whether index columns in adata.obs ("example_oh_idx", "example_contrib_idx", "example_idx") refer to the same data across adatas or not Related to issue: Better user experience for concatenating multiple TF-MInDi objects. Fixes #22

…mer_detector

scverse-bot and others added 2 commits September 30, 2025 06:24

Automated template update to v0.6.0

ccc407a

Feature: dimer detector.

95af42b

SeppeDeWinter requested a review from LukasMahieu October 7, 2025 10:29

Fix docstrings.

899d20c

SeppeDeWinter and others added 5 commits October 14, 2025 15:35

Add by parameter to create_patterns.

7f4a600

refactor distance bias detection and start write tutorial

cc2b5ad

refactored bias detection and notebooks + single pattern plotting

a5fdc85

Merge branch 'main' into dimer_detector

a1668ec

ruff fix

bb8bdce

LukasMahieu added 4 commits October 23, 2025 10:05

Merge pull request #20 from aertslab/feature-create_pattern_by_any_an…

7782099

…notation Allow for patterns to be generated by any annotation in .obs, not only leiden.

Merge branch 'main' into template-update-v2-aertslab-TF-MInDi-v0.6.0

3115ddc

Merge pull request #15 from scverse-bot/template-update-v2-aertslab-T…

be1dee3

…F-MInDi-v0.6.0 Update template to v0.6.0

temp tutorial

1b6a016

LukasMahieu and others added 6 commits October 24, 2025 11:06

filter legend based on colors in adata.obs[color_by]

aaeb60b

Merge pull request #25 from aertslab/bugfix_color_legend

884423e

filter legend based on colors in adata.obs[color_by]

add missing docstring.

a845b26

Add plot for setting z-score threshold for distance bias detection.

a0b8b9b

:-(

b77487c

SeppeDeWinter mentioned this pull request Oct 29, 2025

Feature: Concat TF-MInDi AnnData #26

Merged

LukasMahieu and others added 5 commits November 10, 2025 10:17

Add files via upload

9174ff6

Fix spelling of TF-MInDi to TF-MINDI in README

b041af3

Merge pull request #27 from aertslab/update-description

18cdf27

Update tfmindi description and add overview figure

Add merge functionality to docs.

7553f72

SeppeDeWinter and others added 30 commits November 19, 2025 19:14

Use binomial test to annotate seqlets.

75d7f55

Fix doc.

9166cb7

update notebooks with new assignments

7ba8297

Add MAFFT-based backend for pattern creation

1b2ab4a

Add iTaxoTools-mafftpy dependency and set python minimal version to 3…

4945a04

….10.2

Some formatting and type annotation

301d6d8

Replace MIT License with Academic Non-commercial License

da3b845

Merge pull request #32 from aertslab/patch-license

2597636

Replace MIT License with Academic Non-commercial License

Change overview figure: Remove TomTom and SCENIC+ and repalce with se…

8cce2f4

…qlet embedding.

Add logo and citation.

0311b73

Change to absolute paths.

4eb8e10

change logo paths and add logo to readthedocs

0863ded

Merge pull request #36 from aertslab/add_logo

29f153c

change logo paths and add logo to readthedocs

change heights

9b4ca07

upgrade sphinx tabs

8a5d250

Merge pull request #37 from aertslab/add_logo

3d134d8

change heights logos and fix build

pin sphinx to <9

94ee7f7

Merge pull request #38 from aertslab/add_logo

c6cd6bf

pin sphinx to <9

1.2.0 release

dd2a293

Merge pull request #39 from aertslab/1.2.0

3b200fe

1.2.0 release

Feature: dimer detector.

139e1f5

Fix docstrings.

7edf043

refactor distance bias detection and start write tutorial

820d6bc

refactored bias detection and notebooks + single pattern plotting

0d64e0d

temp tutorial

8adef10

Add plot for setting z-score threshold for distance bias detection.

314aed3

:-(

a5c61c2

Merge branch 'dimer_detector' of github.com:aertslab/TF-MInDi into di…

fca83f5

…mer_detector

Fix bug where wrong index column was used in adata.obs

b477fa1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: dimer detector.#16

Feature: dimer detector.#16
SeppeDeWinter wants to merge 53 commits intomainfrom
dimer_detector

SeppeDeWinter commented Oct 7, 2025

Uh oh!

LukasMahieu commented Oct 13, 2025

Uh oh!

SeppeDeWinter commented Oct 13, 2025

Uh oh!

LukasMahieu commented Oct 21, 2025

Uh oh!

LukasMahieu commented Oct 23, 2025

Uh oh!

SeppeDeWinter commented Oct 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

SeppeDeWinter commented Oct 7, 2025

Add functionalities to detect dimers (and potentially multimers)

Example usage

Uh oh!

LukasMahieu commented Oct 13, 2025

Uh oh!

SeppeDeWinter commented Oct 13, 2025

Uh oh!

LukasMahieu commented Oct 21, 2025

Uh oh!

LukasMahieu commented Oct 23, 2025

Uh oh!

SeppeDeWinter commented Oct 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants