Conversation
|
Nice, these look good. I'm testing on the full tutorial data and seems to work there too. What do you think? I could make these changes tomorrow. Something like this import tfmindi as tm
adata = tm.load_h5ad("seqlets_clustered.h5ad")
patterns = tm.tl.create_patterns(adata, method="kmer")
# detect distance bias
bias_results = {}
for k, pattern in patterns.items():
bias_results[k] = tm.tl.detect_distance_bias(
adata=adata,
pattern=pattern,
window=20,
height=0.25
) # returns DistanceBias objects
# add plotting functions
for pattern_id, result in bias_results.items():
if result.has_bias:
# Profile plot
fig1 = tm.pl.distance_bias_profile(result, title=f"Pattern {pattern_id}")
# Heatmap plot
fig2 = tm.pl.distance_bias_heatmap(result, figsize=(4, 10))
# extend seqlets
results_with_bias = [r for r in bias_results.values() if r.has_bias]
new_seqlets_df, new_seqlet_matrices = tm.tl.extend_seqlets_with_bias(
adata=adata,
bias_results=results_with_bias,
threshold=0.5,
extra_flanks=10
) |
|
Yep makes sense too, I guess that fits better with the reset of the package indeed. |
|
@SeppeDeWinter did the above mentioned mini-refactor and wrote the tutorial (see docs/notebooks for the tutorial). One thing before this can be merged: at the end of that tutorial I perform clustering as well as pattern detection again and plot the results. I would have hoped that the extended seqlets would still be detected as a SOX motif, and that the full SOX dimer would now show up in the pattern logo plogo. However, this doesn't seem to be the case: we still get a separate SOX monomer showing up in the plot and no dimer. Any idea why this is the case? Is this because I did not perform manual annotation and there is no SOX-dimer in the motif collection, so our new sox-dimer motifs get assigned to an "incorrect" cluster? |
…notation Allow for patterns to be generated by any annotation in .obs, not only leiden.
…F-MInDi-v0.6.0 Update template to v0.6.0
|
@SeppeDeWinter I tried a bunch of different things but I'm still stuck here, so would be good if you can take another crack at it when you have time (see the tutorial notebook for what goes wrong). |
|
@LukasMahieu I think your threshold value was too stringent. Running, using tm.pl.distance_bias_heatmap(result, title=f"Pattern {pattern_id} - {patterns[pattern_id].dbd}", x_label_rotation=90, vmin = 1)
there is very low signal in the "orange peak" . this is with
Running These are the detected SOX motifs
--> many SOX dimer. That being said, we might consider not thresholding at all and just extending all seqlet instances for the clusters for which we detect the distance bias? I have not tried reclustering all the seqlets, but I suppose that will also work (will try now) |
filter legend based on colors in adata.obs[color_by]
- concatentation of TF-MInDi anndata objects while preserving adata.var and adata.uns["unique_examples"]
- Has option `idx_match` so user can specify whether index columns in adata.obs ("example_oh_idx", "example_contrib_idx", "example_idx") refer to the same data across adatas or not
Related to issue:
Better user experience for concatenating multiple TF-MInDi objects.
Fixes #22
Update tfmindi description and add overview figure
- concatentation of TF-MInDi anndata objects while preserving adata.var and adata.uns["unique_examples"]
- Has option `idx_match` so user can specify whether index columns in adata.obs ("example_oh_idx", "example_contrib_idx", "example_idx") refer to the same data across adatas or not
Related to issue:
Better user experience for concatenating multiple TF-MInDi objects.
Fixes #22
Replace MIT License with Academic Non-commercial License
change logo paths and add logo to readthedocs
change heights logos and fix build
pin sphinx to <9
1.2.0 release
- concatentation of TF-MInDi anndata objects while preserving adata.var and adata.uns["unique_examples"]
- Has option `idx_match` so user can specify whether index columns in adata.obs ("example_oh_idx", "example_contrib_idx", "example_idx") refer to the same data across adatas or not
Related to issue:
Better user experience for concatenating multiple TF-MInDi objects.
Fixes #22



Add functionalities to detect dimers (and potentially multimers)
Example usage
Load pre-processed anndata object.
In this dataset I know there are SOX dimers, however few are detected.
Basically the HMG/Sox cluster in the upper right corner contains dimers at this point.
Dimers (or multimers, I call it
fixed_distance_biasin general) are detected from patterns.Note, the Tomtom procedure for generating patterns can take parts of sequences outside the actual called seqlet
(i.e. the patterns often look nice but do not always entirely represent the seqlet as shown in the tSNE). Using the k-mer approach only the seqlets themselves are considered. This is what I will use here. We can consider changing the tomtom approach to also do this.
In this step it is important to not use a subset of seqlets, only the seqlets in the patterns are considered for detecting distance bias.
Next we detect the distance bias. This is done by aggregating the contribution score across all seqlets within a pattern (these seqlets are already aligned to each other) and considering a window (20bp in this example) around the seqlet. In this window we will try to detect peaks corresponding to another TFBS that always occurs at a fixed distance relative to the pattern instances.
This is an example of a pattern that has another TFBS near it. the location of the pattern instances are indicated by the red line, the called peak(s) are indicated using the orange line.
This is an example of a pattern that has no TFBS near it (at a fixed distance), in this case no peaks are called.
The contribution score per seqlet instance can also be visualised using a heatmap. This plot show the z-score (along each row) of the contribution score.
The called peaks will be used to extend the seqlets. Also we will take care of removing overlapping seqlets after performing the extension (it might be that a dimer was called as two separate seqlets for example, we want only a single seqlet), overlaps are detected using ncls.
The function below will take care of this. An important parameter in this function is the threshold value. This decides for each seqlet instance whether or not another binding site is near it. For this the maximum z-score within the detected peak window (orange lines above) is considered. Note that I put
vmin=0.5, this same value will be used as threshold.I also add an additional 10bp flanks to each called seqlet, this helps with generating nice patterns later-on.
Now we can generate a new similarity matrix and perform clustering again.
In this plot I annotated Sox dimers (dark blue) manually by inspecting the patterns. Note that we have many more dimers now.