Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/tutorials/Clustering.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -286,7 +286,7 @@
"## Understanding key parameters\n",
"\n",
"- Determining an appropriate threshold for cutoff\n",
" - Butina uses distances (which is 1 - distance) and the cutoff is dependent on the distance metric used. As mentioned earlier, Datamol uses Tanimoto with ECFP fingerprint. Therefore the distance cutoff is 1 - Tanimoto.\n",
" - Butina uses distances (which is 1 - similarity) and the cutoff is dependent on the distance metric used. As mentioned earlier, Datamol uses Tanimoto with ECFP fingerprint. Therefore the distance cutoff is 1 - Tanimoto.\n",
" - Generally speaking, if you have a very small distance cutoff, compounds must be extremely similar (i.e. high Tanimoto score) in order to be grouped into one cluster. Therefore, with a small distance cutoff, you’ll get more clusters with fewer compounds per cluster. Vice versa is true.\n",
"\n",
"**Note:** This is an extremely general overview, in reality, the output greatly depends on both the size and diversity of the dataset being used. There is no “default” cutoff that is set in Datamol and instead, each user should set cutoffs according to their specific dataset and use case. \n",
Expand Down