Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion vignettes/AUCell.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -332,7 +332,7 @@ We can use this propperty to explore the population of cells that are present in

The AUC estimates the proportion of genes in the gene-set that are highly expressed in each cell. Cells expressing many genes from the gene-set will have higher AUC values than cells expressing fewer (i.e. compensating for housekeeping genes, or genes that are highly expressed in all the cells in the dataset). Since the AUC represents the proportion of expressed genes in the signature, we can use the relative AUCs across the cells to explore the population of cells that are present in the dataset according to the expression of the gene-set.

However, determining whether the signature is active (or not) in a given cell is not always trivial. The AUC is not an absolute value, but it depends on the the cell type (i.e. sell size, amount of transcripts), the specific dataset (i.e. sensitivity of the measures) and the gene-set. It is often not straight forward to obtain a pruned *signature* of clear *marker* genes that are completely "on" in the cell type of interest and off" in every other cell. In addition, at single-cell level, most genes are not expressed or detected at a constant level.
However, determining whether the signature is active (or not) in a given cell is not always trivial. The AUC is not an absolute value, but it depends on the cell type (i.e. cell size, amount of transcripts), the specific dataset (i.e. sensitivity of the measures) and the gene-set. It is often not straight forward to obtain a pruned *signature* of clear *marker* genes that are completely "on" in the cell type of interest and "off" in every other cell. In addition, at single-cell level, most genes are not expressed or detected at a constant level.

The ideal situation will be a bi-modal distribution, in which most cells in the dataset have a low "AUC" compared to a population of cells with a clearly higher value (i.e. see "Oligodendrocites" in the next figure). This is normally the case on gene sets that are active mostly in a population of cells with a good representation in the dataset (e.g. ~ 5-30% of cells in the dataset). Similar cases of "marker" gene sets but with different proportions of cells in the datasets are the "neurons" and "microglia" (see figure). When there are very few cells within the dataset, the distribution might look normal-like, but with some outliers to the higher end (e.g. microglia). While if the gene set is marker of a high percentage of cells in the dataset (i.e. neurons), the distribution might start approaching the look of a gene-set of housekeeping genes. As example, the 'housekeeping' gene-set in the figure includes genes that are detected in most cells in the dataset.

Expand Down