diff --git a/vignettes/AUCell.Rmd b/vignettes/AUCell.Rmd index 2bca6db..fcf56d0 100755 --- a/vignettes/AUCell.Rmd +++ b/vignettes/AUCell.Rmd @@ -332,7 +332,7 @@ We can use this propperty to explore the population of cells that are present in The AUC estimates the proportion of genes in the gene-set that are highly expressed in each cell. Cells expressing many genes from the gene-set will have higher AUC values than cells expressing fewer (i.e. compensating for housekeeping genes, or genes that are highly expressed in all the cells in the dataset). Since the AUC represents the proportion of expressed genes in the signature, we can use the relative AUCs across the cells to explore the population of cells that are present in the dataset according to the expression of the gene-set. -However, determining whether the signature is active (or not) in a given cell is not always trivial. The AUC is not an absolute value, but it depends on the the cell type (i.e. sell size, amount of transcripts), the specific dataset (i.e. sensitivity of the measures) and the gene-set. It is often not straight forward to obtain a pruned *signature* of clear *marker* genes that are completely "on" in the cell type of interest and off" in every other cell. In addition, at single-cell level, most genes are not expressed or detected at a constant level. +However, determining whether the signature is active (or not) in a given cell is not always trivial. The AUC is not an absolute value, but it depends on the cell type (i.e. cell size, amount of transcripts), the specific dataset (i.e. sensitivity of the measures) and the gene-set. It is often not straight forward to obtain a pruned *signature* of clear *marker* genes that are completely "on" in the cell type of interest and "off" in every other cell. In addition, at single-cell level, most genes are not expressed or detected at a constant level. The ideal situation will be a bi-modal distribution, in which most cells in the dataset have a low "AUC" compared to a population of cells with a clearly higher value (i.e. see "Oligodendrocites" in the next figure). This is normally the case on gene sets that are active mostly in a population of cells with a good representation in the dataset (e.g. ~ 5-30% of cells in the dataset). Similar cases of "marker" gene sets but with different proportions of cells in the datasets are the "neurons" and "microglia" (see figure). When there are very few cells within the dataset, the distribution might look normal-like, but with some outliers to the higher end (e.g. microglia). While if the gene set is marker of a high percentage of cells in the dataset (i.e. neurons), the distribution might start approaching the look of a gene-set of housekeeping genes. As example, the 'housekeeping' gene-set in the figure includes genes that are detected in most cells in the dataset.