- scvi.data.poisson_gene_selection(adata, layer=None, n_top_genes=4000, use_gpu=None, accelerator='auto', device='auto', subset=False, inplace=True, n_samples=10000, batch_key=None, silent=False, minibatch_size=5000)#
Rank and select genes based on the enrichment of zero counts.
Enrichment is considered by comparing data to a Poisson count model. This is based on M3Drop: tallulandrews/M3Drop The method accounts for library size internally, a raw count matrix should be provided.
Instead of Z-test, enrichment of zeros is quantified by posterior probabilites from a binomial model, computed through sampling.
adata – AnnData object (with sparse X matrix).
4000)) – How many variable genes to select.
None)) – Use default GPU if available (if True), or index of GPU to use (if int), or name of GPU (if str, e.g., ‘cuda:0’), or use CPU (if False). Passing in use_gpu!=None will override accelerator and devices arguments. This argument is deprecated in v1.0 and will be removed in v1.1. Please use accelerator and devices instead.
'auto')) – Supports passing different accelerator types (“cpu”, “gpu”, “tpu”, “ipu”, “hpu”, “mps, “auto”) as well as custom accelerator instances.
'auto')) – The device to use. Can be set to a non-negative index (int or str) or “auto” for automatic selection based on the chosen accelerator. If set to “auto” and accelerator is not determined to be “cpu”, then device will be set to the first available device.
False)) – Inplace subset to highly-variable genes if True otherwise merely indicate highly variable genes.
True)) – Whether to place calculated metrics in .var or return them.
10000)) – The number of Binomial samples to use to estimate posterior probability of enrichment of zeros for each gene.
False)) – If
True, disables the progress bar.
5000)) – Size of temporary matrix for incremental calculation. Larger is faster but requires more RAM or GPU memory. (The default should be fine unless there are hundreds of millions cells or millions of genes.)
- Return type
Depending on inplace returns calculated metrics (
DataFrame) or updates .var with the following fields
- -highly_variable (
boolean indicator of highly-variable genes
fraction of observed zeros per gene
expected fraction of observed zeros per gene
- -prob_zero_enrichment (
Probability of zero enrichment, median across batches in the case of multiple batches
- -prob_zero_enrichment_rank (
Rank of the gene according to probability of zero enrichment, median rank in the case of multiple batches
- -prob_zero_enriched_nbatches (
If batch_key is given, this denotes in how many batches genes are detected as zero enriched
- -highly_variable (