scvi.model.base.DifferentialComputation#
- class scvi.model.base.DifferentialComputation(model_fn, adata_manager)[source]#
Unified class for differential computation.
This class takes a function from a model like SCVI or TOTALVI and takes outputs from this function with respect to the adata input and computed Bayes factors as described in [Lopez18], [Xu21], or [Boyeau19].
- Parameters
- model_fn
Function in model API to get values from.
- adata_manager
AnnDataManager created by
setup_anndata()
.
Methods table#
|
A unified method for differential expression inference. |
|
Samples the posterior scale using the variational posterior distribution. |
Methods#
get_bayes_factors#
- DifferentialComputation.get_bayes_factors(idx1, idx2, mode='vanilla', batchid1=None, batchid2=None, use_observed_batches=False, n_samples=5000, use_permutation=False, m_permutation=10000, change_fn=None, m1_domain_fn=None, delta=0.5, pseudocounts=0.0, cred_interval_lvls=None)[source]#
A unified method for differential expression inference.
Two modes coexist:
In this case, we perform hypothesis testing based on the hypotheses
\[M_1: h_1 > h_2 ~\text{and}~ M_2: h_1 \leq h_2.\]DE can then be based on the study of the Bayes factors
\[\log p(M_1 | x_1, x_2) / p(M_2 | x_1, x_2).\]the “change” mode (described in [Boyeau19])
This mode consists of estimating an effect size random variable (e.g., log fold-change) and performing Bayesian hypothesis testing on this variable. The change_fn function computes the effect size variable \(r\) based on two inputs corresponding to the posterior quantities (e.g., normalized expression) in both populations.
Hypotheses:
\[M_1: r \in R_1 ~\text{(effect size r in region inducing differential expression)}\]\[M_2: r \notin R_1 ~\text{(no differential expression)}\]To characterize the region \(R_1\), which induces DE, the user has two choices.
A common case is when the region \([-\delta, \delta]\) does not induce differential expression. If the user specifies a threshold delta, we suppose that \(R_1 = \mathbb{R} \setminus [-\delta, \delta]\)
Specify an specific indicator function:
\[f: \mathbb{R} \mapsto \{0, 1\} ~\text{s.t.}~ r \in R_1 ~\text{iff.}~ f(r) = 1.\]Decision-making can then be based on the estimates of
\[p(M_1 \mid x_1, x_2).\]Both modes require to sample the posterior distributions. To that purpose, we sample the posterior in the following way:
The posterior is sampled n_samples times for each subpopulation.
For computational efficiency (posterior sampling is quite expensive), instead of comparing the obtained samples element-wise, we can permute posterior samples. Remember that computing the Bayes Factor requires sampling \(q(z_A \mid x_A)\) and \(q(z_B \mid x_B)\).
Currently, the code covers several batch handling configurations:
If
use_observed_batches=True
, then batch are considered as observations and cells’ normalized means are conditioned on real batch observations.If case (cell group 1) and control (cell group 2) are conditioned on the same batch ids. This requires
set(batchid1) == set(batchid2)
orbatchid1 == batchid2 === None
.If case and control are conditioned on different batch ids that do not intersect i.e.,
set(batchid1) != set(batchid2)
andlen(set(batchid1).intersection(set(batchid2))) == 0
.
This function does not cover other cases yet and will warn users in such cases.
- Parameters
- mode : {‘vanilla’, ‘change’}
Literal
[‘vanilla’, ‘change’] (default:'vanilla'
) one of [“vanilla”, “change”]
- idx1 :
List
[bool
] |ndarray
Union
[List
[bool
],ndarray
] bool array masking subpopulation cells 1. Should be True where cell is from associated population
- idx2 :
List
[bool
] |ndarray
Union
[List
[bool
],ndarray
] bool array masking subpopulation cells 2. Should be True where cell is from associated population
- batchid1 :
Sequence
[Union
[int
,float
,str
]] |None
Optional
[Sequence
[Union
[int
,float
,str
]]] (default:None
) List of batch ids for which you want to perform DE Analysis for subpopulation 1. By default, all ids are taken into account
- batchid2 :
Sequence
[Union
[int
,float
,str
]] |None
Optional
[Sequence
[Union
[int
,float
,str
]]] (default:None
) List of batch ids for which you want to perform DE Analysis for subpopulation 2. By default, all ids are taken into account
- use_observed_batches :
bool
|None
Optional
[bool
] (default:False
) Whether posterior values are conditioned on observed batches
- n_samples :
int
(default:5000
) Number of posterior samples
- use_permutation :
bool
(default:False
) Activates step 2 described above. Simply formulated, pairs obtained from posterior sampling will be randomly permuted so that the number of pairs used to compute Bayes Factors becomes m_permutation.
- m_permutation :
int
(default:10000
) Number of times we will “mix” posterior samples in step 2. Only makes sense when use_permutation=True
- change_fn :
str
|Callable
|None
Union
[str
,Callable
,None
] (default:None
) function computing effect size based on both posterior values
- m1_domain_fn :
Callable
|None
Optional
[Callable
] (default:None
) custom indicator function of effect size regions inducing differential expression
- delta :
float
|None
Optional
[float
] (default:0.5
) specific case of region inducing differential expression. In this case, we suppose that \(R \setminus [-\delta, \delta]\) does not induce differential expression (LFC case). If the provided value is None, then a proper threshold is determined from the distribution of LFCs accross genes.
- pseudocounts :
float
|None
Optional
[float
] (default:0.0
) pseudocount offset used for the mode change. When None, observations from non-expressed genes are used to estimate its value.
- cred_interval_lvls :
List
[float
] |ndarray
|None
Union
[List
[float
],ndarray
,None
] (default:None
) List of credible interval levels to compute for the posterior LFC distribution
- mode : {‘vanilla’, ‘change’}
- Return type
- Returns
Differential expression properties
scale_sampler#
- DifferentialComputation.scale_sampler(selection, n_samples=5000, n_samples_per_cell=None, batchid=None, use_observed_batches=False, give_mean=False)[source]#
Samples the posterior scale using the variational posterior distribution.
- Parameters
- selection :
List
[bool
] |ndarray
Union
[List
[bool
],ndarray
] Mask or list of cell ids to select
- n_samples :
int
|None
Optional
[int
] (default:5000
) Number of samples in total per batch (fill either n_samples_total or n_samples_per_cell)
- n_samples_per_cell :
int
|None
Optional
[int
] (default:None
) Number of time we sample from each observation per batch (fill either n_samples_total or n_samples_per_cell)
- batchid :
Sequence
[Union
[int
,float
,str
]] |None
Optional
[Sequence
[Union
[int
,float
,str
]]] (default:None
) Biological batch for which to sample from. Default (None) sample from all batches
- use_observed_batches :
bool
|None
Optional
[bool
] (default:False
) Whether normalized means are conditioned on observed batches or if observed batches are to be used
- give_mean :
bool
|None
Optional
[bool
] (default:False
) Return mean of values
- selection :
- Return type
- Returns
type Dictionary containing: scale Posterior aggregated scale samples of shape (n_samples, n_vars) where n_samples correspond to either: - n_bio_batches * n_cells * n_samples_per_cell or - n_samples_total batch associated batch ids