scvi.module.TOTALVAE#
- class scvi.module.TOTALVAE(n_input_genes, n_input_proteins, n_batch=0, n_labels=0, n_hidden=256, n_latent=20, n_layers_encoder=2, n_layers_decoder=1, n_continuous_cov=0, n_cats_per_cov=None, dropout_rate_decoder=0.2, dropout_rate_encoder=0.2, gene_dispersion='gene', protein_dispersion='protein', log_variational=True, gene_likelihood='nb', latent_distribution='normal', protein_batch_mask=None, encode_covariates=True, protein_background_prior_mean=None, protein_background_prior_scale=None, use_size_factor_key=False, use_observed_lib_size=True, library_log_means=None, library_log_vars=None, use_batch_norm='both', use_layer_norm='none')[source]#
Bases:
BaseModuleClass
Total variational inference for CITE-seq data.
Implements the totalVI model of [Gayoso et al., 2021].
- Parameters:
n_input_genes (int) – Number of input genes
n_input_proteins (int) – Number of input proteins
n_batch (int) – Number of batches
n_labels (int) – Number of labels
n_hidden (Tunable_[int]) – Number of nodes per hidden layer for encoder and decoder
n_latent (Tunable_[int]) – Dimensionality of the latent space
n_layers – Number of hidden layers used for encoder and decoder NNs
n_continuous_cov (int) – Number of continuous covarites
n_cats_per_cov (Optional[Iterable[int]]) – Number of categories for each extra categorical covariate
dropout_rate – Dropout rate for neural networks
gene_dispersion (Tunable_[Literal['gene', 'gene-batch', 'gene-label']]) –
One of the following
'gene'
- genes_dispersion parameter of NB is constant per gene across cells'gene-batch'
- genes_dispersion can differ between different batches'gene-label'
- genes_dispersion can differ between different labels
protein_dispersion (Tunable_[Literal['protein', 'protein-batch', 'protein-label']]) –
One of the following
'protein'
- protein_dispersion parameter is constant per protein across cells'protein-batch'
- protein_dispersion can differ between different batches NOT TESTED'protein-label'
- protein_dispersion can differ between different labels NOT TESTED
log_variational (bool) – Log(data+1) prior to encoding for numerical stability. Not normalization.
gene_likelihood (Tunable_[Literal['zinb', 'nb']]) –
One of
'nb'
- Negative binomial distribution'zinb'
- Zero-inflated negative binomial distribution
latent_distribution (Tunable_[Literal['normal', 'ln']]) –
One of
'normal'
- Isotropic normal'ln'
- Logistic normal with normal params N(0, 1)
protein_batch_mask (Dict[Union[str, int], ndarray]) – Dictionary where each key is a batch code, and value is for each protein, whether it was observed or not.
encode_covariates (bool) – Whether to concatenate covariates to expression in encoder
protein_background_prior_mean (Optional[ndarray]) – Array of proteins by batches, the prior initialization for the protein background mean (log scale)
protein_background_prior_scale (Optional[ndarray]) – Array of proteins by batches, the prior initialization for the protein background scale (log scale)
use_size_factor_key (bool) – Use size_factor AnnDataField defined by the user as scaling factor in mean of conditional distribution. Takes priority over
use_observed_lib_size
.use_observed_lib_size (bool) – Use observed library size for RNA as scaling factor in mean of conditional distribution
library_log_means (Optional[ndarray]) – 1 x n_batch array of means of the log library sizes. Parameterizes prior on library size if not using observed library size.
library_log_vars (Optional[ndarray]) – 1 x n_batch array of variances of the log library sizes. Parameterizes prior on library size if not using observed library size.
n_layers_encoder (Tunable_[int]) –
n_layers_decoder (Tunable_[int]) –
dropout_rate_decoder (Tunable_[float]) –
dropout_rate_encoder (Tunable_[float]) –
use_batch_norm (Tunable_[Literal['encoder', 'decoder', 'none', 'both']]) –
use_layer_norm (Tunable_[Literal['encoder', 'decoder', 'none', 'both']]) –
Attributes table#
Methods table#
|
Run the generative step. |
|
Compute reconstruction loss. |
|
Returns the tensors of dispersions for genes and proteins. |
|
Internal helper function to compute necessary inference quantities. |
|
Returns the reconstruction loss and the Kullback divergences. |
|
Computes the marginal log likelihood of the data under the model. |
|
Sample from the generative model. |
Attributes#
training
Methods#
generative
- TOTALVAE.generative(z, library_gene, batch_index, label, cont_covs=None, cat_covs=None, size_factor=None, transform_batch=None)[source]#
Run the generative step.
get_reconstruction_loss
- TOTALVAE.get_reconstruction_loss(x, y, px_dict, py_dict, pro_batch_mask_minibatch=None)[source]#
Compute reconstruction loss.
get_sample_dispersion
- TOTALVAE.get_sample_dispersion(x, y, batch_index=None, label=None, n_samples=1)[source]#
Returns the tensors of dispersions for genes and proteins.
- Parameters:
x (Tensor) – tensor of values with shape
(batch_size, n_input_genes)
y (Tensor) – tensor of values with shape
(batch_size, n_input_proteins)
batch_index (Optional[Tensor]) – array that indicates which batch the cells belong to with shape
batch_size
label (Optional[Tensor]) – tensor of cell-types labels with shape
(batch_size, n_labels)
n_samples (int) – number of samples
- Returns:
type tensors of dispersions of the negative binomial distribution
- Return type:
inference
- TOTALVAE.inference(x, y, batch_index=None, label=None, n_samples=1, cont_covs=None, cat_covs=None)[source]#
Internal helper function to compute necessary inference quantities.
We use the dictionary
px_
to contain the parameters of the ZINB/NB for genes. The rate refers to the mean of the NB, dropout refers to Bernoulli mixing parameters.scale
refers to the quanity upon which differential expression is performed. For genes, this can be viewed as the mean of the underlying gamma distribution.We use the dictionary
py_
to contain the parameters of the Mixture NB distribution for proteins.rate_fore
refers to foreground mean, whilerate_back
refers to background mean.scale
refers to foreground mean adjusted for background probability and scaled to reside in simplex.back_alpha
andback_beta
are the posterior parameters forrate_back
.fore_scale
is the scaling factor that enforcesrate_fore
>rate_back
.px_["r"]
andpy_["r"]
are the inverse dispersion parameters for genes and protein, respectively.- Parameters:
x (Tensor) – tensor of values with shape
(batch_size, n_input_genes)
y (Tensor) – tensor of values with shape
(batch_size, n_input_proteins)
batch_index (Optional[Tensor]) – array that indicates which batch the cells belong to with shape
batch_size
label (Optional[Tensor]) – tensor of cell-types labels with shape (batch_size, n_labels)
n_samples – Number of samples to sample from approximate posterior
cont_covs – Continuous covariates to condition on
cat_covs – Categorical covariates to condition on
- Return type:
loss
- TOTALVAE.loss(tensors, inference_outputs, generative_outputs, pro_recons_weight=1.0, kl_weight=1.0)[source]#
Returns the reconstruction loss and the Kullback divergences.
- Parameters:
x – tensor of values with shape
(batch_size, n_input_genes)
y – tensor of values with shape
(batch_size, n_input_proteins)
batch_index – array that indicates which batch the cells belong to with shape
batch_size
label – tensor of cell-types labels with shape (batch_size, n_labels)
- Returns:
type the reconstruction loss and the Kullback divergences
- Return type:
Tuple[FloatTensor, FloatTensor, FloatTensor, FloatTensor]
marginal_ll
- TOTALVAE.marginal_ll(tensors, n_mc_samples)[source]#
Computes the marginal log likelihood of the data under the model.
sample