scvi.module.VAE#

class scvi.module.VAE(n_input, n_batch=0, n_labels=0, n_hidden=128, n_latent=10, n_layers=1, n_continuous_cov=0, n_cats_per_cov=None, dropout_rate=0.1, dispersion='gene', log_variational=True, gene_likelihood='zinb', latent_distribution='normal', encode_covariates=False, deeply_inject_covariates=True, use_batch_norm='both', use_layer_norm='none', use_size_factor_key=False, use_observed_lib_size=True, library_log_means=None, library_log_vars=None, var_activation=None, extra_encoder_kwargs=None, extra_decoder_kwargs=None)[source]#

Bases: BaseMinifiedModeModuleClass

Variational auto-encoder model.

This is an implementation of the scVI model described in [Lopez et al., 2018].

Parameters:
  • n_input (int) – Number of input genes

  • n_batch (int (default: 0)) – Number of batches, if 0, no batch correction is performed.

  • n_labels (int (default: 0)) – Number of labels

  • n_hidden (Tunable_[int] (default: 128)) – Number of nodes per hidden layer

  • n_latent (Tunable_[int] (default: 10)) – Dimensionality of the latent space

  • n_layers (Tunable_[int] (default: 1)) – Number of hidden layers used for encoder and decoder NNs

  • n_continuous_cov (int (default: 0)) – Number of continuous covarites

  • n_cats_per_cov (Optional[Iterable[int]] (default: None)) – Number of categories for each extra categorical covariate

  • dropout_rate (Tunable_[float] (default: 0.1)) – Dropout rate for neural networks

  • dispersion (Tunable_[Literal['gene', 'gene-batch', 'gene-label', 'gene-cell']] (default: 'gene')) –

    One of the following

    • 'gene' - dispersion parameter of NB is constant per gene across cells

    • 'gene-batch' - dispersion can differ between different batches

    • 'gene-label' - dispersion can differ between different labels

    • 'gene-cell' - dispersion can differ for every gene in every cell

  • log_variational (Tunable_[bool] (default: True)) – Log(data+1) prior to encoding for numerical stability. Not normalization.

  • gene_likelihood (Tunable_[Literal['zinb', 'nb', 'poisson']] (default: 'zinb')) –

    One of

    • 'nb' - Negative binomial distribution

    • 'zinb' - Zero-inflated negative binomial distribution

    • 'poisson' - Poisson distribution

  • latent_distribution (Tunable_[Literal['normal', 'ln']] (default: 'normal')) –

    One of

    • 'normal' - Isotropic normal

    • 'ln' - Logistic normal with normal params N(0, 1)

  • encode_covariates (Tunable_[bool] (default: False)) – Whether to concatenate covariates to expression in encoder

  • deeply_inject_covariates (Tunable_[bool] (default: True)) – Whether to concatenate covariates into output of hidden layers in encoder/decoder. This option only applies when n_layers > 1. The covariates are concatenated to the input of subsequent hidden layers.

  • use_batch_norm (Tunable_[Literal['encoder', 'decoder', 'none', 'both']] (default: 'both')) – Whether to use batch norm in layers.

  • use_layer_norm (Tunable_[Literal['encoder', 'decoder', 'none', 'both']] (default: 'none')) – Whether to use layer norm in layers.

  • use_size_factor_key (bool (default: False)) – Use size_factor AnnDataField defined by the user as scaling factor in mean of conditional distribution. Takes priority over use_observed_lib_size.

  • use_observed_lib_size (Tunable_[bool] (default: True)) – Use observed library size for RNA as scaling factor in mean of conditional distribution

  • library_log_means (Optional[ndarray] (default: None)) – 1 x n_batch array of means of the log library sizes. Parameterizes prior on library size if not using observed library size.

  • library_log_vars (Optional[ndarray] (default: None)) – 1 x n_batch array of variances of the log library sizes. Parameterizes prior on library size if not using observed library size.

  • var_activation (Tunable_[Callable] (default: None)) – Callable used to ensure positivity of the variational distributions’ variance. When None, defaults to torch.exp.

  • extra_encoder_kwargs (Optional[dict] (default: None)) – Extra keyword arguments passed into Encoder.

  • extra_decoder_kwargs (Optional[dict] (default: None)) – Extra keyword arguments passed into DecoderSCVI.

Attributes table#

training

Methods table#

generative(z, library, batch_index[, ...])

Runs the generative model.

loss(tensors, inference_outputs, ...[, ...])

Computes the loss function for the model.

marginal_ll(tensors, n_mc_samples[, ...])

Computes the marginal log likelihood of the model.

sample(tensors[, n_samples, max_poisson_rate])

Generate predictive samples from the posterior predictive distribution.

Attributes#

VAE.training: bool#

Methods#

VAE.generative(z, library, batch_index, cont_covs=None, cat_covs=None, size_factor=None, y=None, transform_batch=None)[source]#

Runs the generative model.

VAE.loss(tensors, inference_outputs, generative_outputs, kl_weight=1.0)[source]#

Computes the loss function for the model.

VAE.marginal_ll(tensors, n_mc_samples, return_mean=False, n_mc_samples_per_pass=1)[source]#

Computes the marginal log likelihood of the model.

Parameters:
  • tensors – Dict of input tensors, typically corresponding to the items of the data loader.

  • n_mc_samples – Number of Monte Carlo samples to use for the estimation of the marginal log likelihood.

  • return_mean (default: False) – Whether to return the mean of marginal likelihoods over cells.

  • n_mc_samples_per_pass (default: 1) – Number of Monte Carlo samples to use per pass. This is useful to avoid memory issues.

VAE.sample(tensors, n_samples=1, max_poisson_rate=100000000.0)[source]#

Generate predictive samples from the posterior predictive distribution.

The posterior predictive distribution is denoted as \(p(\hat{x} \mid x)\), where \(x\) is the input data and \(\hat{x}\) is the sampled data.

We sample from this distribution by first sampling n_samples times from the posterior distribution \(q(z \mid x)\) for a given observation, and then sampling from the likelihood \(p(\hat{x} \mid z)\) for each of these.

Parameters:
  • tensors (dict[str, Tensor]) – Dictionary of tensors passed into forward().

  • n_samples (int (default: 1)) – Number of Monte Carlo samples to draw from the distribution for each observation.

  • max_poisson_rate (float (default: 100000000.0)) – The maximum value to which to clip the rate parameter of Poisson. Avoids numerical sampling issues when the parameter is very large due to the variance of the distribution.

Return type:

Tensor

Returns:

Tensor on CPU with shape (n_obs, n_vars) if n_samples == 1, else (n_obs, n_vars,).