scvi.module.TOTALVAE#

class scvi.module.TOTALVAE(n_input_genes, n_input_proteins, n_batch=0, n_labels=0, n_hidden=256, n_latent=20, n_layers_encoder=2, n_layers_decoder=1, n_continuous_cov=0, n_cats_per_cov=None, dropout_rate_decoder=0.2, dropout_rate_encoder=0.2, gene_dispersion='gene', protein_dispersion='protein', log_variational=True, gene_likelihood='nb', latent_distribution='normal', protein_batch_mask=None, encode_covariates=True, protein_background_prior_mean=None, protein_background_prior_scale=None, use_size_factor_key=False, use_observed_lib_size=True, library_log_means=None, library_log_vars=None, use_batch_norm='both', use_layer_norm='none', extra_encoder_kwargs=None, extra_decoder_kwargs=None)[source]#

Bases: BaseMinifiedModeModuleClass

Total variational inference for CITE-seq data.

Implements the totalVI model of [Gayoso et al., 2021].

Parameters:
  • n_input_genes (int) – Number of input genes

  • n_input_proteins (int) – Number of input proteins

  • n_batch (int (default: 0)) – Number of batches

  • n_labels (int (default: 0)) – Number of labels

  • n_hidden (int (default: 256)) – Number of nodes per hidden layer for encoder and decoder

  • n_latent (int (default: 20)) – Dimensionality of the latent space

  • n_layers – Number of hidden layers used for encoder and decoder NNs

  • n_continuous_cov (int (default: 0)) – Number of continuous covarites

  • n_cats_per_cov (Iterable[int] | None (default: None)) – Number of categories for each extra categorical covariate

  • dropout_rate – Dropout rate for neural networks

  • gene_dispersion (Literal['gene', 'gene-batch', 'gene-label'] (default: 'gene')) –

    One of the following

    • 'gene' - genes_dispersion parameter of NB is constant per gene across cells

    • 'gene-batch' - genes_dispersion can differ between different batches

    • 'gene-label' - genes_dispersion can differ between different labels

  • protein_dispersion (Literal['protein', 'protein-batch', 'protein-label'] (default: 'protein')) –

    One of the following

    • 'protein' - protein_dispersion parameter is constant per protein across cells

    • 'protein-batch' - protein_dispersion can differ between different batches NOT TESTED

    • 'protein-label' - protein_dispersion can differ between different labels NOT TESTED

  • log_variational (bool (default: True)) – Log(data+1) prior to encoding for numerical stability. Not normalization.

  • gene_likelihood (Literal['zinb', 'nb'] (default: 'nb')) –

    One of

    • 'nb' - Negative binomial distribution

    • 'zinb' - Zero-inflated negative binomial distribution

  • latent_distribution (Literal['normal', 'ln'] (default: 'normal')) –

    One of

    • 'normal' - Isotropic normal

    • 'ln' - Logistic normal with normal params N(0, 1)

  • protein_batch_mask (dict[str | int, ndarray] (default: None)) – Dictionary where each key is a batch code, and value is for each protein, whether it was observed or not.

  • encode_covariates (bool (default: True)) – Whether to concatenate covariates to expression in encoder

  • protein_background_prior_mean (ndarray | None (default: None)) – Array of proteins by batches, the prior initialization for the protein background mean (log scale)

  • protein_background_prior_scale (ndarray | None (default: None)) – Array of proteins by batches, the prior initialization for the protein background scale (log scale)

  • use_size_factor_key (bool (default: False)) – Use size_factor AnnDataField defined by the user as scaling factor in mean of conditional distribution. Takes priority over use_observed_lib_size.

  • use_observed_lib_size (bool (default: True)) – Use observed library size for RNA as scaling factor in mean of conditional distribution

  • library_log_means (ndarray | None (default: None)) – 1 x n_batch array of means of the log library sizes. Parameterizes prior on library size if not using observed library size.

  • library_log_vars (ndarray | None (default: None)) – 1 x n_batch array of variances of the log library sizes. Parameterizes prior on library size if not using observed library size.

  • use_batch_norm (Literal['encoder', 'decoder', 'none', 'both'] (default: 'both')) – Whether to use batch norm in layers.

  • use_layer_norm (Literal['encoder', 'decoder', 'none', 'both'] (default: 'none')) – Whether to use layer norm in layers.

  • extra_encoder_kwargs (dict | None (default: None)) – Extra keyword arguments passed into EncoderTOTALVI.

  • extra_decoder_kwargs (dict | None (default: None)) – Extra keyword arguments passed into DecoderTOTALVI.

Attributes table#

Methods table#

generative(z, library_gene, batch_index, label)

Run the generative step.

get_reconstruction_loss(x, y, px_dict, py_dict)

Compute reconstruction loss.

get_sample_dispersion(x, y[, batch_index, ...])

Returns the tensors of dispersions for genes and proteins.

loss(tensors, inference_outputs, ...[, ...])

Returns the reconstruction loss and the Kullback divergences.

marginal_ll(tensors, n_mc_samples[, return_mean])

Computes the marginal log likelihood of the data under the model.

on_load(model)

Callback function run in load().

sample(tensors[, n_samples])

Sample from the generative model.

Attributes#

TOTALVAE.training: bool#

Methods#

TOTALVAE.generative(z, library_gene, batch_index, label, cont_covs=None, cat_covs=None, size_factor=None, transform_batch=None)[source]#

Run the generative step.

Return type:

dict[str, Tensor | dict[str, Tensor]]

TOTALVAE.get_reconstruction_loss(x, y, px_dict, py_dict, pro_batch_mask_minibatch=None)[source]#

Compute reconstruction loss.

Return type:

tuple[Tensor, Tensor]

TOTALVAE.get_sample_dispersion(x, y, batch_index=None, label=None, n_samples=1)[source]#

Returns the tensors of dispersions for genes and proteins.

Parameters:
  • x (Tensor) – tensor of values with shape (batch_size, n_input_genes)

  • y (Tensor) – tensor of values with shape (batch_size, n_input_proteins)

  • batch_index (Tensor | None (default: None)) – array that indicates which batch the cells belong to with shape batch_size

  • label (Tensor | None (default: None)) – tensor of cell-types labels with shape (batch_size, n_labels)

  • n_samples (int (default: 1)) – number of samples

Return type:

tuple[Tensor, Tensor]

Returns:

type tensors of dispersions of the negative binomial distribution

TOTALVAE.loss(tensors, inference_outputs, generative_outputs, pro_recons_weight=1.0, kl_weight=1.0)[source]#

Returns the reconstruction loss and the Kullback divergences.

Parameters:
  • x – tensor of values with shape (batch_size, n_input_genes)

  • y – tensor of values with shape (batch_size, n_input_proteins)

  • batch_index – array that indicates which batch the cells belong to with shape batch_size

  • label – tensor of cell-types labels with shape (batch_size, n_labels)

Return type:

tuple[FloatTensor, FloatTensor, FloatTensor, FloatTensor]

Returns:

type the reconstruction loss and the Kullback divergences

TOTALVAE.marginal_ll(tensors, n_mc_samples, return_mean=True)[source]#

Computes the marginal log likelihood of the data under the model.

TOTALVAE.on_load(model)[source]#

Callback function run in load().

TOTALVAE.sample(tensors, n_samples=1)[source]#

Sample from the generative model.