TOTALVI

class scvi.models.TOTALVI(n_input_genes, n_input_proteins, n_batch=0, n_labels=0, n_hidden=256, n_latent=20, n_layers_encoder=1, n_layers_decoder=1, dropout_rate_decoder=0.2, dropout_rate_encoder=0.2, gene_dispersion='gene', protein_dispersion='protein', log_variational=True, reconstruction_loss_gene='nb', latent_distribution='ln', protein_batch_mask=None, encoder_batch=True)[source]

Bases: torch.nn.modules.module.Module

Total variational inference for CITE-seq data

Implements the totalVI model of [GayosoSteier20].

Parameters
  • n_input_genes (intint) – Number of input genes

  • n_input_proteins (intint) – Number of input proteins

  • n_batch (intint) – Number of batches

  • n_labels (intint) – Number of labels

  • n_hidden (intint) – Number of nodes per hidden layer for the z encoder (protein+genes), genes library encoder, z->genes+proteins decoder

  • n_latent (intint) – Dimensionality of the latent space

  • n_layers – Number of hidden layers used for encoder and decoder NNs

  • dropout_rate – Dropout rate for neural networks

  • genes_dispersion

    One of the following

    • 'gene' - genes_dispersion parameter of NB is constant per gene across cells

    • 'gene-batch' - genes_dispersion can differ between different batches

    • 'gene-label' - genes_dispersion can differ between different labels

  • protein_dispersion (strstr) –

    One of the following

    • 'protein' - protein_dispersion parameter is constant per protein across cells

    • 'protein-batch' - protein_dispersion can differ between different batches NOT TESTED

    • 'protein-label' - protein_dispersion can differ between different labels NOT TESTED

  • log_variational (boolbool) – Log(data+1) prior to encoding for numerical stability. Not normalization.

  • reconstruction_loss_genes

    One of

    • 'nb' - Negative binomial distribution

    • 'zinb' - Zero-inflated negative binomial distribution

  • latent_distribution (strstr) –

    One of

    • 'normal' - Isotropic normal

    • 'ln' - Logistic normal with normal params N(0, 1)

    Examples:

Returns

>>> dataset = Dataset10X(dataset_name="pbmc_10k_protein_v3", save_path=save_path)
>>> totalvae = TOTALVI(gene_dataset.nb_genes, len(dataset.protein_names), use_cuda=True)

Methods Summary

forward(x, y, local_l_mean_gene, …[, …])

Returns the reconstruction loss and the Kullback divergences

get_reconstruction_loss(x, y, px_, py_[, …])

Compute reconstruction loss

get_sample_dispersion(x, y[, batch_index, …])

Returns the tensors of dispersions for genes and proteins

get_sample_rate(x, y[, batch_index, label, …])

Returns the tensor of negative binomial mean for genes

get_sample_scale(x, y[, batch_index, label, …])

Returns tuple of gene and protein scales.

inference(x, y[, batch_index, label, …])

Internal helper function to compute necessary inference quantities

sample_from_posterior_l(x, y[, batch_index, …])

Provides the tensor of library size from the posterior

sample_from_posterior_z(x, y[, batch_index, …])

Access the tensor of latent values from the posterior

Methods Documentation

forward(x, y, local_l_mean_gene, local_l_var_gene, batch_index=None, label=None)[source]

Returns the reconstruction loss and the Kullback divergences

Parameters
  • x (TensorTensor) – tensor of values with shape (batch_size, n_input_genes)

  • y (TensorTensor) – tensor of values with shape (batch_size, n_input_proteins)

  • local_l_mean_gene (TensorTensor) – tensor of means of the prior distribution of latent variable l with shape (batch_size, 1)``

  • local_l_var_gene (TensorTensor) – tensor of variancess of the prior distribution of latent variable l with shape (batch_size, 1)

  • batch_index (Tensor, NoneOptional[Tensor]) – array that indicates which batch the cells belong to with shape batch_size

  • label (Tensor, NoneOptional[Tensor]) – tensor of cell-types labels with shape (batch_size, n_labels)

Return type

Tuple[FloatTensor, FloatTensor, FloatTensor, FloatTensor]Tuple[FloatTensor, FloatTensor, FloatTensor, FloatTensor]

Returns

type the reconstruction loss and the Kullback divergences

get_reconstruction_loss(x, y, px_, py_, pro_batch_mask_minibatch=None)[source]

Compute reconstruction loss

Return type

Tuple[Tensor, Tensor]Tuple[Tensor, Tensor]

get_sample_dispersion(x, y, batch_index=None, label=None, n_samples=1)[source]

Returns the tensors of dispersions for genes and proteins

Parameters
  • x (TensorTensor) – tensor of values with shape (batch_size, n_input_genes)

  • y (TensorTensor) – tensor of values with shape (batch_size, n_input_proteins)

  • batch_index (Tensor, NoneOptional[Tensor]) – array that indicates which batch the cells belong to with shape batch_size

  • label (Tensor, NoneOptional[Tensor]) – tensor of cell-types labels with shape (batch_size, n_labels)

  • n_samples (intint) – number of samples

Return type

Tuple[Tensor, Tensor]Tuple[Tensor, Tensor]

Returns

type tensors of dispersions of the negative binomial distribution

get_sample_rate(x, y, batch_index=None, label=None, n_samples=1)[source]

Returns the tensor of negative binomial mean for genes

Parameters
  • x (TensorTensor) – tensor of values with shape (batch_size, n_input_genes)

  • y (TensorTensor) – tensor of values with shape (batch_size, n_input_proteins)

  • batch_index (Tensor, NoneOptional[Tensor]) – array that indicates which batch the cells belong to with shape batch_size

  • label (Tensor, NoneOptional[Tensor]) – tensor of cell-types labels with shape (batch_size, n_labels)

  • n_samples (intint) – number of samples

Return type

TensorTensor

Returns

type tensor of means of the negative binomial distribution with shape (batch_size, n_input_genes)

get_sample_scale(x, y, batch_index=None, label=None, n_samples=1, transform_batch=None, eps=0, normalize_pro=False, sample_bern=True, include_bg=False)[source]

Returns tuple of gene and protein scales.

These scales can also be transformed into a particular batch. This function is the core of differential expression.

Parameters
  • transform_batch (int, NoneOptional[int]) – Int of batch to “transform” all cells into

  • eps – Prior count to add to protein normalized expression (Default value = 0)

  • normalize_pro – bool, whether to make protein expression sum to one in a cell (Default value = False)

  • include_bg – bool, whether to include the background component of expression (Default value = False)

Return type

TensorTensor

Returns

inference(x, y, batch_index=None, label=None, n_samples=1, transform_batch=None)[source]

Internal helper function to compute necessary inference quantities

We use the dictionary px_ to contain the parameters of the ZINB/NB for genes. The rate refers to the mean of the NB, dropout refers to Bernoulli mixing parameters. scale refers to the quanity upon which differential expression is performed. For genes, this can be viewed as the mean of the underlying gamma distribution.

We use the dictionary py_ to contain the parameters of the Mixture NB distribution for proteins. rate_fore refers to foreground mean, while rate_back refers to background mean. scale refers to foreground mean adjusted for background probability and scaled to reside in simplex. back_alpha and back_beta are the posterior parameters for rate_back. fore_scale is the scaling factor that enforces rate_fore > rate_back.

px_["r"] and py_["r"] are the inverse dispersion parameters for genes and protein, respectively.

Return type

{str: Tensor, {str: Tensor}}Dict[str, Union[Tensor, Dict[str, Tensor]]]

sample_from_posterior_l(x, y, batch_index=None, give_mean=True)[source]

Provides the tensor of library size from the posterior

Parameters
  • x (TensorTensor) – tensor of values with shape (batch_size, n_input_genes)

  • y (TensorTensor) – tensor of values with shape (batch_size, n_input_proteins)

Return type

TensorTensor

Returns

type tensor of shape (batch_size, 1)

sample_from_posterior_z(x, y, batch_index=None, give_mean=False, n_samples=5000)[source]

Access the tensor of latent values from the posterior

Parameters
  • x (TensorTensor) – tensor of values with shape (batch_size, n_input_genes)

  • y (TensorTensor) – tensor of values with shape (batch_size, n_input_proteins)

  • batch_index (Tensor, NoneOptional[Tensor]) – tensor of batch indices

  • give_mean (boolbool) – Whether to sample, or give mean of distribution

Return type

TensorTensor

Returns

type tensor of shape (batch_size, n_latent)