scvi.module.TOTALVAE#

class scvi.module.TOTALVAE(n_input_genes, n_input_proteins, n_batch=0, n_labels=0, n_hidden=256, n_latent=20, n_layers_encoder=2, n_layers_decoder=1, n_continuous_cov=0, n_cats_per_cov=None, dropout_rate_decoder=0.2, dropout_rate_encoder=0.2, gene_dispersion='gene', protein_dispersion='protein', log_variational=True, gene_likelihood='nb', latent_distribution='normal', protein_batch_mask=None, encode_covariates=True, protein_background_prior_mean=None, protein_background_prior_scale=None, use_size_factor_key=False, use_observed_lib_size=True, library_log_means=None, library_log_vars=None, use_batch_norm='both', use_layer_norm='none')[source]#

Bases: BaseModuleClass

Total variational inference for CITE-seq data.

Implements the totalVI model of [GayosoSteier21].

Parameters:
n_input_genes : int

Number of input genes

n_input_proteins : int

Number of input proteins

n_batch : int (default: 0)

Number of batches

n_labels : int (default: 0)

Number of labels

n_hidden : int (default: 256)

Number of nodes per hidden layer for encoder and decoder

n_latent : int (default: 20)

Dimensionality of the latent space

n_layers

Number of hidden layers used for encoder and decoder NNs

n_continuous_cov : int (default: 0)

Number of continuous covarites

n_cats_per_cov : Iterable[int] | NoneOptional[Iterable[int]] (default: None)

Number of categories for each extra categorical covariate

dropout_rate

Dropout rate for neural networks

gene_dispersion : str (default: 'gene')

One of the following

  • 'gene' - genes_dispersion parameter of NB is constant per gene across cells

  • 'gene-batch' - genes_dispersion can differ between different batches

  • 'gene-label' - genes_dispersion can differ between different labels

protein_dispersion : str (default: 'protein')

One of the following

  • 'protein' - protein_dispersion parameter is constant per protein across cells

  • 'protein-batch' - protein_dispersion can differ between different batches NOT TESTED

  • 'protein-label' - protein_dispersion can differ between different labels NOT TESTED

log_variational : bool (default: True)

Log(data+1) prior to encoding for numerical stability. Not normalization.

gene_likelihood : str (default: 'nb')

One of

  • 'nb' - Negative binomial distribution

  • 'zinb' - Zero-inflated negative binomial distribution

latent_distribution : str (default: 'normal')

One of

  • 'normal' - Isotropic normal

  • 'ln' - Logistic normal with normal params N(0, 1)

protein_batch_mask : {str | int: ndarray} | NoneOptional[Dict[Union[str, int], ndarray]] (default: None)

Dictionary where each key is a batch code, and value is for each protein, whether it was observed or not.

encode_covariates : bool (default: True)

Whether to concatenate covariates to expression in encoder

protein_background_prior_mean : ndarray | NoneOptional[ndarray] (default: None)

Array of proteins by batches, the prior initialization for the protein background mean (log scale)

protein_background_prior_scale : ndarray | NoneOptional[ndarray] (default: None)

Array of proteins by batches, the prior initialization for the protein background scale (log scale)

use_size_factor_key : bool (default: False)

Use size_factor AnnDataField defined by the user as scaling factor in mean of conditional distribution. Takes priority over use_observed_lib_size.

use_observed_lib_size : bool (default: True)

Use observed library size for RNA as scaling factor in mean of conditional distribution

library_log_means : ndarray | NoneOptional[ndarray] (default: None)

1 x n_batch array of means of the log library sizes. Parameterizes prior on library size if not using observed library size.

library_log_vars : ndarray | NoneOptional[ndarray] (default: None)

1 x n_batch array of variances of the log library sizes. Parameterizes prior on library size if not using observed library size.

Attributes table#

Methods table#

generative(z, library_gene, batch_index, label)

Run the generative model.

get_reconstruction_loss(x, y, px_dict, py_dict)

Compute reconstruction loss.

get_sample_dispersion(x, y[, batch_index, ...])

Returns the tensors of dispersions for genes and proteins.

inference(x, y[, batch_index, label, ...])

Internal helper function to compute necessary inference quantities.

loss(tensors, inference_outputs, ...[, ...])

Returns the reconstruction loss and the Kullback divergences.

marginal_ll(tensors, n_mc_samples)

sample(tensors[, n_samples])

Generate samples from the learned model.

Attributes#

T_destination#

TOTALVAE.T_destination#

alias of TypeVar(‘T_destination’, bound=Mapping[str, Tensor])

alias of TypeVar(‘T_destination’, bound=Mapping[str, Tensor]) .. autoattribute:: TOTALVAE.T_destination device ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

TOTALVAE.device#

dump_patches#

TOTALVAE.dump_patches: bool = False#

This allows better BC support for load_state_dict(). In state_dict(), the version number will be saved as in the attribute _metadata of the returned state dict, and thus pickled. _metadata is a dictionary with keys that follow the naming convention of state dict. See _load_from_state_dict on how to use this information in loading.

If new parameters/buffers are added/removed from a module, this number shall be bumped, and the module’s _load_from_state_dict method can compare the version number and do appropriate changes if the state dict is from before the change.

training#

TOTALVAE.training: bool#

Methods#

generative#

TOTALVAE.generative(z, library_gene, batch_index, label, cont_covs=None, cat_covs=None, size_factor=None, transform_batch=None)[source]#

Run the generative model.

This function should return the parameters associated with the likelihood of the data. This is typically written as \(p(x|z)\).

This function should return a dictionary with str keys and Tensor values.

Return type:

{str: Tensor | {str: Tensor}}Dict[str, Union[Tensor, Dict[str, Tensor]]]

get_reconstruction_loss#

TOTALVAE.get_reconstruction_loss(x, y, px_dict, py_dict, pro_batch_mask_minibatch=None)[source]#

Compute reconstruction loss.

Return type:

Tuple[Tensor, Tensor]

get_sample_dispersion#

TOTALVAE.get_sample_dispersion(x, y, batch_index=None, label=None, n_samples=1)[source]#

Returns the tensors of dispersions for genes and proteins.

Parameters:
x : Tensor

tensor of values with shape (batch_size, n_input_genes)

y : Tensor

tensor of values with shape (batch_size, n_input_proteins)

batch_index : Tensor | NoneOptional[Tensor] (default: None)

array that indicates which batch the cells belong to with shape batch_size

label : Tensor | NoneOptional[Tensor] (default: None)

tensor of cell-types labels with shape (batch_size, n_labels)

n_samples : int (default: 1)

number of samples

Return type:

Tuple[Tensor, Tensor]

Returns:

type tensors of dispersions of the negative binomial distribution

inference#

TOTALVAE.inference(x, y, batch_index=None, label=None, n_samples=1, cont_covs=None, cat_covs=None)[source]#

Internal helper function to compute necessary inference quantities.

We use the dictionary px_ to contain the parameters of the ZINB/NB for genes. The rate refers to the mean of the NB, dropout refers to Bernoulli mixing parameters. scale refers to the quanity upon which differential expression is performed. For genes, this can be viewed as the mean of the underlying gamma distribution.

We use the dictionary py_ to contain the parameters of the Mixture NB distribution for proteins. rate_fore refers to foreground mean, while rate_back refers to background mean. scale refers to foreground mean adjusted for background probability and scaled to reside in simplex. back_alpha and back_beta are the posterior parameters for rate_back. fore_scale is the scaling factor that enforces rate_fore > rate_back.

px_["r"] and py_["r"] are the inverse dispersion parameters for genes and protein, respectively.

Parameters:
x : Tensor

tensor of values with shape (batch_size, n_input_genes)

y : Tensor

tensor of values with shape (batch_size, n_input_proteins)

batch_index : Tensor | NoneOptional[Tensor] (default: None)

array that indicates which batch the cells belong to with shape batch_size

label : Tensor | NoneOptional[Tensor] (default: None)

tensor of cell-types labels with shape (batch_size, n_labels)

n_samples

Number of samples to sample from approximate posterior

cont_covs

Continuous covariates to condition on

cat_covs

Categorical covariates to condition on

Return type:

{str: Tensor | {str: Tensor}}Dict[str, Union[Tensor, Dict[str, Tensor]]]

loss#

TOTALVAE.loss(tensors, inference_outputs, generative_outputs, pro_recons_weight=1.0, kl_weight=1.0)[source]#

Returns the reconstruction loss and the Kullback divergences.

Parameters:
x

tensor of values with shape (batch_size, n_input_genes)

y

tensor of values with shape (batch_size, n_input_proteins)

batch_index

array that indicates which batch the cells belong to with shape batch_size

label

tensor of cell-types labels with shape (batch_size, n_labels)

Return type:

Tuple[FloatTensor, FloatTensor, FloatTensor, FloatTensor]

Returns:

type the reconstruction loss and the Kullback divergences

marginal_ll#

TOTALVAE.marginal_ll(tensors, n_mc_samples)[source]#

sample#

TOTALVAE.sample(tensors, n_samples=1)[source]#

Generate samples from the learned model.