scvi.model.base.VAEMixin#

class scvi.model.base.VAEMixin[source]#

Universal variational auto-encoder (VAE) methods.

Methods table#

differential_abundance([adata, sample_key, ...])

Compute the differential abundance between samples.

get_aggregated_posterior([adata, indices, ...])

Compute the aggregated posterior over the u latent representations.

get_elbo([adata, indices, batch_size, ...])

Compute the evidence lower bound (ELBO) on the data.

get_latent_representation([adata, indices, ...])

Compute the latent representation of the data.

get_marginal_ll([adata, indices, ...])

Compute the marginal log-likehood of the data.

get_reconstruction_error([adata, indices, ...])

Compute the reconstruction error on the data.

Methods#

VAEMixin.differential_abundance(adata=None, sample_key=None, batch_size=128, num_cells_posterior=None, dof=None)[source]#

Compute the differential abundance between samples.

Computes the log probabilities of each sample conditioned on the estimated aggregate posterior distribution of each cell.

Parameters:
  • adata (AnnData | MuData | None (default: None)) – The data object to compute the differential abundance for. For very large datasets, this should be a subset of the original data object.

  • sample_key (str | None (default: None)) – Key for the sample covariate.

  • batch_size (int (default: 128)) – Minibatch size for computing the differential abundance.

  • num_cells_posterior (int | None (default: None)) – Maximum number of cells used to compute aggregated posterior for each sample.

  • dof (float | None (default: None)) – Degrees of freedom for the Student’s t-distribution components for aggregated posterior. If None, components are Normal.

VAEMixin.get_aggregated_posterior(adata=None, indices=None, batch_size=None, dof=3.0)[source]#

Compute the aggregated posterior over the u latent representations.

Parameters:
  • adata (default: None) – AnnData object to use. Defaults to the AnnData object used to initialize the model.

  • indices (default: None) – Indices of cells to use.

  • batch_size (default: None) – Batch size to use for computing the latent representation.

  • dof (default: 3.0) – Degrees of freedom for the Student’s t-distribution components. If None, components are Normal.

Returns:

A mixture distribution of the aggregated posterior.

VAEMixin.get_elbo(adata=None, indices=None, batch_size=None, dataloader=None, return_mean=True, data_loader_kwargs=None, **kwargs)[source]#

Compute the evidence lower bound (ELBO) on the data.

The ELBO is the reconstruction error plus the Kullback-Leibler (KL) divergences between the variational distributions and the priors. It is different from the marginal log-likelihood; specifically, it is a lower bound on the marginal log-likelihood plus a term that is constant with respect to the variational distribution. It still gives good insights on the modeling of the data and is fast to compute.

Parameters:
  • adata (AnnData | None (default: None)) – AnnData object with var_names in the same order as the ones used to train the model. If None and dataloader is also None, it defaults to the object used to initialize the model.

  • indices (Sequence[int] | None (default: None)) – Indices of observations in adata to use. If None, defaults to all observations. Ignored if dataloader is not None.

  • batch_size (int | None (default: None)) – Minibatch size for the forward pass. If None, defaults to scvi.settings.batch_size. Ignored if dataloader is not None.

  • dataloader (Iterator[dict[str, Tensor | None]] | None (default: None)) – An iterator over minibatches of data on which to compute the metric. The minibatches should be formatted as a dictionary of Tensor with keys as expected by the model. If None, a dataloader is created from adata.

  • return_mean (bool (default: True)) – Whether to return the mean of the ELBO or the ELBO for each observation.

  • data_loader_kwargs (dict | None (default: None)) – Keyword args for data loader, in dict form.

  • **kwargs – Additional keyword arguments to pass into the forward method of the module.

Return type:

float

Returns:

Evidence lower bound (ELBO) of the data.

Notes

This is not the negative ELBO, so higher is better.

VAEMixin.get_latent_representation(adata=None, indices=None, give_mean=True, mc_samples=5000, batch_size=None, return_dist=False, dataloader=None, **data_loader_kwargs)[source]#

Compute the latent representation of the data.

This is typically denoted as \(z_n\).

Parameters:
  • adata (AnnData | None (default: None)) – AnnData object with var_names in the same order as the ones used to train the model. If None and dataloader is also None, it defaults to the object used to initialize the model.

  • indices (Sequence[int] | None (default: None)) – Indices of observations in adata to use. If None, defaults to all observations. Ignored if dataloader is not None

  • give_mean (bool (default: True)) – If True, returns the mean of the latent distribution. If False, returns an estimate of the mean using mc_samples Monte Carlo samples.

  • mc_samples (int (default: 5000)) – Number of Monte Carlo samples to use for the estimator for distributions with no closed-form mean (e.g., the logistic normal distribution). Not used if give_mean is True or if return_dist is True.

  • batch_size (int | None (default: None)) – Minibatch size for the forward pass. If None, defaults to scvi.settings.batch_size. Ignored if dataloader is not None

  • return_dist (bool (default: False)) – If True, returns the mean and variance of the latent distribution. Otherwise, returns the mean of the latent distribution.

  • dataloader (Iterator[dict[str, Tensor | None]] (default: None)) – An iterator over minibatches of data on which to compute the metric. The minibatches should be formatted as a dictionary of Tensor with keys as expected by the model. If None, a dataloader is created from adata.

  • **data_loader_kwargs – Keyword args for data loader.

Return type:

ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]] | tuple[ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]], ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]]]

Returns:

An array of shape (n_obs, n_latent) if return_dist is False. Otherwise, returns a tuple of arrays (n_obs, n_latent) with the mean and variance of the latent distribution.

VAEMixin.get_marginal_ll(adata=None, indices=None, n_mc_samples=1000, batch_size=None, return_mean=True, dataloader=None, data_loader_kwargs=None, **kwargs)[source]#

Compute the marginal log-likehood of the data.

The computation here is a biased estimator of the marginal log-likelihood of the data.

Parameters:
  • adata (AnnData | None (default: None)) – AnnData object with var_names in the same order as the ones used to train the model. If None and dataloader is also None, it defaults to the object used to initialize the model.

  • indices (Sequence[int] | None (default: None)) – Indices of observations in adata to use. If None, defaults to all observations. Ignored if dataloader is not None.

  • n_mc_samples (int (default: 1000)) – Number of Monte Carlo samples to use for the estimator. Passed into the module’s marginal_ll method.

  • batch_size (int | None (default: None)) – Minibatch size for the forward pass. If None, defaults to scvi.settings.batch_size. Ignored if dataloader is not None.

  • return_mean (bool (default: True)) – Whether to return the mean of the marginal log-likelihood or the marginal-log likelihood for each observation.

  • dataloader (Iterator[dict[str, Tensor | None]] (default: None)) – An iterator over minibatches of data on which to compute the metric. The minibatches should be formatted as a dictionary of Tensor with keys as expected by the model. If None, a dataloader is created from adata.

  • data_loader_kwargs (dict | None (default: None)) – Keyword args for data loader, in dict form.

  • **kwargs – Additional keyword arguments to pass into the module’s marginal_ll method.

Return type:

float | Tensor

Returns:

If True, returns the mean marginal log-likelihood. Otherwise returns a tensor of shape (n_obs,) with the marginal log-likelihood for each observation.

Notes

This is not the negative log-likelihood, so higher is better.

VAEMixin.get_reconstruction_error(adata=None, indices=None, batch_size=None, dataloader=None, return_mean=True, data_loader_kwargs=None, **kwargs)[source]#

Compute the reconstruction error on the data.

The reconstruction error is the negative log likelihood of the data given the latent variables. It is different from the marginal log-likelihood, but still gives good insights on the modeling of the data and is fast to compute. This is typically written as \(p(x \mid z)\), the likelihood term given one posterior sample.

Parameters:
  • adata (AnnData | None (default: None)) – AnnData object with var_names in the same order as the ones used to train the model. If None and dataloader is also None, it defaults to the object used to initialize the model.

  • indices (Sequence[int] | None (default: None)) – Indices of observations in adata to use. If None, defaults to all observations. Ignored if dataloader is not None

  • batch_size (int | None (default: None)) – Minibatch size for the forward pass. If None, defaults to scvi.settings.batch_size. Ignored if dataloader is not None

  • dataloader (Iterator[dict[str, Tensor | None]] | None (default: None)) – An iterator over minibatches of data on which to compute the metric. The minibatches should be formatted as a dictionary of Tensor with keys as expected by the model. If None, a dataloader is created from adata.

  • return_mean (bool (default: True)) – Whether to return the mean reconstruction loss or the reconstruction loss for each observation.

  • data_loader_kwargs (dict | None (default: None)) – Keyword args for data loader, in dict form.

  • **kwargs – Additional keyword arguments to pass into the forward method of the module.

Return type:

dict[str, float]

Returns:

Reconstruction error for the data.

Notes

This is not the negative reconstruction error, so higher is better.