MultiVI 1 (Python class MULTIVI) multimodal generative model capable of integrating multiome, scRNA-seq and scATAC-seq data. After training, it can be used for many common downstream tasks, and also for imputation of a missing modality.

The advantages of multiVI are:

  • Comprehensive in capabilities. Able to perform DE gene, DA region analysis.

  • Scalable to very large datasets (>1 million cells).

  • Once trained with sufficient multimodal data, able to accurately input missing modalities.

The limitations of MultiVI include:

  • Effectively requires a GPU for fast inference.


  • /tutorials/notebooks/api_overview

  • /tutorials/notebooks/MultiVI_tutorial


MultiVI takes as input a multiome single-cell matrix \(X_{mult}\) with \(N\) cells and \(G\) genes and \(M\) genomic regions (peaks), or just a scRNA-seq gene expression matrix \(X_{rna}\) with \(N\) cells and \(G\) genes or just a scATAC-seq accessibility matrix \(X_{acc}\) with \(N\) cells and \(M\) genomic regions (peaks), which can be binary or have count data. Additionally, a design matrix \(S\) containing \(p\) observed covariates, such as day, donor, etc, is an optional input. While \(S\) can include both categorical covariates and continuous covariates, in the following, we assume it contains only one categorical covariate with \(K\) categories, which represents the common case of having multiple batches of data.

Generative process#

MultiVI posits that the observed UMI counts for cell \(n\), gene \(g\), \(x_{ng}\), and regions \(j\), \(y_{nj}\), are generated by the following process:

\begin{align} z_n &\sim {\mathrm{Normal}}\left( {0,I} \right) \\ \ell_n &\sim \mathrm{LogNormal}\left( \ell_\mu^\top s_n ,\ell_{\sigma^2}^\top s_n \right) \\ \rho _n &= f_w\left( z_n, s_n \right) \\ \pi_{ng} &= f_h^g(z_n, s_n) \\ x_{ng} &\sim \mathrm{ObservationModel}(\ell_n \rho_n, \theta_g, \pi_{ng}) \\ p_{nj} &= g_z^j\left(z_n, s_n\right) \\ \ell_n &= f_\ell\left(y_n\right)\\ y_{nj} &\sim \mathrm{Bernoulli}\left(p_{nj} \cdot \ell_n \cdot r_j \right) \end{align}

For RNA-seq, the prior parameters \(\ell_\mu\) and \(\ell_{\sigma^2}\) are computed per batch as the mean and variance of the log library size over cells. The expression data are generated from a count-based likelihood distribution, which here, we denote as the \(\mathrm{ObservationModel}\). While by default the \(\mathrm{ObservationModel}\) is a \(\mathrm{ZeroInflatedNegativeBinomial}\) (ZINB) distribution parameterized by its mean, inverse dispersion, and non-zero-inflation probability, respectively, users can pass gene_likelihood = "negative_binomial" to MultiVI, for example, to use a simpler \(\mathrm{NegativeBinomial}\) distribution.

For ATAC-seq, detecting a region as accessible (\(y_{nj} > 0\)) is generated by a Bernoulli random variable which depends on a cell-specific latent variable \(z_n\), which captures biological heterogeneity, and two auxiliary scaling factors \(\ell_n, r_j\), which account for cell-specific and region-specific technical effects, respectively.

The generative process of MultiVI uses neural networks to produce:

\begin{align} f_w(z_n, s_n) &: \mathbb{R}^{d} \times \{0, 1\}^K \to \Delta^{G-1}\\ f_h(z_n, s_n) &: \mathbb{R}^d \times \{0, 1\}^K \to (0, 1)^T\\ g_z(z_n, s_n) &: \mathbb{R}^{d} \times \{0, 1\}^K \to \left[0,1\right]^M \end{align}

which respectively decode the denoised gene expression, non-zero-inflation probability (only if using ZINB) and estimates the probability of accessibility.

The latent variables, along with their description are summarized in the following table:

Latent variable


Code variable (if different)

\(z_n \in \mathbb{R}^d\)

Low-dimensional representation capturing the state of a cell.


\(\rho_n \in \Delta^{G-1}\)

Denoised/normalized gene expression. This is a vector that sums to 1 within a cell, unless size_factor_key is not None in setup_anndata, in which case this is only forced to be non-negative via softplus.


\(\ell_n \in (0, \infty)\)

Library size for RNA.


\(\theta_g \in (0, \infty)\)

Inverse dispersion for negative binomial. This can be set to be gene/batch specific for example (and would thus be \(\theta_{kg}\)), by passing dispersion="gene-batch" during model intialization. Note that px_r also refers to the underlying real-valued torch parameter that is then exponentiated on every forward pass of the model.



Accessibility probability estimate


\(\ell_n \in \left[0,1\right]\)

Cell-wise scaling factor. Learned, but can be set manually with size_factor_key in setup_anndata.


\(r_j \in \left[0,1\right]\)

Region-wise scaling factor



MultiVI uses variational inference and specifically auto-encoding variational bayes (see Variational Inference) to learn both the model parameters (the neural network params, dispersion params, etc.) and an approximate posterior distribution with the following factorization:

\begin{align} q_\eta(z_n, \ell_n \mid x_n) := q_\eta(z_n \mid x_n, y_n, s_n)q_\eta(\ell_n \mid x_n). \end{align}

Here \(\eta\) is a set of parameters corresponding to inference neural networks (encoders), which we do not describe in detail here, but are described in the MultiVI paper. The underlying class used as the encoder for MultiVI is Encoder. \(z_n\) is calculated determinimistically as the average of two latent variables part of the variational approximation \(z^{acc}_n\) and \(z^{rna}_n\). These two variables are Normal and for that reason, \(z_n\) is Normal. This formalism permits to handle individual modalities by bypassing the average mechanism and considering each modality variation approximation.


Here we provide an overview of some of the tasks that MultiVI can perform. Please see scvi.model.MULTIVI for the full API reference.

Dimensionality reduction#

For dimensionality reduction, the mean of the approximate posterior \(q_\eta(z_n \mid x_n, s_n)\) is returned by default. This is achieved using the method:

>>> latent = model.get_latent_representation()
>>> adata.obsm["X_mvi"] = latent

Users may also return samples from this distribution, as opposed to the mean by passing the argument give_mean=False. The latent representation can be used to create a nearest neighbor graph with scanpy with:

>>> import scanpy as sc
>>> sc.pp.neighbors(adata, use_rep="X_mvi")
>>> adata.obsp["distances"]

Normalization/denoising/imputation of expression#

In get_normalized_expression() MultiVI returns the expected value of \(\rho_n\) under the approximate posterior. For one cell \(n\), this can be written as:

\begin{align} \mathbb{E}_{q_\eta(z_n \mid x_n)}\left[\ell_n'f_w\left( z_n, s_n \right) \right], \end{align}

where \(\ell_n'\) is by default set to 1. See the library_size parameter for more details. The expectation is approximated using Monte Carlo, and the number of samples can be passed as an argument in the code:

>>> model.get_normalized_expression(n_samples=10)

By default the mean over these samples is returned, but users may pass return_mean=False to retrieve all the samples.

Notably, this function also has the transform_batch parameter that allows counterfactual prediction of expression in an unobserved batch. See the Counterfactual prediction guide.

It is worth noting that when accessibility data is passed, MultiVI computes imputation of missing gene expression data. When gene expression data is passed, MultiVI computes denoised gene expression data.

Denoising/imputation of accessibility#

In get_accessibility_estimates() MultiVI returns the expected value of \(y_i\) under the approximate posterior. For one cell \(i\), this can be written as:

\begin{align} \mathbb{E}_{q_\eta(z_i \mid x_i)}\left[g_z\left( z_i, s_i \right) \right], \end{align}

The expectation is approximated by returning the mean of the variation approximation \(z_n\) and then, using this value to decode the accessibility probability estimate \(p_r\). Alternatively, to approximate the variational mean, a number of samples can be passed as an argument in the code:

>>> model.get_accessibility_estimates(n_samples_overall=10)

This value is used to compute the mean of the latent variable over these samples. Notably, this function also has the transform_batch parameter that allows counterfactual prediction of accessibility in an unobserved batch. See the Counterfactual prediction guide.

It is worth noting that when gene expression data is passed, MultiVI computes imputation of missing accessibility data. When accessibility data is passed, MultiVI computes denoised chromatin accessibility.

Differential expression#

Differential expression analysis is achieved with differential_expression(). MultiVI tests differences in magnitude of \(f_w\left( z_n, s_n \right)\). More info is in Differential Expression.

Differential accessibility#

Differential accessibility analysis is achieved with differential_accessibility(). MultiVI tests differences in accessibility of \(g_z\left( z_n, s_n \right)\).


Tal Ashuach*, Mariano I. Gabitto*, Michael I. Jordan, Nir Yosef (2021), MultiVI: deep generative model for the integration of multi-modal data, Biorxiv.