# MultiVI

**MultiVI** [^ref1] (Python class {class}`~scvi.model.MULTIVI`) multimodal generative model capable of
integrating multiome, scRNA-seq and scATAC-seq data. After training, it can be used for many common downstream tasks,
and also for imputation of a missing modality.

The advantages of multiVI are:

-   Comprehensive in capabilities. Able to perform DE gene, DA region analysis.
-   Scalable to very large datasets (>1 million cells).
-   Once trained with sufficient multimodal data, able to accurately input missing modalities.

The limitations of MultiVI include:

-   Effectively requires a GPU for fast inference.

```{topic} Tutorials:

-   {doc}`/tutorials/notebooks/quick_start/api_overview`
-   {doc}`/tutorials/notebooks/multimodal/MultiVI_tutorial`
```

## Preliminaries

MultiVI takes as input a multiome single-cell matrix $X_{mult}$ with $N$ cells and $G$ genes and
$M$ genomic regions (peaks), or just a scRNA-seq gene expression matrix $X_{rna}$ with $N$ cells and
$G$ genes or just a scATAC-seq accessibility matrix $X_{acc}$ with $N$ cells and $M$ genomic
regions (peaks), which can be binary or have count data.
Additionally, a design matrix $S$ containing $p$ observed covariates, such as day, donor, etc, is an optional input.
While $S$ can include both categorical covariates and continuous covariates, in the following, we assume it contains only one
categorical covariate with $K$ categories, which represents the common case of having multiple batches of data.

## Generative process

MultiVI posits that the observed UMI counts for cell $n$, gene $g$, $x_{ng}$, and regions $j$,
$y_{nj}$, are generated by the following process:

```{math}
:nowrap: true

\begin{align}
 z_n &\sim {\mathrm{Normal}}\left( {0,I} \right) \\
 \ell_n &\sim \mathrm{LogNormal}\left( \ell_\mu^\top s_n ,\ell_{\sigma^2}^\top s_n \right) \\
 \rho _n &= f_w\left( z_n, s_n \right) \\
 \pi_{ng} &= f_h^g(z_n, s_n) \\
 x_{ng} &\sim \mathrm{ObservationModel}(\ell_n \rho_n, \theta_g, \pi_{ng}) \\
 p_{nj} &= g_z^j\left(z_n, s_n\right) \\
  \ell_n &= f_\ell\left(y_n\right)\\
  y_{nj} &\sim \mathrm{Bernoulli}\left(p_{nj} \cdot \ell_n \cdot r_j \right)
 \end{align}
```

For RNA-seq, the prior parameters $\ell_\mu$ and $\ell_{\sigma^2}$ are computed per batch as the mean and
variance of the log library size over cells. The expression data are generated from a count-based likelihood
distribution, which here, we denote as the $\mathrm{ObservationModel}$. While by default the
$\mathrm{ObservationModel}$ is a $\mathrm{ZeroInflatedNegativeBinomial}$ (ZINB) distribution parameterized
by its mean, inverse dispersion, and non-zero-inflation probability, respectively, users can pass
`gene_likelihood = "negative_binomial"` to {class}`~scvi.model.MultiVI`, for example, to use a simpler
$\mathrm{NegativeBinomial}$ distribution.

For ATAC-seq, detecting a region as accessible ($y_{nj} > 0$) is
generated by a Bernoulli random variable which depends on a cell-specific latent variable $z_n$, which captures
biological heterogeneity, and two auxiliary scaling factors $\ell_n, r_j$, which account for cell-specific and
region-specific technical effects, respectively.

The generative process of MultiVI uses neural networks to produce:

```{math}
:nowrap: true

\begin{align}
   f_w(z_n, s_n) &: \mathbb{R}^{d} \times \{0, 1\}^K \to \Delta^{G-1}\\
   f_h(z_n, s_n) &: \mathbb{R}^d \times \{0, 1\}^K \to (0, 1)^T\\
   g_z(z_n, s_n) &: \mathbb{R}^{d} \times \{0, 1\}^K \to \left[0,1\right]^M
\end{align}
```

which respectively decode the denoised gene expression, non-zero-inflation probability (only if using ZINB) and
estimates the probability of accessibility.

The latent variables, along with their description are summarized in the following table:

```{eval-rst}
.. list-table::
   :widths: 20 90 15
   :header-rows: 1

   * - Latent variable
     - Description
     - Code variable (if different)
   * - :math:`z_n \in \mathbb{R}^d`
     - Low-dimensional representation capturing the state of a cell.
     - N/A
   * - :math:`\rho_n \in \Delta^{G-1}`
     - Denoised/normalized gene expression. This is a vector that sums to 1 within a cell, unless `size_factor_key is not None` in :class:`~scvi.model.MULTVI.setup_anndata`, in which case this is only forced to be non-negative via softplus.
     - ``px_scale``
   * - :math:`\ell_n \in (0, \infty)`
     - Library size for RNA.
     - N/A
   * - :math:`\theta_g \in (0, \infty)`
     - Inverse dispersion for negative binomial. This can be set to be gene/batch specific for example (and would thus be :math:`\theta_{kg}`), by passing ``dispersion="gene-batch"`` during model intialization. Note that ``px_r`` also refers to the underlying real-valued torch parameter that is then exponentiated on every forward pass of the model.
     - N/A
   * - :math:`p_r`
     - Accessibility probability estimate
     - N/A
   * - :math:`\ell_n \in \left[0,1\right]`
     - Cell-wise scaling factor. Learned, but can be set manually with `size_factor_key` in :class:`~scvi.model.MULTIVI.setup_anndata`.
     - ``d``
   * - :math:`r_j \in \left[0,1\right]`
     - Region-wise scaling factor
     - ``f``
```

## Inference

MultiVI uses variational inference and specifically auto-encoding variational bayes
(see {doc}`/user_guide/background/variational_inference`) to learn both the model parameters (the
neural network params, dispersion params, etc.) and an approximate posterior distribution with the following factorization:

```{math}
:nowrap: true

\begin{align}
   q_\eta(z_n, \ell_n \mid x_n) :=
   q_\eta(z_n \mid x_n, y_n, s_n)q_\eta(\ell_n \mid x_n).
\end{align}
```

Here $\eta$ is a set of parameters corresponding to inference neural networks (encoders), which we do not describe
in detail here, but are described in the MultiVI paper. The underlying class used as the encoder for MultiVI is
{class}`~scvi.nn.Encoder`. $z_n$ is calculated determinimistically as the average of two latent variables part of
the variational approximation $z^{acc}_n$ and $z^{rna}_n$. These two variables are Normal and for that
reason, $z_n$ is Normal. This formalism permits to handle individual modalities by bypassing the average mechanism
and considering each modality variation approximation.

## Tasks

Here we provide an overview of some of the tasks that MultiVI can perform. Please see {class}`scvi.model.MULTIVI` for the full API reference.

### Dimensionality reduction

For dimensionality reduction, the mean of the approximate posterior $q_\eta(z_n \mid x_n, s_n)$ is returned by default.
This is achieved using the method:

```
>>> latent = model.get_latent_representation()
>>> adata.obsm["X_mvi"] = latent
```

Users may also return samples from this distribution, as opposed to the mean by passing the argument `give_mean=False`.
The latent representation can be used to create a nearest neighbor graph with scanpy with:

```
>>> import scanpy as sc
>>> sc.pp.neighbors(adata, use_rep="X_mvi")
>>> adata.obsp["distances"]
```

### Normalization/denoising/imputation of expression

In {func}`~scvi.model.MULTIVI.get_normalized_expression` MultiVI returns the expected value of $\rho_n$ under the approximate posterior. For one cell $n$, this can be written as:

```{math}
:nowrap: true

\begin{align}
   \mathbb{E}_{q_\eta(z_n \mid x_n)}\left[\ell_n'f_w\left( z_n, s_n \right) \right],
\end{align}
```

where $\ell_n'$ is by default set to 1. See the `library_size` parameter for more details. The expectation is approximated using Monte Carlo, and the number of samples can be passed as an argument in the code:

```
>>> model.get_normalized_expression(n_samples=10)
```

By default the mean over these samples is returned, but users may pass `return_mean=False` to retrieve all the samples.

Notably, this function also has the `transform_batch` parameter that allows counterfactual prediction of expression in an unobserved batch. See the {doc}`/user_guide/background/counterfactual_prediction` guide.

It is worth noting that when accessibility data is passed, MultiVI computes imputation of missing gene expression data.
When gene expression data is passed, MultiVI computes denoised gene expression data.

### Denoising/imputation of accessibility

In {func}`~scvi.model.MULTIVI.get_accessibility_estimates` MultiVI returns the expected value of $y_i$ under the
approximate posterior. For one cell $i$, this can be written as:

```{math}
:nowrap: true

\begin{align}
   \mathbb{E}_{q_\eta(z_i \mid x_i)}\left[g_z\left( z_i, s_i \right) \right],
\end{align}
```

The expectation is approximated by returning the mean of the variation approximation $z_n$ and
then, using this value to decode the accessibility probability estimate $p_r$. Alternatively, to approximate
the variational mean, a number of samples can be passed as an argument in the code:

```
>>> model.get_accessibility_estimates(n_samples_overall=10)
```

This value is used to compute the mean of the latent variable over these samples. Notably, this function also has
the `transform_batch` parameter that allows counterfactual prediction of accessibility in an unobserved batch.
See the {doc}`/user_guide/background/counterfactual_prediction` guide.

It is worth noting that when gene expression data is passed, MultiVI computes imputation of missing accessibility data.
When accessibility data is passed, MultiVI computes denoised chromatin accessibility.

### Differential expression

Differential expression analysis is achieved with {func}`~scvi.model.MULTIVI.differential_expression`. MultiVI tests differences in magnitude of $f_w\left( z_n, s_n \right)$. More info is in {doc}`/user_guide/background/differential_expression`.

### Differential accessibility

Differential accessibility analysis is achieved with {func}`~scvi.model.MULTIVI.differential_accessibility`. MultiVI tests differences in accessibility of $g_z\left( z_n, s_n \right)$.

[^ref1]:
    Tal Ashuach\*, Mariano I. Gabitto\*, Michael I. Jordan, Nir Yosef (2021),
    _MultiVI: deep generative model for the integration of multi-modal data_,
    `Biorxiv`.