DestVI 1 (Deconvolution of Spatial Transcriptomics profiles using Variational Inference) posits a conditional generative model of spatial transcriptomics down to the sub-cell-type variation level which can be used to explore the spatial organization of a tissue and understanding gene expression variation between tissues and conditions.

The advantages of DestVI are:

  • Can stratify cells into discrete cell types and model continuous sub-cell-type variation.

  • Scalable to very large datasets (>1 million cells).

The limitations of DestVI include:

  • Effectively requires a GPU for fast inference.


DestVI requires training two models, the scLVM (single-cell latent variable model) and the stLVM (spatial transcriptomic latent variable model). The scLVM takes in as input a scRNA-seq gene expression matrix of UMI counts \(X\) with \(N\) cells and \(G\) genes, along with a vector of cell type labels \(\vec{c}\). Subsequently, the stLVM takes in the trained scLVM, along a spatial gene expression matrix \(Y\) with \(S\) spots and \(G\) genes. Optionally, the user can specify the number of components used for the mixture model underlying the emprical prior.

Generative process


For cell \(n\), the scLVM assumes observed discrete cell type labels \(c_n\) and models continuous covariates \(\gamma_n\) of dimension \(d\) to explain variation in gene expression within a cell type. The scLVM posits that the observed UMI counts for cell \(n\) are generated by the following process:

\begin{align} \gamma_n &\sim \textrm{Normal}(0, I) \tag{1} \\ x_{ng} &\sim \textrm{NegativeBinomial}(l_nf^g(c_n, \gamma_n), p_g) \tag{2} \\ \end{align}

where \(l_n\) is the library size, \(f\) is a two-layer neural network which outputs a \(G\) dimensional vector, and \(p_g\) is the rate parameter of the negative binomial distribution for a given gene \(g\).


We are using the standard rate-shape parametrization of the negative binomial here, rather than the mean-dispersion parametrization used in scVI. This is to take advantage of the additive property of negative binomial distributions sharing the same shape parameter. In this case, the rate parameter for the negative binomial modeling the expression counts for a given gene and spot is equivalent to the sum of the rate parameters for each contributing cell.

This generative process is also summarized in the following graphical model:

scLVM graphical model

scLVM graphical model.

The latent variables for the scLVM, along with their description are summarized in the following table:

Latent variable


Code variable (if different)

\(\gamma_n \in \mathbb{R}^d\)

Low-dimensional representation of sub-cell-type covariates.


\(p_g \in (0, \infty)\)

Rate parameter for the negative binomial distribution.



For the stLVM, we also model the expression counts with a \(\mathrm{NegativeBinomial}\). However, for spatial data, we assume that each spot \(s\) has expression \(x_s\) composed of a bulk of cell types, with cell type abundance, \(\beta_{sc}\), for each cell type \(c\). We assume that for a given spot \(s\) and gene \(g\), the observation is generated as a function of the latent variables \((c, \gamma_s^c)\) by the following process:

\begin{align} \gamma_x^c &\sim \frac{1}{K} \sum_{k=1}^K q_\Phi(\gamma^c \mid u_{kc}, c) \tag{4} \\ x_{sg} &\sim \mathrm{NegativeBinomial}(l_s\alpha_g\sum_{c=1}^{C}\beta_{sc}f^g(c, \gamma_s^c), p_g) \tag{5} \\ \end{align}

where \(l_s\) is the library size and \(\alpha_g\) is a correction term for difference in experimental assays. Like the scLVM, \(f\) is a decoder neural network, and \(p_g\) is the rate parameter for the negative binomial distribution.

To avoid the latent variable \(\gamma_s^c\) from incorporating variation attributed to experimental assay differences, we assign an empirical prior informed by the scLVM and a corresponding set of cells of the same cell type in the scRNA-seq dataset. Above, \(\{u_{kc}\}_{k=1}^K\) designates a set of cells from cell type \(c\) in the scRNA-seq dataset, and \(q_\Phi\) designates the variational distrbution from the scLVM. In literature, the prior is referred to as a VampPrior (“variational aggregated mixture of posteriors” prior) 2. More can be read on this prior in the DestVI paper.

Lastly, an additional latent variable, \(\eta_g\), is incorporated into the aggregated cell expression profile as a dummy cell type to represent gene specific noise. The dummy cell type’s expression profile is distributed as \(\epsilon_g := \mathrm{Softplus}(\eta_g)\) where \(\eta_g \sim \mathrm{Normal}(0, 1)\). Like the other cell types, there is an associated cell type abundance parameter \(\beta_{sc}\) associated with \(\eta\).

This generative process is also summarized in the following graphical model:

stLVM graphical model

stLVM graphical model.

The latent variables for the stLVM, along with their description are summarized in the following table:

Latent variable


Code variable (if different)

\(\beta_{sc} \in (0, \infty)\)

Spot-specific cell type abundance.


\(\gamma_s^c \in (0, \infty)\)

Low-dimensional representation of sub-cell-type covariates for a given spot and cell type.


\(\eta_g \in (-\infty, \infty)\)

Gene-specific noise.


\(\alpha_g \in (0, \infty)\)

Correction term for technological differences.


\(p_g \in (0,\infty)\)

Rate parameter for the negative binomial distribution.




DestVI uses variational inference and specifically auto-encoding variational bayes (see Variational Inference) to learn both the model parameters (the neural network params, rate params, etc.) and an approximate posterior distribution for the scLVM. Like scvi.model.SCVI, the underlying class used as the encoder for DestVI is Encoder.


For the stLVM, DestVI infers point estimates for latent variables \(\gamma^c, \alpha, \beta\) using a penalized likelihood method. Beyond vanilla MAP inference, to regularize \(\alpha\) a variance penalty is applied across all genes. Additionally, rather than having just \(C\) parameters per spot to denote the estimated cell type abundances per spot, the stLVM has \(dC\) parameters per spot as well to account for the latent space learned by the scLVM.

The loss is defined as:

\begin{align} L(l, \alpha, \beta, f^g, \gamma, p, \eta) := &-\log p(X \mid l, \alpha, \beta, f^g, \gamma, p, \eta) - \log p(\eta) \\ &+ \mathrm{Var}(\alpha) - \log p(\gamma \mid \mathrm{VampPrior}) \tag{6} \\ \end{align}

To avoid overfitting, DestVI amortizes inference using a neural network to parametrize the latent variables. Via the amortization parameter of scvi.module.MRDeconv, the user can specify which of \(\beta\) and \(\gamma^c\) will be parametrized by the neural network.


Cell type deconvolution

Once the model is trained, one can retrieve the estimated cell type proportions in each spot using the method:

>>> proportions = st_model.get_proportions()
>>> st_adata.obsm["proportions"] = proportions

These proportions are computed by normalizing across all learned cell type abundances, \(\beta_{sc}\), for a given spot \(s\). I.e. the estimated proportion of cell type \(c\) for spot \(s\) is \(\frac{\beta_{sc}}{\sum_c \beta_{sc}}\).

Subsequently for a given cell type, users can plot a heatmap of the cell type proportions spatially using scanpy with:

>>> import scanpy as sc
>>> sc.p1.embedding(st_adata, basis="location", color="B cells")

Intra cell type variation

Users can retrieve the values of \(\gamma\), the latent variables corresponding to the modeled cell-type-specific continuous covariates with:

>>> gamma = st_model.get_gamma()["B cells"]
>>> st_adata.obsm["B_cells_gamma"] = gamma

Cell-type-specific gene expression imputation

Assuming the user has identified key gene modules that vary within a cell type of interest, they can impute the spatial pattern of the cell-type-specific gene expression with:

>>> # Filter spots with low abundance.
>>> indices = np.where(st_adata.obsm["proportions"][ct_name].values > 0.03)[0]
>>> imputed_counts = st_model.get_scale_for_ct("Monocyte", indices=indices)[["Cxcl9", "Cxcl10", "Fcgr1"]]

Comparative analysis between samples

To perform differential expression across samples, one can apply a frequentist test by taking samples from the parameters of the generative distribution predicted for each spot in question. More details can be found in the DestVI paper.



Romain Lopez, Baoguo Li, Hadas Keren-Shaul, Pierre Boyeau, Merav Kedmi, David Pilzer, Adam Jelinski, Eyal David, Allon Wagner, Yoseph Addad, Michael I. Jordan, Ido Amit, Nir Yosef (2021), Multi-resolution deconvolution of spatial transcriptomics data reveals continuous patterns of inflammation, bioRxiv.


Jakub Tomczak, Max Welling (2018), VAE with a VampPrior, `International Conference on Artificial Intelligence and Statistics <`__.