DestVI [1] (Deconvolution of Spatial Transcriptomics profiles using Variational Inference) posits a conditional generative model of spatial transcriptomics down to the sub-cell-type variation level which can be used to explore the spatial organization of a tissue and understanding gene expression variation between tissues and conditions.

The advantages of DestVI are:

  • Can stratify cells into discrete cell types and model continuous sub-cell-type variation.

  • Scalable to very large datasets (>1 million cells).

The limitations of DestVI include:

  • Effectively requires a GPU for fast inference.


DestVI requires training two models, the scLVM (single-cell latent variable model) and the stLVM (spatial transcriptomic latent variable model). The scLVM takes in as input a scRNA-seq gene expression matrix of UMI counts \(X\) with \(N\) cells and \(G\) genes, along with a vector of cell type labels \(\vec{c}\). Subsequently, the stLVM takes in the trained scLVM, along a spatial gene expression matrix \(Y\) with \(S\) spots and \(G\) genes. Optionally, the user can specify the number of components used for the mixture model underlying the empirical prior.

Generative process#


For cell \(n\), the scLVM assumes observed discrete cell type labels \(c_n\) and models continuous covariates \(\gamma_n\) of dimension \(d\) to explain variation in gene expression within a cell type. The scLVM posits that the observed UMI counts for cell \(n\) are generated by the following process:

\begin{align} \gamma_n &\sim \textrm{Normal}(0, I) \tag{1} \\ x_{ng} &\sim \textrm{NegativeBinomial}(l_nf^g(c_n, \gamma_n), p_g) \tag{2} \\ \end{align}

where \(l_n\) is the library size, \(f\) is a two-layer neural network which outputs a \(G\) dimensional vector, and \(p_g\) is the rate parameter of the negative binomial distribution for a given gene \(g\).


We are using the standard rate-shape parametrization of the negative binomial here, rather than the mean-dispersion parametrization used in scVI. This is to take advantage of the additive property of negative binomial distributions sharing the same shape parameter. In this case, the rate parameter for the negative binomial modeling the expression counts for a given gene and spot is equivalent to the sum of the rate parameters for each contributing cell.

This generative process is also summarized in the following graphical model:

scLVM graphical model

scLVM graphical model.#

The latent variables for the scLVM, along with their description are summarized in the following table:

Latent variable


Code variable (if different)

\(\gamma_n \in \mathbb{R}^d\)

Low-dimensional representation of sub-cell-type covariates.


\(p_g \in (0, \infty)\)

Rate parameter for the negative binomial distribution.



For the stLVM, we also model the expression counts with a \(\mathrm{NegativeBinomial}\). However, for spatial data, we assume that each spot \(s\) has expression \(x_s\) composed of a bulk of cell types, with cell type abundance, \(\beta_{sc}\), for each cell type \(c\). We assume that for a given spot \(s\) and gene \(g\), the observation is generated as a function of the latent variables \((c, \gamma_s^c)\) by the following process:

\begin{align} \gamma_x^c &\sim \sum_{k=1}^K m_{kc} q_\Phi(\gamma^c \mid u_{kc}, c) \tag{4} \\ x_{sg} &\sim \mathrm{NegativeBinomial}(l_s\alpha_g\sum_{c=1}^{C}\beta_{sc}f^g(c, \gamma_s^c), p_g) \tag{5} \\ \end{align}

where \(l_s\) is the library size and \(\alpha_g\) is a correction term for difference in experimental assays. Like the scLVM, \(f\) is a decoder neural network, and \(p_g\) is the rate parameter for the negative binomial distribution.

To avoid the latent variable \(\gamma_s^c\) from incorporating variation attributed to experimental assay differences, we assign an empirical prior informed by the scLVM and the corresponding cells of the same cell type in the scRNA-seq dataset. To compute this function, we subcluster the latent space of the scLVM for each cell type to K cell type specific clusters. For each cluster we compute an empirical mean and variance. Above, \(\{u_{kc}\}_{k=1}^K\) designates the set of cell type specific subclusters from cell type \(c\) in the scRNA-seq dataset, and \(q_\Phi\) designates the empirical normal distribution from the computed cluster mean and variance. The loss is weighted by the probability of a random cell from this cell type to be in the respective cluster in the scRNA-seq dataset (mixture probability, \(m_{kc}\)). In literature, the prior is referred to as a VampPrior (“variational aggregated mixture of posteriors” prior) [2]. More can be read on this prior in the DestVI paper.

Lastly, an additional latent variable, \(\eta_g\), is incorporated into the aggregated cell expression profile as a dummy cell type to represent gene specific noise. The dummy cell type’s expression profile is distributed as \(\epsilon_g := \mathrm{Softplus}(\eta_g)\) where \(\eta_g \sim \mathrm{Normal}(0, 1)\). Like the other cell types, there is an associated cell type abundance parameter \(\beta_{sc}\) associated with \(\eta\). We suspect each spot to only contain a fraction of the different cell types. To increase sparsity of the cell type proportions, the stLVM supports L1 regularization on the cell types proportions \(\beta_{sc}\). By default this loss is not used.

This generative process is also summarized in the following graphical model:

stLVM graphical model

stLVM graphical model.#

The latent variables for the stLVM, along with their description are summarized in the following table:

Latent variable


Code variable (if different)

\(\beta_{sc} \in (0, \infty)\)

Spot-specific cell type abundance.


\(\gamma_s^c \in (-\infty, \infty)\)

Low-dimensional representation of sub-cell-type covariates for a given spot and cell type.


\(\eta_g \in (0, \infty)\)

Gene-specific noise.


\(\alpha_g \in (0, \infty)\)

Correction term for technological differences.


\(p_g \in (0,\infty)\)

Rate parameter for the negative binomial distribution.




DestVI uses variational inference and specifically auto-encoding variational bayes (see Variational Inference) to learn both the model parameters (the neural network params, rate params, etc.) and an approximate posterior distribution for the scLVM. Like scvi.model.SCVI, the underlying class used as the encoder for DestVI is Encoder.


For the stLVM, DestVI infers point estimates for latent variables \(\gamma^c, \alpha, \beta\) using a penalized likelihood method. Beyond vanilla MAP inference, to regularize \(\alpha\) a variance penalty is applied across all genes. Additionally, rather than having just \(C\) parameters per spot to denote the estimated cell type abundances per spot, the stLVM has \(dC\) parameters per spot as well to account for the latent space learned by the scLVM.

The loss is defined as:

\begin{align} L(l, \alpha, \beta, f^g, \gamma, p, \eta) := &-\log p(X \mid l, \alpha, \beta, f^g, \gamma, p, \eta) - \lambda_{\eta} \log p(\eta) \\ &+ \lambda_{\alpha} \mathrm{Var}(\alpha) - \log p(\gamma \mid \mathrm{VampPrior}) + \lambda_{\beta} \lVert \beta_{sc} \rVert_1 \tag{6} \\ \end{align}

where \(\mathrm{Var}(\alpha)\) refers to the empirical variance of the parameters alpha across all genes. We used this as a practical form of regularization (a similar regularizer is used in the ZINB-WaVE model [3]).

\(\lambda_{\beta}\) (l1_reg in code), \(\lambda_{\eta}\) (eta_reg in code) and \(\lambda_{\alpha}\) (beta_reg in code) are hyperparameters used to scale the loss term. Increasing \(\lambda_{\beta}\) leads to increased sparsity of cell type proportions. Increasing \(\lambda_{\alpha}\) leads to less model flexibility for technical variation between single cell and spatial sequencing dataset. Increasing \(\lambda_{\eta}\) leads to more genes being explained by the dummy cell type (we recommend to not change the default value). To avoid overfitting, DestVI amortizes inference using a neural network to parametrize the latent variables. Via the amortization parameter of scvi.module.MRDeconv, the user can specify which of \(\beta\) and \(\gamma^c\) will be parametrized by the neural network.


Cell type deconvolution#

Once the model is trained, one can retrieve the estimated cell type proportions in each spot using the method:

>>> proportions = st_model.get_proportions()
>>> st_adata.obsm["proportions"] = proportions

These proportions are computed by normalizing across all learned cell type abundances, \(\beta_{sc}\), for a given spot \(s\). I.e. the estimated proportion of cell type \(c\) for spot \(s\) is \(\frac{\beta_{sc}}{\sum_c \beta_{sc}}\).

Subsequently for a given cell type, users can plot a heatmap of the cell type proportions spatially using scanpy with:

>>> import scanpy as sc
>>> st_adata.obs['B cells'] = st_adata.obsm['proportions']['B cells']
>>>, color="B cells", spot_size=130)

Intra cell type variation#

Users can retrieve the values of \(\gamma\), the latent variables corresponding to the modeled cell-type-specific continuous covariates with:

>>> gamma = st_model.get_gamma()["B cells"]
>>> st_adata.obsm["B_cells_gamma"] = gamma

Cell-type-specific gene expression imputation#

Assuming the user has identified key gene modules that vary within a cell type of interest, they can impute the spatial pattern of the cell-type-specific gene expression with:

>>> # Filter spots with low abundance.
>>> indices = np.where(st_adata.obsm["proportions"][ct_name].values > 0.03)[0]
>>> imputed_counts = st_model.get_scale_for_ct("Monocyte", indices=indices)[["Cxcl9", "Cxcl10", "Fcgr1"]]

Comparative analysis between samples#

To perform differential expression across samples, one can apply a frequentist test by taking samples from the parameters of the generative distribution predicted for each spot in question.

Utilities function#

To explore the results of the output of the stLVM, we published a utilities function covering functions for automatic thresholding of cell type proportions, a spatial PCA analysis to find main axis of variation in spatial gene expression and the described frequentist test for differential expression. Further information can be found on destvi_utils