DestVI¶
DestVI 1 (Deconvolution of Spatial Transcriptomics profiles using Variational Inference) posits a conditional generative model of spatial transcriptomics down to the subcelltype variation level which can be used to explore the spatial organization of a tissue and understanding gene expression variation between tissues and conditions.
The advantages of DestVI are:
Can stratify cells into discrete cell types and model continuous subcelltype variation.
Scalable to very large datasets (>1 million cells).
The limitations of DestVI include:
Effectively requires a GPU for fast inference.
Preliminaries¶
DestVI requires training two models, the scLVM (singlecell latent variable model) and the stLVM (spatial transcriptomic latent variable model). The scLVM takes in as input a scRNAseq gene expression matrix of UMI counts \(X\) with \(N\) cells and \(G\) genes, along with a vector of cell type labels \(\vec{c}\). Subsequently, the stLVM takes in the trained scLVM, along a spatial gene expression matrix \(Y\) with \(S\) spots and \(G\) genes. Optionally, the user can specify the number of components used for the mixture model underlying the emprical prior.
Generative process¶
scLVM¶
For cell \(n\), the scLVM assumes observed discrete cell type labels \(c_n\) and models continuous covariates \(\gamma_n\) of dimension \(d\) to explain variation in gene expression within a cell type. The scLVM posits that the observed UMI counts for cell \(n\) are generated by the following process:
where \(l_n\) is the library size, \(f\) is a twolayer neural network which outputs a \(G\) dimensional vector, and \(p_g\) is the rate parameter of the negative binomial distribution for a given gene \(g\).
Note
We are using the standard rateshape parametrization of the negative binomial here, rather than the meandispersion parametrization used in scVI. This is to take advantage of the additive property of negative binomial distributions sharing the same shape parameter. In this case, the rate parameter for the negative binomial modeling the expression counts for a given gene and spot is equivalent to the sum of the rate parameters for each contributing cell.
This generative process is also summarized in the following graphical model:
The latent variables for the scLVM, along with their description are summarized in the following table:
Latent variable 
Description 
Code variable (if different) 

\(\gamma_n \in \mathbb{R}^d\) 
Lowdimensional representation of subcelltype covariates. 

\(p_g \in (0, \infty)\) 
Rate parameter for the negative binomial distribution. 

stLVM¶
For the stLVM, we also model the expression counts with a \(\mathrm{NegativeBinomial}\). However, for spatial data, we assume that each spot \(s\) has expression \(x_s\) composed of a bulk of cell types, with cell type abundance, \(\beta_{sc}\), for each cell type \(c\). We assume that for a given spot \(s\) and gene \(g\), the observation is generated as a function of the latent variables \((c, \gamma_s^c)\) by the following process:
where \(l_s\) is the library size and \(\alpha_g\) is a correction term for difference in experimental assays. Like the scLVM, \(f\) is a decoder neural network, and \(p_g\) is the rate parameter for the negative binomial distribution.
To avoid the latent variable \(\gamma_s^c\) from incorporating variation attributed to experimental assay differences, we assign an empirical prior informed by the scLVM and a corresponding set of cells of the same cell type in the scRNAseq dataset. Above, \(\{u_{kc}\}_{k=1}^K\) designates a set of cells from cell type \(c\) in the scRNAseq dataset, and \(q_\Phi\) designates the variational distrbution from the scLVM. In literature, the prior is referred to as a VampPrior (“variational aggregated mixture of posteriors” prior) 2. More can be read on this prior in the DestVI paper.
Lastly, an additional latent variable, \(\eta_g\), is incorporated into the aggregated cell expression profile as a dummy cell type to represent gene specific noise. The dummy cell type’s expression profile is distributed as \(\epsilon_g := \mathrm{Softplus}(\eta_g)\) where \(\eta_g \sim \mathrm{Normal}(0, 1)\). Like the other cell types, there is an associated cell type abundance parameter \(\beta_{sc}\) associated with \(\eta\).
This generative process is also summarized in the following graphical model:
The latent variables for the stLVM, along with their description are summarized in the following table:
Latent variable 
Description 
Code variable (if different) 

\(\beta_{sc} \in (0, \infty)\) 
Spotspecific cell type abundance. 

\(\gamma_s^c \in (0, \infty)\) 
Lowdimensional representation of subcelltype covariates for a given spot and cell type. 

\(\eta_g \in (\infty, \infty)\) 
Genespecific noise. 

\(\alpha_g \in (0, \infty)\) 
Correction term for technological differences. 

\(p_g \in (0,\infty)\) 
Rate parameter for the negative binomial distribution. 

Inference¶
scLVM¶
DestVI uses variational inference and specifically autoencoding variational bayes (see Variational Inference)
to learn both the model parameters (the neural network params, rate params, etc.) and an approximate posterior distribution
for the scLVM. Like scvi.model.SCVI
, the underlying class used as the encoder for DestVI is Encoder
.
stLVM¶
For the stLVM, DestVI infers point estimates for latent variables \(\gamma^c, \alpha, \beta\) using a penalized likelihood method. Beyond vanilla MAP inference, to regularize \(\alpha\) a variance penalty is applied across all genes. Additionally, rather than having just \(C\) parameters per spot to denote the estimated cell type abundances per spot, the stLVM has \(dC\) parameters per spot as well to account for the latent space learned by the scLVM.
The loss is defined as:
To avoid overfitting, DestVI amortizes inference using a neural network to parametrize the latent variables.
Via the amortization
parameter of scvi.module.MRDeconv
, the user can specify which of
\(\beta\) and \(\gamma^c\) will be parametrized by the neural network.
Tasks¶
Cell type deconvolution¶
Once the model is trained, one can retrieve the estimated cell type proportions in each spot using the method:
>>> proportions = st_model.get_proportions()
>>> st_adata.obsm["proportions"] = proportions
These proportions are computed by normalizing across all learned cell type abundances, \(\beta_{sc}\), for a given spot \(s\). I.e. the estimated proportion of cell type \(c\) for spot \(s\) is \(\frac{\beta_{sc}}{\sum_c \beta_{sc}}\).
Subsequently for a given cell type, users can plot a heatmap of the cell type proportions spatially using scanpy with:
>>> import scanpy as sc
>>> sc.p1.embedding(st_adata, basis="location", color="B cells")
Intra cell type variation¶
Users can retrieve the values of \(\gamma\), the latent variables corresponding to the modeled celltypespecific continuous covariates with:
>>> gamma = st_model.get_gamma()["B cells"]
>>> st_adata.obsm["B_cells_gamma"] = gamma
Celltypespecific gene expression imputation¶
Assuming the user has identified key gene modules that vary within a cell type of interest, they can impute the spatial pattern of the celltypespecific gene expression with:
>>> # Filter spots with low abundance.
>>> indices = np.where(st_adata.obsm["proportions"][ct_name].values > 0.03)[0]
>>> imputed_counts = st_model.get_scale_for_ct("Monocyte", indices=indices)[["Cxcl9", "Cxcl10", "Fcgr1"]]
Comparative analysis between samples¶
To perform differential expression across samples, one can apply a frequentist test by taking samples from the parameters of the generative distribution predicted for each spot in question. More details can be found in the DestVI paper.
References:
 1
Romain Lopez, Baoguo Li, Hadas KerenShaul, Pierre Boyeau, Merav Kedmi, David Pilzer, Adam Jelinski, Eyal David, Allon Wagner, Yoseph Addad, Michael I. Jordan, Ido Amit, Nir Yosef (2021), Multiresolution deconvolution of spatial transcriptomics data reveals continuous patterns of inflammation, bioRxiv.
 2
Jakub Tomczak, Max Welling (2018), VAE with a VampPrior, `International Conference on Artificial Intelligence and Statistics <http://proceedings.mlr.press/v84/tomczak18a/tomczak18a.pdf`__.