Stereoscope¶
Stereoscope 1 posits a probabilistic model of spatial transcriptomics and an associated method for the deconvoluton of cell type profiles using a singlecell RNA sequencing reference dataset.
The advantages of Stereoscope are:
Can stratify cells into discrete cell types.
Scalable to very large datasets (>1 million cells).
The limitations of Stereoscope include:
Effectively requires a GPU for fast inference.
Preliminaries¶
Stereoscope requires training two latent variable models (LVMs): one for the singlecell reference dataset and one for the spatial transcriptomics dataset, which incorporates the learned parameters of the singlecell reference LVM. The first LVM takes in as input a scRNAseq gene expression matrix of UMI counts \(Y\) with \(N\) cells and \(G\) genes, along with a vector of cell type labels \(\vec{z}\). Subsequently, the second LVM takes in the learned parameters of the first LVM, along with a spatial gene expression matrix \(X\) with \(S\) spots and \(G\) genes.
Generative process¶
Singlecell reference LVM¶
For cell \(c\), the LVM assumes an observed discrete cell type label \(z_c\) and models the UMI count observation for a given gene \(g\) as a negative binomial distribution. This LVM posits that the observed UMI counts for cell \(c\) and gene \(g\) are generated by the following process:
where \(s_c = \sum_{g\in G} y_{gc}\) is the observed library size of the cell, \(r_{gz}\) is the latent rate parameter for the cell type \(z_c\) and gene \(g\), and \(p_g\) is the latent variable representing the success probability for gene \(g\).
Note
We are using the standard rateshape parametrization of the negative binomial here, rather than the meandispersion parametrization used in scVI. This is to take advantage of the additive property of negative binomial distributions sharing the same shape parameter. In this case, the rate parameter for the negative binomial modeling the expression counts for a given gene and spot is equivalent to the sum of the rate parameters for each contributing cell.
This generative process is also summarized in the following graphical model:
The latent variables for the singlecell reference LVM, along with their description are summarized in the following table:
Latent variable 
Description 
Code variable (if different) 

\(r_{gz} \in (0, \infty)\) 
Rate parameter for the negative binomial distribution. 

\(p_g \in [0, 1]\) 
Shape parameter for the negative binomial distribution. 

Spatial transcriptomics LVM¶
For the second LVM, we also model the expression counts with a \(\mathrm{NegativeBinomial}\). However, for spatial data, we assume that each spot \(s\) has expression \(x_s\) composed of a bulk of cell types, with cell type abundance, \(v_{sz}\), for each cell type \(z\). We assume that for a given spot \(s\) and gene \(g\), the observation is generated by the following process:
where \(\beta_g\) is a genespecific correction term for technical differences. The parameters \(r_{gz}\) and \(p_g\) are the learned parameters from the first LVM.
An additional latent variable, \(\eta_g\), is incorporated into the aggregated cell expression profile as a dummy cell type to represent gene specific noise. The dummy cell type’s expression profile is distributed as \(\varepsilon_g := \mathrm{Softplus}(\eta_g)\) where \(\eta_g \sim \mathrm{Normal}(0, 1)\) to avoid the model from incorrectly assigning explanatory power to this term. Like the other cell types, there is an associated cell type abundance parameter \(\gamma_s\) associated with \(\varepsilon\).
This generative process is also summarized in the following graphical model:
The latent variables for the spatial transcriptomics LVM, along with their description are summarized in the following table:
Latent variable 
Description 
Code variable (if different) 

\(v_{sz} \in (0, \infty)\) 
Spotspecific cell type abundance. The code variable 

\(\eta_g \in (\infty, \infty)\) 
Genespecific noise. Incorporated into the model as \(\varepsilon_g := \mathrm{Softplus}(\eta_g)\). 

\(\beta_g \in (0, \infty)\) 
Correction term for technological differences. 

\(r_{gz} \in (0, \infty)\) 
Rate parameter for the negative binomial distribution shared from the singlecell reference LVM. 

\(p_g \in [0,1]\) 
Shape parameter for the negative binomial distribution shared from the singlecell reference LVM. 

Inference¶
Singlecell reference LVM¶
Stereoscope uses maximum likelihood estimation to estimate the parameters of the first LVM w.r.t. the negative binomial model of UMI observations. This is achieved via stochastic gradient ascent on the likelihood function using the Pytorch framework.
Spatial transcriptomics LVM¶
For the spatial transcriptomics LVM, Stereoscope uses MAP inference to estimate the parameters specific to the model. To be exact, the only parameter given a nonuniform prior is \(\eta_g\) which is posited as a genespecific random effect distributed by a standard Normal prior. Note, the \(r_{gz}\) and \(p_g\) parameters not inferred in this step, but held fixed as the parameters shared by the singlecell reference LVM.
Tasks¶
Cell type deconvolution¶
Once the model is trained, one can retrieve the estimated cell type proportions in each spot using the method:
>>> proportions = spatial_model.get_proportions()
>>> st_adata.obsm["proportions"] = proportions
These proportions are computed by normalizing across all learned cell type abundances, \(v_{sz}\), for a given spot \(s\). I.e. the estimated proportion of cell type \(z\) for spot \(s\) is \(\frac{v_{sz}}{\sum_{z'} v_{sz'}}\).
Subsequently for a given cell type, users can plot a heatmap of the cell type proportions spatially using scanpy with:
>>> import scanpy as sc
>>> sc.p1.embedding(st_adata, basis="location", color="B cells")
References:
 1
Alma Andersson, Joseph Bergenstråhle, Michaela Asp, Ludvig Bergenstråhle, Aleksandra Jurek, José Fernández Navarro & Joakim Lundeberg (2020), Singlecell and spatial transcriptomics enables probabilistic inference of cell type topography, Communications Biology.