Note

This page was generated from DestVI_tutorial.ipynb. Interactive online version: Colab badge. Some tutorial content may look better in light mode.

Multi-resolution deconvolution of spatial transcriptomics#

In this tutorial, we through the steps of applying DestVI for deconvolution of 10x Visium spatial transcriptomics profiles using an accompanying single-cell RNA sequencing data.

Plan for this tutorial:

  1. Loading the data

  2. Training the scLVM to learn a basis of gene expression on the scRNA-seq data

  3. Training the stLVM to perform the deconvolution

  4. Visualize the learned cell type proportions

  5. Dig into the intra cell type information

[1]:
%%capture
import sys

#if True, will install via pypi, else will install from source
stable = True
IN_COLAB = "google.colab" in sys.modules

if IN_COLAB and stable:
    !pip install --quiet scvi-tools[tutorials]
elif IN_COLAB and not stable:
    !pip install --quiet --upgrade jsonschema
    !pip install --quiet git+https://github.com/yoseflab/scvi-tools@master#egg=scvi-tools[tutorials]

Let’s download our data from a comparative study of murine lymph nodes, comparing wild-type with a stimulation after injection of a mycobacteria. We have at disposal a 10x Visium dataset as well as a matching scRNA-seq dataset from the same tissue.

[7]:
import scanpy as sc
import numpy as np
import matplotlib.pyplot as plt

from scvi.model import CondSCVI, DestVI

sc.set_figure_params(figsize=(4, 4), frameon=False)

%config InlineBackend.print_figure_kwargs={'facecolor' : "w"}
%config InlineBackend.figure_format='retina'
[3]:
url1 = "https://github.com/romain-lopez/DestVI-reproducibility/blob/master/lymph_node/deconvolution/ST-LN-compressed.h5ad?raw=true"
url2 = "https://github.com/romain-lopez/DestVI-reproducibility/blob/master/lymph_node/deconvolution/scRNA-LN-compressed.h5ad?raw=true"
out1 = "data/ST-LN-compressed.h5ad"
out2 = "data/scRNA-LN-compressed.h5ad"

Data processing#

First, let’s load the single-cell data

[10]:
sc_adata = sc.read(out2, backup_url=url2)
[11]:
sc.pl.umap(sc_adata, color="broad_cell_types")
../../_images/tutorials_notebooks_DestVI_tutorial_8_0.png
[12]:
# let us filter some genes
G = 2000
sc.pp.filter_genes(sc_adata, min_counts=10)

sc_adata.layers["counts"] = sc_adata.X.copy()

sc.pp.highly_variable_genes(
    sc_adata,
    n_top_genes=G,
    subset=True,
    layer="counts",
    flavor="seurat_v3"
)

sc.pp.normalize_total(sc_adata, target_sum=10e4)
sc.pp.log1p(sc_adata)
sc_adata.raw = sc_adata

Now, let’s load the spatial data and choose a common gene subset

[13]:
st_adata = sc.read(out1, backup_url=url1)
[14]:
st_adata.layers["counts"] = st_adata.X.copy()

sc.pp.normalize_total(st_adata, target_sum=10e4)
sc.pp.log1p(st_adata)
st_adata.raw = st_adata
[15]:
# filter genes to be the same on the spatial data
intersect = np.intersect1d(sc_adata.var_names, st_adata.var_names)
st_adata = st_adata[:, intersect].copy()
sc_adata = sc_adata[:, intersect].copy()
G = len(intersect)
[16]:
sc.pl.embedding(st_adata, basis="location", color="lymph_node")
../../_images/tutorials_notebooks_DestVI_tutorial_14_0.png

Fit the scLVM#

[17]:
CondSCVI.setup_anndata(sc_adata, layer="counts", labels_key="broad_cell_types")

Here we would like to reweight each measurement by a scalar factor (e.g., the inverse proportion) in the loss of the model so that lowly abundant cell types get better fit by the model.

[18]:
sc_model = CondSCVI(sc_adata, weight_obs=True)
sc_model.view_anndata_setup()
Anndata setup with scvi-tools version 0.0.0.
Setup via `CondSCVI.setup_anndata` with arguments:
{'labels_key': 'broad_cell_types', 'layer': 'counts'}
     Summary Statistics     
┏━━━━━━━━━━━━━━━━━━┳━━━━━━━┓
┃ Summary Stat Key  Value ┃
┡━━━━━━━━━━━━━━━━━━╇━━━━━━━┩
│     n_cells       14989 │
│      n_vars       1888  │
│     n_labels       12   │
└──────────────────┴───────┘
               Data Registry                
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Registry Key     scvi-tools Location    ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│      X         adata.layers['counts']   │
│    labels     adata.obs['_scvi_labels'] │
└──────────────┴───────────────────────────┘
                         labels State Registry                         
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┓
┃        Source Location          Categories    scvi-tools Encoding ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━┩
│ adata.obs['broad_cell_types']     B cells              0          │
│                                 CD4 T cells            1          │
│                                 CD8 T cells            2          │
│                                 GD T cells             3          │
│                                 Macrophages            4          │
│                                Migratory DCs           5          │
│                                  Monocytes             6          │
│                                  NK cells              7          │
│                                    Tregs               8          │
│                                    cDC1s               9          │
│                                    cDC2s              10          │
│                                    pDCs               11          │
└───────────────────────────────┴───────────────┴─────────────────────┘
[19]:
sc_model.train(max_epochs=250)
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Epoch 250/250: 100%|██████████| 250/250 [02:56<00:00,  1.42it/s, loss=6.32e+03, v_num=1]
[20]:
sc_model.history["elbo_train"].plot()
[20]:
<AxesSubplot:xlabel='epoch'>
../../_images/tutorials_notebooks_DestVI_tutorial_20_1.png

Deconvolution with stLVM#

[21]:
DestVI.setup_anndata(st_adata, layer="counts")
[22]:
st_model = DestVI.from_rna_model(st_adata, sc_model)
st_model.view_anndata_setup()
Anndata setup with scvi-tools version 0.0.0.
Setup via `DestVI.setup_anndata` with arguments:
{'layer': 'counts'}
     Summary Statistics     
┏━━━━━━━━━━━━━━━━━━┳━━━━━━━┓
┃ Summary Stat Key  Value ┃
┡━━━━━━━━━━━━━━━━━━╇━━━━━━━┩
│     n_cells       1092  │
│      n_vars       1888  │
└──────────────────┴───────┘
              Data Registry              
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Registry Key   scvi-tools Location   ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━┩
│      X        adata.layers['counts'] │
│    ind_x      adata.obs['_indices']  │
└──────────────┴────────────────────────┘
[23]:
st_model.train(max_epochs=2500)
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
/home/adam/miniconda3/envs/scvi-tools-dev/lib/python3.9/site-packages/pytorch_lightning/trainer/data_loading.py:432: UserWarning: The number of training samples (9) is smaller than the logging interval Trainer(log_every_n_steps=10). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
  rank_zero_warn(
Epoch 2500/2500: 100%|██████████| 2500/2500 [02:33<00:00, 16.32it/s, loss=1.91e+06, v_num=1]
[24]:
st_model.history["elbo_train"].plot()
[24]:
<AxesSubplot:xlabel='epoch'>
../../_images/tutorials_notebooks_DestVI_tutorial_25_1.png

The output of DestVI has two resolution. At the broader resolution, DestVI returns the cell type proportion in every spot. At the more granular resolution, DestVI can impute cell type specific gene expression in every spot.

Cell type proportions#

[25]:
st_adata.obsm["proportions"] = st_model.get_proportions()
[26]:
st_adata.obsm["proportions"]
[26]:
B cells CD4 T cells CD8 T cells GD T cells Macrophages Migratory DCs Monocytes NK cells Tregs cDC1s cDC2s pDCs
AAACCGGGTAGGTACC-1-0 0.637984 0.011602 0.131066 0.001483 0.078387 0.022070 0.037014 0.001217 0.034324 0.036291 0.001194 0.007368
AAACCTCATGAAGTTG-1-0 0.704364 0.062713 0.034917 0.004356 0.003038 0.073587 0.022697 0.000332 0.056630 0.015594 0.012193 0.009580
AAAGACTGGGCGCTTT-1-0 0.555716 0.050376 0.152898 0.004673 0.000936 0.156392 0.001197 0.007212 0.014315 0.018271 0.035730 0.002283
AAAGGGCAGCTTGAAT-1-0 0.052247 0.194508 0.314976 0.045681 0.070682 0.163615 0.013382 0.010573 0.015039 0.048264 0.070398 0.000634
AAAGTCGACCCTCAGT-1-0 0.974565 0.005891 0.002726 0.000824 0.002120 0.000420 0.002220 0.000250 0.001909 0.004891 0.002778 0.001408
... ... ... ... ... ... ... ... ... ... ... ... ...
TTGGTCACACTCGTAA-1-1 0.231858 0.163697 0.223010 0.033034 0.055156 0.096177 0.017553 0.005138 0.005169 0.132620 0.029938 0.006650
TTGTAAGGCCAGTTGG-1-1 0.359533 0.025405 0.119407 0.017364 0.131745 0.002346 0.058825 0.019540 0.041462 0.001687 0.193841 0.028846
TTGTAATCCGTACTCG-1-1 0.611863 0.091874 0.000777 0.001867 0.036414 0.137280 0.000565 0.005284 0.044451 0.042363 0.022307 0.004956
TTGTATCACACAGAAT-1-1 0.214328 0.127759 0.068367 0.022677 0.001081 0.282225 0.050427 0.001407 0.060480 0.013060 0.153828 0.004361
TTGTGTTTCCCGAAAG-1-1 0.499730 0.016280 0.000982 0.000347 0.224561 0.003535 0.004302 0.004139 0.006120 0.169780 0.062263 0.007962

1092 rows × 12 columns

[27]:
ct_list = ["B cells", "CD8 T cells", "Monocytes"]
for ct in ct_list:
  data = st_adata.obsm["proportions"][ct].values
  st_adata.obs[ct] = np.clip(data, 0, np.quantile(data, 0.99))
[28]:
sc.pl.embedding(st_adata, basis="location", color=ct_list)
../../_images/tutorials_notebooks_DestVI_tutorial_30_0.png

We observe a strong compartimentalization of the cell types in the lymph node (B cells / T cells), as expected. We also observe a differential localization of the monocytes (refer to the paper for further details).

Intra cell type information#

At the heart of DestVI is a multitude of latent variables (5 per cell type per spots). We refer to them as “gamma”, and we may manually examine them for downstream analysis

[29]:
# more globally, the values of the gamma are all summarized in this dictionary of data frames
for ct, g in st_model.get_gamma().items():
  st_adata.obsm["{}_gamma".format(ct)] = g
[30]:
st_adata.obsm["B cells_gamma"].head(5)
[30]:
0 1 2 3 4
AAACCGGGTAGGTACC-1-0 0.254131 -0.303706 -0.541926 -0.140051 0.613149
AAACCTCATGAAGTTG-1-0 0.298438 -0.300652 -0.707703 -0.455448 0.775106
AAAGACTGGGCGCTTT-1-0 0.284692 -0.276409 -0.933447 -0.522930 0.603220
AAAGGGCAGCTTGAAT-1-0 2.923985 0.419020 0.599740 -0.782254 1.880168
AAAGTCGACCCTCAGT-1-0 0.138194 -0.110206 0.057396 -0.246791 0.620383

Because those values may be hard to examine for end-users, we presented several methods for prioritizing the study of different cell types (based on PCA and Hotspot). If you’d like to use those methods, please refer to our DestVI reproducibility repository. If you have suggestions to improve those, and would like to see them in the main codebase, reach out to us.

In this tutorial, we assume that the user have identified key gene modules that vary within one cell type in the single-cell RNA sequencing data (e.g., using Hotspot). We provide here a code snippet for imputing the spatial pattern of the cell type specific gene expression, using the example of the IFN-I inflammation signal.

[31]:
plt.figure(figsize=(8, 8))

ct_name = "Monocytes"
gene_name = ["Cxcl9", "Cxcl10", "Fcgr1"]


# we must filter spots with low abundance (consult the paper for an automatic procedure)
indices = np.where(st_adata.obsm["proportions"][ct_name].values > 0.03)[0]

# impute genes and combine them
specific_expression = np.sum(st_model.get_scale_for_ct(ct_name, indices=indices)[gene_name], 1)
specific_expression = np.log(1 + 1e4 * specific_expression)

# plot (i) background (ii) g
plt.scatter(st_adata.obsm["location"][:, 0], st_adata.obsm["location"][:, 1], alpha=0.05)
plt.scatter(st_adata.obsm["location"][indices][:, 0], st_adata.obsm["location"][indices][:, 1],
            c=specific_expression, s=10, cmap="Reds")
plt.colorbar()
plt.title(f"Imputation of {gene_name} in {ct_name}")
plt.show()
/tmp/ipykernel_909515/3582198935.py:18: MatplotlibDeprecationWarning: Auto-removal of grids by pcolor() and pcolormesh() is deprecated since 3.5 and will be removed two minor releases later; please call grid(False) first.
  plt.colorbar()
../../_images/tutorials_notebooks_DestVI_tutorial_35_1.png