Note

This page was generated from DestVI_tutorial.ipynb. Interactive online version: Colab badge.

Multi-resolution deconvolution of spatial transcriptomics

In this tutorial, we through the steps of applying DestVI for deconvolution of 10x Visium spatial transcriptomics profiles using an accompanying single-cell RNA sequencing data.

Plan for this tutorial:

  1. Loading the data

  2. Training the scLVM to learn a basis of gene expression on the scRNA-seq data

  3. Training the stLVM to perform the deconvolution

  4. Visualize the learned cell type proportions

  5. Dig into the intra cell type information

[1]:
import sys

#if True, will install via pypi, else will install from source
stable = False
IN_COLAB = "google.colab" in sys.modules

if IN_COLAB and stable:
    !pip install --quiet scvi-tools[tutorials]
elif IN_COLAB and not stable:
    !pip install --quiet --upgrade jsonschema
    !pip install --quiet git+https://github.com/yoseflab/scvi-tools@master#egg=scvi-tools[tutorials]

Let’s download our data from a comparative study of murine lymph nodes, comparing wild-type with a stimulation after injection of a mycobacteria. We have at disposal a 10x Visium dataset as well as a matching scRNA-seq dataset from the same tissue.

[2]:
!wget --quiet https://github.com/romain-lopez/DestVI-reproducibility/blob/master/lymph_node/deconvolution/ST-LN-compressed.h5ad?raw=true -O ST-LN-compressed.h5ad
!wget --quiet https://github.com/romain-lopez/DestVI-reproducibility/blob/master/lymph_node/deconvolution/scRNA-LN-compressed.h5ad?raw=true -O scRNA-LN-compressed.h5ad
[3]:
import scanpy as sc
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from matplotlib.lines import Line2D
import umap

import scvi
from scvi.model import CondSCVI, DestVI

import torch

%matplotlib inline
Global seed set to 0

Data processing

First, let’s load the single-cell data

[4]:
sc_adata = sc.read_h5ad("scRNA-LN-compressed.h5ad")
[5]:
sc.pl.umap(sc_adata, color="broad_cell_types")
../../_images/tutorials_notebooks_DestVI_tutorial_8_0.png
[6]:
# let us filter some genes
G = 2000
sc.pp.filter_genes(sc_adata, min_counts=10)

sc_adata.layers["counts"] = sc_adata.X.copy()

sc.pp.highly_variable_genes(
    sc_adata,
    n_top_genes=G,
    subset=True,
    layer="counts",
    flavor="seurat_v3"
)

sc.pp.normalize_total(sc_adata, target_sum=10e4)
sc.pp.log1p(sc_adata)
sc_adata.raw = sc_adata
/data/yosef2/users/jhong/miniconda3/envs/r_tutorial/lib/python3.7/site-packages/pandas/core/arrays/categorical.py:2487: FutureWarning: The `inplace` parameter in pandas.Categorical.remove_unused_categories is deprecated and will be removed in a future version.
  res = method(*args, **kwargs)
/data/yosef2/users/jhong/miniconda3/envs/r_tutorial/lib/python3.7/site-packages/scanpy/preprocessing/_highly_variable_genes.py:144: FutureWarning: Slicing a positional slice with .loc is not supported, and will raise TypeError in a future version.  Use .loc with labels or .iloc with positions instead.
  df.loc[: int(n_top_genes), 'highly_variable'] = True

Now, let’s load the spatial data and choose a common gene subset

[7]:
st_adata = sc.read_h5ad("ST-LN-compressed.h5ad")
[8]:
st_adata.layers["counts"] = st_adata.X.copy()

sc.pp.normalize_total(st_adata, target_sum=10e4)
sc.pp.log1p(st_adata)
st_adata.raw = st_adata
[9]:
# filter genes to be the same on the spatial data
intersect = np.intersect1d(sc_adata.var_names, st_adata.var_names)
st_adata = st_adata[:, intersect].copy()
sc_adata = sc_adata[:, intersect].copy()
G = len(intersect)
[10]:
sc.pl.embedding(st_adata, basis="location", color="lymph_node")
../../_images/tutorials_notebooks_DestVI_tutorial_14_0.png

Fit the scLVM

[11]:
CondSCVI.setup_anndata(sc_adata, layer="counts", labels_key="broad_cell_types")
INFO     No batch_key inputted, assuming all cells are same batch
INFO     Using labels from adata.obs["broad_cell_types"]
INFO     Using data from adata.layers["counts"]
INFO     Successfully registered anndata object containing 14989 cells, 1888 vars, 1 batches,
         12 labels, and 0 proteins. Also registered 0 extra categorical covariates and 0
         extra continuous covariates.
INFO     Please do not further modify adata until model is trained.

Here we would like to reweight each measurement by a scalar factor (e.g., the inverse proportion) in the loss of the model so that lowly abundant cell types get better fit by the model.

[12]:
sc_model = CondSCVI(sc_adata, weight_obs=True)
[13]:
sc_model.train(max_epochs=250)
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2]
Epoch 250/250: 100%|██████████| 250/250 [06:26<00:00,  1.55s/it, loss=6.33e+03, v_num=1]
[14]:
sc_model.history["elbo_train"].plot()
[14]:
<AxesSubplot:xlabel='epoch'>
../../_images/tutorials_notebooks_DestVI_tutorial_20_1.png

Deconvolution with stLVM

[15]:
DestVI.setup_anndata(st_adata, layer="counts")
INFO     No batch_key inputted, assuming all cells are same batch
INFO     No label_key inputted, assuming all cells have same label
INFO     Using data from adata.layers["counts"]
INFO     Successfully registered anndata object containing 1092 cells, 1888 vars, 1 batches,
         1 labels, and 0 proteins. Also registered 0 extra categorical covariates and 0 extra
         continuous covariates.
INFO     Please do not further modify adata until model is trained.
[16]:
st_model = DestVI.from_rna_model(st_adata, sc_model)
[17]:
st_model.train(max_epochs=2500)
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2]
Epoch 2500/2500: 100%|██████████| 2500/2500 [07:46<00:00,  5.36it/s, loss=1.91e+06, v_num=1]
[18]:
st_model.history["elbo_train"].plot()
[18]:
<AxesSubplot:xlabel='epoch'>
../../_images/tutorials_notebooks_DestVI_tutorial_25_1.png

The output of DestVI has two resolution. At the broader resolution, DestVI returns the cell type proportion in every spot. At the more granular resolution, DestVI can impute cell type specific gene expression in every spot.

Cell type proportions

[19]:
st_adata.obsm["proportions"] = st_model.get_proportions()
[20]:
st_adata.obsm["proportions"]
[20]:
B cells CD4 T cells CD8 T cells GD T cells Macrophages Migratory DCs Monocytes NK cells Tregs cDC1s cDC2s pDCs
AAACCGGGTAGGTACC-1-0 0.596260 0.012929 0.155701 0.001475 0.080470 0.067471 0.019320 0.000948 0.037274 0.007547 0.001749 0.018856
AAACCTCATGAAGTTG-1-0 0.522311 0.065364 0.006204 0.002666 0.053359 0.130832 0.000222 0.000284 0.132180 0.058891 0.017776 0.009912
AAAGACTGGGCGCTTT-1-0 0.348368 0.217058 0.031992 0.011700 0.010824 0.311752 0.000469 0.018961 0.019974 0.016495 0.001645 0.010762
AAAGGGCAGCTTGAAT-1-0 0.016202 0.139608 0.201433 0.048341 0.061167 0.289535 0.017216 0.021677 0.158993 0.036508 0.008806 0.000515
AAAGTCGACCCTCAGT-1-0 0.964411 0.012349 0.002764 0.002069 0.002382 0.000541 0.001075 0.000397 0.007611 0.002425 0.001693 0.002283
... ... ... ... ... ... ... ... ... ... ... ... ...
TTGGTCACACTCGTAA-1-1 0.260798 0.205414 0.174131 0.062072 0.060133 0.189190 0.007970 0.005396 0.004774 0.015370 0.003664 0.011089
TTGTAAGGCCAGTTGG-1-1 0.173581 0.098180 0.040271 0.003690 0.216839 0.226903 0.010227 0.048160 0.040025 0.051994 0.053003 0.037127
TTGTAATCCGTACTCG-1-1 0.503158 0.158881 0.001569 0.003402 0.026541 0.199877 0.001611 0.033534 0.020229 0.037379 0.009732 0.004087
TTGTATCACACAGAAT-1-1 0.165067 0.036548 0.070373 0.012850 0.003193 0.433961 0.021731 0.005168 0.180457 0.023681 0.039695 0.007275
TTGTGTTTCCCGAAAG-1-1 0.488664 0.033776 0.000881 0.001055 0.249014 0.046302 0.048985 0.003367 0.017212 0.092147 0.005592 0.013005

1092 rows × 12 columns

[21]:
ct_list = ["B cells", "CD8 T cells", "Monocytes"]
for ct in ct_list:
  data = st_adata.obsm["proportions"][ct].values
  st_adata.obs[ct] = np.clip(data, 0, np.quantile(data, 0.99))
[22]:
sc.pl.embedding(st_adata, basis="location", color=ct_list)
../../_images/tutorials_notebooks_DestVI_tutorial_30_0.png

We observe a strong compartimentalization of the cell types in the lymph node (B cells / T cells), as expected. We also observe a differential localization of the monocytes (refer to the paper for further details).

Intra cell type information

At the heart of DestVI is a multitude of latent variables (5 per cell type per spots). We refer to them as “gamma”, and we may manually examine them for downstream analysis

[23]:
# more globally, the values of the gamma are all summarized in this dictionary of data frames
for ct, g in st_model.get_gamma().items():
  st_adata.obsm["{}_gamma".format(ct)] = g
[24]:
st_adata.obsm["B cells_gamma"].head(5)
[24]:
0 1 2 3 4
AAACCGGGTAGGTACC-1-0 -0.613750 -0.214428 0.141364 1.067237 0.347522
AAACCTCATGAAGTTG-1-0 -0.613363 -0.105559 0.179267 0.959022 0.473459
AAAGACTGGGCGCTTT-1-0 -0.615548 -0.135828 0.067430 1.027307 0.443130
AAAGGGCAGCTTGAAT-1-0 0.641599 -0.084988 0.154415 0.285969 0.746916
AAAGTCGACCCTCAGT-1-0 -0.614170 -0.220829 0.079255 0.737150 0.446656

Because those values may be hard to examine for end-users, we presented several methods for prioritizing the study of different cell types (based on PCA and Hotspot). If you’d like to use those methods, please refer to our DestVI reproducibility repository. If you have suggestions to improve those, and would like to see them in the main codebase, reach out to us.

In this tutorial, we assume that the user have identified key gene modules that vary within one cell type in the single-cell RNA sequencing data (e.g., using Hotspot). We provide here a code snippet for imputing the spatial pattern of the cell type specific gene expression, using the example of the IFN-I inflammation signal.

[25]:
plt.figure(figsize=(8, 8))

ct_name = "Monocytes"
gene_name = ["Cxcl9", "Cxcl10", "Fcgr1"]


# we must filter spots with low abundance (consult the paper for an automatic procedure)
indices = np.where(st_adata.obsm["proportions"][ct_name].values > 0.03)[0]

# impute genes and combine them
specific_expression = np.sum(st_model.get_scale_for_ct(ct_name, indices=indices)[gene_name], 1)
specific_expression = np.log(1 + 1e4 * specific_expression)

# plot (i) background (ii) g
plt.scatter(st_adata.obsm["location"][:, 0], st_adata.obsm["location"][:, 1], alpha=0.05)
plt.scatter(st_adata.obsm["location"][indices][:, 0], st_adata.obsm["location"][indices][:, 1],
            c=specific_expression, s=10, cmap="Reds")
plt.colorbar()
plt.title(f"Imputation of {gene_name} in {ct_name}")
plt.show()
../../_images/tutorials_notebooks_DestVI_tutorial_35_0.png
[ ]:

[ ]: