Note

This page was generated from cell2location_lymph_node_spatial_tutorial.ipynb. Interactive online version: Colab badge.

Mapping human lymph node cell types to 10X Visium with Cell2location

This contributed tutorial shows how to use cell2location method for spatially resolving fine-grained cell types by integrating 10X Visium data with scRNA-seq reference of cell types. Cell2location is a principled Bayesian model that estimates which combination of cell types in which cell abundance could have given the mRNA counts in the spatial data, while modelling technical effects (platform/technology effect, contaminating RNA, unexplained variance).

Important

Cell2location is accessible via its own package, but is powered by scvi-tools. Please visit https://cell2location.readthedocs.io/en/latest/.

In this tutorial, we analyse a publicly available Visium dataset of the human lymph node from 10X Genomics, and spatially map a comprehensive atlas of 34 reference cell types derived by integration of scRNA-seq datasets from human secondary lymphoid organs.

  • Cell2location provides high sensitivity and resolution by borrowing statistical strength across locations. This is achieved by modelling similarity of location patterns between cell types using a hierarchical factorisation of cell abundance into tissue zones as a prior (see paper methods).

  • Using our statistical method based on Negative Binomial regression to robustly combine scRNA-seq reference data across technologies and batches results in improved spatial mapping accuracy. Given cell type annotation for each cell, the corresponding reference cell type signatures \(g_{f,g}\), which represent the average mRNA count of each gene \(g\) in each cell type \(f\), can be estimated from sc/snRNA-seq data using either 1) NB regression or 2) a hard-coded computation of per-cluster average mRNA counts for individual genes. We generally recommend using NB regression. This notebook shows use a dataset composed on multiple batches and technologies.When the batch effects are small, a faster hard-coded method of computing per cluster averages provides similarly high accuracy. We also recommend the hard-coded method for non-UMI technologies such as Smart-Seq 2.

  • Cell2location needs untransformed unnormalised spatial mRNA counts as input.

  • You also need to provide cell2location with the expected average cell abundance per location which is used as a prior to guide estimation of absolute cell abundance. This value depends on the tissue and can be estimated by counting nuclei for a few locations in the paired histology image but can be approximate (see paper methods for more guidance).

Workflow diagram

Figure 1.

Loading packages

[1]:
# If in colab install cell2location
import sys
IN_COLAB = "google.colab" in sys.modules

if IN_COLAB:
    !pip install --quiet git+https://github.com/BayraktarLab/cell2location#egg=cell2location[tutorials]

import numpy as np
import scanpy as sc
import anndata
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns

import cell2location
import scvi
Global seed set to 0

First, let’s define where we save the results of our analysis:

[2]:
results_folder = './results/lymph_nodes_analysis/'

# create paths and names to results folders for reference regression and cell2location models
ref_run_name = f'{results_folder}/reference_signatures'
run_name = f'{results_folder}/cell2location_map'

Loading Visium and scRNA-seq reference data

First let’s read spatial Visium data from 10X Space Ranger output. Here we use lymph node data generated by 10X and presented in Kleshchevnikov et al (section 4, Fig 4). This dataset can be conveniently downloaded and imported using scanpy. See this tutorial for a more extensive and practical example of data loading (multiple visium samples).

[3]:
adata_vis = sc.datasets.visium_sge(sample_id="V1_Human_Lymph_Node")
adata_vis.obs['sample'] = list(adata_vis.uns['spatial'].keys())[0]
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.

Note Here we rename genes to ENSEMBL ID for correct matching between single cell and spatial data.

You can still plot gene expression by name using standard scanpy functions as follows:

sc.pl.spatial(color='PTPRC', gene_symbols='SYMBOL', ...)
[4]:
adata_vis.var['SYMBOL'] = adata_vis.var_names
adata_vis.var.set_index('gene_ids', drop=True, inplace=True)

Note

Mitochondia-encoded genes (gene names start with prefix mt- or MT-) are irrelevant for spatial mapping because their expression represents technical artifacts in the single cell and nucleus data rather than biological abundance of mitochondria. Yet these genes compose 15-40% of mRNA in each location. Hence, to avoid mapping artifacts we strongly recommend removing mitochondrial genes.

[5]:
# find mitochondria-encoded (MT) genes
adata_vis.var['MT_gene'] = [gene.startswith('MT-') for gene in adata_vis.var['SYMBOL']]

# remove MT genes for spatial mapping (keeping their counts in the object)
adata_vis.obsm['MT'] = adata_vis[:, adata_vis.var['MT_gene'].values].X.toarray()
adata_vis = adata_vis[:, ~adata_vis.var['MT_gene'].values]

Published scRNA-seq datasets of lymph nodes have typically lacked an adequate representation of germinal centre-associated immune cell populations due to age of patient donors. We, therefore, include scRNA-seq datasets spanning lymph nodes, spleen and tonsils in our single-cell reference to ensure that we captured the full diversity of immune cell states likely to exist in the spatial transcriptomic dataset.

Here we download this dataset, import into anndata and change variable names to ENSEMBL gene identifiers.

[6]:
# Read data
adata_ref = sc.read(
    f'./data/sc.h5ad',
    backup_url='https://cell2location.cog.sanger.ac.uk/paper/integrated_lymphoid_organ_scrna/RegressionNBV4Torch_57covariates_73260cells_10237genes/sc.h5ad'
)

Warning

Here we rename genes to ENSEMBL ID for correct matching between single cell and spatial data.

[7]:
adata_ref.var['SYMBOL'] = adata_ref.var.index
adata_ref.var.set_index('GeneID-2', drop=True, inplace=True)

# delete unnecessary raw slot (to be removed in a future version of the tutorial)
del adata_ref.raw

Note

Before we estimate the reference cell type signature we recommend to perform very permissive genes selection. We prefer this to standard highly-variable-gene selection because our procedure keeps markers of rare genes while removing most of the uninformative genes.

In this 2D histogram, orange rectangle highlights genes excluded based on the combination of number of cells expressing that gene (Y-axis) and average RNA count for cells where the gene was detected (X-axis).

In this case, the downloaded dataset was already filtered using this method, hence no density under the orange rectangle (to be changed in the future version of the tutorial).

[8]:
from cell2location.utils.filtering import filter_genes
selected = filter_genes(adata_ref, cell_count_cutoff=5, cell_percentage_cutoff2=0.03, nonz_mean_cutoff=1.12)

# filter the object
adata_ref = adata_ref[:, selected].copy()
../../_images/tutorials_notebooks_cell2location_lymph_node_spatial_tutorial_21_0.png

Estimation of reference cell type signatures (NB regression)

The signatures are estimated from scRNA-seq data, accounting for batch effect, using a Negative Binomial regression model.

Preparing anndata. First, prepare anndata object for the regression model:

[9]:
# prepare anndata for the regression model
scvi.data.setup_anndata(adata=adata_ref,
                        # 10X reaction / sample / batch
                        batch_key='Sample',
                        # cell type, covariate used for constructing signatures
                        labels_key='Subset',
                        # multiplicative technical effects (platform, 3' vs 5', donor effect)
                        categorical_covariate_keys=['Method']
                       )
scvi.data.view_anndata_setup(adata_ref)
INFO     Using batches from adata.obs["Sample"]
INFO     Using labels from adata.obs["Subset"]
INFO     Using data from adata.X
/nfs/team283/vk7/software/miniconda3farm5/envs/cell2loc_env2/lib/python3.9/site-packages/sklearn/utils/deprecation.py:86: FutureWarning: Function setup_anndata is deprecated; Please use the model-specific setup_anndata methods instead. The global method will be removed in version 0.15.0.
INFO     Successfully registered anndata object containing 73260 cells, 10237 vars, 23
         batches, 34 labels, and 0 proteins. Also registered 1 extra categorical covariates
         and 0 extra continuous covariates.
INFO     Please do not further modify adata until model is trained.
/nfs/team283/vk7/software/miniconda3farm5/envs/cell2loc_env2/lib/python3.9/site-packages/sklearn/utils/deprecation.py:86: FutureWarning: Function view_anndata_setup is deprecated; This method will be removed in 0.15.0. Please avoid building any new dependencies on it.
Anndata setup with scvi-tools version 0.14.1.
              Data Summary              
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┓
┃             Data              Count ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━┩
│            Cells              73260 │
│             Vars              10237 │
│            Labels              34   │
│           Batches              23   │
│           Proteins              0   │
│ Extra Categorical Covariates    1   │
│ Extra Continuous Covariates     0   │
└──────────────────────────────┴───────┘
                    SCVI Data Registry                    
┏━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃     Data                scvi-tools Location           ┃
┡━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│       X                       adata.X                 │
│ batch_indices         adata.obs['_scvi_batch']        │
│    labels            adata.obs['_scvi_labels']        │
│   cat_covs     adata.obsm['_scvi_extra_categoricals'] │
└───────────────┴────────────────────────────────────────┘
                        Label Categories                        
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┓
┃   Source Location       Categories     scvi-tools Encoding ┃
┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━┩
│ adata.obs['Subset']     B_Cycling               0          │
│                          B_GC_DZ                1          │
│                          B_GC_LZ                2          │
│                         B_GC_prePB              3          │
│                           B_IFN                 4          │
│                        B_activated              5          │
│                           B_mem                 6          │
│                          B_naive                7          │
│                          B_plasma               8          │
│                          B_preGC                9          │
│                          DC_CCR7+              10          │
│                          DC_cDC1               11          │
│                          DC_cDC2               12          │
│                           DC_pDC               13          │
│                            Endo                14          │
│                            FDC                 15          │
│                            ILC                 16          │
│                       Macrophages_M1           17          │
│                       Macrophages_M2           18          │
│                            Mast                19          │
│                         Monocytes              20          │
│                             NK                 21          │
│                            NKT                 22          │
│                           T_CD4+               23          │
│                         T_CD4+_TfH             24          │
│                       T_CD4+_TfH_GC            25          │
│                        T_CD4+_naive            26          │
│                       T_CD8+_CD161+            27          │
│                      T_CD8+_cytotoxic          28          │
│                        T_CD8+_naive            29          │
│                          T_TIM3+               30          │
│                           T_TfR                31          │
│                           T_Treg               32          │
│                            VSMC                33          │
└─────────────────────┴──────────────────┴─────────────────────┘
                           Batch Categories                           
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┓
┃   Source Location          Categories        scvi-tools Encoding ┃
┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━┩
│ adata.obs['Sample']     4861STDY7135913               0          │
│                         4861STDY7135914               1          │
│                         4861STDY7208412               2          │
│                         4861STDY7208413               3          │
│                         4861STDY7462253               4          │
│                         4861STDY7462254               5          │
│                         4861STDY7462255               6          │
│                         4861STDY7462256               7          │
│                         4861STDY7528597               8          │
│                         4861STDY7528598               9          │
│                         4861STDY7528599              10          │
│                         4861STDY7528600              11          │
│                           BCP002_Total               12          │
│                           BCP003_Total               13          │
│                           BCP004_Total               14          │
│                           BCP005_Total               15          │
│                           BCP006_Total               16          │
│                           BCP008_Total               17          │
│                           BCP009_Total               18          │
│                      Human_colon_16S7255677          19          │
│                      Human_colon_16S7255678          20          │
│                      Human_colon_16S8000484          21          │
│                           Pan_T7935494               22          │
└─────────────────────┴────────────────────────┴─────────────────────┘
               Extra Categorical Variables                
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┓
┃   Source Location    Categories  scvi-tools Encoding ┃
┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━┩
│ adata.obs['Method']     3GEX              0          │
│                         5GEX              1          │
│                                                      │
└─────────────────────┴────────────┴─────────────────────┘

Training model. Now we train the model to estimate reference cell type signatures:

[10]:
# create and train the regression model
from cell2location.models import RegressionModel
mod = RegressionModel(adata_ref)

# Use all data for training (validation not implemented yet, train_size=1)
mod.train(max_epochs=250, batch_size=2500, train_size=1, lr=0.002, use_gpu=True)
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
/nfs/team283/vk7/software/miniconda3farm5/envs/cell2loc_env2/lib/python3.9/site-packages/pytorch_lightning/trainer/configuration_validator.py:99: UserWarning: you passed in a val_dataloader but have no validation_step. Skipping val loop
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Epoch 250/250: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 250/250 [23:02<00:00,  5.53s/it, v_num=1, elbo_train=2.88e+8]

Determine if the model needs more training.

Here, we plot ELBO loss history during training, removing first 20 epochs from the plot. This plot should have a decreasing trend and level off by the end of training. If it is still decreasing, increase max_epochs.

[11]:
mod.plot_history(20)
../../_images/tutorials_notebooks_cell2location_lymph_node_spatial_tutorial_27_0.png
[12]:
# In this section, we export the estimated cell abundance (summary of the posterior distribution).
adata_ref = mod.export_posterior(
    adata_ref, sample_kwargs={'num_samples': 1000, 'batch_size': 2500, 'use_gpu': True}
)

# Save model
mod.save(f"{ref_run_name}", overwrite=True)

# Save anndata object with results
adata_file = f"{ref_run_name}/sc.h5ad"
adata_ref.write(adata_file)
adata_file
Sampling local variables, batch:   0%|                                                                                                                   | 0/30 [00:00<?, ?it/s]
Sampling global variables, sample: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 999/999 [00:11<00:00, 85.85it/s]
[12]:
'./results/lymph_nodes_analysis//reference_signatures/sc.h5ad'

Examine QC plots.

  1. Reconstruction accuracy to assess if there are any issues with inference.

  2. The estimated expression signatures are distinct from mean expression in each cluster because of batch effects. For scRNA-seq datasets which do not suffer from batch effect (this dataset does), cluster average expression can be used instead of estimating signatures with a model. When this plot is very different from a diagonal plot (e.g. very low values on Y-axis, density everywhere) it indicates problems with signature estimation.

[13]:
mod.plot_QC()
../../_images/tutorials_notebooks_cell2location_lymph_node_spatial_tutorial_30_0.png
../../_images/tutorials_notebooks_cell2location_lymph_node_spatial_tutorial_30_1.png

The model and output h5ad can be loaded later like this:

mod = cell2location.models.RegressionModel.load(f"{ref_run_name}", adata_ref)
adata_file = f"{ref_run_name}/sc.h5ad"
adata_ref = sc.read_h5ad(adata_file)

Extracting reference cell types signatures as a pd.DataFrame. All parameters of the a Negative Binomial regression model are exported into reference anndata object, however for spatial mapping we just need the estimated expression of every gene in every cell type. Here we extract that from standard output:

[14]:
# export estimated expression in each cluster
if 'means_per_cluster_mu_fg' in adata_ref.varm.keys():
    inf_aver = adata_ref.varm['means_per_cluster_mu_fg'][[f'means_per_cluster_mu_fg_{i}'
                                    for i in adata_ref.uns['mod']['factor_names']]].copy()
else:
    inf_aver = adata_ref.var[[f'means_per_cluster_mu_fg_{i}'
                                    for i in adata_ref.uns['mod']['factor_names']]].copy()
inf_aver.columns = adata_ref.uns['mod']['factor_names']
inf_aver.iloc[0:5, 0:5]
[14]:
B_Cycling B_GC_DZ B_GC_LZ B_GC_prePB B_IFN
GeneID-2
ENSG00000188976 0.422329 0.238066 0.304271 0.340599 0.148056
ENSG00000188290 0.002038 0.000724 0.000753 0.054379 0.039965
ENSG00000187608 0.384274 0.212048 0.274994 0.509900 3.942358
ENSG00000186891 0.019464 0.000763 0.053882 0.067154 0.010845
ENSG00000186827 0.007515 0.000530 0.006214 0.029150 0.011172

Cell2location: spatial mapping

Find shared genes and prepare anndata. Subset both anndata and reference signatures:

[15]:
# find shared genes and subset both anndata and reference signatures
intersect = np.intersect1d(adata_vis.var_names, inf_aver.index)
adata_vis = adata_vis[:, intersect].copy()
inf_aver = inf_aver.loc[intersect, :].copy()

# prepare anndata for cell2location model
scvi.data.setup_anndata(adata=adata_vis, batch_key="sample")
scvi.data.view_anndata_setup(adata_vis)
INFO     Using batches from adata.obs["sample"]
INFO     No label_key inputted, assuming all cells have same label
INFO     Using data from adata.X
/nfs/team283/vk7/software/miniconda3farm5/envs/cell2loc_env2/lib/python3.9/site-packages/sklearn/utils/deprecation.py:86: FutureWarning: Function setup_anndata is deprecated; Please use the model-specific setup_anndata methods instead. The global method will be removed in version 0.15.0.
INFO     Successfully registered anndata object containing 4035 cells, 10217 vars, 1 batches,
         1 labels, and 0 proteins. Also registered 0 extra categorical covariates and 0 extra
         continuous covariates.
INFO     Please do not further modify adata until model is trained.
/nfs/team283/vk7/software/miniconda3farm5/envs/cell2loc_env2/lib/python3.9/site-packages/sklearn/utils/deprecation.py:86: FutureWarning: Function view_anndata_setup is deprecated; This method will be removed in 0.15.0. Please avoid building any new dependencies on it.
Anndata setup with scvi-tools version 0.14.1.
              Data Summary              
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┓
┃             Data              Count ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━┩
│            Cells              4035  │
│             Vars              10217 │
│            Labels               1   │
│           Batches               1   │
│           Proteins              0   │
│ Extra Categorical Covariates    0   │
│ Extra Continuous Covariates     0   │
└──────────────────────────────┴───────┘
             SCVI Data Registry              
┏━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃     Data          scvi-tools Location    ┃
┡━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│       X                 adata.X          │
│ batch_indices  adata.obs['_scvi_batch']  │
│    labels      adata.obs['_scvi_labels'] │
└───────────────┴───────────────────────────┘
                        Label Categories                        
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┓
┃      Source Location       Categories  scvi-tools Encoding ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━┩
│ adata.obs['_scvi_labels']      0                0          │
└───────────────────────────┴────────────┴─────────────────────┘
                         Batch Categories                          
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┓
┃   Source Location        Categories       scvi-tools Encoding ┃
┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━┩
│ adata.obs['sample']  V1_Human_Lymph_Node           0          │
└─────────────────────┴─────────────────────┴─────────────────────┘

Important

To use cell2location spatial mapping model, you need to specify 2 user-provided hyperparameters (N_cells_per_location and detection_alpha) - for detailed guidance on setting these hyperparameters and their impact see the flow diagram and the note.

Choosing hyperparameter ``N_cells_per_location``! It is useful to adapt the expected cell abundance N_cells_per_location to every tissue. This value can be estimated from paired histology images and as described in the note above. Change the value presented in this tutorial (N_cells_per_location=30) to the value observed in your your tissue.

Choosing hyperparameter ``detection_alpha``! You can often use the default value of detection_alpha hyperparameter (detection_alpha=200), but for datasets where you observe within-batch variation in total RNA count that cannot be explained by tissue containing more cells (histological examination) use detection_alpha=20.

[16]:
# create and train the model
mod = cell2location.models.Cell2location(
    adata_vis, cell_state_df=inf_aver,
    # the expected average cell abundance: tissue-dependent
    # hyper-prior which can be estimated from paired histology:
    N_cells_per_location=30,
    # hyperparameter controlling normalisation of
    # within-experiment variation in RNA detection (using default here):
    detection_alpha=200
)

mod.train(max_epochs=30000,
          # train using full data (batch_size=None)
          batch_size=None,
          # use all data points in training because
          # we need to estimate cell abundance at all locations
          train_size=1,
          use_gpu=True)

# plot ELBO loss history during training, removing first 100 epochs from the plot
mod.plot_history(1000)
plt.legend(labels=['full data training']);
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
/nfs/team283/vk7/software/miniconda3farm5/envs/cell2loc_env2/lib/python3.9/site-packages/pytorch_lightning/trainer/configuration_validator.py:99: UserWarning: you passed in a val_dataloader but have no validation_step. Skipping val loop
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Epoch 30000/30000: 100%|█████████████████████████████████████████████████████████████████████████████████████| 30000/30000 [43:28<00:00, 11.50it/s, v_num=1, elbo_train=4.54e+7]
../../_images/tutorials_notebooks_cell2location_lymph_node_spatial_tutorial_38_2.png

Exporting estimated posterior distributions of cell abundance and saving results:

[17]:
# In this section, we export the estimated cell abundance (summary of the posterior distribution).
adata_vis = mod.export_posterior(
    adata_vis, sample_kwargs={'num_samples': 1000, 'batch_size': mod.adata.n_obs, 'use_gpu': True}
)

# Save model
mod.save(f"{run_name}", overwrite=True)

# mod = cell2location.models.Cell2location.load(f"{run_name}", adata_vis)

# Save anndata object with results
adata_file = f"{run_name}/sp.h5ad"
adata_vis.write(adata_file)
adata_file
Sampling local variables, batch: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:17<00:00, 17.00s/it]
Sampling global variables, sample: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 999/999 [00:16<00:00, 60.48it/s]
/nfs/team283/vk7/software/miniconda3farm5/envs/cell2loc_env2/lib/python3.9/site-packages/anndata/_core/anndata.py:1220: FutureWarning: The `inplace` parameter in pandas.Categorical.reorder_categories is deprecated and will be removed in a future version. Removing unused categories will always return a new Categorical object.
... storing 'sample' as categorical
/nfs/team283/vk7/software/miniconda3farm5/envs/cell2loc_env2/lib/python3.9/site-packages/anndata/_core/anndata.py:1220: FutureWarning: The `inplace` parameter in pandas.Categorical.reorder_categories is deprecated and will be removed in a future version. Removing unused categories will always return a new Categorical object.
... storing 'feature_types' as categorical
/nfs/team283/vk7/software/miniconda3farm5/envs/cell2loc_env2/lib/python3.9/site-packages/anndata/_core/anndata.py:1220: FutureWarning: The `inplace` parameter in pandas.Categorical.reorder_categories is deprecated and will be removed in a future version. Removing unused categories will always return a new Categorical object.
... storing 'genome' as categorical
/nfs/team283/vk7/software/miniconda3farm5/envs/cell2loc_env2/lib/python3.9/site-packages/anndata/_core/anndata.py:1220: FutureWarning: The `inplace` parameter in pandas.Categorical.reorder_categories is deprecated and will be removed in a future version. Removing unused categories will always return a new Categorical object.
... storing 'SYMBOL' as categorical
[17]:
'./results/lymph_nodes_analysis//cell2location_map/sp.h5ad'

The model and output h5ad can be loaded later like this:

mod = cell2location.models.Cell2location.load(f"{run_name}", adata_vis)
adata_file = f"{run_name}/sp.h5ad"
adata_vis = sc.read_h5ad(adata_file)

Assessing mapping quality. Examine reconstruction accuracy to assess if there are any issues with mapping. The plot should be roughly diagonal, strong deviations will signal problems that need to be investigated

[18]:
mod.plot_QC()
../../_images/tutorials_notebooks_cell2location_lymph_node_spatial_tutorial_43_0.png

When intergrating multiple spatial batches and when working with datasets that have substantial variation of detected RNA within slides (that cannot be explained by high cellular density in the histology), it is important to assess whether cell2location normalised those effects. You expect to see similar total cell abundance across batches but distinct RNA detection sensitivity (both estimated by cell2location). You expect total cell abundance to mirror high cellular density in the histology.

fig = mod.plot_spatial_QC_across_batches()

Visualising cell abundance in spatial coordinates

Note

We use 5% quantile of the posterior distribution, representing the value of cell abundance that the model has high confidence in (aka ‘at least this amount is present’).

[19]:
# add 5% quantile, representing confident cell abundance, 'at least this amount is present',
# to adata.obs with nice names for plotting
adata_vis.obs[adata_vis.uns['mod']['factor_names']] = adata_vis.obsm['q05_cell_abundance_w_sf']

# select one slide
from cell2location.utils import select_slide
slide = select_slide(adata_vis, 'V1_Human_Lymph_Node')

# plot in spatial coordinates
with mpl.rc_context({'axes.facecolor':  'black',
                     'figure.figsize': [4.5, 5]}):

    sc.pl.spatial(slide, cmap='magma',
                  # show first 8 cell types
                  color=['B_Cycling', 'B_GC_LZ', 'T_CD4+_TfH_GC', 'FDC',
                         'B_naive', 'T_CD4+_naive', 'B_plasma', 'Endo'],
                  ncols=4, size=1.3,
                  img_key='hires',
                  # limit color scale at 99.2% quantile of cell abundance
                  vmin=0, vmax='p99.2'
                 )
../../_images/tutorials_notebooks_cell2location_lymph_node_spatial_tutorial_46_0.png
[20]:
# Now we use cell2location plotter that allows showing multiple cell types in one panel
from cell2location.plt import plot_spatial

# select up to 6 clusters
clust_labels = ['T_CD4+_naive', 'B_naive', 'FDC']
clust_col = ['' + str(i) for i in clust_labels] # in case column names differ from labels

slide = select_slide(adata_vis, 'V1_Human_Lymph_Node')

with mpl.rc_context({'figure.figsize': (15, 15)}):
    fig = plot_spatial(
        adata=slide,
        # labels to show on a plot
        color=clust_col, labels=clust_labels,
        show_img=True,
        # 'fast' (white background) or 'dark_background'
        style='fast',
        # limit color scale at 99.2% quantile of cell abundance
        max_color_quantile=0.992,
        # size of locations (adjust depending on figure size)
        circle_diameter=6,
        colorbar_position='right'
    )
../../_images/tutorials_notebooks_cell2location_lymph_node_spatial_tutorial_47_0.png

Modules and their versions used for this analysis

Useful for debugging and reporting issues.

[21]:
cell2location.utils.list_imported_modules()
sys 3.9.6 (default, Aug 18 2021, 19:38:01)
[GCC 7.5.0]
re 2.2.1
ipykernel._version 6.3.1
json 2.0.9
jupyter_client._version 7.0.2
traitlets._version 5.1.0
traitlets 5.1.0
logging 0.5.1.2
platform 1.0.8
_ctypes 1.1.0
ctypes 1.1.0
zmq.backend.cython.constants 40304
zmq.backend.cython 40304
zmq.sugar.constants 40304
zmq.sugar.version 22.2.1
zmq.sugar 22.2.1
zmq 22.2.1
argparse 1.1
tornado 6.1
zlib 1.0
colorama 0.4.4
_curses b'2.2'
dateutil._version 2.8.2
dateutil 2.8.2
six 1.16.0
_decimal 1.70
decimal 1.70
jupyter_core.version 4.7.1
jupyter_core 4.7.1
entrypoints 0.3
jupyter_client 7.0.2
ipykernel 6.3.1
IPython.core.release 7.27.0
pygments 2.10.0
ptyprocess 0.7.0
pexpect 4.8.0
IPython.core.crashhandler 7.27.0
decorator 5.0.9
pickleshare 0.7.5
backcall 0.2.0
_sqlite3 2.6.0
sqlite3.dbapi2 2.6.0
sqlite3 2.6.0
wcwidth 0.2.5
prompt_toolkit 3.0.20
parso 0.8.2
jedi 0.18.0
urllib.request 3.9
IPython.core.magics.code 7.27.0
IPython 7.27.0
ipython_genutils._version 0.2.0
ipython_genutils 0.2.0
distutils 3.9.6
debugpy 1.4.1
xmlrpc.client 3.9
socketserver 0.4
http.server 0.6
pkg_resources._vendor.appdirs 1.4.3
pkg_resources.extern.appdirs 1.4.3
pkg_resources._vendor.packaging.__about__ 20.4
pkg_resources._vendor.packaging 20.4
pkg_resources.extern.packaging 20.4
pkg_resources._vendor.pyparsing 2.2.1
pkg_resources.extern.pyparsing 2.2.1
_pydevd_frame_eval.vendored.bytecode 0.12.0
_pydev_bundle.fsnotify 0.1.5
pydevd 2.4.1
numpy.version 1.20.3
numpy.core._multiarray_umath 3.1
numpy.core 1.20.3
numpy.linalg._umath_linalg 0.1.5
numpy.lib 1.20.3
numpy 1.20.3
packaging.__about__ 21.0
packaging 21.0
_csv 1.0
csv 1.0
scanpy._metadata 1.8.1
scipy.version 1.7.1
scipy 1.7.1
anndata._metadata 0.7.6
h5py.version 3.4.0
h5py 3.4.0
natsort 7.1.1
pytz 2021.1
numexpr.version 2.7.3
numexpr 2.7.3
pandas 1.3.2
anndata 0.7.6
stdlib_list v0.8.0
sinfo 0.3.4
yaml 5.4.1
llvmlite 0.37.0
numba.cloudpickle 1.6.0
numba.misc.appdirs 1.4.1
numba 0.54.0
joblib.externals.cloudpickle 1.6.0
joblib.externals.loky 2.9.0
joblib 1.0.1
scipy._lib.decorator 4.0.5
scipy.linalg._fblas b'$Revision: $'
scipy.linalg._flapack b'$Revision: $'
scipy.linalg._flinalg b'$Revision: $'
scipy.special.specfun b'$Revision: $'
scipy.ndimage 2.0
scipy.optimize.minpack2 b'$Revision: $'
scipy.sparse.linalg.isolve._iterative b'$Revision: $'
scipy.sparse.linalg.eigen.arpack._arpack b'$Revision: $'
scipy.optimize._lbfgsb b'$Revision: $'
scipy.optimize._cobyla b'$Revision: $'
scipy.optimize._slsqp b'$Revision: $'
scipy.optimize._minpack  1.10
scipy.optimize.__nnls b'$Revision: $'
scipy.linalg._interpolative b'$Revision: $'
scipy.integrate._odepack  1.9
scipy.integrate._quadpack  1.13
scipy.integrate.vode b'$Revision: $'
scipy.integrate._dop b'$Revision: $'
scipy.integrate.lsoda b'$Revision: $'
scipy.integrate._ode $Id$
scipy.interpolate._fitpack  1.7
scipy.interpolate.dfitpack b'$Revision: $'
scipy.stats.statlib b'$Revision: $'
scipy.stats.mvn b'$Revision: $'
sklearn.utils._joblib 1.0.1
sklearn.base 0.24.2
sklearn 0.24.2
texttable 1.6.4
igraph.version 0.9.6
igraph 0.9.6
leidenalg.version 0.8.7
leidenalg 0.8.7
PIL._version 8.3.1
PIL 8.3.1
defusedxml 0.7.1
xml.etree.ElementTree 1.3.0
cffi 1.14.6
PIL.Image 8.3.1
pyparsing 2.4.7
cycler 0.10.0
kiwisolver 1.3.2
matplotlib 3.4.3
tables 3.6.1
scanpy 1.8.1
seaborn.external.husl 2.1.0
statsmodels 0.12.2
ipywidgets._version 7.6.4
ipywidgets 7.6.4
seaborn 0.11.2
torch.version 1.9.0+cu102
tarfile 0.9.0
torch.cuda.nccl 2708
torch.backends.cudnn 7605
tqdm._dist_ver 4.62.2
tqdm.version 4.62.2
tqdm.cli 4.62.2
tqdm 4.62.2
torch 1.9.0+cu102
opt_einsum v3.3.0
pyro._version 1.7.0
pyro 1.7.0
theano.version 1.0.5
scipy._lib._uarray 0.5.1+49.g4c3f1d7.scipy
scipy.signal.spline 0.2
theano 1.0.5
pytorch_lightning.__about__ 1.3.8
torchmetrics.__about__ 0.5.0
torchmetrics 0.5.0
deprecate 0.3.0
fsspec 2021.08.1
tensorboard.version 2.6.0
tensorboard 2.6.0
setuptools.version 52.0.0.post20210125
setuptools._vendor.packaging.__about__ 20.4
setuptools._vendor.packaging 20.4
setuptools.extern.packaging 20.4
setuptools._vendor.ordered_set 3.1
setuptools.extern.ordered_set 3.1
setuptools 52.0.0.post20210125
google.protobuf 3.17.3
tensorboard.compat.tensorflow_stub.pywrap_tensorflow 0
tensorboard.compat.tensorflow_stub stub
pytorch_lightning 1.3.8
attr 21.2.0
_cffi_backend 1.14.6
pycparser.ply 3.9
pycparser.ply.yacc 3.10
pycparser.ply.lex 3.10
pycparser 2.20
threadpoolctl 2.2.0
docrep 0.3.2
scvi 0.14.1
cftime._cftime 1.5.0
cftime 1.5.0
netCDF4._netCDF4 1.5.7
netCDF4 1.5.7
xarray.tutorial master
xarray 0.19.0
arviz.data.base 0.10.0
arviz 0.10.0
fastprogress 0.2.7
pymc3 3.9.3
patsy.version 0.5.1
patsy 0.5.1
mizani 0.7.3
palettable 3.3.0
mizani.external.husl 4.0.3
statsmodels.__init__ 0.12.2
statsmodels.tools.web 0.12.2
statsmodels.api 0.12.2
plotnine 0.8.0
pynndescent 0.5.4
umap 0.5.1
[ ]: