scvi.data.pbmc_seurat_v4_cite_seq

scvi.data.pbmc_seurat_v4_cite_seq(save_path='data/', apply_filters=True, aggregate_proteins=True, mask_protein_batches=0, run_setup_anndata=True)[source]

Dataset of PBMCs measured with CITE-seq (161764 cells).

This dataset was first presented in the Seurat v4 paper:

https://doi.org/10.1016/j.cell.2021.04.048

It contains 8 volunteers in an HIV vaccine trial measured at 3 time points; thus, there are 24 batches in this dataset.

Parameters
save_path : strstr (default: 'data/')

Location to use when saving/loading the data.

apply_filters : boolbool (default: True)

Apply filters at cell and protein level. At the cell level, this filters on protein library size, number proteins detected, percent mito, and removes cells labeled as doublets.

aggregate_proteins : boolbool (default: True)

Antibodies targeting the same surface protein are added together, and isotype controls are removed. See the source code for full details.

mask_protein_subset

Set proteins in this many batches to be all zero (considered missing for TOTALVI.). This improves transfer learning with this dataset.

run_setup_anndata : boolbool (default: True)

If true, runs setup_anndata() on dataset before returning.

Return type

AnnDataAnnData

Returns

AnnData

Notes

This is not the same exact dataset as can be downloaded from:

https://satijalab.org/seurat/articles/multimodal_reference_mapping.html

This is due to the fact that the object linked in the tutorial above does not contain the actual UMI count data for RNA. UMI counts had to be separately downloaded from GEO (GSE164378). The counts in that object are an output of the scTransform method and should not be treated like UMI counts.

Examples

>>> import scvi
>>> adata = scvi.data.pbmc_seurat_v4_cite_seq()