'data/', apply_filters=True, aggregate_proteins=True, mask_protein_batches=0)[source]#

Dataset of PBMCs measured with CITE-seq (161764 cells).

This dataset was first presented in the Seurat v4 paper:

It contains 8 volunteers in an HIV vaccine trial measured at 3 time points; thus, there are 24 batches in this dataset.

  • save_path (str (default: 'data/')) – Location to use when saving/loading the data.

  • apply_filters (bool (default: True)) – Apply filters at cell and protein level. At the cell level, this filters on protein library size, number proteins detected, percent mito, and removes cells labeled as doublets.

  • aggregate_proteins (bool (default: True)) – Antibodies targeting the same surface protein are added together, and isotype controls are removed. See the source code for full details.

  • mask_protein_batches (int (default: 0)) – Set proteins in this many batches to be all zero (considered missing for TOTALVI.). This improves transfer learning with this dataset.

Return type:





This is not the same exact dataset as can be downloaded from:

This is due to the fact that the object linked in the tutorial above does not contain the actual UMI count data for RNA. UMI counts had to be separately downloaded from GEO (GSE164378). The counts in that object are an output of the scTransform method and should not be treated like UMI counts.


>>> import scvi
>>> adata =