scvi.data.pbmc_dataset#

scvi.data.pbmc_dataset(save_path='data/', remove_extracted_data=True)[source]#

Loads pbmc dataset.

We considered scRNA-seq data from two batches of peripheral blood mononuclear cells (PBMCs) from a healthy donor (4K PBMCs and 8K PBMCs). We derived quality control metrics using the cellrangerRkit R package (v. 1.1.0). Quality metrics were extracted from CellRanger throughout the molecule specific information file. After filtering, we extract 12,039 cells with 10,310 sampled genes and get biologically meaningful clusters with the software Seurat. We then filter genes that we could not match with the bulk data used for differential expression to be left with g = 3346.

Parameters:
  • save_path (str (default: 'data/')) – Location to use when saving/loading the data.

  • remove_extracted_data (bool (default: True)) – If true, will remove the folder the data was extracted to

Returns:

AnnData with batch info (.obs['batch']), label info (.obs['labels'])

Return type:

AnnData

Examples

>>> import scvi
>>> adata = scvi.data.pbmc_dataset()