scvi.data.brainlarge_dataset

scvi.data.brainlarge_dataset#

scvi.data.brainlarge_dataset(save_path='data/', sample_size_gene_var=10000, max_cells_to_keep=None, n_genes_to_keep=720, loading_batch_size=100000)[source]#

Loads brain-large dataset.

This dataset contains 1.3 million brain cells from 10x Genomics. We randomly shuffle the data to get a 1M subset of cells and order genes by variance to retain first 10,000 and then 720 sampled variable genes. This dataset is then sampled multiple times in cells for the runtime and goodness-of-fit analysis. We report imputation scores on the 10k cells and 720 genes samples only.

Parameters:
  • save_path (str (default: 'data/')) – Location to use when saving/loading the data.

  • sample_size_gene_var (int (default: 10000)) – Number of cells to use to estimate gene variances.

  • max_cells_to_keep (int | None (default: None)) – Maximum number of cells to keep.

  • n_genes_to_keep (int (default: 720)) – Number of genes to keep, ordered by decreasing variance.

  • loading_batch_size (int (default: 100000)) – Number of cells to use for each chunk loaded.

Return type:

AnnData

Returns:

AnnData with batch info (.obs['batch']) and label info (.obs['labels'])

Examples

>>> import scvi
>>> adata = scvi.data.brainlarge_dataset()