scvi.data.brainlarge_dataset

scvi.data.brainlarge_dataset(save_path='data/', run_setup_anndata=True, sample_size_gene_var=10000, max_cells_to_keep=None, n_genes_to_keep=720, loading_batch_size=100000)[source]

Loads brain-large dataset.

This dataset contains 1.3 million brain cells from 10x Genomics. We randomly shuffle the data to get a 1M subset of cells and order genes by variance to retain first 10,000 and then 720 sampled variable genes. This dataset is then sampled multiple times in cells for the runtime and goodness-of-fit analysis. We report imputation scores on the 10k cells and 720 genes samples only.

Parameters
save_path : strstr (default: 'data/')

Location to use when saving/loading the data.

run_setup_anndata : boolbool (default: True)

If true, runs setup_anndata() on dataset before returning

sample_size_gene_var : intint (default: 10000)

Number of cells to use to estimate gene variances.

max_cells_to_keep : int | NoneOptional[int] (default: None)

Maximum number of cells to keep.

n_genes_to_keep : intint (default: 720)

Number of genes to keep, ordered by decreasing variance.

loading_batch_size : intint (default: 100000)

Number of cells to use for each chunk loaded.

Return type

AnnDataAnnData

Returns

AnnData with batch info (.obs['batch']) and label info (.obs['labels'])

Examples

>>> import scvi
>>> adata = scvi.data.brainlarge_dataset()