BrainLargeDataset

class scvi.dataset.BrainLargeDataset(filename=None, save_path='data/', sample_size_gene_var=10000, max_cells_to_keep=None, nb_genes_to_keep=720, loading_batch_size=100000, delayed_populating=False)[source]

Bases: scvi.dataset.dataset.DownloadableDataset

Loads brain-large dataset.

This dataset contains 1.3 million brain cells from 10x Genomics. We randomly shuffle the data to get a 1M subset of cells and order genes by variance to retain first 10,000 and then 720 sampled variable genes. This dataset is then sampled multiple times in cells for the runtime and goodness-of-fit analysis. We report imputation scores on the 10k cells and 720 genes samples only.

Parameters
  • filename (str, NoneOptional[str]) – File name to use when saving/loading the data.

  • save_path (strstr) – Location to use when saving/loading the data.

  • sample_size_gene_var (intint) – Number of cells to use to estimate gene variances.

  • max_cells_to_keep (int, NoneOptional[int]) – Maximum number of cells to keep.

  • nb_genes_to_keep (intint) – Number of genes to keep, ordered by decreasing variance.

  • loading_batch_size (intint) – Number of cells to use for each chunk loaded.

  • delayed_populating (boolbool) – Switch for delayed populating mechanism.

Examples

>>> gene_dataset = BrainLargeDataset()

Methods Summary

populate()

Populates a DonwloadableDataset object’s data attributes.

Methods Documentation

populate()[source]

Populates a DonwloadableDataset object’s data attributes.

E.g by calling one of GeneExpressionDataset’s populate_from... methods.