Dataset10X

class scvi.dataset.Dataset10X(dataset_name=None, filename=None, save_path='data/10X', url=None, type='filtered', dense=False, measurement_names_column=1, remove_extracted_data=False, delayed_populating=False)[source]

Bases: scvi.dataset.dataset.DownloadableDataset

Loads a file from 10x website.

Parameters
  • dataset_name (str, NoneOptional[str]) – Name of the dataset file. Has to be one of: “frozen_pbmc_donor_a”, “frozen_pbmc_donor_b”, “frozen_pbmc_donor_c”, “fresh_68k_pbmc_donor_a”, “cd14_monocytes”, “b_cells”, “cd34”, “cd56_nk”, “cd4_t_helper”, “regulatory_t”, “naive_t”, “memory_t”, “cytotoxic_t”, “naive_cytotoxic”, “pbmc8k”, “pbmc4k”, “t_3k”, “t_4k”, “neuron_9k”, “pbmc_1k_protein_v3”, “pbmc_10k_protein_v3”, “malt_10k_protein_v3”, “pbmc_1k_v2”, “pbmc_1k_v3”, “pbmc_10k_v3”, “hgmm_1k_v2”, “hgmm_1k_v3”, “hgmm_5k_v3”, “hgmm_10k_v3”, “neuron_1k_v2”, “neuron_1k_v3”, “neuron_10k_v3”, “heart_1k_v2”, “heart_1k_v3”, “heart_10k_v3”.

  • filename (str, NoneOptional[str]) – manual override of the filename to write to.

  • save_path (strstr) – Location to use when saving/loading the data.

  • url (str, NoneOptional[str]) – manual override of the download remote location. Note that we already provide urls for most 10X datasets, which are automatically formed only using the dataset_name.

  • type (strstr) – Either filtered data or raw data.

  • dense (boolbool) – Whether to load as dense or sparse. If False, data is cast to sparse using scipy.sparse.csr_matrix.

  • measurement_names_column (intint) – column in which to find measurement names in the corresponding .tsv file.

  • remove_extracted_data (boolbool) – Whether to remove extracted archives after populating the dataset.

Examples

>>> tenX_dataset = Dataset10X("neuron_9k")

Methods Summary

find_path_to_data()

Returns exact path for the data in the archive.

populate()

Populates a DonwloadableDataset object’s data attributes.

Methods Documentation

find_path_to_data()[source]

Returns exact path for the data in the archive.

This is required because 10X doesn’t have a consistent way of storing their data. Additionally, the function returns whether the data is stored in compressed format.

Return type

Tuple[str, str]Tuple[str, str]

Returns

path in which files are contains and their suffix if compressed.

populate()[source]

Populates a DonwloadableDataset object’s data attributes.

E.g by calling one of GeneExpressionDataset’s populate_from... methods.