CsvDataset

class scvi.dataset.CsvDataset(filename, save_path='data/', url=None, new_n_genes=None, subset_genes=None, compression=None, sep=',', gene_by_cell=True, labels_file=None, batch_ids_file=None, delayed_populating=False)[source]

Bases: scvi.dataset.dataset.DownloadableDataset

Loads a .csv file.

Parameters
  • filename (strstr) – File name to use when saving/loading the data.

  • save_path (strstr) – Location to use when saving/loading the data.

  • url (str, NoneOptional[str]) – URL pointing to the data which will be downloaded if it’s not already in save_path.

  • new_n_genes (int, NoneOptional[int]) – Number of subsampled genes.

  • subset_genes (Iterable[Union[int, str]], NoneOptional[Iterable[Union[int, str]]]) – List of genes for subsampling.

  • compression (str, NoneOptional[str]) – For on-the-fly decompression of on-disk data. If ‘infer’ and filepath_or_buffer is path-like, then detect compression from the following extensions: ‘.gz’, ‘.bz2’, ‘.zip’, or ‘.xz’ (otherwise no decompression). If using ‘zip’, the ZIP file must contain only one data file to be read in.

  • batch_ids_file (str, NoneOptional[str]) – Name of the .csv file with batch indices. File contains two columns. The first holds cell names and second holds batch indices - type int. The first row of the file is header.

Examples

>>> # Loading a remote dataset
>>> remote_url = "https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE100866&format=file&file="
... "GSE100866%5FCBMC%5F8K%5F13AB%5F10X%2DRNA%5Fumi%2Ecsv%2Egz")
>>> remote_csv_dataset = CsvDataset("GSE100866_CBMC_8K_13AB_10X-RNA_umi.csv.gz", save_path='data/',
... compression="gzip", url=remote_url)
>>> # Loading a local dataset
>>> local_csv_dataset = CsvDataset("GSE100866_CBMC_8K_13AB_10X-RNA_umi.csv.gz",
... save_path="data/", compression='gzip')

Methods Summary

populate()

Populates a DonwloadableDataset object’s data attributes.

Methods Documentation

populate()[source]

Populates a DonwloadableDataset object’s data attributes.

E.g by calling one of GeneExpressionDataset’s populate_from... methods.