CsvDataset¶
-
class
scvi.dataset.
CsvDataset
(filename, save_path='data/', url=None, new_n_genes=None, subset_genes=None, compression=None, sep=',', gene_by_cell=True, labels_file=None, batch_ids_file=None, delayed_populating=False)[source]¶ Bases:
scvi.dataset.dataset.DownloadableDataset
Loads a .csv file.
- Parameters
filename (
str
str
) – File name to use when saving/loading the data.save_path (
str
str
) – Location to use when saving/loading the data.url (
str
,None
Optional
[str
]) – URL pointing to the data which will be downloaded if it’s not already insave_path
.new_n_genes (
int
,None
Optional
[int
]) – Number of subsampled genes.subset_genes (
Iterable
[Union
[int
,str
]],None
Optional
[Iterable
[Union
[int
,str
]]]) – List of genes for subsampling.compression (
str
,None
Optional
[str
]) – For on-the-fly decompression of on-disk data. If ‘infer’ and filepath_or_buffer is path-like, then detect compression from the following extensions: ‘.gz’, ‘.bz2’, ‘.zip’, or ‘.xz’ (otherwise no decompression). If using ‘zip’, the ZIP file must contain only one data file to be read in.batch_ids_file (
str
,None
Optional
[str
]) – Name of the .csv file with batch indices. File contains two columns. The first holds cell names and second holds batch indices - type int. The first row of the file is header.
Examples
>>> # Loading a remote dataset >>> remote_url = "https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE100866&format=file&file=" ... "GSE100866%5FCBMC%5F8K%5F13AB%5F10X%2DRNA%5Fumi%2Ecsv%2Egz") >>> remote_csv_dataset = CsvDataset("GSE100866_CBMC_8K_13AB_10X-RNA_umi.csv.gz", save_path='data/', ... compression="gzip", url=remote_url) >>> # Loading a local dataset >>> local_csv_dataset = CsvDataset("GSE100866_CBMC_8K_13AB_10X-RNA_umi.csv.gz", ... save_path="data/", compression='gzip')
Methods Summary
populate
()Populates a
DonwloadableDataset
object’s data attributes.Methods Documentation