scvi.data.synthetic_iid

scvi.data.synthetic_iid#

scvi.data.synthetic_iid(batch_size=200, n_genes=100, n_proteins=100, n_regions=100, n_batches=2, n_labels=3, dropout_ratio=0.7, sparse_format=None, generate_coordinates=False, return_mudata=False, **kwargs)[source]#

Synthetic multimodal dataset.

RNA and accessibility data are generated from a zero-inflated negative binomial, while protein data is generated from a negative binomial distribution. This dataset is just for testing purposes and not meant for modeling or research. Each value is independently and identically distributed.

Parameters:

batch_size (int (default: 200)) – The number of cells per batch such that the total number of cells in the data is batch_size * n_batches.
n_genes (int (default: 100)) – The number of genes to generate.
n_proteins (int (default: 100)) – The number of proteins to generate.
n_regions (int (default: 100)) – The number of accessibility regions to generate.
n_batches (int (default: 2)) – The number of batches to generate.
n_labels (int (default: 3)) – The number of cell type labels, distributed uniformly across batches.
dropout_ratio (float (default: 0.7)) – The expected percentage of zeros artificially added into the data for RNA and accessibility data.
sparse_format (str | None (default: None)) –
Whether to store RNA, accessibility, and protein data as sparse arrays. One of the following:
- None: Store as a dense numpy.ndarray.
- ”csr_matrix”: Store as a scipy.sparse.csr_matrix.
- ”csc_matrix”: Store as a scipy.sparse.csc_matrix.
generate_coordinates (bool (default: False)) – Whether to generate spatial coordinates for the cells.
return_mudata (bool (default: False)) – Returns a MuData if True, else AnnData.

Return type:

AnnData | MuData

Returns:

AnnData (if return_mudata=False) with the following fields:

.obs[“batch”]: Categorical batch labels in the format batch_{i}.
.obs[“labels”]: Categorical cell type labels in the format label_{i}.
.obsm[“protein_expression”]: Protein expression matrix.
.uns[“protein_names”]: Array of protein names.
.obsm[“accessibility”]: Accessibility expression matrix.
.obsm[“coordinates”]: Spatial coordinates for the cells if generate_coordinates is True.

MuData (if return_mudata=True) with the following fields:

.obs[“batch”]: Categorical batch labels in the format batch_{i}.
.obs[“labels”]: Categorical cell type labels in the format label_{i}.
.mod[“rna”]: RNA expression data.
.mod[“protein_expression”]: Protein expression data.
.mod[“accessibility”]: Accessibility expression data.
.obsm[“coordinates”]: Spatial coordinates for the cells if generate_coordinates is True.

Examples

>>> import scvi
>>> adata = scvi.data.synthetic_iid()

scvi.data.synthetic_iid

Contents

scvi.data.synthetic_iid#