scvi.data.synthetic_iid

Contents

scvi.data.synthetic_iid#

scvi.data.synthetic_iid(batch_size=200, n_genes=100, n_proteins=100, n_regions=100, n_batches=2, n_labels=3, dropout_ratio=0.7, sparse_format=None, generate_coordinates=False, return_mudata=False)[source]#

Synthetic multimodal dataset.

RNA and accessibility data are generated from a zero-inflated negative binomial, while protein data is generated from a negative binomial distribution. This dataset is just for testing purposes and not meant for modeling or research. Each value is independently and identically distributed.

Parameters:
  • batch_size (int (default: 200)) – The number of cells per batch such that the total number of cells in the data is batch_size * n_batches.

  • n_genes (int (default: 100)) – The number of genes to generate.

  • n_proteins (int (default: 100)) – The number of proteins to generate.

  • n_regions (int (default: 100)) – The number of accessibility regions to generate.

  • n_batches (int (default: 2)) – The number of batches to generate.

  • n_labels (int (default: 3)) – The number of cell type labels, distributed uniformly across batches.

  • sparse – Whether to store ZINB generated data as a scipy.sparse.csr_matrix.

  • dropout_ratio (float (default: 0.7)) – The expected percentage of zeros artificially added into the data for RNA and accessibility data.

  • sparse_format (str | None (default: None)) –

    Whether to store RNA, accessibility, and protein data as sparse arrays. One of the following:

  • generate_coordinates (bool (default: False)) – Whether to generate spatial coordinates for the cells.

  • return_mudata (bool (default: False)) – Returns a MuData if True, else AnnData.

Return type:

Union[AnnData, MuData]

Returns:

AnnData (if return_mudata=False) with the following fields:

  • .obs[“batch”]: Categorical batch labels in the format batch_{i}.

  • .obs[“labels”]: Categorical cell type labels in the format label_{i}.

  • .obsm[“protein_expression”]: Protein expression matrix.

  • .uns[“protein_names”]: Array of protein names.

  • .obsm[“accessibility”]: Accessibility expression matrix.

  • .obsm[“coordinates”]: Spatial coordinates for the cells if generate_coordinates is True.

MuData (if return_mudata=True) with the following fields:

  • .obs[“batch”]: Categorical batch labels in the format batch_{i}.

  • .obs[“labels”]: Categorical cell type labels in the format label_{i}.

  • .mod[“rna”]: RNA expression data.

  • .mod[“protein_expression”]: Protein expression data.

  • .mod[“accessibility”]: Accessibility expression data.

  • .obsm[“coordinates”]: Spatial coordinates for the cells if generate_coordinates is True.

Examples

>>> import scvi
>>> adata = scvi.data.synthetic_iid()