scvi.data.synthetic_iid#
- scvi.data.synthetic_iid(batch_size=200, n_genes=100, n_proteins=100, n_regions=100, n_batches=2, n_labels=3, dropout_ratio=0.7, sparse_format=None, return_mudata=False)[source]#
Synthetic multimodal dataset.
RNA and accessibility data are generated from a zero-inflated negative binomial, while protein data is generated from a negative binomial distribution. This dataset is just for testing purposes and not meant for modeling or research. Each value is independently and identically distributed.
- Parameters
batch_size (
int
(default:200
)) – The number of cells per batch such that the total number of cells in the data is batch_size * n_batches.n_genes (
int
(default:100
)) – The number of genes to generate.n_proteins (
int
(default:100
)) – The number of proteins to generate.n_regions (
int
(default:100
)) – The number of accessibility regions to generate.n_batches (
int
(default:2
)) – The number of batches to generate.n_labels (
int
(default:3
)) – The number of cell type labels, distributed uniformly across batches.sparse – Whether to store ZINB generated data as a
scipy.sparse.csr_matrix
.dropout_ratio (
float
(default:0.7
)) – The expected percentage of zeros artificially added into the data for RNA and accessibility data.sparse_format (
Optional
[str
] (default:None
)) –Whether to store RNA, accessibility, and protein data as sparse arrays. One of the following:
None: Store as a dense
numpy.ndarray
.”csr_matrix”: Store as a
scipy.sparse.csr_matrix
.”csc_matrix”: Store as a
scipy.sparse.csc_matrix
.
return_mudata (
bool
(default:False
)) – Returns aMuData
if True, elseAnnData
.
- Return type
- Returns
AnnData
(if return_mudata=False) with the following fields:.obs[“batch”]: Categorical batch labels in the format batch_{i}.
.obs[“labels”]: Categorical cell type labels in the format label_{i}.
.obsm[“protein_expression”]: Protein expression matrix.
.uns[“protein_names”]: Array of protein names.
.obsm[“accessibility”]: Accessibility expression matrix.
MuData
(if return_mudata=True) with the following fields:.obs[“batch”]: Categorical batch labels in the format batch_{i}.
.obs[“labels”]: Categorical cell type labels in the format label_{i}.
.mod[“rna”]: RNA expression data.
.mod[“protein_expression”]: Protein expression data.
.mod[“accessibility”]: Accessibility expression data.
Examples
>>> import scvi >>> adata = scvi.data.synthetic_iid()