scvi.data.setup_anndata

scvi.data.setup_anndata(adata, batch_key=None, labels_key=None, layer=None, protein_expression_obsm_key=None, protein_names_uns_key=None, categorical_covariate_keys=None, continuous_covariate_keys=None, copy=False)[source]

Sets up AnnData object for models.

A mapping will be created between data fields used by models to their respective locations in adata. This method will also compute the log mean and log variance per batch for the library size prior.

None of the data in adata are modified. Only adds fields to adata.

Parameters
adata : AnnDataAnnData

AnnData object containing raw counts. Rows represent cells, columns represent features.

batch_key : str | NoneOptional[str] (default: None)

key in adata.obs for batch information. Categories will automatically be converted into integer categories and saved to adata.obs[‘_scvi_batch’]. If None, assigns the same batch to all the data.

labels_key : str | NoneOptional[str] (default: None)

key in adata.obs for label information. Categories will automatically be converted into integer categories and saved to adata.obs[‘_scvi_labels’]. If None, assigns the same label to all the data.

layer : str | NoneOptional[str] (default: None)

if not None, uses this as the key in adata.layers for raw count data.

protein_expression_obsm_key : str | NoneOptional[str] (default: None)

key in adata.obsm for protein expression data, Required for TOTALVI.

protein_names_uns_key : str | NoneOptional[str] (default: None)

key in adata.uns for protein names. If None, will use the column names of adata.obsm[protein_expression_obsm_key] if it is a DataFrame, else will assign sequential names to proteins. Only relevant but not required for TOTALVI.

categorical_covariate_keys : List[str] | NoneOptional[List[str]] (default: None)

keys in adata.obs that correspond to categorical data. Used in some models.

continuous_covariate_keys : List[str] | NoneOptional[List[str]] (default: None)

keys in adata.obs that correspond to continuous data. Used in some models.

copy : boolbool (default: False)

if True, a copy of adata is returned.

Return type

AnnData | NoneOptional[AnnData]

Returns

If copy, will return AnnData. Adds the following fields to adata:

.uns[‘_scvi’]

scvi setup dictionary

.obs[‘_local_l_mean’]

per batch library size mean

.obs[‘_local_l_var’]

per batch library size variance

.obs[‘_scvi_labels’]

labels encoded as integers

.obs[‘_scvi_batch’]

batch encoded as integers

Examples

Example setting up a scanpy dataset with random gene data and no batch nor label information

>>> import scanpy as sc
>>> import scvi
>>> import numpy as np
>>> adata = scvi.data.synthetic_iid(run_setup_anndata=False)
>>> adata
AnnData object with n_obs × n_vars = 400 × 100
    obs: 'batch', 'labels'
    uns: 'protein_names'
    obsm: 'protein_expression'

Filter cells and run preprocessing before setup_anndata

>>> sc.pp.filter_cells(adata, min_counts = 0)

Since no batch_key nor labels_key was passed, setup_anndata() will assume all cells have the same batch and label

>>> scvi.data.setup_anndata(adata)
INFO      No batch_key inputted, assuming all cells are same batch
INFO      No label_key inputted, assuming all cells have same label
INFO      Using data from adata.X
INFO      Computing library size prior per batch
INFO      Registered keys:['X', 'batch_indices', 'local_l_mean', 'local_l_var', 'labels']
INFO      Successfully registered anndata object containing 400 cells, 100 vars, 1 batches, 1 labels, and 0 proteins. Also registered 0 extra categorical covariates and 0 extra continuous covariates.

Example setting up scanpy dataset with random gene data, batch, and protein expression

>>> adata = scvi.data.synthetic_iid(run_setup_anndata=False)
>>> scvi.data.setup_anndata(adata, batch_key='batch', protein_expression_obsm_key='protein_expression')
INFO      Using batches from adata.obs["batch"]
INFO      No label_key inputted, assuming all cells have same label
INFO      Using data from adata.X
INFO      Computing library size prior per batch
INFO      Using protein expression from adata.obsm['protein_expression']
INFO      Generating sequential protein names
INFO      Registered keys:['X', 'batch_indices', 'local_l_mean', 'local_l_var', 'labels', 'protein_expression']
INFO      Successfully registered anndata object containing 400 cells, 100 vars, 2 batches, 1 labels, and 100 proteins. Also registered 0 extra categorical covariates and 0 extra continuous covariates.