New in 0.15.0 (2022-02-28)#

In this release, we have completely refactored the logic behind our data handling strategy (i.e. setup_anndata) to allow for:

  1. Readable data handling for existing models.

  2. Modular code for easy addition of custom data fields to incorporate into models.

  3. Avoidance of unexpected edge cases when more than one model is instantiated in one session.

Important Note: This change will not break pipelines for model users (with the exception of a small change to SCANVI). However, there are several breaking changes for model developers. The data handling tutorial goes over these changes in detail.

This refactor is centered around the new AnnDataManager class which orchestrates any data processing necessary for scvi-tools and stores necessary information, rather than adding additional fields to the AnnData input.

Schematic of data handling strategy with AnnDataManager

Schematic of data handling strategy with AnnDataManager#

We also have an exciting new experimental Jax-based scVI implementation via JaxSCVI. While this implementation has limited functionality, we have found it to be substantially faster than the PyTorch-based implementation. For example, on a 10-core Intel CPU, Jax on only a CPU can be as fast as PyTorch with a GPU (RTX3090). We will be planning further Jax integrations in the next releases.

Changes#

Breaking changes#

  • The keyword argument run_setup_anndata has been removed from built-in datasets since there is no longer a model-agnostic setup_anndata method (#1237).

  • The function scvi.model._metrics.clustering_scores has been removed due to incompatbility with new data handling (#1237).

  • SCANVI now takes unlabeled_category as an argument to setup_anndata() rather than on initialization (#1237).

  • setup_anndata is now a class method on model classes and requires specific function calls to ensure proper AnnDataManager setup and model save/load. Any model inheriting from BaseModelClass will need to re-implement this method (#1237).

    • To adapt existing custom models to v0.15.0, one can references the guidelines below. For some examples of how this was done for the existing models in the codebase, please reference the following PRs: (#1301, #1302). : 1. scvi._CONSTANTS has been changed to scvi.REGISTRY_KEYS. 2. setup_anndata() functions are now class functions and follow a specific structure. Please refer to setup_anndata() for an example. 3. scvi.data.get_from_registry() has been removed. This method can be replaced by scvi.data.AnnDataManager.get_from_registry(). 4. The setup dict stored directly on the AnnData object, adata["_scvi"], has been deprecated. Instead, this information now lives in scvi.data.AnnDataManager.registry. : - The data registry can be accessed at scvi.data.AnnDataManager.data_registry. - Summary stats can be accessed at scvi.data.AnnDataManager.summary_stats. - Any field-specific information (e.g. adata.obs["categorical_mappings"]) now lives in field-specific state registries. These can be retrieved via the function get_state_registry(). 5. register_tensor_from_anndata() has been removed. To register tensors with no relevant AnnDataField subclass, create a new a new subclass of BaseAnnDataField and add it to appropriate model’s setup_anndata() function.

Contributors#