New in 0.15.0 (2022-02-28)#
In this release, we have completely refactored the logic behind our data handling strategy (i.e. setup_anndata
) to allow for:
Readable data handling for existing models.
Modular code for easy addition of custom data fields to incorporate into models.
Avoidance of unexpected edge cases when more than one model is instantiated in one session.
Important Note: This change will not break pipelines for model users (with the exception of a small change to SCANVI
).
However, there are several breaking changes for model developers. The data handling tutorial goes over these
changes in detail.
This refactor is centered around the new AnnDataManager
class which orchestrates any data processing necessary
for scvi-tools and stores necessary information, rather than adding additional fields to the AnnData input.
We also have an exciting new experimental Jax-based scVI implementation via JaxSCVI
. While this implementation has limited functionality, we have found it to be substantially faster than the PyTorch-based implementation. For example, on a 10-core Intel CPU, Jax on only a CPU can be as fast as PyTorch with a GPU (RTX3090). We will be planning further Jax integrations in the next releases.
Changes#
Major refactor to data handling strategy with the introduction of
AnnDataManager
(#1237).Prevent clobbering between models using the same AnnData object with model instance specific
AnnDataManager
mappings (#1342).Add
size_factor_key
toSCVI
,MULTIVI
,SCANVI
, andTOTALVI
(#1334).Add references to the scvi-tools journal publication to the README (#1338, #1339).
Addition of
scvi.model.utils.mde()
(#1372) for accelerated visualization of scvi-tools embeddings.Furo docs theme (#1290)
Add
scvi.model.JaxSCVI
andscvi.module.JaxVAE
, drop Numba dependency for checking if data is count data (#1367).
Breaking changes#
The keyword argument
run_setup_anndata
has been removed from built-in datasets since there is no longer a model-agnosticsetup_anndata
method (#1237).The function
scvi.model._metrics.clustering_scores
has been removed due to incompatbility with new data handling (#1237).SCANVI
now takesunlabeled_category
as an argument tosetup_anndata()
rather than on initialization (#1237).setup_anndata
is now a class method on model classes and requires specific function calls to ensure properAnnDataManager
setup and model save/load. Any model inheriting fromBaseModelClass
will need to re-implement this method (#1237).To adapt existing custom models to v0.15.0, one can references the guidelines below. For some examples of how this was done for the existing models in the codebase, please reference the following PRs: (#1301, #1302). : 1.
scvi._CONSTANTS
has been changed toscvi.REGISTRY_KEYS
. 2.setup_anndata()
functions are now class functions and follow a specific structure. Please refer tosetup_anndata()
for an example. 3.scvi.data.get_from_registry()
has been removed. This method can be replaced byscvi.data.AnnDataManager.get_from_registry()
. 4. The setup dict stored directly on the AnnData object,adata["_scvi"]
, has been deprecated. Instead, this information now lives inscvi.data.AnnDataManager.registry
. : - The data registry can be accessed atscvi.data.AnnDataManager.data_registry
. - Summary stats can be accessed atscvi.data.AnnDataManager.summary_stats
. - Any field-specific information (e.g.adata.obs["categorical_mappings"]
) now lives in field-specific state registries. These can be retrieved via the functionget_state_registry()
. 5.register_tensor_from_anndata()
has been removed. To register tensors with no relevantAnnDataField
subclass, create a new a new subclass ofBaseAnnDataField
and add it to appropriate model’ssetup_anndata()
function.