use_gpuin favor of PyTorch Lightning arguments
Add support for AnnData 0.10.0 #2271.
Add support for Python 3.11 #1977.
Upper bound Chex dependency to 0.1.8 due to NumPy installation conflicts #2132.
Add scBasset motif injection procedure #2010.
Add importance sampling based differential expression procedure #1872.
Log training loss through Lightning’s progress bar #2043.
Filter Jax undetected GPU warnings #2044.
Raise warning if MPS backend is selected for PyTorch, see https://github.com/pytorch/pytorch/issues/77764 #2045.
Add lower bound 0.12.1 for Numpyro dependency #2078.
Add new section in scBasset tutorial for motif scoring #2079.
Fix creation of minified adata by copying original uns dict #2000. This issue arises with anndata>=0.9.0.
Fix bug in
muarguments were switched around #2024.
Fix bug in
scvi.dataloaders.SemiSupervisedDataLoader.resample_labels()where the labeled dataloader was not being reinitialized on subsample #2032.
Use sphinx book theme for documentation #1673.
scvi.model.base.RNASeqMixin.posterior_predictive_sample()now outputs 3-d
Update to lightning 2.0 #1961
Hyperopt is new default searcher for tuner #1961
Switch back to using sphinx autodoc typehints #1970.
Disable default seed, run
scvi.settings.seedafter import for reproducibility #1976.
use_gpuin favor of PyTorch Lightning arguments
devices, to be removed in v1.1 #1978.
Docs organization #1983.
Keyword arguments for encoders and decoders can now be passed in from the model level #1986.
Switch to cellxgene census for backend for cellxgene data function #2030.
Refactor heuristic for default
max_epochsas a separate function
Remove custom reusable doc decorator which was used for de docs #1970.
Remove seqfish and seqfish plus datasets #2017.
Remove support for Python 3.8 (NEP 29) #2021.
Fix hyperlink to pymde docs #1944
Use sphinx autodoc instead of sphinx-autodoc-typehints #1941.
Remove .flake8 and .prospector files #1923.
Fixed computation of ELBO during training plan logging when using global kl terms. [#1895]
New in 0.20.0 (2023-02-01)#
Latent mode support changed so that user data is no longer edited in-place #1756.
Minimum supported Python version is now 3.8 #1819.
Update tutorial formatting with pre-commit #1850
Development in GitHub Codespaces is now supported #1836.
AnnTorchDatasetfixed to work with sparse data #1824.
New in 0.19.0 (2022-10-31)#
All training plans require keyword args after the first positional argument #1749
Remove confusing warning about kl warmup, log kl weight instead #1773
New in 0.18.0 (2022-10-12)#
Add latent mode support in
SCVI#1672. This allows for loading a model using latent representations only (i.e. without the full counts). Not only does this speed up inference by using the cached latent distribution parameters (thus skipping the encoding step), but this also helps in scenarios where the full counts are not available but cached latent parameters are. We provide utility functions and methods to dynamically convert a model to latent mode.
Faster inference in PyTorch with
Upgrade to Lightning 1.6 #1719.
Update CI workflow to separate static code checking from pytest #1710.
Add Python 3.10 to CI workflow #1711.
Use sphinxcontrib-bibtex for references #1731.
Replace custom attrdict with
New in 0.17.4 (2021-09-20)#
Replace instances of
super().__init__()with an argument in
autoreloadextension to throw errors #1671.
Change cell2location tutorial causing docs build to fail #1674.
Replace instances of
ints for new PyTorch Lightning #1686.
Catch case when
torch.backends.mpsis not implemented #1692.
New in 0.17.3 (2022-08-26)#
New in 0.17.2 (2022-08-26)#
Add a static method on the BaseModelClass to return the AnnDataManger’s full registry #1617.
Clarify docstrings for continuous and categorical covariate keys #1637.
Remove poetry lock, use newer build system #1645.
Fix an issue where
max_epochsis never determined heuristically for totalvi, instead it would always default to 400 #1639.
New in 0.17.1 (2022-07-14)#
Make sure notebooks are up to date for real this time :).
New in 0.17.0 (2022-07-14)#
Experimental MuData support for
TOTALVIvia the method
setup_mudata(). For several of the existing
AnnDataFieldclasses, there is now a MuData counterpart with an additional
mod_keyargument used to indicate the modality where the data lives (e.g.
MuDataLayerField). These modified classes are simply wrapped versions of the original
AnnDataFieldcode via the new
Modification of the
generative()method’s outputs to return prior and likelihood properties as
Distributionobjects. Concerned modules are
VAEC. This allows facilitating the manipulation of these distributions for model training and inference #1356.
Major changes to Jax support for scvi-tools models to generalize beyond
JaxSCVI. Support for Jax remains experimental and is subject to breaking changes:
Enable basic device management in Jax-backed modules #1585.
Refactor metrics code and use
MetricCollectionto update metrics in bulk #1529.
Any methods relying on the output of
generativefrom existing scvi-tools models (e.g.
SCANVI) will need to be modified to accept
torch.Distributionobjects rather than tensors for each parameter (e.g.
The signature of
compute_and_log_metrics()has changed to support the use of
MetricCollection. The typical modification required will look like changing
self.compute_and_log_metrics(scvi_loss, self.train_metrics, "train"). The same is necessary for validation metrics except with
self.val_metricsand the mode
Fix issue with
get_normalized_expression()with multiple samples and additional continuous covariates. This bug originated from
generative()failing to match the dimensions of the continuous covariates with the input when
inference()in multiple module classes #1548.
New in 0.16.4 (2022-06-14)#
Note: When applying any model using the
MULTIVI), you should make sure to use v0.16.4 instead of v0.16.3 or v0.16.2. This release fixes a critical bug in the training plan.
New in 0.16.3 (2022-06-04)#
Removes sphinx max version and removes jinja dependency (#1555).
New in 0.16.2 (2022-05-10)#
New in 0.16.1 (2022-04-22)#
Update scArches Pancreas tutorial, DestVI tutorial (#1520).
Fix issue where
load_query_datawould not properly add an obs column with the unlabeled category when the
labels_keywas not present in the query data.
Fix an issue with
prepare_query_data()to ensure it does nothing when genes are completely matched (#1520).
New in 0.16.0 (2022-04-12)#
Bug fix in cell type amortization, which leads to on par performance of cell type amortization
V_encoderwith free parameter for cell type proportions
neg_log_likelihood_prioris not computed anymore on random subset of single cells but cell type specific subclustering using cluster variance
var_vprior, cluster mean
mean_vpriorand cluster mixture proportion
mp_vpriorfor computation. This leads to more stable results and faster computation time. Setting
from_rna_model()to the expected resolution is critical in this algorithm.
We changed the weighting of the loss on the variances of beta and the prior of eta.
Due to bug fixes listed above this version of
DestVI is not backwards compatible. Despite instability in training in the
outdated version, we were able to reproduce results generated with this code. We therefore do not strictly encourage to rerun old experiments.
We published a new tutorial. This new tutorial incorporates a new utility package destvi_utils that generates exploratory plots of the results of
DestVI. We refer to the manual of this package for further documentation.
New in 0.15.5 (2022-04-06)#
New in 0.15.4 (2022-03-28)#
New in 0.15.3 (2022-03-24)#
Fix behavior when
continuous_covariate_keysare used with
Fix dataframe rendering in dark mode docs (#1448)
scvi.model.base.ArchesMixin.prepare_query_data()to work cross device (e.g., model trained on cuda but method used on cpu; see #1451).
New in 0.15.2 (2022-03-15)#
New in 0.15.1 (2022-03-11)#
setupfor Flax-based modules (#1403).
Class docs are now one page on docs site (#1415).
Copied AnnData objects are assigned a new uuid and transfer is attempted (#1416).
New in 0.15.0 (2022-02-28)#
In this release, we have completely refactored the logic behind our data handling strategy (i.e.
setup_anndata) to allow for:
Readable data handling for existing models.
Modular code for easy addition of custom data fields to incorporate into models.
Avoidance of unexpected edge cases when more than one model is instantiated in one session.
Important Note: This change will not break pipelines for model users (with the exception of a small change to
However, there are several breaking changes for model developers. The data handling tutorial goes over these
changes in detail.
This refactor is centered around the new
AnnDataManager class which orchestrates any data processing necessary
for scvi-tools and stores necessary information, rather than adding additional fields to the AnnData input.
We also have an exciting new experimental Jax-based scVI implementation via
JaxSCVI. While this implementation has limited functionality, we have found it to be substantially faster than the PyTorch-based implementation. For example, on a 10-core Intel CPU, Jax on only a CPU can be as fast as PyTorch with a GPU (RTX3090). We will be planning further Jax integrations in the next releases.
Furo docs theme (#1290)
The keyword argument
run_setup_anndatahas been removed from built-in datasets since there is no longer a model-agnostic
scvi.model._metrics.clustering_scoreshas been removed due to incompatbility with new data handling (#1237).
setup_anndatais now a class method on model classes and requires specific function calls to ensure proper
AnnDataManagersetup and model save/load. Any model inheriting from
BaseModelClasswill need to re-implement this method (#1237).
- To adapt existing custom models to v0.15.0, one can references the guidelines below. For some examples of how this was done for the existing models in the codebase, please reference the following PRs: (#1301, #1302).
scvi._CONSTANTShas been changed to
setup_anndata()functions are now class functions and follow a specific structure. Please refer to
setup_anndata()for an example. 3.
scvi.data.get_from_registry()has been removed. This method can be replaced by
scvi.data.AnnDataManager.get_from_registry(). 4. The setup dict stored directly on the AnnData object,
adata["_scvi"], has been deprecated. Instead, this information now lives in
The data registry can be accessed at
scvi.data.AnnDataManager.data_registry. - Summary stats can be accessed at
scvi.data.AnnDataManager.summary_stats. - Any field-specific information (e.g.
adata.obs["categorical_mappings"]) now lives in field-specific state registries. These can be retrieved via the function
register_tensor_from_anndata()has been removed. To register tensors with no relevant
AnnDataFieldsubclass, create a new a new subclass of
BaseAnnDataFieldand add it to appropriate model’s
New in 0.14.6 (2021-02-05)#
Bug fixes, minor improvements of docs, code formatting.
Update black formatting to stable release (#1324)
Refresh readme, move tasks image to docs (#1311).
Add 0.14.5 release note to index (#1296).
Upper bound setuptools due to PyTorch import bug (#1309).
New in 0.14.5 (2021-11-22)#
Bug fixes, new tutorials.
New in 0.14.4 (2021-11-16)#
Bug fixes, some tutorial improvements.
kl_weighthandling for Pyro-based models (#1242).
Fix model history on load with Pyro-based models (#1255).
Model construction tutorial uses new static setup anndata (#1257).
Add codebase overview figure to docs (#1231).
New in 0.14.3 (2021-10-19)#
New in 0.14.2 (2021-10-18)#
Bug fix and new tutorial.
New in 0.14.1 (2021-10-11)#
New in 0.14.0 (2021-10-07)#
In this release, we have completely revamped the scvi-tools documentation website by creating a new set of user guides that provide:
The math behind each method (in a succinct, online methods-like way)
The relationship between the math and the functions associated with each model
The relationship between math variables and code variables
Our previous User Guide guide has been renamed to Tutorials and contains all of our existing tutorials (including tutorials for developers).
Another noteworthy addition in this release is the implementation of the (amortized) Latent Dirichlet Allocation (aka LDA) model applied to single-cell gene expression data. We have also prepared a tutorial that demonstrates how to use this model, using a PBMC 10K dataset from 10x Genomics as an example application.
Lastly, in this release we have made a change to reduce user and developer confusion by making the previously global
setup_anndata method a static class-specific method instead. This provides more clarity on which parameters are applicable for this call, for each model class. Below is a before/after for the DESTVI and TOTALVI model classes:
Added fixes to support PyTorch Lightning 1.4 (#1103)
Simplified data handling in R tutorials with sceasy and addressed bugs in package installation (#1122).
Moved library size distribution computation to model init (#1123)
Updated Contribution docs to describe how we backport patches (#1129)
Implemented Latent Dirichlet Allocation as a PyroModule (#1132)
setup_anndataa static method on model classes rather than one global function (#1150)
Used Pytorch Lightning’s
seed_everythingmethod to set seed (#1151)
Added CITE-Seq datasets (#1182)
Early stopping now prints the reason for stopping when applicable (#1208)
New in 0.13.0 (2021-08-23)#
New in 0.12.2 (2021-08-11)#
OrderedDicttyping import to support all Python 3.7 versions (#1114).
New in 0.12.1 (2021-07-29)#
Update Pytorch Lightning version dependency to
New in 0.12.0 (2021-07-15)#
This release adds features for tighter integration with Pyro for model development, fixes for
SOLO, and other enhancements. Users of
SOLO are strongly encouraged to upgrade as previous bugs will affect performance.
Add “comparison” column to differential expression results (#1074).
CellAssignsize factor usage. See class docstring.
Update minimum Python version to
Slight interface changes to
"elbo_test"are now the average over minibatches as ELBO should be on scale of full data and
optim_kwargscan be set on initialization of training plan (#1059, #1101).
Use pandas read pickle function for pbmc dataset metadata loading (#1099).
n_samples_overallparameter to functions for denoised expression/accesibility/etc. This is used in during differential expression (#1090).
Ignore configure optimizers warning when training Pyro-based models (#1064).
Fix scale of library size for simulated doublets and expression in
SOLOwhen using observed library size to train original
SCVImodel (#1078, #1085). Currently, library sizes in this case are not appropriately put on the log scale.
New in 0.11.0 (2021-05-23)#
From the user perspective, this release features the new differential expression functionality (to be described in a manuscript). For now, it is accessible from
differential_expression(). From the developer perspective, we made changes with respect to
scvi.dataloaders.DataSplitter and surrounding the Pyro backend. Finally, we also made changes to adapt our code to PyTorch Lightning version 1.3.
Require PyTorch lightning > 1.3, add relevant fixes (#1054).
Add DestVI reference (#1060).
Add PeakVI links to README (#1046).
Automatic delta and eps computation in differential expression (#1043).
Allow doublet ratio parameter to be changed for used in SOLO (#1066).
These breaking changes do not affect the user API; though will impact model developers.
Use PyTorch Lightning data modules for
scvi.dataloaders.DataSplitter(#1061). This induces a breaking change in the way the data splitter is used. It is no longer callable and now has a
TrainRunnerand its source code, which is straightforward.
No longer require training plans to be initialized with
n_obs_trainingis now a property that can be set before actual training to rescale the loss.
Log Pyro loss as
train_elboand sum over steps (#1071)
New in 0.10.1 (2021-05-04)#
New in 0.10.0 (2021-04-20)#
PeakVI minor enhancements to differential accessibility and fix scArches support (#1019)
Add DestVI to the codebase (#1011)
Versioned tutorial links (#1005)
Remove old VAEC (#1006)
.numpy()to convert torch tensors to numpy ndarrays (#1016)
Support backed AnnData (#1017), just load anndata with
Solo interface enhancements (#1009)
Updated README (#1028)
Use Python warnings instead of logger warnings (#1021)
Change totalVI protein background default to
Falseis fewer than 10 proteins used (#1034)
New default SCANVI max epochs if loaded with pretrained SCVI model (#1025), restores old
Fix marginal log likelihood computation, which was only being computed on final minibatch of a dataloader. This bug was introduced in the
Fix bug where extra categoricals were not properly extended in
New in 0.9.1 (2021-03-20)#
Update Pyro module backend to better enfore usage of
guide, automate passing of number of training examples to Pyro modules (#990)
Minimum Pyro version bumped (#988)
Improve docs clarity (#989)
Add glossary to developer user guide (#999)
Add num threads config option to
Add CellAssign tutorial (#1004)
New in 0.9.0 (2021-03-03)#
This release features our new software development kit for building new probabilistic models. Our hope is that others will be able to develop new models by importing scvi-tools into their own packages.
From the user perspective, there are two package-wide API breaking changes and one
SCANVI specific breaking change enumerated below. From the method developer perspective, the entire model backend has been revamped using PyTorch Lightning, and no old code will be compatible with this and future versions. Also, we dropped support for Python 3.6.
Breaking change: The
max_epochsfor consistency with PytorchLightning and to better relect the functionality of the parameter.
use_gpufor consistency with PytorchLightning.
check_val_every_n_epochfor consistency with PytorchLightning.
train()methods in the codebase have been removed and various arguments have been reorganized into
trainer_kwargs. Generally speaking,
plan_kwargsdeal with model optimization like kl warmup, while
trainer_kwargsdeal with the actual training loop like early stopping.
Breaking change: GPU handling#
use_cudawas removed from the init of each model and was not replaced by
use_gpu. By default every model is intialized on CPU but can be moved to a device via
model.to_device(). If a model is trained with
use_gpu=Truethe model will remain on the GPU after training.
When loading saved models, scvi-tools will always attempt to load the model on GPU unless otherwise specified.
We now support specifying which GPU device to use if there are multiple available GPUs.
n_epochs_semisupervisedhave been removed from
train. It has been replaced with
max_epochsfor semisupervised training.
n_samples_per_labelis a new argument which will subsample the number of labelled training examples to train on per label each epoch.
New Model Implementations#
Added callback for saving the best state of a model (#887)
Option to disable progress bar (#905)
load() documentation improvements (#913)
track is now public (#938)
get_likelihood_parameter() bug (#967)
model.history are now pandas DataFrames (#949)
New in 0.8.1 (2020-12-23)#
New in 0.8.0 (2020-12-17)#
It is now possible to iteratively update these models with new samples, without altering the model for the “reference” population. Here we use the scArches method. For usage, please see the tutorial in the user guide.
To enable scArches in our models, we added a few new options. The first is
encode_covariates, which is an
SCVI option to encode the one-hotted
batch covariate. We also allow users to exchange batch norm in the encoder and decoder with layer norm, which can be though of as batch norm but per cell.
As the layer norm we use has no parameters, it’s a bit faster than models with batch norm. We don’t find many differences between using batch norm or layer norm
in our models, though we have kept defaults the same in this case. To run scArches effectively, batch norm should be exhanged with layer norm.
Empirical initialization of protein background parameters with totalVI#
The learned prior parameters for the protein background were randomly initialized. Now, they can be set with the
TOTALVI. This option fits a two-component Gaussian mixture model per cell, separating those proteins that are background
for the cell and those that are foreground, and aggregates the learned mean and variance of the smaller component across cells. This computation is done
per batch, if the
batch_key was registered. We emphasize this is just for the initialization of a learned parameter in the model.
Use observed library size option#
Many of our models like
TOTALVI learn a latent library size variable.
use_observed_lib_size may now be passed on model initialization. We have set this as
True by default,
as we see no regression in performance, and training is a bit faster.
To facilitate these enhancements, saved
TOTALVImodels from previous versions will not load properly. This is due to an architecture change of the totalVI encoder, related to latent library size handling.
The default latent distribtuion for
Autotune was removed from this release. We could not maintain the code given the new API changes and we will soon have alternative ways to tune hyperparameters.
Protein names during
setup_anndataare now stored in
adata.uns["_scvi"]["protein_names"], instead of
Fixed an issue where the unlabeled category affected the SCANVI architecture prior distribution. Unfortunately, by fixing this bug, loading previously trained (<v0.8.0)
SCANVImodels will fail.
New in 0.7.1 (2020-10-20)#
This small update provides access to our new Discourse forum from the documentation.
New in 0.7.0 (2020-10-14)#
scvi is now scvi-tools. Version 0.7 introduces many breaking changes. The best way to learn how to use scvi-tools is with our documentation and tutorials.
New high-level API and data loading, please see tutorials and examples for usage.
GeneExpressionDatasetand associated classes have been removed.
Built-in datasets now return
scvi-toolsnow relies entirely on the [AnnData] format.
scvi.modelshas been moved to
Posteriorclasses have been reduced to wrappers on
scvi.inferencehas been split to
scvi.core.trainersfor trainer classes.
Usage of classes like
AnnDataLoadernow require the
AnnDatadata object as input.
The scvi-tools package used to be scvi. This page commemorates all the hard work on the scvi package by our numerous contributors.
add tqdm to within cluster DE genes @adam
restore tqdm to use simple bar instead of ipywidget @adam
move to numpydoc for doctstrings @adam
update issues templates @adam
Poisson variable gene selection @valentine-svensson
BrainSmallDataset set defualt save_path_10X @gokcen-eraslan
train_size must be float between 0.0 and 1.0 @galen
bump dependency versions @galen
remove reproducibility notebook @galen
fix scanVI dataloading @pierre
bug in version for Louvian in setup.py @adam
update highly variable gene selection to handle sparse matrices @adam
update DE docstrings @pierre
improve posterior save load to also handle subclasses @pierre
Create NB and ZINB distributions with torch and refactor code accordingly @pierre
typos in autozivae @achille
bug in csc sparse matrices in anndata data loader @adam
do not automatically upper case genes @adam
Made the intro tutorial more user friendly @adam
Tests for LDVAE notebook @adam
fix compatibility issues with sklearn and numba @romain
fix Anndata @francesco-brundu
docstring, totalVI, totalVI notebook and CITE-seq data @adam
fix type @eduardo-beltrame
fixing installation guide @jeff
improved error message for dispersion @stephen-flemming
synthetic correlated datasets, fixed bug in marginal log likelihood @oscar
autotune, dataset enhancements @gabriel
more consistent posterior API, docstring, validation set @adam
fix anndataset @michael-raevsky
linearly decoded VAE @valentine-svensson
support for scanpy, fixed bugs, dataset enhancements @achille
fix filtering bug, synthetic correlated datasets, docstring, differential expression @pierre
better docstring @jamie-morton
classifier based on library size for doublet detection @david-kelley
First scVI TensorFlow version @romain