Note
This page was generated from
scvi_hub_upload_and_large_files.ipynb.
Interactive online version:
.
Some tutorial content may look better in light mode.
Using scvi-hub to upload pretrained scvi-tools models#
In this tutorial, we will see how to use scvi-tools to upload pretrained models onto Hugging Face. We will also see how to handle large training datasets.
If you have not already, make sure to refer to our scvi_hub_into_and_download tutorial, which is a pre-requisite to this one. It introduces Hugging Face (HF) and the scvi-hub, and describes how to use them for downloading pre-trained models from the HF Model Hub.
[ ]:
!pip install --quiet scvi-colab
from scvi_colab import install
install()
[ ]:
import scanpy as sc
import scvi
from scvi.hub import HubMetadata, HubModel, HubModelCardHelper
sc.set_figure_params(figsize=(4, 4))
# for white background of figures (only for docs rendering)
%config InlineBackend.print_figure_kwargs={'facecolor' : "w"}
%config InlineBackend.figure_format='retina'
Imports#
Let’s start by adding the Python imports we need.
Pretrain a demo model#
Let’s pretrain a model on some synthetic data which we’ll use to upload to the scvi-hub later.
[5]:
local_dir = "local/scvi_hub_upload"
adata = scvi.data.synthetic_iid()
scvi.model.SCVI.setup_anndata(adata)
model = scvi.model.SCVI(adata)
model.train(1)
model.save(local_dir, save_anndata=True, overwrite=True)
/Users/valehvpa/GitRepos/scvi-tools/scvi/data/_built_in_data/_synthetic.py:32: FutureWarning: X.dtype being converted to np.float32 from int64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour.
adata = AnnData(data)
INFO:pytorch_lightning.utilities.rank_zero:GPU available: True (mps), used: False
INFO:pytorch_lightning.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO:pytorch_lightning.utilities.rank_zero:IPU available: False, using: 0 IPUs
INFO:pytorch_lightning.utilities.rank_zero:HPU available: False, using: 0 HPUs
/opt/homebrew/Caskroom/miniconda/base/envs/scvi-hub/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py:1789: UserWarning: MPS available but not used. Set `accelerator` and `devices` using `Trainer(accelerator='mps', devices=1)`.
rank_zero_warn(
/opt/homebrew/Caskroom/miniconda/base/envs/scvi-hub/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py:1892: PossibleUserWarning: The number of training batches (3) is smaller than the logging interval Trainer(log_every_n_steps=10). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
rank_zero_warn(
Epoch 1/1: 100%|██████████| 1/1 [00:00<00:00, 39.55it/s, loss=333, v_num=1]
INFO:pytorch_lightning.utilities.rank_zero:`Trainer.fit` stopped: `max_epochs=1` reached.
Epoch 1/1: 100%|██████████| 1/1 [00:00<00:00, 31.65it/s, loss=333, v_num=1]
Model Card and Metadata#
To upload pretrained models, you’ll need to create an instance of the HubModel
class and then simply call its push_to_huggingface_hub
method.
As you can see from the API reference, the HubModel
init function requires metadata and a Model Card. There are a few ways you can provide these:
The metadata can be either an instance of the
HubMetadata
class that contains the required metadata for this model, or a path to a JSON file on disk where this metadata can be read from.The Model Card can be an instance of the
HubModelCardHelper
class created for this model, or an instance of the HF Model Card object, or a path to a Markdown file on disk where the model card can be read from.You can also use the
HubModelCardHelper
class to create a Model Card from the scvi-tools template, then save it on disk and change it as you wish before passing its path into theHubModel
class.
Here we’ll see how to create the HubMetadata and a Model Card from the data on disk.
[17]:
hm = HubMetadata.from_dir(local_dir, anndata_version="0.8.0")
hmch = HubModelCardHelper.from_dir(
local_dir,
license_info="cc-by-4.0",
anndata_version="0.8.0",
data_modalities=["rna", "protein"],
data_is_annotated=False,
description="This is a demo model used during upload demo.",
references="None.",
)
INFO File local/scvi_hub_upload/model.pt already downloaded
INFO File local/scvi_hub_upload/model.pt already downloaded
[18]:
print(hmch.model_card.content)
---
license: cc-by-4.0
library_name: scvi-tools
tags:
- model_cls_name:SCVI
- scvi_version:0.19.0a0
- anndata_version:0.8.0
- modality:rna
- modality:protein
- annotated:False
---
# Description
This is a demo model used during upload demo.
# Model properties
Many model properties are in the model tags. Some more are listed below.
**model_init_params**:
```json
{
"n_hidden": 128,
"n_latent": 10,
"n_layers": 1,
"dropout_rate": 0.1,
"dispersion": "gene",
"gene_likelihood": "zinb",
"latent_distribution": "normal"
}
```
**model_setup_anndata_args**:
```json
{
"layer": null,
"batch_key": null,
"labels_key": null,
"size_factor_key": null,
"categorical_covariate_keys": null,
"continuous_covariate_keys": null
}
```
**model_summary_stats**:
| Summary Stat Key | Value |
|--------------------------|-------|
| n_batch | 1 |
| n_cells | 400 |
| n_extra_categorical_covs | 0 |
| n_extra_continuous_covs | 0 |
| n_labels | 1 |
| n_vars | 100 |
**model_data_registry**:
| Registry Key | scvi-tools Location |
|--------------|---------------------------|
| X | adata.X |
| batch | adata.obs['_scvi_batch'] |
| labels | adata.obs['_scvi_labels'] |
**model_parent_module**: scvi.model
**data_is_latent**: False
# Training data
This is an optional link to where the training data is stored if it is too large
to host on the huggingface Model hub.
<!-- This field is required for models that haven't been minified by converting to latent
mode. See the scvi-tools documentation for more details. -->
Training data url: N/A
# Training code
This is an optional link to the code used to train the model.
Training code url: N/A
# References
None.
Note: Suppose I wanted to change the content a little bit. To do that, I’d save the card to disk, change it manually as I wish, and then pass its path to HubModel
.
hmch.model_card.save(
"local/my_model_card.md"
) # then change the markdown file on disk...
Create a HubModel
and upload it#
Now we can create the HubModel and push it to the HF Model Hub:
[24]:
hmo = HubModel(local_dir, metadata=hm, model_card=hmch)
hmo
HubModel with:
local_dir: local/scvi_hub_upload
model loaded? No
adata loaded? No
large_training_adata loaded? No
metadata:
HubMetadata(scvi_version='0.19.0a0', anndata_version='0.8.0', training_data_url=None, model_parent_module='scvi.model')
model_card:
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────── license: cc-by-4.0 library_name: scvi-tools tags: • model_cls_name:SCVI • scvi_version:0.19.0a0 • anndata_version:0.8.0 • modality:rna • modality:protein • annotated:False ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ╔═════════════════════════════════════════════════════════════════════════════════════════════════════════════════╗ ║ Description ║ ╚═════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝ This is a demo model used during upload demo. ╔═════════════════════════════════════════════════════════════════════════════════════════════════════════════════╗ ║ Model properties ║ ╚═════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝ Many model properties are in the model tags. Some more are listed below. model_init_params: { "n_hidden": 128, "n_latent": 10, "n_layers": 1, "dropout_rate": 0.1, "dispersion": "gene", "gene_likelihood": "zinb", "latent_distribution": "normal" } model_setup_anndata_args: { "layer": null, "batch_key": null, "labels_key": null, "size_factor_key": null, "categorical_covariate_keys": null, "continuous_covariate_keys": null } model_summary_stats: | Summary Stat Key | Value | |--------------------------|-------| | n_batch | 1 | | n_cells | 400 | | n_extra_categorical_covs | 0 | | n_extra_continuous_covs | 0 | | n_labels | 1 | | n_vars | 100 | model_data_registry: | Registry Key | scvi-tools Location | |--------------|---------------------------| | X | adata.X | | batch | adata.obs['_scvi_batch'] | | labels | adata.obs['_scvi_labels'] | model_parent_module: scvi.model data_is_latent: False ╔═════════════════════════════════════════════════════════════════════════════════════════════════════════════════╗ ║ Training data ║ ╚═════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝ This is an optional link to where the training data is stored if it is too large to host on the huggingface Model hub. Training data url: N/A ╔═════════════════════════════════════════════════════════════════════════════════════════════════════════════════╗ ║ Training code ║ ╚═════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝ This is an optional link to the code used to train the model. Training code url: N/A ╔═════════════════════════════════════════════════════════════════════════════════════════════════════════════════╗ ║ References ║ ╚═════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝ None.
[24]:
To upload, you need to call:
hmo.push_to_huggingface_hub(
repo_name=repo_name, repo_token=repo_token, repo_create=True
)
We won’t do it here but will explain the parameters you need to pass:
repo_name
: The name/id of your repo.repo_token
: The token you need to authenticate yourself to the HF Model Hub. It must have “write” permissions. You can get this from your HF account page. Read this article to find out how.The token can either be passed in as plain text or as the full path to a file on disk where the token is stored.
repo_create
: Whether you want scvi-tools to create the repo for you. If you want to create the repo yourself on the HF Model Hub or if it already exists, you can set this to False.
Large training data#
So far, all models we’ve seen have contained the dataset in the HF Hub Model object. However, in some cases, this is not possible — or desirable — if the training data is too large. For all files large than 5GB, you are prompted to store your training data on a separate storage and provide its URL to the HubModel. This will alert scvi-tools as to where to pull the data from when loading it (or the model) into memory.
There are four possible scenarios. Here we’re assuming that the minified data is <5GB which is very likely to not be the case.
Your training data is <5GB and it is not minified. 👉 In this case, both your model and data will be uploaded to the same HF Model.
Your training data is <5GB and it is minified. 👉 In this case, both your model and minified data will be uploaded to the same HF Model. Optionally, you can provide a link to your full (i.e., non-minified) training data.
Your training data is >=5GB and it is not minified. 👉 In this case, only your model will be uploaded to the HF Model. If you want to use your training data, then it is required to provide a link to it (this must be in the required metadata file, and can be present in the model card as well). When needed, scvi-tools will automatically download your training data from the link you registered.
Your training data is >=5GB and it is minified. 👉 In this case, both your model and minified data will be uploaded to the same HF Model. Optionally, you can provide a link to your full (i.e., non-minified) training data.
It is highly recommended to try to minify your data if possible. Please refer to our Minification tutorial for how to do that.
Note
It is always possible to use another dataset than your training data. You can set model.adata
prior to saving. However, the convention with scvi-hub is to provide access to the training data (full or minified form), so that users can reproduce the results of the model and perform their own analyses on the same data.
Model evaluation#
We recommend that you include some evaluation results in your Model Card. One way to do this is by using our scvi-criticism Python package. It provides a simple API to evaluate the goodness of fit of your model and generate various visualizations. Read more about it in the scvi-criticism documentation.