scvi.hub.HubModel#

class scvi.hub.HubModel(local_dir, metadata=None, model_card=None)[source]#

Wrapper for BaseModelClass backed by HuggingFace Hub.

Parameters:
  • local_dir (str) – Local directory where the data and pre-trained model reside.

  • metadata (HubMetadata | str | None (default: None)) – Either an instance of HubMetadata that contains the required metadata for this model, or a path to a file on disk where this metadata can be read from.

  • model_card (HubModelCardHelper | ModelCard | str | None (default: None)) – The model card for this pre-trained model. Model card is a markdown file that describes the pre-trained model/data and is displayed on HuggingFace. This can be either an instance of ModelCard or an instance of HubModelCardHelper that wraps the model card or a path to a file on disk where the model card can be read from.

Notes

See further usage examples in the following tutorials:

  1. Using scvi-hub to download pretrained scvi-tools models

  2. Using scvi-hub to upload pretrained scvi-tools models

Attributes table#

adata

Returns the data for this model.

large_training_adata

Returns the training data for this model, which might be too large to reside within the hub model.

local_dir

The local directory where the data and pre-trained model reside.

metadata

The metadata for this model.

model

Returns the model object for this hub model.

model_card

The model card for this model.

Methods table#

load_model([adata, accelerator, device])

Loads the model.

pull_from_huggingface_hub(repo_name[, ...])

Download the given model repo from huggingface.

pull_from_s3(cls, s3_bucket, s3_path[, ...])

Download a HubModel from an S3 bucket.

push_to_huggingface_hub(repo_name, repo_token)

Push this model to huggingface.

push_to_s3(s3_bucket, s3_path[, push_anndata])

Upload the HubModel to an S3 bucket.

read_adata()

Reads the data from disk (self._adata_path) if it exists.

read_large_training_adata()

Downloads the large training adata, if it exists, then load it into memory.

save([overwrite])

Save the model card and metadata to the model directory.

Attributes#

HubModel.adata[source]#

Returns the data for this model.

If the data has not been loaded yet, this will call read_adata(). Otherwise, it will simply return the loaded data.

HubModel.large_training_adata[source]#

Returns the training data for this model, which might be too large to reside within the hub model.

If the data has not been loaded yet, this will call read_large_training_adata(), which will attempt to download from the source url. Otherwise, it will simply return the loaded data.

HubModel.local_dir[source]#

The local directory where the data and pre-trained model reside.

HubModel.metadata[source]#

The metadata for this model.

HubModel.model[source]#

Returns the model object for this hub model.

If the model has not been loaded yet, this will call load_model(). Otherwise, it will simply return the loaded model.

HubModel.model_card[source]#

The model card for this model.

Methods#

HubModel.load_model(adata=None, accelerator='auto', device='auto')[source]#

Loads the model.

Parameters:
  • adata (AnnData | None (default: None)) – The data to load the model with, if not None. If None, we’ll try to load the model using the data at self._adata_path. If that file does not exist, we’ll try to load the model using large_training_adata(). If that does not exist either, we’ll error out.

  • %(param_accelerator)s

  • %(param_device)s

classmethod HubModel.pull_from_huggingface_hub(repo_name, cache_dir=None, revision=None, pull_anndata=True, **kwargs)[source]#

Download the given model repo from huggingface.

The model, its card, data, metadata are downloaded to a cached location on disk selected by huggingface and an instance of this class is created with that info and returned.

Parameters:
  • repo_name (str) – ID of the huggingface repo where this model needs to be uploaded

  • cache_dir (str | None (default: None)) – The directory where the downloaded model artifacts will be cached

  • revision (str | None (default: None)) – The revision to pull from the repo. This can be a branch name, a tag, or a full-length commit hash. If None, the default (latest) revision is pulled.

  • pull_anndata (bool (default: True)) – Whether to pull the AnnData object associated with the model. If True but the file does not exist, will fail silently.

  • kwargs – Additional keyword arguments to pass to snapshot_download().

classmethod HubModel.pull_from_s3(cls, s3_bucket, s3_path, pull_anndata=True, cache_dir=None, **kwargs)[source]#

Download a HubModel from an S3 bucket.

Requires boto3 to be installed.

Parameters:
  • s3_bucket (str) – The S3 bucket from which to download the model.

  • s3_path (str) – The S3 path to the saved model.

  • pull_anndata (bool (default: True)) – Whether to pull the AnnData object associated with the model.

  • cache_dir (str | None (default: None)) – The directory where the downloaded model files will be cached. Defaults to a temporary directory created with tempfile.mkdtemp().

  • **kwargs – Keyword arguments passed into client().

Return type:

HubModel

Returns:

The pretrained model specified by the given S3 bucket and path.

HubModel.push_to_huggingface_hub(repo_name, repo_token, repo_create=False, push_anndata=True, repo_create_kwargs=None, **kwargs)[source]#

Push this model to huggingface.

If the dataset is too large to upload to huggingface, this will raise an exception prompting the user to upload the data elsewhere. Otherwise, the data, model card, and metadata are all uploaded to the given model repo.

Parameters:
  • repo_name (str) – ID of the huggingface repo where this model needs to be uploaded

  • repo_token (str) – huggingface API token with write permissions

  • repo_create (bool (default: False)) – Whether to create the repo

  • push_anndata (bool (default: True)) – Whether to push the AnnData object associated with the model.

  • repo_create_kwargs (dict | None (default: None)) – Keyword arguments passed into create_repo() if repo_create=True.

  • **kwargs – Additional keyword arguments passed into upload_file().

HubModel.push_to_s3(s3_bucket, s3_path, push_anndata=True, **kwargs)[source]#

Upload the HubModel to an S3 bucket.

Requires boto3 to be installed.

Parameters:
  • s3_bucket (str) – The S3 bucket to which to upload the model.

  • s3_path (str) – The S3 path where the model will be saved.

  • push_anndata (bool (default: True)) – Whether to push the AnnData object associated with the model.

  • **kwargs – Keyword arguments passed into client().

HubModel.read_adata()[source]#

Reads the data from disk (self._adata_path) if it exists. Otherwise, this is a no-op.

Return type:

None

HubModel.read_large_training_adata()[source]#

Downloads the large training adata, if it exists, then load it into memory. Otherwise, this is a no-op.

Return type:

None

Notes

The large training data url can be a cellxgene explorer session url. However it cannot be a self-hosted session. In other words, it must be from the cellxgene portal (https://cellxgene.cziscience.com/).

HubModel.save(overwrite=False)[source]#

Save the model card and metadata to the model directory.

Parameters:

overwrite (bool (default: False)) – Whether to overwrite existing files.

Return type:

None