scvi.hub.HubModel

scvi.hub.HubModel#

class scvi.hub.HubModel(local_dir, repo_name=None, metadata=None, model_card=None)[source]#

Wrapper for BaseModelClass backed by HuggingFace Hub.

Parameters:

repo_name (str | None (default: None)) – ID of the huggingface repo where this model is uploaded
local_dir (str) – Local directory where the data and pre-trained model reside.
metadata (HubMetadata | str | None (default: None)) – Either an instance of HubMetadata that contains the required metadata for this model, or a path to a file on disk where this metadata can be read from.
model_card (HubModelCardHelper | ModelCard | str | None (default: None)) – The model card for this pre-trained model. Model card is a Markdown file that describes the pre-trained model/data and is displayed on HuggingFace. This can be either an instance of ModelCard or an instance of HubModelCardHelper that wraps the model card or a path to a file on disk where the model card can be read from.

Notes

See further usage examples in the following tutorials:

Attributes table#

`adata`	Returns the data for this model.
`large_training_adata`	Returns the training data for this model.
`local_dir`	The local directory where the data and pre-trained model reside.
`metadata`	The metadata for this model.
`model`	Returns the model object for this hub model.
`model_card`	The model card for this model.
`repo_name`	The local directory where the data and pre-trained model reside.

Methods table#

`load_model`([adata, accelerator, device])	Loads the model.
`pull_from_huggingface_hub`(repo_name[, ...])	Download the given model repo from huggingface.
`pull_from_s3`(cls, s3_bucket, s3_path[, ...])	Download a `HubModel` from an S3 bucket.
`push_to_huggingface_hub`(repo_name[, ...])	Push this model to huggingface.
`push_to_s3`(s3_bucket, s3_path[, push_anndata])	Upload the `HubModel` to an S3 bucket.
`read_adata`()	Reads the data from disk (`self._adata_path`).
`read_large_training_adata`()	Downloads the large training adata.
`read_mudata`()	Reads the data from disk (`self._mudata_path`).
`save`([overwrite])	Save the model card and metadata to the model directory.

Attributes#

HubModel.adata[source]#

Returns the data for this model.

If the data has not been loaded yet, this will call read_adata(). Otherwise, it will simply return the loaded data.

HubModel.large_training_adata[source]#

Returns the training data for this model.

If the data has not been loaded yet, this will call read_large_training_adata(), which will attempt to download from the source url. Otherwise, it will simply return the loaded data.

HubModel.local_dir[source]#: The local directory where the data and pre-trained model reside.

HubModel.metadata[source]#: The metadata for this model.

HubModel.model[source]#

Returns the model object for this hub model.

If the model has not been loaded yet, this will call load_model(). Otherwise, it will simply return the loaded model.

HubModel.model_card[source]#: The model card for this model.

HubModel.repo_name[source]#: The local directory where the data and pre-trained model reside.

Methods#

HubModel.load_model(adata=None, accelerator='auto', device='auto')[source]#

Loads the model.

Parameters:

adata (AnnData | None (default: None)) – The data to load the model with, if not None. If None, we’ll try to load the model using the data at self._adata_path. If that file does not exist, we’ll try to load the model using large_training_adata(). If that does not exist either, we’ll error out.
%(param_accelerator)s
%(param_device)s

classmethod HubModel.pull_from_huggingface_hub(repo_name, cache_dir=None, revision=None, pull_anndata=True, **kwargs)[source]#

Download the given model repo from huggingface.

The model, its card, data, metadata are downloaded to a cached location on the disk selected by huggingface and an instance of this class is created with that info and returned.

Parameters:

repo_name (str) – ID of the huggingface repo where this model needs to be uploaded
cache_dir (str | None (default: None)) – The directory where the downloaded model artifacts will be cached
revision (str | None (default: None)) – The revision to pull from the repo. This can be a branch name, a tag, or a full-length commit hash. If None, the default (latest) revision is pulled.
pull_anndata (bool (default: True)) – Whether to pull the AnnData object associated with the model. If True but the file does not exist, it will fail silently.
kwargs – Additional keyword arguments to pass to snapshot_download().

classmethod HubModel.pull_from_s3(cls, s3_bucket, s3_path, pull_anndata=True, cache_dir=None, unsigned=False, **kwargs)[source]#

Download a HubModel from an S3 bucket.

Requires boto3 to be installed.

Parameters:

s3_bucket (str) – The S3 bucket from which to download the model.
s3_path (str) – The S3 path to the saved model.
pull_anndata (bool (default: True)) – Whether to pull the AnnData object associated with the model.
cache_dir (str | None (default: None)) – The directory where the downloaded model files will be cached. Defaults to a temporary directory created with tempfile.mkdtemp().
unsigned (bool (default: False)) – Whether to use unsigned requests. If True and config is passed in kwargs, config will be overwritten.
**kwargs – Keyword arguments passed into client().

Return type:

HubModel

Returns:

The pretrained model specified by the given S3 bucket and path.

HubModel.push_to_huggingface_hub(repo_name, repo_token=None, repo_create=False, repo_create_kwargs=None, collection_name=None, push_anndata=True, **kwargs)[source]#

Push this model to huggingface.

If the dataset is too large to upload to huggingface, this will raise an: exception, prompting the user to upload the data elsewhere. Otherwise, the

data, model card, and metadata are all uploaded to the given model repo.

Parameters:

repo_name (str) – ID of the huggingface repo where this model needs to be uploaded
repo_token (str | None (default: None)) – huggingface API token with write permissions. If None, uses the token from the HuggingFace cache or HF_TOKEN environment variable.
repo_create (bool (default: False)) – Whether to create the repo
repo_create_kwargs (dict | None (default: None)) – Keyword arguments passed into create_repo() if repo_create=True.
collection_name (str | None (default: None)) – The name of the collection to which the model belongs.
push_anndata (bool (default: True)) – Whether to push the AnnData object associated with the model.
**kwargs – Additional keyword arguments passed into upload_file().

HubModel.push_to_s3(s3_bucket, s3_path, push_anndata=True, **kwargs)[source]#

Upload the HubModel to an S3 bucket.

Requires boto3 to be installed.

Parameters:

s3_bucket (str) – The S3 bucket to which to upload the model.
s3_path (str) – The S3 path where the model will be saved.
push_anndata (bool (default: True)) – Whether to push the AnnData object associated with the model.
**kwargs – Keyword arguments passed into client().

HubModel.read_adata()[source]#

Reads the data from disk (self._adata_path).

Reads if it exists. Otherwise, this is a no-op.

Return type:: None

HubModel.read_large_training_adata()[source]#

Downloads the large training adata.

If it exists, then load it into memory. Otherwise, this is a no-op.

Return type:: None

Notes

The large training data url can be a cellxgene explorer session url. However, it cannot be a self-hosted session. In other words, it must be from the cellxgene portal (https://cellxgene.cziscience.com/).

HubModel.read_mudata()[source]#

Reads the data from disk (self._mudata_path).

Reads if it exists. Otherwise, this is a no-op.

Return type:: None

HubModel.save(overwrite=False)[source]#

Save the model card and metadata to the model directory.

Parameters:: overwrite (bool (default: False)) – Whether to overwrite existing files.
Return type:: None

scvi.hub.HubModel

Contents

scvi.hub.HubModel#

Attributes table#

Methods table#

Attributes#

Methods#