scvi.model.base.SemisupervisedTrainingMixin#

class scvi.model.base.SemisupervisedTrainingMixin[source]#

General purpose semisupervised train, predict and interoperability methods.

Methods table#

get_ranked_markers([adata, attrs])

Get the ranked gene list based on highest attributions.

predict([adata, indices, soft, batch_size, ...])

Return cell label predictions.

shap_adata_predict(X)

SHAP Operator (gives soft predictions gives data X)

shap_predict([adata, indices, shap_args])

Run SHAP interpreter for a trained model and gives back shap values

train([max_epochs, n_samples_per_label, ...])

Train the model.

Methods#

SemisupervisedTrainingMixin.get_ranked_markers(adata=None, attrs=None)[source]#

Get the ranked gene list based on highest attributions.

Parameters:
  • adata (AnnData | MuData | None (default: None)) – AnnData or MuData object that has been registered via corresponding setup method in model class.

  • attrs (numpy.ndarray) – Attributions matrix.

Return type:

DataFrame

Returns:

pandas.DataFrame A pandas dataframe containing the ranked attributions for each gene

Examples

>>> attrs_df = model.get_ranked_markers(attrs)
SemisupervisedTrainingMixin.predict(adata=None, indices=None, soft=False, batch_size=None, use_posterior_mean=True, ig_interpretability=False, ig_args=None)[source]#

Return cell label predictions.

Parameters:
  • adata (default: None) – AnnData or MuData object that has been registered via corresponding setup method in model class.

  • indices (default: None) – Return probabilities for each class label.

  • soft (default: False) – If True, returns per class probabilities

  • batch_size (default: None) – Minibatch size for data loading into model. Defaults to scvi.settings.batch_size.

  • use_posterior_mean (default: True) – If True, uses the mean of the posterior distribution to predict celltype labels. Otherwise, uses a sample from the posterior distribution - this means that the predictions will be stochastic.

  • ig_interpretability (default: False) – If True, run the integrated circuits interpretability per sample and returns a score matrix, in which for each sample we score each gene for its contribution to the sample prediction

  • ig_args (default: None) – Keyword args for IntegratedGradients

SemisupervisedTrainingMixin.shap_adata_predict(X)[source]#

SHAP Operator (gives soft predictions gives data X)

SemisupervisedTrainingMixin.shap_predict(adata=None, indices=None, shap_args=None)[source]#

Run SHAP interpreter for a trained model and gives back shap values

SemisupervisedTrainingMixin.train(max_epochs=None, n_samples_per_label=None, check_val_every_n_epoch=None, train_size=0.9, validation_size=None, shuffle_set_split=True, batch_size=128, accelerator='auto', devices='auto', datasplitter_kwargs=None, plan_kwargs=None, **trainer_kwargs)[source]#

Train the model.

Parameters:
  • max_epochs (int | None (default: None)) – Number of passes through the dataset for semisupervised training.

  • n_samples_per_label (float | None (default: None)) – Number of subsamples for each label class to sample per epoch. By default, there is no label subsampling.

  • check_val_every_n_epoch (int | None (default: None)) – Frequency with which metrics are computed on the data for validation set for both the unsupervised and semisupervised trainers. If you’d like a different frequency for the semisupervised trainer, set check_val_every_n_epoch in semisupervised_train_kwargs.

  • train_size (float (default: 0.9)) – Size of training set in the range [0.0, 1.0].

  • validation_size (float | None (default: None)) – Size of the test set. If None, defaults to 1 - train_size. If train_size + validation_size < 1, the remaining cells belong to a test set.

  • shuffle_set_split (bool (default: True)) – Whether to shuffle indices before splitting. If False, the val, train, and test set are split in the sequential order of the data according to validation_size and train_size percentages.

  • batch_size (int (default: 128)) – Minibatch size to use during training.

  • accelerator (str (default: 'auto')) – Supports passing different accelerator types (“cpu”, “gpu”, “tpu”, “ipu”, “hpu”, “mps, “auto”) as well as custom accelerator instances.

  • devices (int | list[int] | str (default: 'auto')) – The devices to use. Can be set to a non-negative index (int or str), a sequence of device indices (list or comma-separated str), the value -1 to indicate all available devices, or “auto” for automatic selection based on the chosen accelerator. If set to “auto” and accelerator is not determined to be “cpu”, then devices will be set to the first available device.

  • datasplitter_kwargs (dict | None (default: None)) – Additional keyword arguments passed into SemiSupervisedDataSplitter.

  • plan_kwargs (dict | None (default: None)) – Keyword args for SemiSupervisedTrainingPlan. Keyword arguments passed to train() will overwrite values present in plan_kwargs, when appropriate.

  • **trainer_kwargs – Other keyword args for Trainer.