# Minification

Minification refers to the process of reducing the amount of content in your dataset in a smart way. This can be useful for various sorts of reasons and there can be different ways you might want to do this (we call these minification types). Currently, the only type of minification we support is one where we replace the count data with the parameters of the latent posterior distribution, estimated by a trained model. We will focus this tutorial on this type of minification.

There are multiple motivations for minifying the data in this way:

- The data is more compact, so it takes up less space on disk and in memory.
- Data transfer (share, upload, download) is more smooth owing to the smaller data size.
- By using the latent posterior parameters, we can skip the encoder network and save on computation time.

The reason why this is that most post-training routines for scvi-tools models do not in fact require the full counts. Once your model is trained, you essentially only need the model weights and the pre-computed embeddings to carry out analyses. There are certain exceptions to this, but those routines will alert you if you try to call them with a minified dataset.

<img src="https://raw.githubusercontent.com/scverse/scvi-tutorials/main/figures/minification.svg?raw=true" alt="Minification overview" />

Moreover, you can actually use the latent posterior and the decoder network to estimate the original counts! This is of course not the exact same thing as using your actual full counts, but we can show that it is a good approximation using posterior predictive metrics (paper link tbd).

Let's now see how to minify a dataset and use the corresponding model.

```{note}
Running the following cell will install tutorial dependencies on Google Colab only. It will have no effect on environments other than Google Colab.
```

In [1]:
!pip install --quiet scvi-colab
from scvi_colab import install

install()

[0m

                Not currently in Google Colab environment.

                Please run with `run_outside_colab=True` to override.

                Returning with no further action.
                
  warn(


In [2]:
import os
import tempfile

import scanpy as sc
import scvi
import seaborn as sns
import torch

In [3]:
scvi.settings.seed = 0
print("Last run with scvi-tools version:", scvi.__version__)

Seed set to 0


Last run with scvi-tools version: 1.1.0


```{note}
You can modify `save_dir` below to change where the data files for this tutorial are saved.
```

In [4]:
sc.set_figure_params(figsize=(6, 6), frameon=False)
sns.set_theme()
torch.set_float32_matmul_precision("high")
save_dir = tempfile.TemporaryDirectory()

%config InlineBackend.print_figure_kwargs={"facecolor": "w"}
%config InlineBackend.figure_format="retina"

## Get the data and model

Here we use the data and pre-trained model obtained from running [this](https://docs.scvi-tools.org/en/stable/tutorials/notebooks/api_overview.html) scvi-tools tutorial.

The dataset used is a subset of the heart cell atlas dataset:\
Litviňuková, M., Talavera-López, C., Maatz, H., Reichart, D., Worth, C. L., Lindberg, E. L., … & Teichmann, S. A. (2020). Cells of the adult human heart. Nature, 588(7838), 466-472.

Let's train the model as usual. Also save the model and data on disk as we'll need them later.

In [5]:
adata = scvi.data.heart_cell_atlas_subsampled(save_path=save_dir.name)

[34mINFO    [0m Downloading file at [35m/tmp/tmp6x2zn3yt/[0m[95mhca_subsampled_20k.h5ad[0m                                              


Downloading...:   0%|          | 0/65714.0 [00:00<?, ?it/s]

Downloading...:  10%|▉         | 6448/65714.0 [00:00<00:00, 64375.54it/s]

Downloading...:  28%|██▊       | 18216/65714.0 [00:00<00:00, 95691.58it/s]

Downloading...:  46%|████▌     | 29966/65714.0 [00:00<00:00, 105606.30it/s]

Downloading...:  63%|██████▎   | 41676/65714.0 [00:00<00:00, 110105.58it/s]

Downloading...:  81%|████████▏ | 53439/65714.0 [00:00<00:00, 112782.94it/s]

Downloading...:  99%|█████████▉| 65204/65714.0 [00:00<00:00, 114404.89it/s]

Downloading...: 100%|██████████| 65714/65714.0 [00:00<00:00, 108552.60it/s]




In [6]:
sc.pp.filter_genes(adata, min_counts=3)
adata.layers["counts"] = adata.X.copy()
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)
adata.raw = adata
sc.pp.highly_variable_genes(
    adata,
    n_top_genes=1200,
    subset=True,
    layer="counts",
    flavor="seurat_v3",
    batch_key="cell_source",
)

In [7]:
scvi.model.SCVI.setup_anndata(
    adata,
    layer="counts",
    categorical_covariate_keys=["cell_source", "donor"],
    continuous_covariate_keys=["percent_mito", "percent_ribo"],
)
model = scvi.model.SCVI(adata)


For instance checks, use `isinstance(X, (anndata.experimental.CSRDataset, anndata.experimental.CSCDataset))` instead.

For creation, use `anndata.experimental.sparse_dataset(X)` instead.



In [8]:
model.train(max_epochs=20)

GPU available: True (cuda), used: True


TPU available: False, using: 0 TPU cores


IPU available: False, using: 0 IPUs


HPU available: False, using: 0 HPUs


LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


/env/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=19` in the `DataLoader` to improve performance.


Training:   0%|          | 0/20 [00:00<?, ?it/s]

Epoch 1/20:   0%|          | 0/20 [00:00<?, ?it/s]

Epoch 1/20:   5%|▌         | 1/20 [00:01<00:23,  1.21s/it]

Epoch 1/20:   5%|▌         | 1/20 [00:01<00:23,  1.21s/it, v_num=1, train_loss_step=331, train_loss_epoch=387]

Epoch 2/20:   5%|▌         | 1/20 [00:01<00:23,  1.21s/it, v_num=1, train_loss_step=331, train_loss_epoch=387]

Epoch 2/20:  10%|█         | 2/20 [00:01<00:15,  1.13it/s, v_num=1, train_loss_step=331, train_loss_epoch=387]

Epoch 2/20:  10%|█         | 2/20 [00:01<00:15,  1.13it/s, v_num=1, train_loss_step=392, train_loss_epoch=318]

Epoch 3/20:  10%|█         | 2/20 [00:01<00:15,  1.13it/s, v_num=1, train_loss_step=392, train_loss_epoch=318]

Epoch 3/20:  15%|█▌        | 3/20 [00:02<00:13,  1.28it/s, v_num=1, train_loss_step=392, train_loss_epoch=318]

Epoch 3/20:  15%|█▌        | 3/20 [00:02<00:13,  1.28it/s, v_num=1, train_loss_step=328, train_loss_epoch=306]

Epoch 4/20:  15%|█▌        | 3/20 [00:02<00:13,  1.28it/s, v_num=1, train_loss_step=328, train_loss_epoch=306]

Epoch 4/20:  20%|██        | 4/20 [00:03<00:11,  1.37it/s, v_num=1, train_loss_step=328, train_loss_epoch=306]

Epoch 4/20:  20%|██        | 4/20 [00:03<00:11,  1.37it/s, v_num=1, train_loss_step=259, train_loss_epoch=300]

Epoch 5/20:  20%|██        | 4/20 [00:03<00:11,  1.37it/s, v_num=1, train_loss_step=259, train_loss_epoch=300]

Epoch 5/20:  25%|██▌       | 5/20 [00:03<00:10,  1.42it/s, v_num=1, train_loss_step=259, train_loss_epoch=300]

Epoch 5/20:  25%|██▌       | 5/20 [00:03<00:10,  1.42it/s, v_num=1, train_loss_step=283, train_loss_epoch=296]

Epoch 6/20:  25%|██▌       | 5/20 [00:03<00:10,  1.42it/s, v_num=1, train_loss_step=283, train_loss_epoch=296]

Epoch 6/20:  30%|███       | 6/20 [00:04<00:09,  1.46it/s, v_num=1, train_loss_step=283, train_loss_epoch=296]

Epoch 6/20:  30%|███       | 6/20 [00:04<00:09,  1.46it/s, v_num=1, train_loss_step=258, train_loss_epoch=293]

Epoch 7/20:  30%|███       | 6/20 [00:04<00:09,  1.46it/s, v_num=1, train_loss_step=258, train_loss_epoch=293]

Epoch 7/20:  35%|███▌      | 7/20 [00:05<00:08,  1.48it/s, v_num=1, train_loss_step=258, train_loss_epoch=293]

Epoch 7/20:  35%|███▌      | 7/20 [00:05<00:08,  1.48it/s, v_num=1, train_loss_step=346, train_loss_epoch=291]

Epoch 8/20:  35%|███▌      | 7/20 [00:05<00:08,  1.48it/s, v_num=1, train_loss_step=346, train_loss_epoch=291]

Epoch 8/20:  40%|████      | 8/20 [00:05<00:08,  1.49it/s, v_num=1, train_loss_step=346, train_loss_epoch=291]

Epoch 8/20:  40%|████      | 8/20 [00:05<00:08,  1.49it/s, v_num=1, train_loss_step=278, train_loss_epoch=290]

Epoch 9/20:  40%|████      | 8/20 [00:05<00:08,  1.49it/s, v_num=1, train_loss_step=278, train_loss_epoch=290]

Epoch 9/20:  45%|████▌     | 9/20 [00:06<00:07,  1.50it/s, v_num=1, train_loss_step=278, train_loss_epoch=290]

Epoch 9/20:  45%|████▌     | 9/20 [00:06<00:07,  1.50it/s, v_num=1, train_loss_step=277, train_loss_epoch=288]

Epoch 10/20:  45%|████▌     | 9/20 [00:06<00:07,  1.50it/s, v_num=1, train_loss_step=277, train_loss_epoch=288]

Epoch 10/20:  50%|█████     | 10/20 [00:07<00:06,  1.51it/s, v_num=1, train_loss_step=277, train_loss_epoch=288]

Epoch 10/20:  50%|█████     | 10/20 [00:07<00:06,  1.51it/s, v_num=1, train_loss_step=431, train_loss_epoch=287]

Epoch 11/20:  50%|█████     | 10/20 [00:07<00:06,  1.51it/s, v_num=1, train_loss_step=431, train_loss_epoch=287]

Epoch 11/20:  55%|█████▌    | 11/20 [00:07<00:05,  1.52it/s, v_num=1, train_loss_step=431, train_loss_epoch=287]

Epoch 11/20:  55%|█████▌    | 11/20 [00:07<00:05,  1.52it/s, v_num=1, train_loss_step=302, train_loss_epoch=287]

Epoch 12/20:  55%|█████▌    | 11/20 [00:07<00:05,  1.52it/s, v_num=1, train_loss_step=302, train_loss_epoch=287]

Epoch 12/20:  60%|██████    | 12/20 [00:08<00:05,  1.52it/s, v_num=1, train_loss_step=302, train_loss_epoch=287]

Epoch 12/20:  60%|██████    | 12/20 [00:08<00:05,  1.52it/s, v_num=1, train_loss_step=228, train_loss_epoch=286]

Epoch 13/20:  60%|██████    | 12/20 [00:08<00:05,  1.52it/s, v_num=1, train_loss_step=228, train_loss_epoch=286]

Epoch 13/20:  65%|██████▌   | 13/20 [00:09<00:04,  1.52it/s, v_num=1, train_loss_step=228, train_loss_epoch=286]

Epoch 13/20:  65%|██████▌   | 13/20 [00:09<00:04,  1.52it/s, v_num=1, train_loss_step=296, train_loss_epoch=285]

Epoch 14/20:  65%|██████▌   | 13/20 [00:09<00:04,  1.52it/s, v_num=1, train_loss_step=296, train_loss_epoch=285]

Epoch 14/20:  70%|███████   | 14/20 [00:09<00:03,  1.52it/s, v_num=1, train_loss_step=296, train_loss_epoch=285]

Epoch 14/20:  70%|███████   | 14/20 [00:09<00:03,  1.52it/s, v_num=1, train_loss_step=259, train_loss_epoch=284]

Epoch 15/20:  70%|███████   | 14/20 [00:09<00:03,  1.52it/s, v_num=1, train_loss_step=259, train_loss_epoch=284]

Epoch 15/20:  75%|███████▌  | 15/20 [00:10<00:03,  1.52it/s, v_num=1, train_loss_step=259, train_loss_epoch=284]

Epoch 15/20:  75%|███████▌  | 15/20 [00:10<00:03,  1.52it/s, v_num=1, train_loss_step=322, train_loss_epoch=284]

Epoch 16/20:  75%|███████▌  | 15/20 [00:10<00:03,  1.52it/s, v_num=1, train_loss_step=322, train_loss_epoch=284]

Epoch 16/20:  80%|████████  | 16/20 [00:11<00:02,  1.53it/s, v_num=1, train_loss_step=322, train_loss_epoch=284]

Epoch 16/20:  80%|████████  | 16/20 [00:11<00:02,  1.53it/s, v_num=1, train_loss_step=208, train_loss_epoch=283]

Epoch 17/20:  80%|████████  | 16/20 [00:11<00:02,  1.53it/s, v_num=1, train_loss_step=208, train_loss_epoch=283]

Epoch 17/20:  85%|████████▌ | 17/20 [00:11<00:01,  1.53it/s, v_num=1, train_loss_step=208, train_loss_epoch=283]

Epoch 17/20:  85%|████████▌ | 17/20 [00:11<00:01,  1.53it/s, v_num=1, train_loss_step=240, train_loss_epoch=283]

Epoch 18/20:  85%|████████▌ | 17/20 [00:11<00:01,  1.53it/s, v_num=1, train_loss_step=240, train_loss_epoch=283]

Epoch 18/20:  90%|█████████ | 18/20 [00:12<00:01,  1.53it/s, v_num=1, train_loss_step=240, train_loss_epoch=283]

Epoch 18/20:  90%|█████████ | 18/20 [00:12<00:01,  1.53it/s, v_num=1, train_loss_step=292, train_loss_epoch=282]

Epoch 19/20:  90%|█████████ | 18/20 [00:12<00:01,  1.53it/s, v_num=1, train_loss_step=292, train_loss_epoch=282]

Epoch 19/20:  95%|█████████▌| 19/20 [00:12<00:00,  1.54it/s, v_num=1, train_loss_step=292, train_loss_epoch=282]

Epoch 19/20:  95%|█████████▌| 19/20 [00:12<00:00,  1.54it/s, v_num=1, train_loss_step=346, train_loss_epoch=282]

Epoch 20/20:  95%|█████████▌| 19/20 [00:12<00:00,  1.54it/s, v_num=1, train_loss_step=346, train_loss_epoch=282]

Epoch 20/20: 100%|██████████| 20/20 [00:13<00:00,  1.55it/s, v_num=1, train_loss_step=346, train_loss_epoch=282]

Epoch 20/20: 100%|██████████| 20/20 [00:13<00:00,  1.55it/s, v_num=1, train_loss_step=273, train_loss_epoch=282]

`Trainer.fit` stopped: `max_epochs=20` reached.


Epoch 20/20: 100%|██████████| 20/20 [00:13<00:00,  1.47it/s, v_num=1, train_loss_step=273, train_loss_epoch=282]




In [9]:
model_path = os.path.join(save_dir.name, "scvi_hca")
model.save(model_path, save_anndata=True, overwrite=True)

In [10]:
model = scvi.model.SCVI.load(model_path)
model

[34mINFO    [0m File [35m/tmp/tmp6x2zn3yt/scvi_hca/[0m[95mmodel.pt[0m already downloaded                                                






Note that, as expected, "Model's adata is minified" is False.

In [11]:
model.adata

AnnData object with n_obs × n_vars = 18641 × 1200
    obs: 'NRP', 'age_group', 'cell_source', 'cell_type', 'donor', 'gender', 'n_counts', 'n_genes', 'percent_mito', 'percent_ribo', 'region', 'sample', 'scrublet_score', 'source', 'type', 'version', 'cell_states', 'Used', '_scvi_batch', '_scvi_labels'
    var: 'gene_ids-Harvard-Nuclei', 'feature_types-Harvard-Nuclei', 'gene_ids-Sanger-Nuclei', 'feature_types-Sanger-Nuclei', 'gene_ids-Sanger-Cells', 'feature_types-Sanger-Cells', 'gene_ids-Sanger-CD45', 'feature_types-Sanger-CD45', 'n_counts', 'highly_variable', 'highly_variable_rank', 'means', 'variances', 'variances_norm', 'highly_variable_nbatches'
    uns: '_scvi_manager_uuid', '_scvi_uuid', 'cell_type_colors', 'hvg', 'log1p'
    obsm: '_scvi_extra_categorical_covs', '_scvi_extra_continuous_covs'
    layers: 'counts'

Notice that in addition to `adata.X`, we also have a layer (`counts`) and a `raw` attribute.

In [12]:
model.adata.raw

<anndata._core.raw.Raw at 0x7f8d3e9f6690>

Let's also save a reference to `model.adata`. We'll see later that this remains unchanged because **minification is not an inplace procedure**.

In [13]:
bdata = model.adata
bdata is model.adata  # this should be True because we didn't copy the anndata object

True

## Minify

To minify the data, all we need to do is:

1. get the latent representation and store it in the adata
1. call `model.minify_adata()`

In [14]:
qzm, qzv = model.get_latent_representation(give_mean=False, return_dist=True)
model.adata.obsm["X_latent_qzm"] = qzm
model.adata.obsm["X_latent_qzv"] = qzv

model.minify_adata()

[34mINFO    [0m Input AnnData not setup with scvi-tools. attempting to transfer AnnData setup                             


[34mINFO    [0m Generating sequential column names                                                                        


[34mINFO    [0m Generating sequential column names                                                                        


In [15]:
model



As expected, "Model's adata is minified" is now True. Also, we can check the model's `minified_data_type`:

In [16]:
model.minified_data_type

'latent_posterior_parameters'

Let's check out the data now:

In [17]:
model.adata

AnnData object with n_obs × n_vars = 18641 × 1200
    obs: 'NRP', 'age_group', 'cell_source', 'cell_type', 'donor', 'gender', 'n_counts', 'n_genes', 'percent_mito', 'percent_ribo', 'region', 'sample', 'scrublet_score', 'source', 'type', 'version', 'cell_states', 'Used', '_scvi_batch', '_scvi_labels', '_scvi_observed_lib_size'
    var: 'gene_ids-Harvard-Nuclei', 'feature_types-Harvard-Nuclei', 'gene_ids-Sanger-Nuclei', 'feature_types-Sanger-Nuclei', 'gene_ids-Sanger-Cells', 'feature_types-Sanger-Cells', 'gene_ids-Sanger-CD45', 'feature_types-Sanger-CD45', 'n_counts', 'highly_variable', 'highly_variable_rank', 'means', 'variances', 'variances_norm', 'highly_variable_nbatches'
    uns: '_scvi_manager_uuid', 'cell_type_colors', 'hvg', 'log1p', '_scvi_adata_minify_type', '_scvi_uuid'
    obsm: '_scvi_extra_categorical_covs', '_scvi_extra_continuous_covs', 'X_latent_qzm', 'X_latent_qzv', '_scvi_latent_qzm', '_scvi_latent_qzv'
    layers: 'counts'

First, let's check that the original adata was not modified (minification is not inplace):

In [18]:
model.adata is bdata

False

Next, we see that we still have the same number of obs and vars: 18641 × 1200. This seems strange! Didn't we say we minized the data? We did. The way we did that is we "emptied" the contents of `adata.X`, `adata.layers["counts"]`, and `adata.raw`. Instead, we cached the much smaller latent posterior parameters in `adata.obsm["_scvi_latent_qzm"]` and `adata.obsm["_scvi_latent_qzv"]`. Let's double check that:

In [19]:
model.adata.X

<18641x1200 sparse matrix of type '<class 'numpy.float64'>'
	with 0 stored elements in Compressed Sparse Row format>

In [20]:
model.adata.layers["counts"]

<18641x1200 sparse matrix of type '<class 'numpy.float64'>'
	with 0 stored elements in Compressed Sparse Row format>

In [21]:
model.adata.raw is None

True

In [22]:
bdata

AnnData object with n_obs × n_vars = 18641 × 1200
    obs: 'NRP', 'age_group', 'cell_source', 'cell_type', 'donor', 'gender', 'n_counts', 'n_genes', 'percent_mito', 'percent_ribo', 'region', 'sample', 'scrublet_score', 'source', 'type', 'version', 'cell_states', 'Used', '_scvi_batch', '_scvi_labels'
    var: 'gene_ids-Harvard-Nuclei', 'feature_types-Harvard-Nuclei', 'gene_ids-Sanger-Nuclei', 'feature_types-Sanger-Nuclei', 'gene_ids-Sanger-Cells', 'feature_types-Sanger-Cells', 'gene_ids-Sanger-CD45', 'feature_types-Sanger-CD45', 'n_counts', 'highly_variable', 'highly_variable_rank', 'means', 'variances', 'variances_norm', 'highly_variable_nbatches'
    uns: '_scvi_manager_uuid', '_scvi_uuid', 'cell_type_colors', 'hvg', 'log1p'
    obsm: '_scvi_extra_categorical_covs', '_scvi_extra_continuous_covs', 'X_latent_qzm', 'X_latent_qzv'
    layers: 'counts'

Everything else is the same, all the other metadata is there.

But is the data really smaller now? Let's check:

In [23]:
minified_model_path = os.path.join(save_dir.name, "scvi_hca_minified")
model.save(minified_model_path, save_anndata=True, overwrite=True)

In [24]:
before = os.path.getsize(os.path.join(model_path, "adata.h5ad")) // (1024 * 1024)
after = os.path.getsize(os.path.join(minified_model_path, "adata.h5ad")) // (1024 * 1024)

print(f"AnnData size before minification: {before} MB")
print(f"AnnData size after minification: {after} MB")

AnnData size before minification: 212 MB
AnnData size after minification: 8 MB


We also see a a new uns key called `_scvi_adata_minify_type`. This specifies the type of minification. It's the same as `model.minified_data_type`. In fact this is a quick way to tell if your data is minified. We also expose a utility function to check that quickly.

In [25]:
model.adata.uns["_scvi_adata_minify_type"]

'latent_posterior_parameters'

In [26]:
scvi.data._utils._is_minified(model.adata)

True

Last but not least, you might have noticed that there is a new obs columns called `_scvi_observed_lib_size`. We add the pre-computed per-cell library sizes to this column and use it during inference, because the minified data is deprived of the full counts.

Another claim we made earlier is that analysis functions are faster if you use the minified data. Let's time how much they take. Here we'll look at the `get_likelihood_parameters` method.

In [27]:
model_orig = scvi.model.SCVI.load(model_path)

print("Running `get_likelihood_parameters` without minified data...")
%timeit model_orig.get_likelihood_parameters(n_samples=3, give_mean=True)

[34mINFO    [0m File [35m/tmp/tmp6x2zn3yt/scvi_hca/[0m[95mmodel.pt[0m already downloaded                                                


Running `get_likelihood_parameters` without minified data...




2.93 s ± 46.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [28]:
print("Running `get_likelihood_parameters` with minified data...")
%timeit model.get_likelihood_parameters(n_samples=3, give_mean=True)

Running `get_likelihood_parameters` with minified data...



For instance checks, use `isinstance(X, (anndata.experimental.CSRDataset, anndata.experimental.CSCDataset))` instead.

For creation, use `anndata.experimental.sparse_dataset(X)` instead.



2.94 s ± 47.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


Time savings are not very sharp in the case of this dataset, but there are some marginal savings regardless.

## Save and load

Just like a regular model, you can save the model and its minified data, and load them back in:

In [29]:
model.save(minified_model_path, overwrite=True, save_anndata=True)

# load saved model with saved (minified) adata
loaded_model = scvi.model.SCVI.load(minified_model_path)
loaded_model

[34mINFO    [0m File [35m/tmp/tmp6x2zn3yt/scvi_hca_minified/[0m[95mmodel.pt[0m already downloaded                                       






Next, let's load the model with a non-minified data.

In [30]:
loaded_model = scvi.model.SCVI.load(model_path, adata=bdata)
loaded_model

[34mINFO    [0m File [35m/tmp/tmp6x2zn3yt/scvi_hca/[0m[95mmodel.pt[0m already downloaded                                                




So if you want to "undo" the minification procedure, so to speak, you can always load your model with the non-minified data (if you still have it), or any other non-minified data for that matter, as long as it's compatible with the model of course.

Last but not least, let's see what happens if we try to load a model whose adata was not minified, with a dataset that is minified:

In [31]:
scvi.data._utils._is_minified(model.adata)

True

In [32]:
try:
    scvi.model.SCVI.load(model_path, adata=model.adata)
except KeyError as e:
    print("KeyError: " + str(e))

[34mINFO    [0m File [35m/tmp/tmp6x2zn3yt/scvi_hca/[0m[95mmodel.pt[0m already downloaded                                                


KeyError: 'state_registry'


As we see, this is not allowed. This is because when you try to load a model with another dataset, we try to validate that dataset against the model's registry. In this case, the data is not compatible with the model registry because it has attributes pertaining to minification, which this model is not aware of.

## Support

Minification is not supported for all models yet. A model supports this functionality if and only if it inherits from the `BaseMinifiedModeModelClass` class. A model that does not support this:

- does not have a `minify_adata()` method
- cannot be loaded with a minified data. If you try to do this you will see this error:
  "The MyModel model currently does not support minified data."

To support minification for your own model, inherit your model class from the `BaseMinifiedModeModelClass` and your module class from the `BaseMinifiedModeModuleClass`.