Decipher#

Decipher [Nazaret et al., 2024] (Python class Decipher) is a probabilistic model for interpretable representation learning in single-cell RNA-seq data. Decipher learns a low-dimensional latent representation for visualization and a higher-dimensional latent representation for refined cell-state information, and it provides utilities for imputed expression and trajectory-associated gene patterns.

The advantages of Decipher are:

  • It learns an interpretable latent space v, which is two-dimensional by default and can be visualized directly.

  • It also learns an intermediate latent space z, which can capture more detailed cell-state structure than the visualization space.

  • It includes helper methods for imputed gene expression, Decipher time, and gene expression patterns along a trajectory.

The limitations of Decipher include:

  • The current scvi-tools implementation registers only the count matrix, with an optional raw-count layer, and does not yet expose condition covariates through setup_anndata().

  • The tutorial currently demonstrates the basic Decipher model; a fuller implementation aligned with the original method is still under development.

  • As with other latent-variable models, trajectory interpretation depends on the quality of the learned latent representation and the user-specified trajectory.

Preliminaries#

Decipher takes as input a scRNA-seq count matrix \(X\) with \(N\) cells and \(G\) genes. The AnnData object is registered with setup_anndata(); by default, the model uses adata.X, but a count layer can be provided with the layer argument.

The model does not require batch annotations or other covariates in the current implementation. After registration, a Decipher model can be created and trained with the usual scvi-tools pattern:

>>> Decipher.setup_anndata(adata, layer="counts")
>>> model = Decipher(adata)
>>> model.train()

Model Overview#

Decipher uses two latent representations:

  • \(v_i\), a low-dimensional interpretable representation of cell \(i\).

  • \(z_i\), an intermediate latent representation that links \(v_i\) to gene expression.

The generative model starts from a standard normal prior on \(v_i\). A decoder maps \(v_i\) to the parameters of a normal distribution over \(z_i\), and a second decoder maps \(z_i\) to gene proportions. These gene proportions are multiplied by the observed library size of the cell and used as the mean of a negative binomial observation model with learned gene-specific inverse dispersion.

The default dimensions are dim_v=2 and dim_z=10. The two-dimensional v representation is intended for direct visualization, while z is intended to retain more information for reconstruction and downstream cell-state analyses.

Inference#

Decipher is implemented as a Pyro module and trained with stochastic variational inference. The guide uses encoder networks in the reverse direction: log-transformed counts are encoded to \(z_i\), and the concatenation of \(z_i\) and the log-transformed counts is encoded to \(v_i\). The beta parameter scales the KL term for $v_i`, which controls the regularization strength of the interpretable latent space.

During training, Decipher uses validation predictive log likelihood as the default early-stopping monitor when early stopping is enabled.

Tasks#

Here we provide an overview of common tasks. Please see Decipher for the full API reference.

Latent Representation#

The default latent representation is \(v_i\):

>>> adata.obsm["X_decipher_v"] = model.get_latent_representation()

The intermediate representation \(z_i\) can be returned with give_z=True:

>>> adata.obsm["X_decipher_z"] = model.get_latent_representation(give_z=True)

Imputed Expression#

compute_imputed_gene_expression() decodes cells through the model and returns imputed gene expression on the scale of each cell’s observed library size. When compute_covariances=True, the method also returns covariances between the imputed expression and the stored Decipher v and z representations.

Trajectories and Gene Patterns#

The Decipher utilities include a Trajectory object for representing paths through the Decipher latent space. Given a trajectory and cluster assignments, compute_decipher_time() estimates a Decipher time for cells on the trajectory with K-nearest-neighbor regression. The compute_gene_patterns() method then decodes points along the trajectory to summarize gene expression patterns with means and quantiles.