PoissonVI#
PoissonVI [Martens et al., 2023] (Python class POISSONVI) is
a variational model for quantitative scATAC-seq fragment counts. It follows the
peakVI-style single-cell ATAC workflow, but uses a Poisson observation model to analyze
fragment counts rather than binarized accessibility.
The advantages of PoissonVI are:
It models quantitative fragment-count information, which can retain signal that is discarded when scATAC-seq matrices are binarized.
It exposes familiar peakVI-style tasks, including latent representation learning, normalized accessibility, region factors, and differential accessibility.
It supports registered batch and covariate annotations through
setup_anndata().
The limitations of PoissonVI include:
It expects fragment-count-like input. Read-count matrices with strong even-count artifacts should be converted to fragment counts before training.
It is designed for chromatin accessibility regions, so
differential_expression()is not implemented; usedifferential_accessibility()instead.The nonlinear latent space is useful for integration and clustering, but individual latent dimensions are not directly interpretable.
Preliminaries#
PoissonVI takes as input a cell-by-region matrix \(X\) with \(N\) cells and \(M\) genomic
regions. The values should be quantitative fragment counts. For 10x scATAC-seq data, the
tutorial demonstrates converting read counts to approximate fragment counts with
reads_to_fragments() and storing them in a layer before calling
setup_anndata().
The registered AnnData object may include a batch key, labels, size factors, and other categorical or continuous covariates:
>>> POISSONVI.setup_anndata(adata, layer="fragments", batch_key="batch")
>>> model = POISSONVI(adata)
>>> model.train()
The setup method checks that the registered matrix has fragment-count-like behavior.
Model Overview#
PoissonVI reuses the variational autoencoder machinery used by other scvi-tools models, with defaults chosen to mirror peakVI where appropriate. A cell-level latent variable \(z_i\) captures biological variation in chromatin accessibility. The decoder maps this latent representation, along with registered covariates, to region-specific Poisson rates.
In the implementation, PoissonVI uses VAE with
gene_likelihood="poisson", gene-level dispersion settings that are unused by the
Poisson likelihood, layer normalization in both encoder and decoder, and LeakyReLU
activation functions.
Inference#
PoissonVI uses amortized variational inference. The encoder approximates the posterior distribution over the latent accessibility representation, and the decoder is optimized to reconstruct the registered fragment-count matrix under a Poisson likelihood.
The model also learns region-specific decoder factors. By default,
get_normalized_accessibility() temporarily removes the
region-factor bias when returning normalized accessibility, which emphasizes
cell-specific accessibility variation. Passing normalize_regions=True reintroduces the
region factors and produces estimates closer to the observed input scale.
Tasks#
Here we provide an overview of common tasks. Please see POISSONVI
for the full API reference.
Dimensionality Reduction#
The latent representation can be stored in adata.obsm and used with Scanpy for
neighbors, clustering, and visualization:
>>> adata.obsm["X_poissonvi"] = model.get_latent_representation()
Normalized Accessibility#
get_normalized_accessibility() returns normalized
accessibility estimates for cells and regions:
>>> accessibility = model.get_normalized_accessibility()
The method supports posterior sampling, region subsetting, transform_batch, and
importance weights. The learned region factors can be retrieved with
get_region_factors().
Differential Accessibility#
Differential accessibility is performed with
differential_accessibility(). The method supports both
"change" and "vanilla" modes, one-sided or two-sided tests, and optional batch
correction.