scvi.criticism.PosteriorPredictiveCheck#

class scvi.criticism.PosteriorPredictiveCheck(adata, models_dict, count_layer_key=None, n_samples=10, indices=None)[source]#

EXPERIMENTAL Posterior predictive checks for comparing scRNA-seq generative models.

Parameters:
  • adata (AnnData) – AnnData object with raw counts in either adata.X or adata.layers.

  • models_dict (dict[str, BaseModelClass]) – Dictionary of models to compare.

  • count_layer_key (str | None (default: None)) – Key in adata.layers to use as raw counts. If None, defaults to adata.X.

  • n_samples (int (default: 10)) – Number of posterior predictive samples to generate.

  • indices (list | None (default: None)) – Indices of observations in adata to subset to before generating posterior predictive samples and computing metrics. If None, defaults to all observations in adata.

Methods table#

calibration_error([confidence_intervals])

Calibration error for each observed count.

coefficient_of_variation([dim])

Calculate the coefficient of variation (CV) for each model and the raw counts.

differential_expression(de_groupby[, ...])

Compute differential expression (DE) metrics.

zero_fraction()

Fraction of zeros in raw counts for a specific gene

Methods#

PosteriorPredictiveCheck.calibration_error(confidence_intervals=None)[source]#

Calibration error for each observed count.

For a series of credible intervals of the samples, the fraction of observed counts that fall within the credible interval is computed. The calibration error is then the squared difference between the observed fraction and the true interval width.

For this metric, lower is better.

Parameters:

confidence_intervals (list[float] | float (default: None)) – List of confidence intervals to compute calibration error for. E.g., [0.01, 0.02, 0.98, 0.99]

Return type:

None

Notes

This does not work on sparse data and can cause large memory usage.

PosteriorPredictiveCheck.coefficient_of_variation(dim='cells')[source]#

Calculate the coefficient of variation (CV) for each model and the raw counts.

The CV is computed over the cells or features dimension per sample. The mean CV is then computed over all samples.

Parameters:

dim (Literal['cells', 'features'] (default: 'cells')) – Dimension to compute CV over.

Return type:

None

PosteriorPredictiveCheck.differential_expression(de_groupby, de_method='t-test', n_samples=1, cell_scale_factor=10000.0, p_val_thresh=0.001, n_top_genes_fallback=100)[source]#

Compute differential expression (DE) metrics.

If n_samples > 1, all metrics are averaged over a posterior predictive dataset.

Parameters:
  • de_groupby (str) – The column name in adata_obs_raw that contains the groupby information.

  • de_method (str (default: 't-test')) – The DE method to use. See rank_genes_groups() for more details.

  • n_samples (int (default: 1)) – The number of posterior predictive samples to use for the DE analysis.

  • cell_scale_factor (float (default: 10000.0)) – The cell scale factor to use for normalization before DE.

  • p_val_thresh (float (default: 0.001)) – The p-value threshold to use for the DE analysis.

  • n_top_genes_fallback (int (default: 100)) – The number of top genes to use for the DE analysis if the number of genes with a p-value < p_val_thresh is zero.

PosteriorPredictiveCheck.zero_fraction()[source]#

Fraction of zeros in raw counts for a specific gene

Return type:

None