Posterior

class scvi.inference.Posterior(model, gene_dataset, shuffle=False, indices=None, use_cuda=True, data_loader_kwargs={})[source]

Bases: object

The functional data unit.

A Posterior instance is instantiated with a model and a gene_dataset, and as well as additional arguments that for Pytorch’s DataLoader. A subset of indices can be specified, for purposes such as splitting the data into train/test or labelled/unlabelled (for semi-supervised learning). Each trainer instance of the Trainer class can therefore have multiple Posterior instances to train a model. A Posterior instance also comes with many methods or utilities for its corresponding data.

Parameters
  • model – A model instance from class VAE, VAEC, SCANVI

  • gene_dataset (GeneExpressionDatasetGeneExpressionDataset) – A gene_dataset instance like CortexDataset()

  • shuffle – Specifies if a RandomSampler or a SequentialSampler should be used

  • indices – Specifies how the data should be split with regards to train/test or labelled/unlabelled

  • use_cuda – Default: True

  • data_loader_kwargs – Keyword arguments to passed into the DataLoader

Examples

Let us instantiate a trainer, with a gene_dataset and a model

A UnsupervisedTrainer instance has two Posterior attributes: train_set and test_set For this subset of the original gene_dataset instance, we can examine the differential expression, log_likelihood, entropy batch mixing, … or display the TSNE of the data in the latent space through the scVI model

>>> gene_dataset = CortexDataset()
>>> vae = VAE(gene_dataset.nb_genes, n_batch=gene_dataset.n_batches * False,
... n_labels=gene_dataset.n_labels, use_cuda=True)
>>> trainer = UnsupervisedTrainer(vae, gene_dataset)
>>> trainer.train(n_epochs=50)
>>> trainer.train_set.differential_expression_stats()
>>> trainer.train_set.reconstruction_error()
>>> trainer.train_set.entropy_batch_mixing()
>>> trainer.train_set.show_t_sne(n_samples=1000, color_by="labels")

Attributes Summary

indices

Returns the current dataloader indices used by the object

nb_cells

returns the number of studied cells.

posterior_type

Returns the posterior class name

Methods Summary

accuracy()

apply_t_sne(latent[, n_samples])

rtype

TupleTuple

clustering_scores([prediction_algorithm])

rtype

TupleTuple

corrupted()

Corrupts gene counts.

differential_expression_score(idx1, idx2[, …])

Unified method for differential expression inference.

differential_expression_stats([M_sampling])

Output average over statistics in a symmetric way (a against b), forget the sets if permutation is True

elbo()

Returns the Evidence Lower Bound associated to the object.

entropy_batch_mixing(**kwargs)

Returns the object’s entropy batch mixing.

generate([n_samples, genes, batch_size])

Create observation samples from the Posterior Predictive distribution

generate_denoised_samples([n_samples, …])

Return samples from an adjusted posterior predictive.

generate_feature_correlation_matrix([…])

Wrapper of generate_denoised_samples() to create a gene-gene corr matrix

generate_parameters([n_samples, give_mean])

Estimates data’s count means, dispersions and dropout logits.

get_bayes_factors(idx1, idx2[, mode, …])

A unified method for differential expression inference.

get_latent([give_mean])

Output posterior z mean or sample, batch index, and label

get_sample_scale([transform_batch, …])

Returns the frequencies of expression for the data.

get_stats()

rtype

ndarrayndarray

imputation([n_samples, transform_batch])

Imputes px_rate over self cells

imputation_benchmark([n_samples, show_plot, …])

Visualizes the model imputation performance.

imputation_list([n_samples])

Imputes data’s gene counts from corrupted data.

imputation_score([original_list, …])

Computes median absolute imputation error.

knn_purity()

Computes kNN purity as described in [Lopez18]

marginal_ll([n_mc_samples])

Estimates the marginal likelihood of the object’s data.

nn_overlap_score(**kwargs)

Quantify how much the similarity between cells in the mRNA latent space resembles their similarity at the protein level.

one_vs_all_degenes([subset, cell_labels, …])

Performs one population vs all others Differential Expression Analysis

raw_data()

Returns raw data for classification

reconstruction_error()

Returns the reconstruction error associated to the object.

save_posterior(dir_path)

Saves the posterior properties in folder dir_path.

scale_sampler(selection[, n_samples, …])

Samples the posterior scale using the variational posterior distribution.

sequential([batch_size])

Returns a copy of the object that iterate over the data sequentially.

show_t_sne([n_samples, color_by, save_name, …])

to_cuda(tensors)

Converts list of tensors to cuda.

uncorrupted()

Uncorrupts gene counts.

update(data_loader_kwargs)

Updates the dataloader

update_sampler_indices(idx)

Updates the dataloader indices.

within_cluster_degenes(states[, …])

Performs Differential Expression within clusters for different cell states

Attributes Documentation

indices

Returns the current dataloader indices used by the object

Return type

ndarrayndarray

nb_cells

returns the number of studied cells.

Return type

intint

posterior_type

Returns the posterior class name

Return type

strstr

Methods Documentation

accuracy()[source]
static apply_t_sne(latent, n_samples=1000)[source]
Return type

TupleTuple

clustering_scores(prediction_algorithm='knn')[source]
Return type

TupleTuple

corrupted()[source]

Corrupts gene counts.

Return type

PosteriorPosterior

differential_expression_score(idx1, idx2, mode='vanilla', batchid1=None, batchid2=None, use_observed_batches=False, n_samples=5000, use_permutation=False, M_permutation=10000, all_stats=True, change_fn=None, m1_domain_fn=None, delta=0.5, cred_interval_lvls=None, **kwargs)[source]

Unified method for differential expression inference.

This function is an extension of the get_bayes_factors method providing additional genes information to the user

Two modes coexist:

  • the “vanilla” mode follows protocol described in [Lopez18]

In this case, we perform hypothesis testing based on the hypotheses

\[M_1: h_1 > h_2 ~\text{and}~ M_2: h_1 \leq h_2\]

DE can then be based on the study of the Bayes factors

\[\log p(M_1 | x_1, x_2) / p(M_2 | x_1, x_2)\]

consists in estimating an effect size random variable (e.g., log fold-change) and performing Bayesian hypothesis testing on this variable. The change_fn function computes the effect size variable r based two inputs corresponding to the normalized means in both populations.

Hypotheses:

\[M_1: r \in R_1 ~\text{(effect size r in region inducing differential expression)}\]
\[M_2: r \notin R_1 ~\text{(no differential expression)}\]

To characterize the region \(R_1\), which induces DE, the user has two choices.

1. A common case is when the region \([-\delta, \delta]\) does not induce differential expression. If the user specifies a threshold delta, we suppose that \(R_1 = \mathbb{R} \setminus [-\delta, \delta]\)

  1. specify an specific indicator function

\[f: \mathbb{R} \mapsto \{0, 1\} ~\text{s.t.}~ r \in R_1 ~\text{iff.}~ f(r) = 1\]

Decision-making can then be based on the estimates of

\[p(M_1 \mid x_1, x_2)\]

Both modes require to sample the normalized means posteriors. To that purpose, we sample the Posterior in the following way:

  1. The posterior is sampled n_samples times for each subpopulation

  2. For computation efficiency (posterior sampling is quite expensive), instead of

    comparing the obtained samples element-wise, we can permute posterior samples. Remember that computing the Bayes Factor requires sampling \(q(z_A \mid x_A)\) and \(q(z_B \mid x_B)\)

Currently, the code covers several batch handling configurations:

1. If use_observed_batches=True, then batch are considered as observations and cells’ normalized means are conditioned on real batch observations

2. If case (cell group 1) and control (cell group 2) are conditioned on the same batch ids. Examples:

>>> set(batchid1) = set(batchid2)

or

>>> batchid1 = batchid2 = None

3. If case and control are conditioned on different batch ids that do not intersect i.e.,

>>> set(batchid1) != set(batchid2)

and

>>> len(set(batchid1).intersection(set(batchid2))) == 0

This function does not cover other cases yet and will warn users in such cases.

Parameters
  • mode (str, NoneOptional[str]) – one of [“vanilla”, “change”]

  • idx1 (List[bool], ndarrayUnion[List[bool], ndarray]) – bool array masking subpopulation cells 1. Should be True where cell is from associated population

  • idx2 (List[bool], ndarrayUnion[List[bool], ndarray]) – bool array masking subpopulation cells 2. Should be True where cell is from associated population

  • batchid1 (List[int], ndarray, NoneUnion[List[int], ndarray, None]) – List of batch ids for which you want to perform DE Analysis for subpopulation 1. By default, all ids are taken into account

  • batchid2 (List[int], ndarray, NoneUnion[List[int], ndarray, None]) – List of batch ids for which you want to perform DE Analysis for subpopulation 2. By default, all ids are taken into account

  • use_observed_batches (bool, NoneOptional[bool]) – Whether normalized means are conditioned on observed batches

  • n_samples (intint) – Number of posterior samples

  • use_permutation (boolbool) – Activates step 2 described above. Simply formulated, pairs obtained from posterior sampling (when calling sample_scale_from_batch) will be randomly permuted so that the number of pairs used to compute Bayes Factors becomes M_permutation.

  • M_permutation (intint) – Number of times we will “mix” posterior samples in step 2. Only makes sense when use_permutation=True

  • change_fn (str, Callable, NoneUnion[str, Callable, None]) – function computing effect size based on both normalized means

  • m1_domain_fn (Callable, NoneOptional[Callable]) – custom indicator function of effect size regions inducing differential expression

  • delta (float, NoneOptional[float]) – specific case of region inducing differential expression. In this case, we suppose that R setminus [-delta, delta] does not induce differential expression (LFC case)

  • cred_interval_lvls (List[float], ndarray, NoneUnion[List[float], ndarray, None]) – List of credible interval levels to compute for the posterior LFC distribution

  • all_stats (boolbool) – whether additional metrics should be provided

  • **kwargs – Other keywords arguments for get_sample_scale

Return type

DataFrameDataFrame

Returns

diff_exp_results The most important columns are:

  • proba_de (probability of being differentially expressed in change mode)

  • bayes_factor (bayes factors in the vanilla mode)

  • scale1 and scale2 (means of the scales in population 1 and 2)

  • When using the change mode, the mean, median, std of the posterior LFC

differential_expression_stats(M_sampling=100)[source]

Output average over statistics in a symmetric way (a against b), forget the sets if permutation is True

Parameters

M_sampling (intint) – number of samples

Return type

TupleTuple

Returns

type Tuple px_scales, all_labels where (i) px_scales: scales of shape (M_sampling, n_genes) (ii) all_labels: labels of shape (M_sampling, )

elbo()[source]

Returns the Evidence Lower Bound associated to the object.

Return type

TensorTensor

entropy_batch_mixing(**kwargs)[source]

Returns the object’s entropy batch mixing.

Return type

TensorTensor

generate(n_samples=100, genes=None, batch_size=128)[source]

Create observation samples from the Posterior Predictive distribution

Parameters
  • n_samples (intint) – Number of required samples for each cell

  • genes (list, ndarray, NoneUnion[list, ndarray, None]) – Indices of genes of interest

  • batch_size (intint) – Desired Batch size to generate data

Return type

Tuple[Tensor, Tensor]Tuple[Tensor, Tensor]

Returns

x_newtorch.Tensor

tensor with shape (n_cells, n_genes, n_samples)

x_oldtorch.Tensor

tensor with shape (n_cells, n_genes)

generate_denoised_samples(n_samples=25, batch_size=64, rna_size_factor=1000, transform_batch=None)[source]

Return samples from an adjusted posterior predictive.

Parameters
  • n_samples (intint) – How may samples per cell

  • batch_size (intint) – Mini-batch size for sampling. Lower means less GPU memory footprint

  • rna_size_factor (intint) – size factor for RNA prior to sampling gamma distribution

  • transform_batch (int, NoneOptional[int]) – int of which batch to condition on for all cells

Return type

ndarrayndarray

Returns

generate_feature_correlation_matrix(n_samples=10, batch_size=64, rna_size_factor=1000, transform_batch=None, correlation_type='spearman')[source]

Wrapper of generate_denoised_samples() to create a gene-gene corr matrix

Parameters
  • n_samples (intint) – How may samples per cell

  • batch_size (intint) – Mini-batch size for sampling. Lower means less GPU memory footprint

  • rna_size_factor (intint) – size factor for RNA prior to sampling gamma distribution

  • transform_batch (int, List[int], NoneUnion[int, List[int], None]) –

    Batches to condition on. If transform_batch is:

    • None, then real observed batch is used

    • int, then batch transform_batch is used

    • list of int, then values are averaged over provided batches.

  • correlation_type (strstr) – One of “pearson”, “spearman”

Return type

ndarrayndarray

Returns

Gene-gene correlation matrix

generate_parameters(n_samples=1, give_mean=False)[source]

Estimates data’s count means, dispersions and dropout logits.

Return type

TupleTuple

get_bayes_factors(idx1, idx2, mode='vanilla', batchid1=None, batchid2=None, use_observed_batches=False, n_samples=5000, use_permutation=False, M_permutation=10000, change_fn=None, m1_domain_fn=None, delta=0.5, cred_interval_lvls=None, **kwargs)[source]

A unified method for differential expression inference.

Two modes coexist:

  • the “vanilla” mode follows protocol described in [Lopez18]

In this case, we perform hypothesis testing based on the hypotheses

\[M_1: h_1 > h_2 ~\text{and}~ M_2: h_1 \leq h_2\]

DE can then be based on the study of the Bayes factors

\[\log p(M_1 | x_1, x_2) / p(M_2 | x_1, x_2)\]

consists in estimating an effect size random variable (e.g., log fold-change) and performing Bayesian hypothesis testing on this variable. The change_fn function computes the effect size variable r based two inputs corresponding to the normalized means in both populations.

Hypotheses:

\[M_1: r \in R_1 ~\text{(effect size r in region inducing differential expression)}\]
\[M_2: r \notin R_1 ~\text{(no differential expression)}\]

To characterize the region \(R_1\), which induces DE, the user has two choices.

1. A common case is when the region \([-\delta, \delta]\) does not induce differential expression. If the user specifies a threshold delta, we suppose that \(R_1 = \mathbb{R} \setminus [-\delta, \delta]\)

  1. specify an specific indicator function

\[f: \mathbb{R} \mapsto \{0, 1\} ~\text{s.t.}~ r \in R_1 ~\text{iff.}~ f(r) = 1\]

Decision-making can then be based on the estimates of

\[p(M_1 \mid x_1, x_2)\]

Both modes require to sample the normalized means posteriors. To that purpose, we sample the Posterior in the following way:

  1. The posterior is sampled n_samples times for each subpopulation

  2. For computation efficiency (posterior sampling is quite expensive), instead of

    comparing the obtained samples element-wise, we can permute posterior samples. Remember that computing the Bayes Factor requires sampling \(q(z_A \mid x_A)\) and \(q(z_B \mid x_B)\)

Currently, the code covers several batch handling configurations:

1. If use_observed_batches=True, then batch are considered as observations and cells’ normalized means are conditioned on real batch observations

2. If case (cell group 1) and control (cell group 2) are conditioned on the same batch ids. Examples:

>>> set(batchid1) = set(batchid2)

or

>>> batchid1 = batchid2 = None

3. If case and control are conditioned on different batch ids that do not intersect i.e.,

>>> set(batchid1) != set(batchid2)

and

>>> len(set(batchid1).intersection(set(batchid2))) == 0

This function does not cover other cases yet and will warn users in such cases.

Parameters
  • mode (str, NoneOptional[str]) – one of [“vanilla”, “change”]

  • idx1 (List[bool], ndarrayUnion[List[bool], ndarray]) – bool array masking subpopulation cells 1. Should be True where cell is from associated population

  • idx2 (List[bool], ndarrayUnion[List[bool], ndarray]) – bool array masking subpopulation cells 2. Should be True where cell is from associated population

  • batchid1 (List[int], ndarray, NoneUnion[List[int], ndarray, None]) – List of batch ids for which you want to perform DE Analysis for subpopulation 1. By default, all ids are taken into account

  • batchid2 (List[int], ndarray, NoneUnion[List[int], ndarray, None]) – List of batch ids for which you want to perform DE Analysis for subpopulation 2. By default, all ids are taken into account

  • use_observed_batches (bool, NoneOptional[bool]) – Whether normalized means are conditioned on observed batches

  • n_samples (intint) – Number of posterior samples

  • use_permutation (boolbool) – Activates step 2 described above. Simply formulated, pairs obtained from posterior sampling (when calling sample_scale_from_batch) will be randomly permuted so that the number of pairs used to compute Bayes Factors becomes M_permutation.

  • M_permutation (intint) – Number of times we will “mix” posterior samples in step 2. Only makes sense when use_permutation=True

  • change_fn (str, Callable, NoneUnion[str, Callable, None]) – function computing effect size based on both normalized means

  • m1_domain_fn (Callable, NoneOptional[Callable]) – custom indicator function of effect size regions inducing differential expression

  • delta (float, NoneOptional[float]) – specific case of region inducing differential expression. In this case, we suppose that \(R \setminus [-\delta, \delta]\) does not induce differential expression (LFC case)

  • cred_interval_lvls (List[float], ndarray, NoneUnion[List[float], ndarray, None]) – List of credible interval levels to compute for the posterior LFC distribution

  • **kwargs – Other keywords arguments for get_sample_scale()

Return type

{str: ndarray}Dict[str, ndarray]

Returns

Differential expression properties

get_latent(give_mean=True)[source]

Output posterior z mean or sample, batch index, and label

Parameters
  • sample – z mean or z sample

  • give_mean (bool, NoneOptional[bool]) – (Default value = True)

Return type

Tuple[ndarray, ndarray, ndarray]Tuple[ndarray, ndarray, ndarray]

Returns

latentndarrayndarray

low-dim representation

batch_indicesndarrayndarray

batch indicies corresponding to each cell

labelsndarrayndarray

label corresponding to each cell

get_sample_scale(transform_batch=None, gene_list=None, library_size=1, return_df=None, n_samples=1, return_mean=True)[source]

Returns the frequencies of expression for the data.

This is denoted as \(\rho_n\) in the scVI paper.

Parameters
  • transform_batch (int, NoneOptional[int]) –

    Batch to condition on. If transform_batch is:

    • None, then real observed batch is used

    • int, then batch transform_batch is used

  • gene_list (List[int], ndarray, NoneUnion[List[int], ndarray, None]) – Return frequencies of expression for a subset of genes. This can save memory when working with large datasets and few genes are of interest.

  • library_size (floatfloat) – Scale the expression frequencies to a common library size. This allows gene expression levels to be interpreted on a common scale of relevant magnitude.

  • return_df (bool, NoneOptional[bool]) – Return a DataFrame instead of an np.ndarray. Includes gene names as columns. Requires either n_samples=1 or return_mean=True. When gene_list is not None and contains more than one gene, this is option is True. Otherwise, it defaults to False.

  • n_samples (intint) – Get sample scale from multiple samples.

  • return_mean (boolbool) – Whether to return the mean of the samples.

Return type

ndarray, DataFrameUnion[ndarray, DataFrame]

Returns

  • denoised_expression - array of decoded expression adjusted for library size

If n_samples > 1 and return_mean is False, then the shape is (samples, cells, genes). Otherwise, shape is (cells, genes). Return type is np.ndarray unless return_df is True.

get_stats()[source]
Return type

ndarrayndarray

imputation(n_samples=1, transform_batch=None)[source]

Imputes px_rate over self cells

Parameters
  • n_samples (int, NoneOptional[int]) – number of posterior samples

  • transform_batch (int, List[int], NoneUnion[int, List[int], None]) –

    Batches to condition on. If transform_batch is:

    • None, then real observed batch is used

    • int, then batch transform_batch is used

    • list of int, then px_rates are averaged over provided batches.

Return type

ndarrayndarray

Returns

type n_samples, n_cells, n_genes) px_rates squeezed array

imputation_benchmark(n_samples=8, show_plot=True, title_plot='imputation', save_path='')[source]

Visualizes the model imputation performance.

Parameters
  • n_samples (intint) – (Default value = 8)

  • show_plot (boolbool) – (Default value = True)

  • title_plot (strstr) – (Default value = “imputation”)

  • save_path (strstr) – (Default value = “”)

Return type

TupleTuple

Returns

imputation_list(n_samples=1)[source]

Imputes data’s gene counts from corrupted data.

Parameters

n_samples (intint) – (Default value = 1)

Return type

tupletuple

Returns

imputation_score(original_list=None, imputed_list=None, n_samples=1)[source]

Computes median absolute imputation error.

Parameters
Return type

floatfloat

Returns

knn_purity()[source]

Computes kNN purity as described in [Lopez18]

Return type

TensorTensor

marginal_ll(n_mc_samples=1000)[source]

Estimates the marginal likelihood of the object’s data.

Parameters

n_mc_samples (int, NoneOptional[int]) – Number of MC estimates to use

Return type

TensorTensor

Returns

Marginal LL

nn_overlap_score(**kwargs)[source]

Quantify how much the similarity between cells in the mRNA latent space resembles their similarity at the protein level.

Compute the overlap fold enrichment between the protein and mRNA-based cell 100-nearest neighbor graph and the Spearman correlation of the adjacency matrices.

Parameters

**kwargs

Return type

TupleTuple

Returns

one_vs_all_degenes(subset=None, cell_labels=None, use_observed_batches=False, min_cells=10, n_samples=5000, use_permutation=False, M_permutation=10000, output_file=False, mode='vanilla', change_fn=None, m1_domain_fn=None, delta=0.5, cred_interval_lvls=None, save_dir='./', filename='one2all', **kwargs)[source]

Performs one population vs all others Differential Expression Analysis

It takes labels or cell types to characterize the different populations.

Parameters
  • subset (List[bool], ndarray, NoneUnion[List[bool], ndarray, None]) – None Or bool array masking subset of cells you are interested in (True when you want to select cell). In that case, it should have same length than gene_dataset

  • cell_labels (List, ndarray, NoneUnion[List, ndarray, None]) – optional: Labels of cells

  • min_cells (intint) – Ceil number of cells used to compute Bayes Factors

  • n_samples (intint) – Number of times the posterior will be sampled for each pop

  • use_permutation (boolbool) – Activates pair random permutations. Simply formulated, pairs obtained from posterior sampling (when calling sample_scale_from_batch) will be randomly permuted so that the number of pairs used to compute Bayes Factors becomes M_permutation.

  • M_permutation (intint) – Number of times we will “mix” posterior samples in step 2. Only makes sense when use_permutation=True

  • use_observed_batches (boolbool) – see differential_expression_score

  • M_permutation – see differential_expression_score

  • mode (str, NoneOptional[str]) – see differential_expression_score

  • change_fn (str, Callable, NoneUnion[str, Callable, None]) – see differential_expression_score

  • m1_domain_fn (Callable, NoneOptional[Callable]) – see differential_expression_score

  • delta (float, NoneOptional[float]) – see `differential_expression_score

  • cred_interval_lvls (List[float], ndarray, NoneUnion[List[float], ndarray, None]) – List of credible interval levels to compute for the posterior LFC distribution

  • output_file (boolbool) – Bool: save file?

  • save_dir (strstr) – param filename:`

  • **kwargs – Other keywords arguments for get_sample_scale

Return type

tupletuple

Returns

type Tuple (de_res, de_cluster) (i) de_res is a list of length nb_clusters (based on provided labels or on hardcoded cell types) (ii) de_res[i] contains Bayes Factors for population number i vs all the rest (iii) de_cluster returns the associated names of clusters. Are contained in this results only clusters for which we have at least min_cells elements to compute predicted Bayes Factors

raw_data()[source]

Returns raw data for classification

Return type

TupleTuple

reconstruction_error()[source]

Returns the reconstruction error associated to the object.

Return type

TensorTensor

save_posterior(dir_path)[source]

Saves the posterior properties in folder dir_path.

To ensure safety, this method requires that dir_path does not exist. The posterior can then be retrieved later on with the function load_posterior

Parameters

dir_path (strstr) – non-existing directory in which the posterior properties will be saved.

scale_sampler(selection, n_samples=5000, n_samples_per_cell=None, batchid=None, use_observed_batches=False, give_mean=False, **kwargs)[source]

Samples the posterior scale using the variational posterior distribution.

Parameters
  • n_samples (int, NoneOptional[int]) – Number of samples in total per batch (fill either n_samples_total or n_samples_per_cell)

  • n_samples_per_cell (int, NoneOptional[int]) – Number of time we sample from each observation per batch (fill either n_samples_total or n_samples_per_cell)

  • batchid (List[int], ndarray, NoneUnion[List[int], ndarray, None]) – Biological batch for which to sample from. Default (None) sample from all batches

  • use_observed_batches (bool, NoneOptional[bool]) – Whether normalized means are conditioned on observed batches or if observed batches are to be used

  • selection (List[bool], ndarrayUnion[List[bool], ndarray]) – Mask or list of cell ids to select

  • **kwargs – Other keywords arguments for get_sample_scale()

Return type

dictdict

Returns

type Dictionary containing: scale Posterior aggregated scale samples of shape (n_samples, n_genes) where n_samples correspond to either: - n_bio_batches * n_cells * n_samples_per_cell or - n_samples_total batch associated batch ids

sequential(batch_size=128)[source]

Returns a copy of the object that iterate over the data sequentially.

Parameters

batch_size (int, NoneOptional[int]) – New batch size.

Return type

PosteriorPosterior

show_t_sne(n_samples=1000, color_by='', save_name='', latent=None, batch_indices=None, labels=None, n_batch=None)[source]
to_cuda(tensors)[source]

Converts list of tensors to cuda.

Parameters

tensors (List[Tensor]List[Tensor]) – tensors to convert

Return type

List[Tensor]List[Tensor]

uncorrupted()[source]

Uncorrupts gene counts.

Return type

PosteriorPosterior

update(data_loader_kwargs)[source]

Updates the dataloader

Parameters

data_loader_kwargs (dictdict) – dataloader updates.

Return type

PosteriorPosterior

Returns

Updated posterior

update_sampler_indices(idx)[source]

Updates the dataloader indices.

More precisely, this method can be used to temporarily change which cells __iter__ will yield. This is particularly useful for computational considerations when one is only interested in a subset of the cells of the Posterior object. This method should be used carefully and requires to reset the dataloader to its original value after use.

Parameters

idx (List, ndarrayUnion[List, ndarray]) – Indices (in [0, len(dataset)] to sample from

Examples

>>> old_loader = self.data_loader
>>> cell_indices = np.array([1, 2, 3])
>>> self.update_sampler_indices(cell_indices)
>>> for tensors in self:
>>>    # your code
>>> # Do not forget next line!
>>> self.data_loader = old_loader
within_cluster_degenes(states, cell_labels=None, min_cells=10, batch1=None, batch2=None, use_observed_batches=False, subset=None, n_samples=5000, use_permutation=False, M_permutation=10000, mode='vanilla', change_fn=None, m1_domain_fn=None, delta=0.5, cred_interval_lvls=None, output_file=False, save_dir='./', filename='within_cluster', **kwargs)[source]

Performs Differential Expression within clusters for different cell states

Parameters
  • cell_labels (List, ndarray, NoneUnion[List, ndarray, None]) – optional: Labels of cells

  • min_cells (intint) – Ceil number of cells used to compute Bayes Factors

  • states (List[bool], ndarrayUnion[List[bool], ndarray]) – States of the cells.

  • batch1 (List[int], ndarray, NoneUnion[List[int], ndarray, None]) – List of batch ids for which you want to perform DE Analysis for subpopulation 1. By default, all ids are taken into account

  • batch2 (List[int], ndarray, NoneUnion[List[int], ndarray, None]) – List of batch ids for which you want to perform DE Analysis for subpopulation 2. By default, all ids are taken into account

  • subset (List[bool], ndarray, NoneUnion[List[bool], ndarray, None]) – MASK: Subset of cells you are interested in.

  • n_samples (intint) – Number of times the posterior will be sampled for each pop

  • use_permutation (boolbool) – Activates pair random permutations. Simply formulated, pairs obtained from posterior sampling (when calling sample_scale_from_batch) will be randomly permuted so that the number of pairs used to compute Bayes Factors becomes M_permutation.

  • M_permutation (intint) – Number of times we will “mix” posterior samples in step 2. Only makes sense when use_permutation=True

  • output_file (boolbool) – Bool: save file?

  • save_dir (strstr) – param filename:

  • use_observed_batches (boolbool) – see differential_expression_score

  • M_permutation – see differential_expression_score

  • mode (str, NoneOptional[str]) – see differential_expression_score

  • change_fn (str, Callable, NoneUnion[str, Callable, None]) – see differential_expression_score

  • m1_domain_fn (Callable, NoneOptional[Callable]) – see differential_expression_score

  • delta (float, NoneOptional[float]) – see differential_expression_score

  • cred_interval_lvls (List[float], ndarray, NoneUnion[List[float], ndarray, None]) – See differential_expression_score

  • **kwargs – Other keywords arguments for get_sample_scale()

Return type

tupletuple

Returns

type Tuple (de_res, de_cluster) (i) de_res is a list of length nb_clusters (based on provided labels or on hardcoded cell types) (ii) de_res[i] contains Bayes Factors for population number i vs all the rest (iii) de_cluster returns the associated names of clusters. Are contained in this results only clusters for which we have at least min_cells elements to compute predicted Bayes Factors