Posterior¶

class scvi.inference.Posterior(model, gene_dataset, shuffle=False, indices=None, use_cuda=True, data_loader_kwargs={})[source]¶

Bases: object

The functional data unit.

A Posterior instance is instantiated with a model and a gene_dataset, and as well as additional arguments that for Pytorch’s DataLoader. A subset of indices can be specified, for purposes such as splitting the data into train/test or labelled/unlabelled (for semi-supervised learning). Each trainer instance of the Trainer class can therefore have multiple Posterior instances to train a model. A Posterior instance also comes with many methods or utilities for its corresponding data.

Parameters

model – A model instance from class VAE, VAEC, SCANVI
gene_dataset (GeneExpressionDatasetGeneExpressionDataset) – A gene_dataset instance like CortexDataset()
shuffle – Specifies if a RandomSampler or a SequentialSampler should be used
indices – Specifies how the data should be split with regards to train/test or labelled/unlabelled
use_cuda – Default: True
data_loader_kwargs – Keyword arguments to passed into the DataLoader

Examples

Let us instantiate a trainer, with a gene_dataset and a model

A UnsupervisedTrainer instance has two Posterior attributes: train_set and test_set For this subset of the original gene_dataset instance, we can examine the differential expression, log_likelihood, entropy batch mixing, … or display the TSNE of the data in the latent space through the scVI model

>>> gene_dataset = CortexDataset()
>>> vae = VAE(gene_dataset.nb_genes, n_batch=gene_dataset.n_batches * False,
... n_labels=gene_dataset.n_labels, use_cuda=True)
>>> trainer = UnsupervisedTrainer(vae, gene_dataset)
>>> trainer.train(n_epochs=50)

>>> trainer.train_set.differential_expression_stats()
>>> trainer.train_set.reconstruction_error()
>>> trainer.train_set.entropy_batch_mixing()
>>> trainer.train_set.show_t_sne(n_samples=1000, color_by="labels")

Attributes Summary

`indices`	Returns the current dataloader indices used by the object
`nb_cells`	returns the number of studied cells.
`posterior_type`	Returns the posterior class name

Methods Summary

`accuracy`()
`apply_t_sne`(latent[, n_samples])	rtype `TupleTuple`
`clustering_scores`([prediction_algorithm])	rtype `TupleTuple`
`corrupted`()	Corrupts gene counts.
`differential_expression_score`(idx1, idx2[, …])	Unified method for differential expression inference.
`differential_expression_stats`([M_sampling])	Output average over statistics in a symmetric way (a against b), forget the sets if permutation is True
`elbo`()	Returns the Evidence Lower Bound associated to the object.
`entropy_batch_mixing`(**kwargs)	Returns the object’s entropy batch mixing.
`generate`([n_samples, genes, batch_size])	Create observation samples from the Posterior Predictive distribution
`generate_denoised_samples`([n_samples, …])	Return samples from an adjusted posterior predictive.
`generate_feature_correlation_matrix`([…])	Wrapper of generate_denoised_samples() to create a gene-gene corr matrix
`generate_parameters`([n_samples, give_mean])	Estimates data’s count means, dispersions and dropout logits.
`get_bayes_factors`(idx1, idx2[, mode, …])	A unified method for differential expression inference.
`get_latent`([give_mean])	Output posterior z mean or sample, batch index, and label
`get_sample_scale`([transform_batch, …])	Returns the frequencies of expression for the data.
`get_stats`()	rtype `ndarrayndarray`
`imputation`([n_samples, transform_batch])	Imputes px_rate over self cells
`imputation_benchmark`([n_samples, show_plot, …])	Visualizes the model imputation performance.
`imputation_list`([n_samples])	Imputes data’s gene counts from corrupted data.
`imputation_score`([original_list, …])	Computes median absolute imputation error.
`knn_purity`()	Computes kNN purity as described in [Lopez18]
`marginal_ll`([n_mc_samples])	Estimates the marginal likelihood of the object’s data.
`nn_overlap_score`(**kwargs)	Quantify how much the similarity between cells in the mRNA latent space resembles their similarity at the protein level.
`one_vs_all_degenes`([subset, cell_labels, …])	Performs one population vs all others Differential Expression Analysis
`raw_data`()	Returns raw data for classification
`reconstruction_error`()	Returns the reconstruction error associated to the object.
`save_posterior`(dir_path)	Saves the posterior properties in folder dir_path.
`scale_sampler`(selection[, n_samples, …])	Samples the posterior scale using the variational posterior distribution.
`sequential`([batch_size])	Returns a copy of the object that iterate over the data sequentially.
`show_t_sne`([n_samples, color_by, save_name, …])
`to_cuda`(tensors)	Converts list of tensors to cuda.
`uncorrupted`()	Uncorrupts gene counts.
`update`(data_loader_kwargs)	Updates the dataloader
`update_sampler_indices`(idx)	Updates the dataloader indices.
`within_cluster_degenes`(states[, …])	Performs Differential Expression within clusters for different cell states

Attributes Documentation

indices¶

Returns the current dataloader indices used by the object

Return type: ndarrayndarray

nb_cells¶

returns the number of studied cells.

Return type: intint

posterior_type¶

Returns the posterior class name

Return type: strstr

Methods Documentation

accuracy()[source]¶

static apply_t_sne(latent, n_samples=1000)[source]¶

Return type: TupleTuple

clustering_scores(prediction_algorithm='knn')[source]¶

Return type: TupleTuple

corrupted()[source]¶

Corrupts gene counts.

Return type: PosteriorPosterior

differential_expression_score(idx1, idx2, mode='vanilla', batchid1=None, batchid2=None, use_observed_batches=False, n_samples=5000, use_permutation=False, M_permutation=10000, all_stats=True, change_fn=None, m1_domain_fn=None, delta=0.5, cred_interval_lvls=None, **kwargs)[source]¶

Unified method for differential expression inference.

This function is an extension of the get_bayes_factors method providing additional genes information to the user

Two modes coexist:

the “vanilla” mode follows protocol described in [Lopez18]

In this case, we perform hypothesis testing based on the hypotheses

\[M_1: h_1 > h_2 ~\text{and}~ M_2: h_1 \leq h_2\]

DE can then be based on the study of the Bayes factors

\[\log p(M_1 | x_1, x_2) / p(M_2 | x_1, x_2)\]

the “change” mode (described in [Boyeau19])

consists in estimating an effect size random variable (e.g., log fold-change) and performing Bayesian hypothesis testing on this variable. The change_fn function computes the effect size variable r based two inputs corresponding to the normalized means in both populations.

Hypotheses:

\[M_1: r \in R_1 ~\text{(effect size r in region inducing differential expression)}\]

\[M_2: r \notin R_1 ~\text{(no differential expression)}\]

To characterize the region \(R_1\), which induces DE, the user has two choices.

1. A common case is when the region \([-\delta, \delta]\) does not induce differential expression. If the user specifies a threshold delta, we suppose that \(R_1 = \mathbb{R} \setminus [-\delta, \delta]\)

specify an specific indicator function

\[f: \mathbb{R} \mapsto \{0, 1\} ~\text{s.t.}~ r \in R_1 ~\text{iff.}~ f(r) = 1\]

Decision-making can then be based on the estimates of

\[p(M_1 \mid x_1, x_2)\]

Both modes require to sample the normalized means posteriors. To that purpose, we sample the Posterior in the following way:

The posterior is sampled n_samples times for each subpopulation
For computation efficiency (posterior sampling is quite expensive), instead of
comparing the obtained samples element-wise, we can permute posterior samples. Remember that computing the Bayes Factor requires sampling \(q(z_A \mid x_A)\) and \(q(z_B \mid x_B)\)

Currently, the code covers several batch handling configurations:

1. If use_observed_batches=True, then batch are considered as observations and cells’ normalized means are conditioned on real batch observations

2. If case (cell group 1) and control (cell group 2) are conditioned on the same batch ids. Examples:

>>> set(batchid1) = set(batchid2)

or

>>> batchid1 = batchid2 = None

3. If case and control are conditioned on different batch ids that do not intersect i.e.,

>>> set(batchid1) != set(batchid2)

and

>>> len(set(batchid1).intersection(set(batchid2))) == 0

This function does not cover other cases yet and will warn users in such cases.

Parameters

mode (str, NoneOptional[str]) – one of [“vanilla”, “change”]
idx1 (List[bool], ndarrayUnion[List[bool], ndarray]) – bool array masking subpopulation cells 1. Should be True where cell is from associated population
idx2 (List[bool], ndarrayUnion[List[bool], ndarray]) – bool array masking subpopulation cells 2. Should be True where cell is from associated population
batchid1 (List[int], ndarray, NoneUnion[List[int], ndarray, None]) – List of batch ids for which you want to perform DE Analysis for subpopulation 1. By default, all ids are taken into account
batchid2 (List[int], ndarray, NoneUnion[List[int], ndarray, None]) – List of batch ids for which you want to perform DE Analysis for subpopulation 2. By default, all ids are taken into account
use_observed_batches (bool, NoneOptional[bool]) – Whether normalized means are conditioned on observed batches
n_samples (intint) – Number of posterior samples
use_permutation (boolbool) – Activates step 2 described above. Simply formulated, pairs obtained from posterior sampling (when calling sample_scale_from_batch) will be randomly permuted so that the number of pairs used to compute Bayes Factors becomes M_permutation.
M_permutation (intint) – Number of times we will “mix” posterior samples in step 2. Only makes sense when use_permutation=True
change_fn (str, Callable, NoneUnion[str, Callable, None]) – function computing effect size based on both normalized means
m1_domain_fn (Callable, NoneOptional[Callable]) – custom indicator function of effect size regions inducing differential expression
delta (float, NoneOptional[float]) – specific case of region inducing differential expression. In this case, we suppose that R setminus [-delta, delta] does not induce differential expression (LFC case)
cred_interval_lvls (List[float], ndarray, NoneUnion[List[float], ndarray, None]) – List of credible interval levels to compute for the posterior LFC distribution
all_stats (boolbool) – whether additional metrics should be provided
**kwargs – Other keywords arguments for get_sample_scale

Return type

DataFrameDataFrame

Returns

diff_exp_results The most important columns are:

proba_de (probability of being differentially expressed in change mode)
bayes_factor (bayes factors in the vanilla mode)
scale1 and scale2 (means of the scales in population 1 and 2)
When using the change mode, the mean, median, std of the posterior LFC

differential_expression_stats(M_sampling=100)[source]¶

Output average over statistics in a symmetric way (a against b), forget the sets if permutation is True

Parameters: M_sampling (intint) – number of samples
Return type: TupleTuple
Returns: type Tuple px_scales, all_labels where (i) px_scales: scales of shape (M_sampling, n_genes) (ii) all_labels: labels of shape (M_sampling, )

elbo()[source]¶

Returns the Evidence Lower Bound associated to the object.

Return type: TensorTensor

entropy_batch_mixing(**kwargs)[source]¶

Returns the object’s entropy batch mixing.

Return type: TensorTensor

generate(n_samples=100, genes=None, batch_size=128)[source]¶

Create observation samples from the Posterior Predictive distribution

Parameters

n_samples (intint) – Number of required samples for each cell
genes (list, ndarray, NoneUnion[list, ndarray, None]) – Indices of genes of interest
batch_size (intint) – Desired Batch size to generate data

Return type

Tuple[Tensor, Tensor]Tuple[Tensor, Tensor]

Returns

x_newtorch.Tensor: tensor with shape (n_cells, n_genes, n_samples)
x_oldtorch.Tensor: tensor with shape (n_cells, n_genes)

generate_denoised_samples(n_samples=25, batch_size=64, rna_size_factor=1000, transform_batch=None)[source]¶

Return samples from an adjusted posterior predictive.

Parameters

n_samples (intint) – How may samples per cell
batch_size (intint) – Mini-batch size for sampling. Lower means less GPU memory footprint
rna_size_factor (intint) – size factor for RNA prior to sampling gamma distribution
transform_batch (int, NoneOptional[int]) – int of which batch to condition on for all cells

Return type

ndarrayndarray

Returns

generate_feature_correlation_matrix(n_samples=10, batch_size=64, rna_size_factor=1000, transform_batch=None, correlation_type='spearman')[source]¶

Wrapper of generate_denoised_samples() to create a gene-gene corr matrix

Parameters

n_samples (intint) – How may samples per cell
batch_size (intint) – Mini-batch size for sampling. Lower means less GPU memory footprint
rna_size_factor (intint) – size factor for RNA prior to sampling gamma distribution
transform_batch (int, List[int], NoneUnion[int, List[int], None]) –
Batches to condition on. If transform_batch is:
- None, then real observed batch is used
- int, then batch transform_batch is used
- list of int, then values are averaged over provided batches.
correlation_type (strstr) – One of “pearson”, “spearman”

Return type

ndarrayndarray

Returns

Gene-gene correlation matrix

generate_parameters(n_samples=1, give_mean=False)[source]¶

Estimates data’s count means, dispersions and dropout logits.

Return type: TupleTuple

get_bayes_factors(idx1, idx2, mode='vanilla', batchid1=None, batchid2=None, use_observed_batches=False, n_samples=5000, use_permutation=False, M_permutation=10000, change_fn=None, m1_domain_fn=None, delta=0.5, cred_interval_lvls=None, **kwargs)[source]¶

A unified method for differential expression inference.

Two modes coexist:

the “vanilla” mode follows protocol described in [Lopez18]

In this case, we perform hypothesis testing based on the hypotheses

\[M_1: h_1 > h_2 ~\text{and}~ M_2: h_1 \leq h_2\]

DE can then be based on the study of the Bayes factors

\[\log p(M_1 | x_1, x_2) / p(M_2 | x_1, x_2)\]

the “change” mode (described in [Boyeau19])

consists in estimating an effect size random variable (e.g., log fold-change) and performing Bayesian hypothesis testing on this variable. The change_fn function computes the effect size variable r based two inputs corresponding to the normalized means in both populations.

Hypotheses:

\[M_1: r \in R_1 ~\text{(effect size r in region inducing differential expression)}\]

\[M_2: r \notin R_1 ~\text{(no differential expression)}\]

To characterize the region \(R_1\), which induces DE, the user has two choices.

1. A common case is when the region \([-\delta, \delta]\) does not induce differential expression. If the user specifies a threshold delta, we suppose that \(R_1 = \mathbb{R} \setminus [-\delta, \delta]\)

specify an specific indicator function

\[f: \mathbb{R} \mapsto \{0, 1\} ~\text{s.t.}~ r \in R_1 ~\text{iff.}~ f(r) = 1\]

Decision-making can then be based on the estimates of

\[p(M_1 \mid x_1, x_2)\]

Both modes require to sample the normalized means posteriors. To that purpose, we sample the Posterior in the following way:

The posterior is sampled n_samples times for each subpopulation
For computation efficiency (posterior sampling is quite expensive), instead of
comparing the obtained samples element-wise, we can permute posterior samples. Remember that computing the Bayes Factor requires sampling \(q(z_A \mid x_A)\) and \(q(z_B \mid x_B)\)

Currently, the code covers several batch handling configurations:

1. If use_observed_batches=True, then batch are considered as observations and cells’ normalized means are conditioned on real batch observations

2. If case (cell group 1) and control (cell group 2) are conditioned on the same batch ids. Examples:

>>> set(batchid1) = set(batchid2)

or

>>> batchid1 = batchid2 = None

3. If case and control are conditioned on different batch ids that do not intersect i.e.,

>>> set(batchid1) != set(batchid2)

and

>>> len(set(batchid1).intersection(set(batchid2))) == 0

This function does not cover other cases yet and will warn users in such cases.

Parameters

mode (str, NoneOptional[str]) – one of [“vanilla”, “change”]
idx1 (List[bool], ndarrayUnion[List[bool], ndarray]) – bool array masking subpopulation cells 1. Should be True where cell is from associated population
idx2 (List[bool], ndarrayUnion[List[bool], ndarray]) – bool array masking subpopulation cells 2. Should be True where cell is from associated population
batchid1 (List[int], ndarray, NoneUnion[List[int], ndarray, None]) – List of batch ids for which you want to perform DE Analysis for subpopulation 1. By default, all ids are taken into account
batchid2 (List[int], ndarray, NoneUnion[List[int], ndarray, None]) – List of batch ids for which you want to perform DE Analysis for subpopulation 2. By default, all ids are taken into account
use_observed_batches (bool, NoneOptional[bool]) – Whether normalized means are conditioned on observed batches
n_samples (intint) – Number of posterior samples
use_permutation (boolbool) – Activates step 2 described above. Simply formulated, pairs obtained from posterior sampling (when calling sample_scale_from_batch) will be randomly permuted so that the number of pairs used to compute Bayes Factors becomes M_permutation.
M_permutation (intint) – Number of times we will “mix” posterior samples in step 2. Only makes sense when use_permutation=True
change_fn (str, Callable, NoneUnion[str, Callable, None]) – function computing effect size based on both normalized means
m1_domain_fn (Callable, NoneOptional[Callable]) – custom indicator function of effect size regions inducing differential expression
delta (float, NoneOptional[float]) – specific case of region inducing differential expression. In this case, we suppose that \(R \setminus [-\delta, \delta]\) does not induce differential expression (LFC case)
cred_interval_lvls (List[float], ndarray, NoneUnion[List[float], ndarray, None]) – List of credible interval levels to compute for the posterior LFC distribution
**kwargs – Other keywords arguments for get_sample_scale()

Return type

{str: ndarray}Dict[str, ndarray]

Returns

Differential expression properties

get_latent(give_mean=True)[source]¶

Output posterior z mean or sample, batch index, and label

Parameters

sample – z mean or z sample
give_mean (bool, NoneOptional[bool]) – (Default value = True)

Return type

Tuple[ndarray, ndarray, ndarray]Tuple[ndarray, ndarray, ndarray]

Returns

latentndarrayndarray: low-dim representation
batch_indicesndarrayndarray: batch indicies corresponding to each cell
labelsndarrayndarray: label corresponding to each cell

get_sample_scale(transform_batch=None, gene_list=None, library_size=1, return_df=None, n_samples=1, return_mean=True)[source]¶

Returns the frequencies of expression for the data.

This is denoted as \(\rho_n\) in the scVI paper.

Parameters

transform_batch (int, NoneOptional[int]) –
Batch to condition on. If transform_batch is:
- None, then real observed batch is used
- int, then batch transform_batch is used
gene_list (List[int], ndarray, NoneUnion[List[int], ndarray, None]) – Return frequencies of expression for a subset of genes. This can save memory when working with large datasets and few genes are of interest.
library_size (floatfloat) – Scale the expression frequencies to a common library size. This allows gene expression levels to be interpreted on a common scale of relevant magnitude.
return_df (bool, NoneOptional[bool]) – Return a DataFrame instead of an np.ndarray. Includes gene names as columns. Requires either n_samples=1 or return_mean=True. When gene_list is not None and contains more than one gene, this is option is True. Otherwise, it defaults to False.
n_samples (intint) – Get sample scale from multiple samples.
return_mean (boolbool) – Whether to return the mean of the samples.

Return type

ndarray, DataFrameUnion[ndarray, DataFrame]

Returns

denoised_expression - array of decoded expression adjusted for library size

If n_samples > 1 and return_mean is False, then the shape is (samples, cells, genes). Otherwise, shape is (cells, genes). Return type is np.ndarray unless return_df is True.

get_stats()[source]¶

Return type: ndarrayndarray

imputation(n_samples=1, transform_batch=None)[source]¶

Imputes px_rate over self cells

Parameters

n_samples (int, NoneOptional[int]) – number of posterior samples
transform_batch (int, List[int], NoneUnion[int, List[int], None]) –
Batches to condition on. If transform_batch is:
- None, then real observed batch is used
- int, then batch transform_batch is used
- list of int, then px_rates are averaged over provided batches.

Return type

ndarrayndarray

Returns

type n_samples, n_cells, n_genes) px_rates squeezed array

imputation_benchmark(n_samples=8, show_plot=True, title_plot='imputation', save_path='')[source]¶

Visualizes the model imputation performance.

Parameters

n_samples (intint) – (Default value = 8)
show_plot (boolbool) – (Default value = True)
title_plot (strstr) – (Default value = “imputation”)
save_path (strstr) – (Default value = “”)

Return type

TupleTuple

Returns

imputation_list(n_samples=1)[source]¶

Imputes data’s gene counts from corrupted data.

Parameters: n_samples (intint) – (Default value = 1)
Return type: tupletuple
Returns

imputation_score(original_list=None, imputed_list=None, n_samples=1)[source]¶

Computes median absolute imputation error.

Parameters

original_list (List, NoneOptional[List]) – (Default value = None)
imputed_list (List, NoneOptional[List]) – (Default value = None)
n_samples (intint) – (Default value = 1)

Return type

floatfloat

Returns

knn_purity()[source]¶

Computes kNN purity as described in [Lopez18]

Return type: TensorTensor

marginal_ll(n_mc_samples=1000)[source]¶

Estimates the marginal likelihood of the object’s data.

Parameters: n_mc_samples (int, NoneOptional[int]) – Number of MC estimates to use
Return type: TensorTensor
Returns: Marginal LL

nn_overlap_score(**kwargs)[source]¶

Quantify how much the similarity between cells in the mRNA latent space resembles their similarity at the protein level.

Compute the overlap fold enrichment between the protein and mRNA-based cell 100-nearest neighbor graph and the Spearman correlation of the adjacency matrices.

Parameters: **kwargs –
Return type: TupleTuple
Returns

one_vs_all_degenes(subset=None, cell_labels=None, use_observed_batches=False, min_cells=10, n_samples=5000, use_permutation=False, M_permutation=10000, output_file=False, mode='vanilla', change_fn=None, m1_domain_fn=None, delta=0.5, cred_interval_lvls=None, save_dir='./', filename='one2all', **kwargs)[source]¶

Performs one population vs all others Differential Expression Analysis

It takes labels or cell types to characterize the different populations.

Parameters

subset (List[bool], ndarray, NoneUnion[List[bool], ndarray, None]) – None Or bool array masking subset of cells you are interested in (True when you want to select cell). In that case, it should have same length than gene_dataset
cell_labels (List, ndarray, NoneUnion[List, ndarray, None]) – optional: Labels of cells
min_cells (intint) – Ceil number of cells used to compute Bayes Factors
n_samples (intint) – Number of times the posterior will be sampled for each pop
use_permutation (boolbool) – Activates pair random permutations. Simply formulated, pairs obtained from posterior sampling (when calling sample_scale_from_batch) will be randomly permuted so that the number of pairs used to compute Bayes Factors becomes M_permutation.
M_permutation (intint) – Number of times we will “mix” posterior samples in step 2. Only makes sense when use_permutation=True
use_observed_batches (boolbool) – see differential_expression_score
M_permutation – see differential_expression_score
mode (str, NoneOptional[str]) – see differential_expression_score
change_fn (str, Callable, NoneUnion[str, Callable, None]) – see differential_expression_score
m1_domain_fn (Callable, NoneOptional[Callable]) – see differential_expression_score
delta (float, NoneOptional[float]) – see `differential_expression_score
cred_interval_lvls (List[float], ndarray, NoneUnion[List[float], ndarray, None]) – List of credible interval levels to compute for the posterior LFC distribution
output_file (boolbool) – Bool: save file?
save_dir (strstr) – param filename:`
**kwargs – Other keywords arguments for get_sample_scale

Return type

tupletuple

Returns

type Tuple (de_res, de_cluster) (i) de_res is a list of length nb_clusters (based on provided labels or on hardcoded cell types) (ii) de_res[i] contains Bayes Factors for population number i vs all the rest (iii) de_cluster returns the associated names of clusters. Are contained in this results only clusters for which we have at least min_cells elements to compute predicted Bayes Factors

raw_data()[source]¶

Returns raw data for classification

Return type: TupleTuple

reconstruction_error()[source]¶

Returns the reconstruction error associated to the object.

Return type: TensorTensor

save_posterior(dir_path)[source]¶

Saves the posterior properties in folder dir_path.

To ensure safety, this method requires that dir_path does not exist. The posterior can then be retrieved later on with the function load_posterior

Parameters: dir_path (strstr) – non-existing directory in which the posterior properties will be saved.

scale_sampler(selection, n_samples=5000, n_samples_per_cell=None, batchid=None, use_observed_batches=False, give_mean=False, **kwargs)[source]¶

Samples the posterior scale using the variational posterior distribution.

Parameters

n_samples (int, NoneOptional[int]) – Number of samples in total per batch (fill either n_samples_total or n_samples_per_cell)
n_samples_per_cell (int, NoneOptional[int]) – Number of time we sample from each observation per batch (fill either n_samples_total or n_samples_per_cell)
batchid (List[int], ndarray, NoneUnion[List[int], ndarray, None]) – Biological batch for which to sample from. Default (None) sample from all batches
use_observed_batches (bool, NoneOptional[bool]) – Whether normalized means are conditioned on observed batches or if observed batches are to be used
selection (List[bool], ndarrayUnion[List[bool], ndarray]) – Mask or list of cell ids to select
**kwargs – Other keywords arguments for get_sample_scale()

Return type

dictdict

Returns

type Dictionary containing: scale Posterior aggregated scale samples of shape (n_samples, n_genes) where n_samples correspond to either: - n_bio_batches * n_cells * n_samples_per_cell or - n_samples_total batch associated batch ids

sequential(batch_size=128)[source]¶

Returns a copy of the object that iterate over the data sequentially.

Parameters: batch_size (int, NoneOptional[int]) – New batch size.
Return type: PosteriorPosterior

show_t_sne(n_samples=1000, color_by='', save_name='', latent=None, batch_indices=None, labels=None, n_batch=None)[source]¶

to_cuda(tensors)[source]¶

Converts list of tensors to cuda.

Parameters: tensors (List[Tensor]List[Tensor]) – tensors to convert
Return type: List[Tensor]List[Tensor]

uncorrupted()[source]¶

Uncorrupts gene counts.

Return type: PosteriorPosterior

update(data_loader_kwargs)[source]¶

Updates the dataloader

Parameters: data_loader_kwargs (dictdict) – dataloader updates.
Return type: PosteriorPosterior
Returns: Updated posterior

update_sampler_indices(idx)[source]¶

Updates the dataloader indices.

More precisely, this method can be used to temporarily change which cells __iter__ will yield. This is particularly useful for computational considerations when one is only interested in a subset of the cells of the Posterior object. This method should be used carefully and requires to reset the dataloader to its original value after use.

Parameters: idx (List, ndarrayUnion[List, ndarray]) – Indices (in [0, len(dataset)] to sample from

Examples

>>> old_loader = self.data_loader
>>> cell_indices = np.array([1, 2, 3])
>>> self.update_sampler_indices(cell_indices)
>>> for tensors in self:
>>>    # your code

>>> # Do not forget next line!
>>> self.data_loader = old_loader

within_cluster_degenes(states, cell_labels=None, min_cells=10, batch1=None, batch2=None, use_observed_batches=False, subset=None, n_samples=5000, use_permutation=False, M_permutation=10000, mode='vanilla', change_fn=None, m1_domain_fn=None, delta=0.5, cred_interval_lvls=None, output_file=False, save_dir='./', filename='within_cluster', **kwargs)[source]¶

Performs Differential Expression within clusters for different cell states

Parameters

cell_labels (List, ndarray, NoneUnion[List, ndarray, None]) – optional: Labels of cells
min_cells (intint) – Ceil number of cells used to compute Bayes Factors
states (List[bool], ndarrayUnion[List[bool], ndarray]) – States of the cells.
batch1 (List[int], ndarray, NoneUnion[List[int], ndarray, None]) – List of batch ids for which you want to perform DE Analysis for subpopulation 1. By default, all ids are taken into account
batch2 (List[int], ndarray, NoneUnion[List[int], ndarray, None]) – List of batch ids for which you want to perform DE Analysis for subpopulation 2. By default, all ids are taken into account
subset (List[bool], ndarray, NoneUnion[List[bool], ndarray, None]) – MASK: Subset of cells you are interested in.
n_samples (intint) – Number of times the posterior will be sampled for each pop
use_permutation (boolbool) – Activates pair random permutations. Simply formulated, pairs obtained from posterior sampling (when calling sample_scale_from_batch) will be randomly permuted so that the number of pairs used to compute Bayes Factors becomes M_permutation.
M_permutation (intint) – Number of times we will “mix” posterior samples in step 2. Only makes sense when use_permutation=True
output_file (boolbool) – Bool: save file?
save_dir (strstr) – param filename:
use_observed_batches (boolbool) – see differential_expression_score
M_permutation – see differential_expression_score
mode (str, NoneOptional[str]) – see differential_expression_score
change_fn (str, Callable, NoneUnion[str, Callable, None]) – see differential_expression_score
m1_domain_fn (Callable, NoneOptional[Callable]) – see differential_expression_score
delta (float, NoneOptional[float]) – see differential_expression_score
cred_interval_lvls (List[float], ndarray, NoneUnion[List[float], ndarray, None]) – See differential_expression_score
**kwargs – Other keywords arguments for get_sample_scale()

Return type

tupletuple

Returns

type Tuple (de_res, de_cluster) (i) de_res is a list of length nb_clusters (based on provided labels or on hardcoded cell types) (ii) de_res[i] contains Bayes Factors for population number i vs all the rest (iii) de_cluster returns the associated names of clusters. Are contained in this results only clusters for which we have at least min_cells elements to compute predicted Bayes Factors