Posterior¶
-
class
scvi.inference.
Posterior
(model, gene_dataset, shuffle=False, indices=None, use_cuda=True, data_loader_kwargs={})[source]¶ Bases:
object
The functional data unit.
A Posterior instance is instantiated with a model and a gene_dataset, and as well as additional arguments that for Pytorch’s DataLoader. A subset of indices can be specified, for purposes such as splitting the data into train/test or labelled/unlabelled (for semi-supervised learning). Each trainer instance of the Trainer class can therefore have multiple Posterior instances to train a model. A Posterior instance also comes with many methods or utilities for its corresponding data.
- Parameters
model – A model instance from class
VAE
,VAEC
,SCANVI
gene_dataset (
GeneExpressionDataset
GeneExpressionDataset
) – A gene_dataset instance likeCortexDataset()
shuffle – Specifies if a RandomSampler or a SequentialSampler should be used
indices – Specifies how the data should be split with regards to train/test or labelled/unlabelled
use_cuda – Default:
True
data_loader_kwargs – Keyword arguments to passed into the DataLoader
Examples
Let us instantiate a trainer, with a gene_dataset and a model
A UnsupervisedTrainer instance has two Posterior attributes: train_set and test_set For this subset of the original gene_dataset instance, we can examine the differential expression, log_likelihood, entropy batch mixing, … or display the TSNE of the data in the latent space through the scVI model
>>> gene_dataset = CortexDataset() >>> vae = VAE(gene_dataset.nb_genes, n_batch=gene_dataset.n_batches * False, ... n_labels=gene_dataset.n_labels, use_cuda=True) >>> trainer = UnsupervisedTrainer(vae, gene_dataset) >>> trainer.train(n_epochs=50)
>>> trainer.train_set.differential_expression_stats() >>> trainer.train_set.reconstruction_error() >>> trainer.train_set.entropy_batch_mixing() >>> trainer.train_set.show_t_sne(n_samples=1000, color_by="labels")
Attributes Summary
Returns the current dataloader indices used by the object
returns the number of studied cells.
Returns the posterior class name
Methods Summary
accuracy
()apply_t_sne
(latent[, n_samples])clustering_scores
([prediction_algorithm])Corrupts gene counts.
differential_expression_score
(idx1, idx2[, …])Unified method for differential expression inference.
differential_expression_stats
([M_sampling])Output average over statistics in a symmetric way (a against b), forget the sets if permutation is True
elbo
()Returns the Evidence Lower Bound associated to the object.
entropy_batch_mixing
(**kwargs)Returns the object’s entropy batch mixing.
generate
([n_samples, genes, batch_size])Create observation samples from the Posterior Predictive distribution
generate_denoised_samples
([n_samples, …])Return samples from an adjusted posterior predictive.
Wrapper of generate_denoised_samples() to create a gene-gene corr matrix
generate_parameters
([n_samples, give_mean])Estimates data’s count means, dispersions and dropout logits.
get_bayes_factors
(idx1, idx2[, mode, …])A unified method for differential expression inference.
get_latent
([give_mean])Output posterior z mean or sample, batch index, and label
get_sample_scale
([transform_batch, …])Returns the frequencies of expression for the data.
imputation
([n_samples, transform_batch])Imputes px_rate over self cells
imputation_benchmark
([n_samples, show_plot, …])Visualizes the model imputation performance.
imputation_list
([n_samples])Imputes data’s gene counts from corrupted data.
imputation_score
([original_list, …])Computes median absolute imputation error.
Computes kNN purity as described in [Lopez18]
marginal_ll
([n_mc_samples])Estimates the marginal likelihood of the object’s data.
nn_overlap_score
(**kwargs)Quantify how much the similarity between cells in the mRNA latent space resembles their similarity at the protein level.
one_vs_all_degenes
([subset, cell_labels, …])Performs one population vs all others Differential Expression Analysis
raw_data
()Returns raw data for classification
Returns the reconstruction error associated to the object.
save_posterior
(dir_path)Saves the posterior properties in folder dir_path.
scale_sampler
(selection[, n_samples, …])Samples the posterior scale using the variational posterior distribution.
sequential
([batch_size])Returns a copy of the object that iterate over the data sequentially.
show_t_sne
([n_samples, color_by, save_name, …])to_cuda
(tensors)Converts list of tensors to cuda.
Uncorrupts gene counts.
update
(data_loader_kwargs)Updates the dataloader
Updates the dataloader indices.
within_cluster_degenes
(states[, …])Performs Differential Expression within clusters for different cell states
Attributes Documentation
Methods Documentation
-
differential_expression_score
(idx1, idx2, mode='vanilla', batchid1=None, batchid2=None, use_observed_batches=False, n_samples=5000, use_permutation=False, M_permutation=10000, all_stats=True, change_fn=None, m1_domain_fn=None, delta=0.5, cred_interval_lvls=None, **kwargs)[source]¶ Unified method for differential expression inference.
This function is an extension of the get_bayes_factors method providing additional genes information to the user
Two modes coexist:
the “vanilla” mode follows protocol described in [Lopez18]
In this case, we perform hypothesis testing based on the hypotheses
\[M_1: h_1 > h_2 ~\text{and}~ M_2: h_1 \leq h_2\]DE can then be based on the study of the Bayes factors
\[\log p(M_1 | x_1, x_2) / p(M_2 | x_1, x_2)\]the “change” mode (described in [Boyeau19])
consists in estimating an effect size random variable (e.g., log fold-change) and performing Bayesian hypothesis testing on this variable. The change_fn function computes the effect size variable r based two inputs corresponding to the normalized means in both populations.
Hypotheses:
\[M_1: r \in R_1 ~\text{(effect size r in region inducing differential expression)}\]\[M_2: r \notin R_1 ~\text{(no differential expression)}\]To characterize the region \(R_1\), which induces DE, the user has two choices.
1. A common case is when the region \([-\delta, \delta]\) does not induce differential expression. If the user specifies a threshold delta, we suppose that \(R_1 = \mathbb{R} \setminus [-\delta, \delta]\)
specify an specific indicator function
\[f: \mathbb{R} \mapsto \{0, 1\} ~\text{s.t.}~ r \in R_1 ~\text{iff.}~ f(r) = 1\]Decision-making can then be based on the estimates of
\[p(M_1 \mid x_1, x_2)\]Both modes require to sample the normalized means posteriors. To that purpose, we sample the Posterior in the following way:
The posterior is sampled n_samples times for each subpopulation
- For computation efficiency (posterior sampling is quite expensive), instead of
comparing the obtained samples element-wise, we can permute posterior samples. Remember that computing the Bayes Factor requires sampling \(q(z_A \mid x_A)\) and \(q(z_B \mid x_B)\)
Currently, the code covers several batch handling configurations:
1. If
use_observed_batches=True
, then batch are considered as observations and cells’ normalized means are conditioned on real batch observations2. If case (cell group 1) and control (cell group 2) are conditioned on the same batch ids. Examples:
>>> set(batchid1) = set(batchid2)
or
>>> batchid1 = batchid2 = None
3. If case and control are conditioned on different batch ids that do not intersect i.e.,
>>> set(batchid1) != set(batchid2)
and
>>> len(set(batchid1).intersection(set(batchid2))) == 0
This function does not cover other cases yet and will warn users in such cases.
- Parameters
mode (
str
,None
Optional
[str
]) – one of [“vanilla”, “change”]idx1 (
List
[bool
],ndarray
Union
[List
[bool
],ndarray
]) – bool array masking subpopulation cells 1. Should be True where cell is from associated populationidx2 (
List
[bool
],ndarray
Union
[List
[bool
],ndarray
]) – bool array masking subpopulation cells 2. Should be True where cell is from associated populationbatchid1 (
List
[int
],ndarray
,None
Union
[List
[int
],ndarray
,None
]) – List of batch ids for which you want to perform DE Analysis for subpopulation 1. By default, all ids are taken into accountbatchid2 (
List
[int
],ndarray
,None
Union
[List
[int
],ndarray
,None
]) – List of batch ids for which you want to perform DE Analysis for subpopulation 2. By default, all ids are taken into accountuse_observed_batches (
bool
,None
Optional
[bool
]) – Whether normalized means are conditioned on observed batchesuse_permutation (
bool
bool
) – Activates step 2 described above. Simply formulated, pairs obtained from posterior sampling (when calling sample_scale_from_batch) will be randomly permuted so that the number of pairs used to compute Bayes Factors becomes M_permutation.M_permutation (
int
int
) – Number of times we will “mix” posterior samples in step 2. Only makes sense when use_permutation=Truechange_fn (
str
,Callable
,None
Union
[str
,Callable
,None
]) – function computing effect size based on both normalized meansm1_domain_fn (
Callable
,None
Optional
[Callable
]) – custom indicator function of effect size regions inducing differential expressiondelta (
float
,None
Optional
[float
]) – specific case of region inducing differential expression. In this case, we suppose that R setminus [-delta, delta] does not induce differential expression (LFC case)cred_interval_lvls (
List
[float
],ndarray
,None
Union
[List
[float
],ndarray
,None
]) – List of credible interval levels to compute for the posterior LFC distributionall_stats (
bool
bool
) – whether additional metrics should be provided**kwargs – Other keywords arguments for get_sample_scale
- Return type
- Returns
diff_exp_results The most important columns are:
proba_de
(probability of being differentially expressed in change mode)bayes_factor
(bayes factors in the vanilla mode)scale1
andscale2
(means of the scales in population 1 and 2)When using the change mode, the mean, median, std of the posterior LFC
-
differential_expression_stats
(M_sampling=100)[source]¶ Output average over statistics in a symmetric way (a against b), forget the sets if permutation is True
-
generate
(n_samples=100, genes=None, batch_size=128)[source]¶ Create observation samples from the Posterior Predictive distribution
- Parameters
- Return type
- Returns
- x_new
torch.Tensor
tensor with shape (n_cells, n_genes, n_samples)
- x_old
torch.Tensor
tensor with shape (n_cells, n_genes)
- x_new
-
generate_denoised_samples
(n_samples=25, batch_size=64, rna_size_factor=1000, transform_batch=None)[source]¶ Return samples from an adjusted posterior predictive.
-
generate_feature_correlation_matrix
(n_samples=10, batch_size=64, rna_size_factor=1000, transform_batch=None, correlation_type='spearman')[source]¶ Wrapper of generate_denoised_samples() to create a gene-gene corr matrix
- Parameters
batch_size (
int
int
) – Mini-batch size for sampling. Lower means less GPU memory footprintrna_size_factor (
int
int
) – size factor for RNA prior to sampling gamma distributiontransform_batch (
int
,List
[int
],None
Union
[int
,List
[int
],None
]) –Batches to condition on. If transform_batch is:
None, then real observed batch is used
int, then batch transform_batch is used
list of int, then values are averaged over provided batches.
- Return type
- Returns
Gene-gene correlation matrix
-
generate_parameters
(n_samples=1, give_mean=False)[source]¶ Estimates data’s count means, dispersions and dropout logits.
-
get_bayes_factors
(idx1, idx2, mode='vanilla', batchid1=None, batchid2=None, use_observed_batches=False, n_samples=5000, use_permutation=False, M_permutation=10000, change_fn=None, m1_domain_fn=None, delta=0.5, cred_interval_lvls=None, **kwargs)[source]¶ A unified method for differential expression inference.
Two modes coexist:
the “vanilla” mode follows protocol described in [Lopez18]
In this case, we perform hypothesis testing based on the hypotheses
\[M_1: h_1 > h_2 ~\text{and}~ M_2: h_1 \leq h_2\]DE can then be based on the study of the Bayes factors
\[\log p(M_1 | x_1, x_2) / p(M_2 | x_1, x_2)\]the “change” mode (described in [Boyeau19])
consists in estimating an effect size random variable (e.g., log fold-change) and performing Bayesian hypothesis testing on this variable. The change_fn function computes the effect size variable r based two inputs corresponding to the normalized means in both populations.
Hypotheses:
\[M_1: r \in R_1 ~\text{(effect size r in region inducing differential expression)}\]\[M_2: r \notin R_1 ~\text{(no differential expression)}\]To characterize the region \(R_1\), which induces DE, the user has two choices.
1. A common case is when the region \([-\delta, \delta]\) does not induce differential expression. If the user specifies a threshold delta, we suppose that \(R_1 = \mathbb{R} \setminus [-\delta, \delta]\)
specify an specific indicator function
\[f: \mathbb{R} \mapsto \{0, 1\} ~\text{s.t.}~ r \in R_1 ~\text{iff.}~ f(r) = 1\]Decision-making can then be based on the estimates of
\[p(M_1 \mid x_1, x_2)\]Both modes require to sample the normalized means posteriors. To that purpose, we sample the Posterior in the following way:
The posterior is sampled n_samples times for each subpopulation
- For computation efficiency (posterior sampling is quite expensive), instead of
comparing the obtained samples element-wise, we can permute posterior samples. Remember that computing the Bayes Factor requires sampling \(q(z_A \mid x_A)\) and \(q(z_B \mid x_B)\)
Currently, the code covers several batch handling configurations:
1. If
use_observed_batches=True
, then batch are considered as observations and cells’ normalized means are conditioned on real batch observations2. If case (cell group 1) and control (cell group 2) are conditioned on the same batch ids. Examples:
>>> set(batchid1) = set(batchid2)
or
>>> batchid1 = batchid2 = None
3. If case and control are conditioned on different batch ids that do not intersect i.e.,
>>> set(batchid1) != set(batchid2)
and
>>> len(set(batchid1).intersection(set(batchid2))) == 0
This function does not cover other cases yet and will warn users in such cases.
- Parameters
mode (
str
,None
Optional
[str
]) – one of [“vanilla”, “change”]idx1 (
List
[bool
],ndarray
Union
[List
[bool
],ndarray
]) – bool array masking subpopulation cells 1. Should be True where cell is from associated populationidx2 (
List
[bool
],ndarray
Union
[List
[bool
],ndarray
]) – bool array masking subpopulation cells 2. Should be True where cell is from associated populationbatchid1 (
List
[int
],ndarray
,None
Union
[List
[int
],ndarray
,None
]) – List of batch ids for which you want to perform DE Analysis for subpopulation 1. By default, all ids are taken into accountbatchid2 (
List
[int
],ndarray
,None
Union
[List
[int
],ndarray
,None
]) – List of batch ids for which you want to perform DE Analysis for subpopulation 2. By default, all ids are taken into accountuse_observed_batches (
bool
,None
Optional
[bool
]) – Whether normalized means are conditioned on observed batchesuse_permutation (
bool
bool
) – Activates step 2 described above. Simply formulated, pairs obtained from posterior sampling (when calling sample_scale_from_batch) will be randomly permuted so that the number of pairs used to compute Bayes Factors becomes M_permutation.M_permutation (
int
int
) – Number of times we will “mix” posterior samples in step 2. Only makes sense when use_permutation=Truechange_fn (
str
,Callable
,None
Union
[str
,Callable
,None
]) – function computing effect size based on both normalized meansm1_domain_fn (
Callable
,None
Optional
[Callable
]) – custom indicator function of effect size regions inducing differential expressiondelta (
float
,None
Optional
[float
]) – specific case of region inducing differential expression. In this case, we suppose that \(R \setminus [-\delta, \delta]\) does not induce differential expression (LFC case)cred_interval_lvls (
List
[float
],ndarray
,None
Union
[List
[float
],ndarray
,None
]) – List of credible interval levels to compute for the posterior LFC distribution**kwargs – Other keywords arguments for get_sample_scale()
- Return type
- Returns
Differential expression properties
-
get_latent
(give_mean=True)[source]¶ Output posterior z mean or sample, batch index, and label
-
get_sample_scale
(transform_batch=None, gene_list=None, library_size=1, return_df=None, n_samples=1, return_mean=True)[source]¶ Returns the frequencies of expression for the data.
This is denoted as \(\rho_n\) in the scVI paper.
- Parameters
transform_batch (
int
,None
Optional
[int
]) –Batch to condition on. If transform_batch is:
None, then real observed batch is used
int, then batch transform_batch is used
gene_list (
List
[int
],ndarray
,None
Union
[List
[int
],ndarray
,None
]) – Return frequencies of expression for a subset of genes. This can save memory when working with large datasets and few genes are of interest.library_size (
float
float
) – Scale the expression frequencies to a common library size. This allows gene expression levels to be interpreted on a common scale of relevant magnitude.return_df (
bool
,None
Optional
[bool
]) – Return a DataFrame instead of an np.ndarray. Includes gene names as columns. Requires either n_samples=1 or return_mean=True. When gene_list is not None and contains more than one gene, this is option is True. Otherwise, it defaults to False.n_samples (
int
int
) – Get sample scale from multiple samples.return_mean (
bool
bool
) – Whether to return the mean of the samples.
- Return type
- Returns
denoised_expression - array of decoded expression adjusted for library size
If
n_samples
> 1 andreturn_mean
is False, then the shape is(samples, cells, genes)
. Otherwise, shape is(cells, genes)
. Return type isnp.ndarray
unlessreturn_df
is True.
-
imputation
(n_samples=1, transform_batch=None)[source]¶ Imputes px_rate over self cells
- Parameters
n_samples (
int
,None
Optional
[int
]) – number of posterior samplestransform_batch (
int
,List
[int
],None
Union
[int
,List
[int
],None
]) –Batches to condition on. If transform_batch is:
None, then real observed batch is used
int, then batch transform_batch is used
list of int, then px_rates are averaged over provided batches.
- Return type
- Returns
type n_samples, n_cells, n_genes) px_rates squeezed array
-
imputation_benchmark
(n_samples=8, show_plot=True, title_plot='imputation', save_path='')[source]¶ Visualizes the model imputation performance.
-
imputation_score
(original_list=None, imputed_list=None, n_samples=1)[source]¶ Computes median absolute imputation error.
-
nn_overlap_score
(**kwargs)[source]¶ Quantify how much the similarity between cells in the mRNA latent space resembles their similarity at the protein level.
Compute the overlap fold enrichment between the protein and mRNA-based cell 100-nearest neighbor graph and the Spearman correlation of the adjacency matrices.
-
one_vs_all_degenes
(subset=None, cell_labels=None, use_observed_batches=False, min_cells=10, n_samples=5000, use_permutation=False, M_permutation=10000, output_file=False, mode='vanilla', change_fn=None, m1_domain_fn=None, delta=0.5, cred_interval_lvls=None, save_dir='./', filename='one2all', **kwargs)[source]¶ Performs one population vs all others Differential Expression Analysis
It takes labels or cell types to characterize the different populations.
- Parameters
subset (
List
[bool
],ndarray
,None
Union
[List
[bool
],ndarray
,None
]) – None Or bool array masking subset of cells you are interested in (True when you want to select cell). In that case, it should have same length than gene_datasetcell_labels (
List
,ndarray
,None
Union
[List
,ndarray
,None
]) – optional: Labels of cellsmin_cells (
int
int
) – Ceil number of cells used to compute Bayes Factorsn_samples (
int
int
) – Number of times the posterior will be sampled for each popuse_permutation (
bool
bool
) – Activates pair random permutations. Simply formulated, pairs obtained from posterior sampling (when calling sample_scale_from_batch) will be randomly permuted so that the number of pairs used to compute Bayes Factors becomes M_permutation.M_permutation (
int
int
) – Number of times we will “mix” posterior samples in step 2. Only makes sense when use_permutation=Trueuse_observed_batches (
bool
bool
) – see differential_expression_scoreM_permutation – see differential_expression_score
mode (
str
,None
Optional
[str
]) – see differential_expression_scorechange_fn (
str
,Callable
,None
Union
[str
,Callable
,None
]) – see differential_expression_scorem1_domain_fn (
Callable
,None
Optional
[Callable
]) – see differential_expression_scoredelta (
float
,None
Optional
[float
]) – see `differential_expression_scorecred_interval_lvls (
List
[float
],ndarray
,None
Union
[List
[float
],ndarray
,None
]) – List of credible interval levels to compute for the posterior LFC distribution**kwargs – Other keywords arguments for get_sample_scale
- Return type
- Returns
type Tuple (de_res, de_cluster) (i) de_res is a list of length nb_clusters (based on provided labels or on hardcoded cell types) (ii) de_res[i] contains Bayes Factors for population number i vs all the rest (iii) de_cluster returns the associated names of clusters. Are contained in this results only clusters for which we have at least min_cells elements to compute predicted Bayes Factors
-
save_posterior
(dir_path)[source]¶ Saves the posterior properties in folder dir_path.
To ensure safety, this method requires that dir_path does not exist. The posterior can then be retrieved later on with the function load_posterior
-
scale_sampler
(selection, n_samples=5000, n_samples_per_cell=None, batchid=None, use_observed_batches=False, give_mean=False, **kwargs)[source]¶ Samples the posterior scale using the variational posterior distribution.
- Parameters
n_samples (
int
,None
Optional
[int
]) – Number of samples in total per batch (fill either n_samples_total or n_samples_per_cell)n_samples_per_cell (
int
,None
Optional
[int
]) – Number of time we sample from each observation per batch (fill either n_samples_total or n_samples_per_cell)batchid (
List
[int
],ndarray
,None
Union
[List
[int
],ndarray
,None
]) – Biological batch for which to sample from. Default (None) sample from all batchesuse_observed_batches (
bool
,None
Optional
[bool
]) – Whether normalized means are conditioned on observed batches or if observed batches are to be usedselection (
List
[bool
],ndarray
Union
[List
[bool
],ndarray
]) – Mask or list of cell ids to select**kwargs – Other keywords arguments for get_sample_scale()
- Return type
- Returns
type Dictionary containing: scale Posterior aggregated scale samples of shape (n_samples, n_genes) where n_samples correspond to either: - n_bio_batches * n_cells * n_samples_per_cell or - n_samples_total batch associated batch ids
-
sequential
(batch_size=128)[source]¶ Returns a copy of the object that iterate over the data sequentially.
-
show_t_sne
(n_samples=1000, color_by='', save_name='', latent=None, batch_indices=None, labels=None, n_batch=None)[source]¶
-
update_sampler_indices
(idx)[source]¶ Updates the dataloader indices.
More precisely, this method can be used to temporarily change which cells __iter__ will yield. This is particularly useful for computational considerations when one is only interested in a subset of the cells of the Posterior object. This method should be used carefully and requires to reset the dataloader to its original value after use.
Examples
>>> old_loader = self.data_loader >>> cell_indices = np.array([1, 2, 3]) >>> self.update_sampler_indices(cell_indices) >>> for tensors in self: >>> # your code
>>> # Do not forget next line! >>> self.data_loader = old_loader
-
within_cluster_degenes
(states, cell_labels=None, min_cells=10, batch1=None, batch2=None, use_observed_batches=False, subset=None, n_samples=5000, use_permutation=False, M_permutation=10000, mode='vanilla', change_fn=None, m1_domain_fn=None, delta=0.5, cred_interval_lvls=None, output_file=False, save_dir='./', filename='within_cluster', **kwargs)[source]¶ Performs Differential Expression within clusters for different cell states
- Parameters
cell_labels (
List
,ndarray
,None
Union
[List
,ndarray
,None
]) – optional: Labels of cellsmin_cells (
int
int
) – Ceil number of cells used to compute Bayes Factorsstates (
List
[bool
],ndarray
Union
[List
[bool
],ndarray
]) – States of the cells.batch1 (
List
[int
],ndarray
,None
Union
[List
[int
],ndarray
,None
]) – List of batch ids for which you want to perform DE Analysis for subpopulation 1. By default, all ids are taken into accountbatch2 (
List
[int
],ndarray
,None
Union
[List
[int
],ndarray
,None
]) – List of batch ids for which you want to perform DE Analysis for subpopulation 2. By default, all ids are taken into accountsubset (
List
[bool
],ndarray
,None
Union
[List
[bool
],ndarray
,None
]) – MASK: Subset of cells you are interested in.n_samples (
int
int
) – Number of times the posterior will be sampled for each popuse_permutation (
bool
bool
) – Activates pair random permutations. Simply formulated, pairs obtained from posterior sampling (when calling sample_scale_from_batch) will be randomly permuted so that the number of pairs used to compute Bayes Factors becomes M_permutation.M_permutation (
int
int
) – Number of times we will “mix” posterior samples in step 2. Only makes sense when use_permutation=Trueuse_observed_batches (
bool
bool
) – see differential_expression_scoreM_permutation – see differential_expression_score
mode (
str
,None
Optional
[str
]) – see differential_expression_scorechange_fn (
str
,Callable
,None
Union
[str
,Callable
,None
]) – see differential_expression_scorem1_domain_fn (
Callable
,None
Optional
[Callable
]) – see differential_expression_scoredelta (
float
,None
Optional
[float
]) – see differential_expression_scorecred_interval_lvls (
List
[float
],ndarray
,None
Union
[List
[float
],ndarray
,None
]) – See differential_expression_score**kwargs – Other keywords arguments for get_sample_scale()
- Return type
- Returns
type Tuple (de_res, de_cluster) (i) de_res is a list of length nb_clusters (based on provided labels or on hardcoded cell types) (ii) de_res[i] contains Bayes Factors for population number i vs all the rest (iii) de_cluster returns the associated names of clusters. Are contained in this results only clusters for which we have at least min_cells elements to compute predicted Bayes Factors