scvi.module.VAE#
- class scvi.module.VAE(n_input, n_batch=0, n_labels=0, n_hidden=128, n_latent=10, n_layers=1, n_continuous_cov=0, n_cats_per_cov=None, dropout_rate=0.1, dispersion='gene', log_variational=True, gene_likelihood='zinb', latent_distribution='normal', encode_covariates=False, deeply_inject_covariates=True, batch_representation='one-hot', use_batch_norm='both', use_layer_norm='none', use_size_factor_key=False, use_observed_lib_size=True, extra_payload_autotune=False, library_log_means=None, library_log_vars=None, var_activation=None, extra_encoder_kwargs=None, extra_decoder_kwargs=None, batch_embedding_kwargs=None)[source]#
Bases:
EmbeddingModuleMixin
,BaseMinifiedModeModuleClass
Variational auto-encoder [Lopez et al., 2018].
- Parameters:
n_input (
int
) – Number of input features.n_batch (
int
(default:0
)) – Number of batches. If0
, no batch correction is performed.n_labels (
int
(default:0
)) – Number of labels.n_hidden (
int
(default:128
)) – Number of nodes per hidden layer. Passed intoEncoder
andDecoderSCVI
.n_latent (
int
(default:10
)) – Dimensionality of the latent space.n_layers (
int
(default:1
)) – Number of hidden layers. Passed intoEncoder
andDecoderSCVI
.n_continuous_cov (
int
(default:0
)) – Number of continuous covariates.n_cats_per_cov (
list
[int
] |None
(default:None
)) – A list of integers containing the number of categories for each categorical covariate.dropout_rate (
float
(default:0.1
)) – Dropout rate. Passed intoEncoder
but notDecoderSCVI
.dispersion (
Literal
['gene'
,'gene-batch'
,'gene-label'
,'gene-cell'
] (default:'gene'
)) –Flexibility of the dispersion parameter when
gene_likelihood
is either"nb"
or"zinb"
. One of the following:"gene"
: parameter is constant per gene across cells."gene-batch"
: parameter is constant per gene per batch."gene-label"
: parameter is constant per gene per label."gene-cell"
: parameter is constant per gene per cell.
log_variational (
bool
(default:True
)) – IfTrue
, uselog1p()
on input data before encoding for numerical stability (not normalization).gene_likelihood (
Literal
['zinb'
,'nb'
,'poisson'
] (default:'zinb'
)) –Distribution to use for reconstruction in the generative process. One of the following:
"nb"
:NegativeBinomial
."zinb"
:ZeroInflatedNegativeBinomial
."poisson"
:Poisson
."normal"
:Normal
.
latent_distribution (
Literal
['normal'
,'ln'
] (default:'normal'
)) –Distribution to use for the latent space. One of the following:
"normal"
: isotropic normal."ln"
: logistic normal with normal params N(0, 1).
encode_covariates (
bool
(default:False
)) – IfTrue
, covariates are concatenated to gene expression prior to passing through the encoder(s). Else, only gene expression is used.deeply_inject_covariates (
bool
(default:True
)) – IfTrue
andn_layers > 1
, covariates are concatenated to the outputs of hidden layers in the encoder(s) (ifencoder_covariates
isTrue
) and the decoder prior to passing through the next layer.batch_representation (
Literal
['one-hot'
,'embedding'
] (default:'one-hot'
)) –EXPERIMENTAL
Method for encoding batch information. One of the following:"one-hot"
: represent batches with one-hot encodings."embedding"
: represent batches with continuously-valued embeddings usingEmbedding
.
Note that batch representations are only passed into the encoder(s) if
encode_covariates
isTrue
.use_batch_norm (
Literal
['encoder'
,'decoder'
,'none'
,'both'
] (default:'both'
)) –Specifies where to use
BatchNorm1d
in the model. One of the following:"none"
: don’t use batch norm in either encoder(s) or decoder."encoder"
: use batch norm only in the encoder(s)."decoder"
: use batch norm only in the decoder."both"
: use batch norm in both encoder(s) and decoder.
Note: if
use_layer_norm
is also specified, both will be applied (firstBatchNorm1d
, thenLayerNorm
).use_layer_norm (
Literal
['encoder'
,'decoder'
,'none'
,'both'
] (default:'none'
)) –Specifies where to use
LayerNorm
in the model. One of the following:"none"
: don’t use layer norm in either encoder(s) or decoder."encoder"
: use layer norm only in the encoder(s)."decoder"
: use layer norm only in the decoder."both"
: use layer norm in both encoder(s) and decoder.
Note: if
use_batch_norm
is also specified, both will be applied (firstBatchNorm1d
, thenLayerNorm
).use_size_factor_key (
bool
(default:False
)) – IfTrue
, use theobs
column as defined by thesize_factor_key
parameter in the model’ssetup_anndata
method as the scaling factor in the mean of the conditional distribution. Takes priority overuse_observed_lib_size
.use_observed_lib_size (
bool
(default:True
)) – IfTrue
, use the observed library size for RNA as the scaling factor in the mean of the conditional distribution.extra_payload_autotune (
bool
(default:False
)) – IfTrue
, will return extra matrices in the loss output to be used during autotunelibrary_log_means (
ndarray
|None
(default:None
)) –ndarray
of shape(1, n_batch)
of means of the log library sizes that parameterize the prior on library size ifuse_size_factor_key
isFalse
anduse_observed_lib_size
isFalse
.library_log_vars (
ndarray
|None
(default:None
)) –ndarray
of shape(1, n_batch)
of variances of the log library sizes that parameterize the prior on library size ifuse_size_factor_key
isFalse
anduse_observed_lib_size
isFalse
.var_activation (
Callable
[[Tensor
],Tensor
] (default:None
)) – Callable used to ensure positivity of the variance of the variational distribution. Passed intoEncoder
. Defaults toexp()
.extra_encoder_kwargs (
dict
|None
(default:None
)) – Additional keyword arguments passed intoEncoder
.extra_decoder_kwargs (
dict
|None
(default:None
)) – Additional keyword arguments passed intoDecoderSCVI
.batch_embedding_kwargs (
dict
|None
(default:None
)) – Keyword arguments passed intoEmbedding
ifbatch_representation
is set to"embedding"
.
Notes
Lifecycle: argument
batch_representation
is experimental in v1.2.
Attributes table#
Methods table#
|
Run the generative process. |
|
Compute the loss. |
|
Compute the marginal log-likelihood of the data under the model. |
|
Generate predictive samples from the posterior predictive distribution. |
Attributes#
- VAE.training: bool#
Methods#
- VAE.generative(z, library, batch_index, cont_covs=None, cat_covs=None, size_factor=None, y=None, transform_batch=None)[source]#
Run the generative process.
- VAE.loss(tensors, inference_outputs, generative_outputs, kl_weight=1.0)[source]#
Compute the loss.
- Return type:
LossOutput
- VAE.marginal_ll(tensors, n_mc_samples, return_mean=False, n_mc_samples_per_pass=1)[source]#
Compute the marginal log-likelihood of the data under the model.
- Parameters:
tensors (
dict
[str
,Tensor
]) – Dictionary of tensors passed intoforward()
.n_mc_samples (
int
) – Number of Monte Carlo samples to use for the estimation of the marginal log-likelihood.return_mean (
bool
(default:False
)) – Whether to return the mean of marginal likelihoods over cells.n_mc_samples_per_pass (
int
(default:1
)) – Number of Monte Carlo samples to use per pass. This is useful to avoid memory issues.
- VAE.sample(tensors, n_samples=1, max_poisson_rate=100000000.0)[source]#
Generate predictive samples from the posterior predictive distribution.
The posterior predictive distribution is denoted as \(p(\hat{x} \mid x)\), where \(x\) is the input data and \(\hat{x}\) is the sampled data.
We sample from this distribution by first sampling
n_samples
times from the posterior distribution \(q(z \mid x)\) for a given observation, and then sampling from the likelihood \(p(\hat{x} \mid z)\) for each of these.- Parameters:
tensors (
dict
[str
,Tensor
]) – Dictionary of tensors passed intoforward()
.n_samples (
int
(default:1
)) – Number of Monte Carlo samples to draw from the distribution for each observation.max_poisson_rate (
float
(default:100000000.0
)) – The maximum value to which to clip therate
parameter ofPoisson
. Avoids numerical sampling issues when the parameter is very large due to the variance of the distribution.
- Return type:
Tensor
- Returns:
Tensor on CPU with shape
(n_obs, n_vars)
ifn_samples == 1
, else(n_obs, n_vars,)
.