Tutorials:
Preliminaries
scArches is an approach that works with cVAEs. Suppose we have \(G\) -dimensional gene expression data represented by \(x\) and one categorical covariate with \(K\)
categories that is represented via one-hot (i.e., the second category would be represented as \([0, 1, 0, ..., 0]\) ).
The first layer of the encoder with \(H\) hidden neurons of a cVAE with ReLU activation can be written as
\begin{align}
f_1(x,s) = {\textrm{max}}(0,W_x^{(1)} x + W_s^{(1)} s),
\end{align}
where \(W_x^{(1)} \in \mathbb{R}^{H \times G}\) and \(W_s^{(1)} \in \mathbb{R}^{H \times K}\) .
Architectural surgery
Now suppose our cVAE has been trained on data. The so-called “architectural surgery” with scArches augments the first layer with new parameters corresponding
to \(L\) unseen categories in the query data (i.e., batches in single-cell language), which are represented in the one-hot vector \(s'\) .
The first layer of the encoder is now specified as
\begin{align}
f_1(x,s,s') = {\textrm{max}}(0,W_x^{(1)} x + W_s^{(1)} s + W_{s'}^{(1)} s'),
\end{align}
where \(W_{s'} \in \mathbb{R}^{H \times L}\) is a new randomly initialized matrix.
We note that in practice, there is only one matrix and not three separate matrices.
Also, with scArches, the same architectural surgery is applied to the decoder, which is not shown here for brevity.
Some of the cVAEs in scvi-tools use the categorical one-hot encodings in all hidden layers in the encoder.
For example, the option deeply_inject_covariates=True
can be used in SCVI
.
Empirically, this improves removal of nuisance variation due to these covariates.
In this case of “deep injection” there would be new parameters in each hidden layer. With two hidden layers
this is written as
\begin{align}
f_2(x,s, s') = {\textrm{max}}(0,f_1(x,s, s') + W^{(2)}_s s + W^{(2)}_{s'} s').
\end{align}
Training
By default, the training of the model with the query data is performed with respect to only the new query-category specific parameters.
Thus, all the previous parameters from the reference building stage are frozen.
This results in a model in which the latent representation \(z\) (encoder output) does not change for reference data after the
query step.