# Solo#

Solo [1] (Python class SOLO) posits a flexible generative model of scRNA-seq count data that can subsequently be used for many common downstream tasks.

• Can perform doublet detection on pre-trained SCVI models

• Scalable to very large datasets (>1 million cells).

The limitations of Solo include:

• For an analysis seeking to only do doublet detection, Solo will be slower than other methods.

## Overview#

Solo starts with a trained SCVI instance. First Solo, simulates doublets using the original data and second Solo trains a classifer on the model latent space.

### Doublet simulation#

A simulated doublet $$d_n$$ is generated via the following process:

\begin{align} d_n = x_{1} + x_{2}, \end{align}

where $$x_{1}$$ and $$x_{2}$$ are drawn i.i.d from the empirical data distribution $$p_{\textrm{data}}(x)$$ over single-cell transcriptomes (count data).

The number of doublets to generate is controlled by the doublet_ratio parameter of from_scvi_model().

### Classifier training#

After doublet simulation, the doublets are encoded through the scVI encoder, which outputs latent representations $$z'_{1:D}$$ if there are $$D$$ doublets.

These vectors are assigned a label of 1, while the latent representations of the original data $$z_{1:N}$$ are assigned a label of 0. A simple multilayer perceptron classifier (scvi.module.Classifier) is trained and the doublet score for each originally observed cell is the doublet probability according to this classifier.