scVIVA for representing cells and their environment in spatial transcriptomics

scVIVA for representing cells and their environment in spatial transcriptomics#

In this tutorial, we go through the steps of training scVIVA, a deep generative model that leverages both cell-intrinsic and neighboring gene expression profiles to produce stochastic embeddings of cell states as well as normalized gene expression profiles. We show how to obtain informative fine-grained partitions of cells that reflects both their internal state and the surrounding tissue and use the generative model to test hypotheses of differential expression between tissue niches.

Plan for this tutorial:

Loading the data
Training a scVIVA model
Visualizing the latent space
Perform DE analysis across niches

%load_ext autoreload
%autoreload 2

# Install from GitHub for now
!pip install --quiet scvi-colab
!pip install --quiet adjustText
from scvi_colab import install

install()

import os
import random
import tempfile

import numpy as np  # type: ignore
import scanpy as sc  # type: ignore
import scvi  # type: ignore
import torch  # type: ignore
from rich import print  # type: ignore

sc.set_figure_params(figsize=(4, 4))
save_dir = tempfile.TemporaryDirectory()
%config InlineBackend.print_figure_kwargs={'facecolor' : "w"}
%config InlineBackend.figure_format='retina'

scvi.settings.seed = 0
random.seed(42)
np.random.seed(42)
torch.manual_seed(42)
print("Last run with scvi-tools version:", scvi.__version__)

Last run with scvi-tools version: 1.4.2

# Quickly check the correct folder is used.
scvi.__file__

'/opt/anaconda3/envs/scvi_new/lib/python3.13/site-packages/scvi/__init__.py'

Data loading#

In this tutorial we load a human breast cancer section, generated with 10X Xenium. The cell segmentation originally performed on this data resulted in many erroneously assigned transcripts and therefore re-segmented the cells using the ProSeg algorithm, which is a scalable algorithm for transcriptome-informed segmentation.

adata_path = os.path.join(save_dir.name, "adata_for_tuto_s1.h5ad")
adata = sc.read(
    adata_path,
    backup_url="https://exampledata.scverse.org/scvi-tools/adata_for_tuto_s1.h5ad",
)
adata

AnnData object with n_obs × n_vars = 117305 × 313
    obs: 'sample', 'x', 'y', 'z', 'observed_x', 'observed_y', 'observed_z', 'fov', 'n_counts', 'index', 'cell_type'
    var: 'mt'
    uns: '_scvi_manager_uuid', '_scvi_uuid', 'log1p', 'pca'
    obsm: 'X_pca', 'X_scANVI', 'spatial'
    varm: 'PCs'
    layers: 'counts', 'counts_log1p', 'counts_wo_bg', 'min_max_scaled', 'min_max_scaled_raw'

The authors identified distinct tumor domains in this specimen, corresponding to in situ ductal carcinoma (DCIS) and invasive tumor:

sc.pl.spatial(adata, color="cell_type", spot_size=30)

../../../_images/03074ae5ad51a0d22e36e0f72b4c8bc2b1617d1d0d757c95da3229f08f96f84f.png

adata.obs["cell_type"].value_counts()

cell_type
Stromal                    32426
Invasive_Tumor             18185
DCIS                       15993
CD4+_T_Cells                9387
Macrophages_1               8644
CD8+_T_Cells                7493
Endothelial                 7000
B_Cells                     5359
Myoepi_ACTA2+               5324
Myoepi_KRT15+               2636
Macrophages_2               2616
Perivascular-Like            817
Unlabeled                    516
IRF7+_DCs                    454
LAMP3+_DCs                   218
Mast_Cells                   203
Stromal_&_T_Cell_Hybrid       28
T_Cell_&_Tumor_Hybrid          6
Name: count, dtype: int64

Train scVIVA model#

We first define the neighborhood of each cell using a k-nn graph. We set \(k=20\). Then, the environment features are defined in two ways - the first is the cell-type composition of its cellular neighborhood. The second is the average gene expression state of neighboring cells, with a separate profile for each of the present cell types. These cell-intrinsic gene expression states can be learned with a spatially unaware model such as scANVI, or with resolVI.

Here we assume that scANVI has already been trained on the data, and that the embeddings are stored in the AnnData object. We refer to the scANVI tutorial for training the model.

Environment features computations occur in the preprocessing_anndata method, that adds the relevant keys to the AnnData object.

setup_kwargs = {
    "sample_key": "sample",  # column in adata.obs that contains the individual slide ID
    "labels_key": "cell_type",  # column in adata.obs that contains the cell type labels
    "cell_coordinates_key": "spatial",  # spatial coordinates key in adata.obsm
    "expression_embedding_key": "X_scANVI",  # expression embedding key in adata.obsm
}

scvi.external.SCVIVA.preprocessing_anndata(
    adata,
    k_nn=20,  # number of nearest neighbors for spatial graph construction
    **setup_kwargs,
)

Saved niche_indexes and niche_distances in adata.obsm

Saved niche_composition in adata.obsm

Saved niche_activation in adata.obsm

Then, as in all scvi-tools model, we need to register the AnnData.

scvi.external.SCVIVA.setup_anndata(
    adata,
    layer="counts",  # adata layer that contains the raw counts
    batch_key="sample",  # column in adata.obs that contains the batch covariate
    **setup_kwargs,
)

INFO     Using column names from columns of adata.obsm['niche_composition']                                        
INFO     Generating sequential column names                                                                        
INFO     Generating sequential column names                                                                        
INFO     Generating sequential column names                                                                        
INFO     Generating sequential column names                                                                        
INFO     Generating sequential column names

We instantiate a scVIVA model:

nichevae = scvi.external.SCVIVA(adata)

nichevae

scVIVA model with the following parameters: 
n_hidden: 128, n_latent: 10, n_layers: 1, dropout_rate: 0.1, dispersion: gene, gene_likelihood: poisson, 
latent_distribution: normal.
Training status: Not Trained
Model's adata is minified?: False

nichevae.train(
    max_epochs=600,
    early_stopping=True,
    check_val_every_n_epoch=1,
    batch_size=512,
    plan_kwargs={
        "lr": 5e-4,
    },
)

Monitored metric elbo_validation did not improve in the last 45 records. Best score: 238.892. Signaling Trainer to stop.

We can plot the training curves:

nichevae.history.keys()

dict_keys(['kl_weight', 'validation_loss', 'elbo_validation', 'reconstruction_loss_validation', 'kl_local_validation', 'kl_global_validation', 'niche_compo_validation', 'niche_reconst_validation', 'train_loss', 'elbo_train', 'reconstruction_loss_train', 'kl_local_train', 'kl_global_train', 'niche_compo_train', 'niche_reconst_train'])

Let’s plot for instance the validation ELBO, niche composition and state losses:

nichevae.history["elbo_validation"].plot()

<Axes: xlabel='epoch'>

../../../_images/75721f2bf5eaaff7f51dc493f0ee71ae86f76df21a63cfd916d5e3ca904ef00f.png

nichevae.history["niche_compo_validation"].plot()

<Axes: xlabel='epoch'>

../../../_images/7826bd29062ac533f7158c310020b0b6956fcfed404a156a9edec50976c177c5.png

nichevae.history["niche_reconst_validation"].plot()

<Axes: xlabel='epoch'>

../../../_images/9e98eca7521714d4df3e8f71f93a494ff6189c7bf1ca73ac54e0e78cf4c69fb9.png

nichevae.history["kl_local_validation"].plot()

<Axes: xlabel='epoch'>

../../../_images/45b14565f5dd15df148f28d256bf24289a0272896b456f859c6de1158596a125.png

nichevae.history["reconstruction_loss_validation"].plot()

<Axes: xlabel='epoch'>

../../../_images/30553702e280fd261948efa9306f42d2ed309ee1a0093467265c66fa7010c817.png

After training the model, we can compute and store the latent space:

adata.obsm["X_scVIVA"] = nichevae.get_latent_representation()

We may visualize the latent space in UMAP coordinates, coloring by cell type.

sc.pp.neighbors(adata, use_rep="X_scVIVA", n_neighbors=30)
sc.tl.umap(adata)

sc.pl.umap(adata, color="cell_type", frameon=False)

../../../_images/78455e160914ef13e50ae1558d56620c52449d0275679a8a31486453d432f924.png

Differential expression analysis#

We now use the generative model to test hypotheses of differential expression between the niches. We’ll focus on endothelial cells.

adata_endothelial = adata[adata.obs["cell_type"] == "Endothelial"].copy()
adata_not_endo = adata[adata.obs["cell_type"] != "Endothelial"].copy()

print(adata_endothelial)

AnnData object with n_obs × n_vars = 7000 × 313
    obs: 'sample', 'x', 'y', 'z', 'observed_x', 'observed_y', 'observed_z', 'fov', 'n_counts', 'index', 
'cell_type', '_scvi_batch', '_scvi_labels', '_scvi_sample'
    var: 'mt'
    uns: '_scvi_manager_uuid', '_scvi_uuid', 'log1p', 'pca', 'cell_type_colors', 'cell_type_to_int', 'neighbors', 
'umap'
    obsm: 'X_pca', 'X_scANVI', 'spatial', 'niche_indexes', 'niche_distances', 'niche_composition', 
'niche_activation', 'X_scVIVA', 'X_umap'
    varm: 'PCs'
    layers: 'counts', 'counts_log1p', 'counts_wo_bg', 'min_max_scaled', 'min_max_scaled_raw'
    obsp: 'distances', 'connectivities'

We perform coarse Leiden clustering on the endothelial latent space, in a bid to find spatially confined populations of endothelial cells.

sc.pp.neighbors(adata_endothelial, use_rep="X_scVIVA", n_neighbors=30, random_state=42)
sc.tl.umap(adata_endothelial)
sc.tl.leiden(
    adata_endothelial,
    key_added="leiden_scVIVA",
    resolution=0.3,
    flavor="igraph",
    n_iterations=-1,
    random_state=42,
)

adata_endothelial.obs["leiden_scVIVA"].unique()  # check the number of clusters

['0', '1', '2', '3', '4']
Categories (5, object): ['0', '1', '2', '3', '4']

We focus on clusters 0 and 1, which are located in the stromal and tumor regions, respectively. We then perform differential expression analysis between these two clusters.

sc.pl.spatial(
    adata_endothelial,
    color="leiden_scVIVA",
    spot_size=80,
    groups=["1", "0"],
)

../../../_images/f6ba11fe1a7a0dcfe76a897a257d255896b423b806c0d1016cca8601434ac92e.png

adata.obs["leiden_scVIVA"] = "Unknown"
adata.obs.loc[adata.obs["cell_type"] == "Endothelial", "leiden_scVIVA"] = adata_endothelial.obs[
    "leiden_scVIVA"
]

adata.obs["leiden_scVIVA"].value_counts()

leiden_scVIVA
Unknown    110305
1            2693
4            2059
3            1517
0             475
2             256
Name: count, dtype: int64

We now run the differential expression function, between the cell groups \(\textit{G1}=tumor~endothelial\) and \(\textit{G2}=stromal~endothelial\). We first set the number of nearest neighbors to compute the non-endothelial neighbors of \(\textit{G1}\) and \(\textit{G2}\), called \(\textit{N1}\) and \(\textit{N2}\), respectively.

Setting niche_mode=True, we compute 4 different DE tests: \(\{\textit{G1}~vs~\textit{G2}\}\), \(\{\textit{G1}~vs~\textit{N1}\}\), \(\{\textit{N1}~vs~\textit{G2}\}\) and \(\{\textit{N1}~vs~\textit{N2}\}\) (in this order). We set a test-specific treshold for significant log-fold change DELTA.

Other parameters include the number of samples to draw from the posterior N_SAMPLES_DE, PSEUDOCOUNTS for stability and FDR for the FDR correction. More details can be found in Boyeau et al. PNAS 2023.

delta_niches = 0.05  # smaller delta for niche comparison
delta_markers = 0.15  # bigger delta for G1-N1 comparison
DELTA = [delta_niches, delta_markers, delta_niches, delta_niches]


K_NN_DE = 6

GROUP = "leiden_scVIVA"
G1 = "1"
G2 = "0"
PSEUDOCOUNTS = 1e-4
N_SAMPLES_DE = 1e5
FDR = 0.2

DE_1_0 = nichevae.differential_expression(
    adata,
    groupby=GROUP,
    group1=G1,
    group2=G2,
    k_nn=K_NN_DE,
    delta=DELTA,
    niche_mode=True,
    n_samples_overall=N_SAMPLES_DE,
    fdr_target=FDR,
    pseudocounts=PSEUDOCOUNTS,
)

Computing adjusted nearest neighbors...

Using 1 samples

Mean number of neighbors: 3.3 ± 2.1

Computing DE...

Running DE for group1_group2

Running DE for group1_neighbors1

Running DE for neighbors1_group2

Running DE for neighbors1_neighbors2

Computing g1 confidence scores...

Let’s analysize the DE test: \(\textit{G1}=tumor~endothelial\) vs \(\textit{G2}=stromal~endothelial\). The DE function returns a Dataclass object DE_1_0.

We can access the Gaussian process classifier properties with the gpc attribute:

DE_1_0.gpc

GaussianProcessClassifier(kernel=1**2 * RationalQuadratic(alpha=0.1, length_scale=10),
                          n_restarts_optimizer=10, random_state=0)

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

DE_1_0.gpc_info()

Training score:  0.9242424242424242

Marginal likelihood:  -28.747479809035976

Kernel:  31.6**2 * RationalQuadratic(alpha=1, length_scale=10)

The \(\textit{G1}\) vs \(\textit{G2}\) differential expression results are stored in the g1_g2 attribute:

DE_1_0.g1_g2

	proba_de	proba_not_de	bayes_factor	scale1	scale2	pseudocounts	delta	lfc_mean	lfc_median	lfc_std	...	raw_mean2	non_zeros_proportion1	non_zeros_proportion2	raw_normalized_mean1	raw_normalized_mean2	is_de_fdr_0.2	comparison	group1	group2	proba_de_g1_n1
ADH1B	0.99923	0.00077	7.168337	0.000262	0.047370	0.0001	0.05	-8.181890	-8.232571	2.553259	...	15.056842	0.034163	0.760000	2.550718	410.002928	True	1 vs 0	1	0	0.000000
ADIPOQ	0.99874	0.00126	6.675375	0.000054	0.012345	0.0001	0.05	-7.992774	-8.159674	2.537889	...	4.280000	0.008541	0.513684	0.565162	117.251658	True	1 vs 0	1	0	0.000000
LEP	0.99862	0.00138	6.584284	0.000031	0.002542	0.0001	0.05	-5.699508	-5.836974	2.050291	...	0.802105	0.008912	0.233684	0.412827	26.084750	True	1 vs 0	1	0	0.000000
ESM1	0.99737	0.00263	5.938134	0.011158	0.000127	0.0001	0.05	5.849895	5.865199	2.047015	...	0.046316	0.405867	0.037895	110.493364	1.485543	True	1 vs 0	1	0	0.905005
TIMP4	0.99232	0.00768	4.861425	0.000553	0.008132	0.0001	0.05	-3.605672	-3.641601	1.537198	...	2.728421	0.092462	0.585263	5.884608	92.643020	True	1 vs 0	1	0	0.000000
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
KLRD1	0.48121	0.51879	-0.075195	0.000205	0.000220	0.0001	0.05	0.026786	0.034280	1.188139	...	0.109474	0.023765	0.063158	1.585760	4.190337	False	1 vs 0	1	0	0.065707
SELL	0.47591	0.52409	-0.096435	0.000272	0.000282	0.0001	0.05	-0.024120	-0.012895	1.019317	...	0.092632	0.037133	0.056842	2.244403	2.984867	False	1 vs 0	1	0	0.000000
PDE4A	0.45742	0.54258	-0.170734	0.001671	0.001647	0.0001	0.05	-0.002758	0.005811	0.454352	...	0.465263	0.253249	0.322105	16.375647	15.766655	False	1 vs 0	1	0	0.372335
TCL1A	0.45349	0.54651	-0.186579	0.000028	0.000026	0.0001	0.05	0.149420	0.153122	1.306990	...	0.010526	0.008541	0.010526	0.506623	0.479167	False	1 vs 0	1	0	0.006273
PIGR	0.42050	0.57950	-0.320721	0.000021	0.000023	0.0001	0.05	-0.128784	-0.115181	1.381333	...	0.012632	0.004085	0.010526	0.269003	0.261401	False	1 vs 0	1	0	0.000000

313 rows × 23 columns

Where the probability of true DE according to the Gaussian process classifier is stored in the proba_de_g1_n1 column:

DE_1_0.g1_g2["proba_de_g1_n1"]

ADH1B     0.000000
ADIPOQ    0.000000
LEP       0.000000
ESM1      0.905005
TIMP4     0.000000
            ...   
KLRD1     0.065707
SELL      0.000000
PDE4A     0.372335
TCL1A     0.006273
PIGR      0.000000
Name: proba_de_g1_n1, Length: 313, dtype: float64

We may also access the other tests results in the same way: DE_1_0.g1_n1, DE_1_0.n1_g2 and DE_1_0.n1_n2.

We can then filter genes to upregulated genes, i.e. such that the median Log-Fold Change over the samples is positive, and the proba_de (ratio of LFC greater than the defined delta treshold over the total number of posterior samples) is greater than a given filter - here we set it to 0.8.

PROBA_TRES = 0.8

g1_g3_genes = DE_1_0.g1_g2[
    (DE_1_0.g1_g2["lfc_median"] > 0) & (DE_1_0.g1_g2["proba_de"] > PROBA_TRES)
].index

We then display the results: median Log-Fold Change (LFC) of upregulated genes in \(\textit{G1}\) vs \(\textit{G2}\) displayed on the x-axis, while we compare differential expression computed between \(\textit{N1}\) and \(\textit{G2}\) on the y-axis.

Genes are colored by their marker label (yellow=significantly upregulated in \(\textit{G1}\) vs \(\textit{N1}\), green otherwise).

We also display the classifier decision boundary (the predicted probability of being in the yellow class).

PLOT_MARGIN = 0.2

DE_1_0.plot(
    filter=g1_g3_genes,  # selected genes to plot
    # path_to_save="DE_plot.svg",
    margin=PLOT_MARGIN,  # margin around the plot
    legend_loc="upper right",  # location of the legend
)

../../../_images/da211458994f4cdb75d34f2c1d8bcc3a6ac76e95867d1808fc973d949d78528b.png

You can select the marker genes (positive class for the classifier, yellow in the plot):

DE_1_0.gpc.confident_genes

Index(['ESM1', 'PRDM1', 'TCIM', 'ZEB1', 'HDC', 'CDC42EP1', 'DNTTIP1',
       'NDUFA4L2', 'SVIL', 'FOXC2', 'DAPK3', 'FBLIM1', 'RAMP2', 'OXTR', 'KDR',
       'SNAI1', 'CLEC14A', 'SOX18', 'IL3RA', 'CCPG1', 'TCF4', 'DUSP5', 'MYLK'],
      dtype='object')

We can also check the predicted class probabilities of the Gaussian process classifier:

DE_1_0.g1_g2["proba_de_g1_n1"].loc[DE_1_0.gpc.confident_genes]

ESM1        0.905005
PRDM1       0.459986
TCIM        0.975001
ZEB1        0.994582
HDC         0.208575
CDC42EP1    0.975575
DNTTIP1     0.198414
NDUFA4L2    0.998732
SVIL        0.811520
FOXC2       0.970905
DAPK3       0.704933
FBLIM1      0.996895
RAMP2       0.983532
OXTR        0.899196
KDR         0.981578
SNAI1       0.998695
CLEC14A     0.978961
SOX18       0.995286
IL3RA       0.997017
CCPG1       0.831721
TCF4        0.997643
DUSP5       0.377894
MYLK        0.997606
Name: proba_de_g1_n1, dtype: float64

Then we can further filter the confident gene list, by setting a treshold on the classifier predictions - for instance 0.9:

DE_1_0.g1_g2["proba_de_g1_n1"].loc[DE_1_0.gpc.confident_genes][
    DE_1_0.g1_g2["proba_de_g1_n1"].loc[DE_1_0.gpc.confident_genes] > 0.9
].index

Index(['ESM1', 'TCIM', 'ZEB1', 'CDC42EP1', 'NDUFA4L2', 'FOXC2', 'FBLIM1',
       'RAMP2', 'KDR', 'SNAI1', 'CLEC14A', 'SOX18', 'IL3RA', 'TCF4', 'MYLK'],
      dtype='object')

Finally, we can plot spatial maps of the selected genes. We first compute global percentiles of the gene expression values to set an upper bound for the color scale.

def get_gene_percentiles_list(adata, gene_list, p, layer=None):
    """
    Calculate the p-percentile of gene expression for a list of genes in an AnnData object.

    Parameters
    ----------
        adata (AnnData): The AnnData object containing expression data.
        gene_list (list): List of gene names for which to compute percentiles.
        p (float): Percentile to compute (between 0 and 100).
        layer (str or None): The layer from which to retrieve expression data.
                             If None, uses `adata.X`.

    Returns
    -------
        list: A list of p-percentile values for the genes, in the same order as gene_list.
              If a gene is not found, its value will be `None`.
    """
    percentiles = []

    for gene in gene_list:
        if gene in adata.var_names:
            if layer:
                data = adata[:, gene].layers[layer].flatten()
            else:
                data = adata[:, gene].X.flatten()

            # Compute the percentile
            percentiles.append(np.percentile(data, p))
        else:
            percentiles.append(None)  # Handle genes not in adata.var_names

    return percentiles

We display ESM1, KDR, SNAI1, critical genes for angiogenesis in invasive cancer. We aslo display FOXA1, that is both upregulated in \(\textit{G1}\) and \(\textit{N1}\), to show how our procedure can filter such genes.

gene_list_invasive = ["ESM1", "KDR", "SNAI1", "FOXA1"]
percentiles_invasive = get_gene_percentiles_list(
    adata, gene_list_invasive, 99.9, layer="min_max_scaled"
)

We first plot these genes in endothelial cells:

plot_endo = True

sc.pl.spatial(
    adata=adata_endothelial if plot_endo else adata_not_endo,
    spot_size=100 if plot_endo else 40,
    color=gene_list_invasive,
    frameon=False,
    use_raw=False,
    wspace=0.4,
    vmax=percentiles_invasive,
    layer="min_max_scaled",
    cmap="plasma",
)

../../../_images/cf17c878926c5a361406bec543687afd58d93f3f3c990a9d513767deed8bf151.png

Then in all cells but endothelial:

plot_endo = False

sc.pl.spatial(
    adata=adata_endothelial if plot_endo else adata_not_endo,
    spot_size=100 if plot_endo else 40,
    color=gene_list_invasive,
    frameon=False,
    use_raw=False,
    wspace=0.4,
    vmax=percentiles_invasive,
    layer="min_max_scaled",
    cmap="plasma",
)

../../../_images/0a84fae9d05fa1cdac369c30ba812cec9cf225191513c709bdf6289ec9bb1dae.png

	kernel kernel: kernel instance, default=None The kernel specifying the covariance function of the GP. If None is passed, the kernel "1.0 * RBF(1.0)" is used as default. Note that the kernel's hyperparameters are optimized during fitting. Also kernel cannot be a `CompoundKernel`.	1*2 Ration...ngth_scale=10)
	optimizer optimizer: 'fmin_l_bfgs_b', callable or None, default='fmin_l_bfgs_b' Can either be one of the internally supported optimizers for optimizing the kernel's parameters, specified by a string, or an externally defined optimizer passed as a callable. If a callable is passed, it must have the signature:: def optimizer(obj_func, initial_theta, bounds): # * 'obj_func' is the objective function to be maximized, which # takes the hyperparameters theta as parameter and an # optional flag eval_gradient, which determines if the # gradient is returned additionally to the function value # * 'initial_theta': the initial value for theta, which can be # used by local optimizers # * 'bounds': the bounds on the values of theta .... # Returned are the best found hyperparameters theta and # the corresponding value of the target function. return theta_opt, func_min Per default, the 'L-BFGS-B' algorithm from scipy.optimize.minimize is used. If None is passed, the kernel's parameters are kept fixed. Available internal optimizers are:: 'fmin_l_bfgs_b'	'fmin_l_bfgs_b'
	n_restarts_optimizer n_restarts_optimizer: int, default=0 The number of restarts of the optimizer for finding the kernel's parameters which maximize the log-marginal likelihood. The first run of the optimizer is performed from the kernel's initial parameters, the remaining ones (if any) from thetas sampled log-uniform randomly from the space of allowed theta-values. If greater than 0, all bounds must be finite. Note that n_restarts_optimizer=0 implies that one run is performed.	10
	max_iter_predict max_iter_predict: int, default=100 The maximum number of iterations in Newton's method for approximating the posterior during predict. Smaller values will reduce computation time at the cost of worse results.	100
	warm_start warm_start: bool, default=False If warm-starts are enabled, the solution of the last Newton iteration on the Laplace approximation of the posterior mode is used as initialization for the next call of _posterior_mode(). This can speed up convergence when _posterior_mode is called several times on similar problems as in hyperparameter optimization. See :term:`the Glossary `.	False
	copy_X_train copy_X_train: bool, default=True If True, a persistent copy of the training data is stored in the object. Otherwise, just a reference to the training data is stored, which might cause predictions to change if the data is modified externally.	True
	random_state random_state: int, RandomState instance or None, default=None Determines random number generation used to initialize the centers. Pass an int for reproducible results across multiple function calls. See :term:`Glossary `.	0
	multi_class multi_class: {'one_vs_rest', 'one_vs_one'}, default='one_vs_rest' Specifies how multi-class classification problems are handled. Supported are 'one_vs_rest' and 'one_vs_one'. In 'one_vs_rest', one binary Gaussian process classifier is fitted for each class, which is trained to separate this class from the rest. In 'one_vs_one', one binary Gaussian process classifier is fitted for each pair of classes, which is trained to separate these two classes. The predictions of these binary predictors are combined into multi-class predictions. Note that 'one_vs_one' does not support predicting probability estimates.	'one_vs_rest'
	n_jobs n_jobs: int, default=None The number of jobs to use for the computation: the specified multiclass problems are computed in parallel. ``None`` means 1 unless in a :obj:`joblib.parallel_backend` context. ``-1`` means using all processors. See :term:`Glossary ` for more details.	None
	kernel__k1	1**2
	kernel__k2	RationalQuadr...ngth_scale=10)
	kernel__k1__constant_value	1.0
	kernel__k1__constant_value_bounds	(0.001, ...)
	kernel__k2__length_scale	10
	kernel__k2__alpha	0.1
	kernel__k2__length_scale_bounds	(10.0, ...)
	kernel__k2__alpha_bounds	(0.001, ...)

scVIVA for representing cells and their environment in spatial transcriptomics

Contents

scVIVA for representing cells and their environment in spatial transcriptomics#

Data loading#

Train scVIVA model#

Differential expression analysis#