Using Python in R with `reticulate`#

In this tutorial, we will demonstrate how to perform basic Python operations in R using the library reticulate. This includes converting between R and Python dataframe objects and running python functions. Since scvi-tools is written in Python, such an interface is necessary to take advantage of these models within the R environment.

Import Libraries#

library(reticulate)
library(anndata)
library(sceasy)

library(Seurat)
library(SeuratData)

Operating between Python and R#

First, we will create a dummy list, and convert between R and Python. Note that R is 1-indexed while Python is 0-indexed, so when retrieiving elements the user should be conscious of what kind of object they are operating on.

lst <- list(1, 2, 3)
print(lst)
print(typeof(lst))

[[1]]
[1] 1

[[2]]
[1] 2

[[3]]
[1] 3

[1] "list"

We will convert this R list to a Python list via a function provided by the reticulate library called r_to_py(). This works for various fundamental R types like vectors, lists, arrays, data frames, functions, and primitives. Any python object will have typeof(obj) as environment. To see the Python type, we can call class(obj) instead.

py_lst <- r_to_py(lst)
print(py_lst)
print(typeof(py_lst))
print(class(py_lst))

[1.0, 2.0, 3.0]
[1] "environment"
[1] "python.builtin.list"   "python.builtin.object"

We can call instance functions of a Python object by replacing the usual dot notation with $ instead. So something like lst.append(5) would become lst$append(5).

py_lst$append(5)
print(py_lst)

None

[1.0, 2.0, 3.0, 5.0]

Note, arguments passed into these functions can either be Python or R objects. R objects passed in as arguments will automatically converted to the corresponding Python type via the r_to_py() function. However, this can sometimes result in unexpected results. For example, 0 in R will be automatically inferred as a float, which can result in an error when trying to pop an element below. We workaround this by explictly casting the R term to an integer with as.integer(0) or using the 0L syntax, which results in the proper type conversion.

# This will fail.
py_lst$pop(0)

Error in py_call_impl(callable, dots$args, dots$keywords): TypeError: integer argument expected, got float
Traceback:

1. py_lst$pop(0)
2. py_call_impl(callable, dots$args, dots$keywords)

py_lst$pop(0L)
print(py_lst)

1.0

[2.0, 3.0, 5.0]

Finally, we will convert back into an R list with the function py_to_r() which executes the inverse of r_to_py().

lst <- py_to_r(py_lst)
print(lst)

[1] 2 3 5

Import Python libraries#

Now, we load the scanpy library via reticulate using the import() function. The convert boolean argument determines whether the output of Python functions is automatically converted to an R object equivalent via the py_to_r() function. Here, we set it to FALSE intentionally since often times we would like to retain the Python format for further manipulation in Python (e.g. with scanpy). Additionally, this keeps data type conversion more explicit, avoiding type confusion.

sc <- import('scanpy', convert = FALSE)

Load Dataset with SeuratData#

data("pbmc3k")
pbmc <- pbmc3k

pbmc

An object of class Seurat 
13714 features across 2700 samples within 1 assay 
Active assay: RNA (13714 features, 0 variable features)

In order to make use of scvi-tools, we use a third-party library called sceasy to convert the SeuratObject into an AnnData object, the primary format used by scanpy and scvi-tools.

adata <- convertFormat(pbmc, from="seurat", to="anndata", main_layer="counts", drop_single_values=FALSE)
adata

AnnData object with n_obs × n_vars = 2700 × 13714
    obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'seurat_annotations'
    var: 'name'

We can access the AnnData fields in the same way we call instance functions, with the $ syntax.

adata$obs$head()

               orig.ident  nCount_RNA  nFeature_RNA seurat_annotations
AAACATACAACCAC     pbmc3k      2419.0           779       Memory CD4 T
AAACATTGAGCTAC     pbmc3k      4903.0          1352                  B
AAACATTGATCAGC     pbmc3k      3147.0          1129       Memory CD4 T
AAACCGTGCTTCCG     pbmc3k      2639.0           960         CD14+ Mono
AAACCGTGTATGCG     pbmc3k       980.0           521                 NK

class(adata$obs)

'pandas.core.frame.DataFrame'
'pandas.core.generic.NDFrame'
'pandas.core.base.PandasObject'
'pandas.core.accessor.DirNamesMixin'
'pandas.core.base.SelectionMixin'
'pandas.core.indexing.IndexingMixin'
'pandas.core.arraylike.OpsMixin'
'python.builtin.object'

head(py_to_r(adata$obs))

A data.frame: 6 × 4
	orig.ident	nCount_RNA	nFeature_RNA	seurat_annotations
	<fct>	<dbl>	<int>	<fct>
AAACATACAACCAC	pbmc3k	2419	779	Memory CD4 T
AAACATTGAGCTAC	pbmc3k	4903	1352	B
AAACATTGATCAGC	pbmc3k	3147	1129	Memory CD4 T
AAACCGTGCTTCCG	pbmc3k	2639	960	CD14+ Mono
AAACCGTGTATGCG	pbmc3k	980	521	NK
AAACGCACTGGTAC	pbmc3k	2163	781	Memory CD4 T

Above, we loaded the anndata R library. It is important to know when dealing with a Python AnnData object and an R AnnDataR6 Object. We can distinguish these by using the class() method, then using the py_to_r(), r_to_py() functions to interoperate between the two. Generally, it is recommended to use the R AnnDataR6 object to manipulate fields.

class(adata)

'anndata._core.anndata.AnnData'
'python.builtin.object'

class(py_to_r(adata))

'AnnDataR6'
'R6'

# Convert adata object to R AnnDataR6 object.
adata <- py_to_r(adata)

We can set fields in the AnnData object using the $ syntax. Here, we run CPM normalization using scanpy and save it to a new layer in the AnnData object. For the sake of demonstration, we do not use the inplace update option that scanpy provides. Note, this only works well if using the AnnDataR6 object.

X_norm <- sc$pp$normalize_total(adata, target_sum = 1e+09, inplace = FALSE)["X"]
adata$layers["X_norm"] <- X_norm

head(as.data.frame(adata$layers["X_norm"]))

A data.frame: 6 × 13714
	AL627309.1	AP006222.2	RP11-206L10.2	RP11-206L10.9	LINC00115	NOC2L	KLHL17	PLEKHN1	RP11-54O7.17	HES4	⋯	MT-ND4L	MT-ND4	MT-ND5	MT-ND6	MT-CYB	AC145212.1	AL592183.1	AL354822.1	PNRC2.1	SRSF10.1
	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	⋯	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>
AAACATACAACCAC	0	0	0	0	0	0	0	0	0	0	⋯	0.0	4133939.5	413393.9	0	1653575.8	0	0.0	0	0	0
AAACATTGAGCTAC	0	0	0	0	0	0	0	0	0	0	⋯	0.0	6730573.5	203956.8	0	1631654.1	0	203956.8	0	0	0
AAACATTGATCAGC	0	0	0	0	0	0	0	0	0	0	⋯	0.0	953288.9	635525.9	0	1271051.9	0	0.0	0	0	0
AAACCGTGCTTCCG	0	0	0	0	0	0	0	0	0	0	⋯	378931.4	1136794.2	757862.8	0	757862.8	0	0.0	0	0	0
AAACCGTGTATGCG	0	0	0	0	0	0	0	0	0	0	⋯	0.0	0.0	2040816.4	0	1020408.2	0	0.0	0	0	0
AAACGCACTGGTAC	0	0	0	0	0	0	0	0	0	0	⋯	0.0	1849283.4	0.0	0	1386962.5	0	0.0	0	0	0

Now you should be comfortable interoperating between R and Python. Once you configure your AnnData object to contain all the necessary fields for your model of choice, you can intialize and train with the AnnData object. Visit our tutorials page for examples of running scvi-tools in R.

Session Info#

sI <- sessionInfo()
sI$loadedOnly <- NULL
print(sI, locale=FALSE)

R version 4.0.3 (2020-10-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 16.04.6 LTS

Matrix products: default
BLAS/LAPACK: /data/yosef2/users/jhong/miniconda3/envs/r_tutorial/lib/libopenblasp-r0.3.12.so

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] stxBrain.SeuratData_0.1.1 pbmc3k.SeuratData_3.1.4  
[3] ifnb.SeuratData_3.0.0     SeuratData_0.2.1         
[5] SeuratObject_4.0.2        Seurat_4.0.4             
[7] sceasy_0.0.6              anndata_0.7.5.3          
[9] reticulate_1.22          

Using Python in R with reticulate

Contents

Using Python in R with `reticulate`#

Import Libraries#

Operating between Python and R#

Import Python libraries#

Load Dataset with SeuratData#

Session Info#

Using Python in R with reticulate

Contents

Using Python in R with reticulate#

Import Libraries#

Operating between Python and R#

Import Python libraries#

Load Dataset with SeuratData#

Session Info#

Using Python in R with `reticulate`#