Using Python in R with reticulate
#
In this tutorial, we will demonstrate how to perform basic Python operations in R using the library reticulate
. This includes converting between R and Python dataframe objects and running python functions. Since scvi-tools
is written in Python, such an interface is necessary to take advantage of these models within the R environment.
Import Libraries#
library(reticulate)
library(anndata)
library(sceasy)
library(Seurat)
library(SeuratData)
Operating between Python and R#
First, we will create a dummy list, and convert between R and Python. Note that R is 1-indexed while Python is 0-indexed, so when retrieiving elements the user should be conscious of what kind of object they are operating on.
lst <- list(1, 2, 3)
print(lst)
print(typeof(lst))
[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 3
[1] "list"
We will convert this R list to a Python list via a function provided by the reticulate
library called r_to_py()
. This works for various fundamental R types like vectors, lists, arrays, data frames, functions, and primitives. Any python object will have typeof(obj)
as environment
. To see the Python type, we can call class(obj)
instead.
py_lst <- r_to_py(lst)
print(py_lst)
print(typeof(py_lst))
print(class(py_lst))
[1.0, 2.0, 3.0]
[1] "environment"
[1] "python.builtin.list" "python.builtin.object"
We can call instance functions of a Python object by replacing the usual dot notation with $
instead. So something like lst.append(5)
would become lst$append(5)
.
py_lst$append(5)
print(py_lst)
None
[1.0, 2.0, 3.0, 5.0]
Note, arguments passed into these functions can either be Python or R objects. R objects passed in as arguments will automatically converted to the corresponding Python type via the r_to_py()
function. However, this can sometimes result in unexpected results. For example, 0
in R will be automatically inferred as a float, which can result in an error when trying to pop an element below. We workaround this by explictly casting the R term to an integer with as.integer(0)
or using the 0L
syntax, which results in the proper type conversion.
# This will fail.
py_lst$pop(0)
Error in py_call_impl(callable, dots$args, dots$keywords): TypeError: integer argument expected, got float
Traceback:
1. py_lst$pop(0)
2. py_call_impl(callable, dots$args, dots$keywords)
py_lst$pop(0L)
print(py_lst)
1.0
[2.0, 3.0, 5.0]
Finally, we will convert back into an R list with the function py_to_r()
which executes the inverse of r_to_py()
.
lst <- py_to_r(py_lst)
print(lst)
[1] 2 3 5
Import Python libraries#
Now, we load the scanpy library via reticulate using the import()
function. The convert
boolean argument determines whether the output of Python functions is automatically converted to an R object equivalent via the py_to_r()
function. Here, we set it to FALSE
intentionally since often times we would like to retain the Python format for further manipulation in Python (e.g. with scanpy). Additionally, this keeps data type conversion more explicit, avoiding type confusion.
sc <- import('scanpy', convert = FALSE)
Load Dataset with SeuratData#
data("pbmc3k")
pbmc <- pbmc3k
pbmc
An object of class Seurat
13714 features across 2700 samples within 1 assay
Active assay: RNA (13714 features, 0 variable features)
In order to make use of scvi-tools
, we use a third-party library called sceasy
to convert the SeuratObject into an AnnData object, the primary format used by scanpy
and scvi-tools
.
adata <- convertFormat(pbmc, from="seurat", to="anndata", main_layer="counts", drop_single_values=FALSE)
adata
AnnData object with n_obs × n_vars = 2700 × 13714
obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'seurat_annotations'
var: 'name'
We can access the AnnData fields in the same way we call instance functions, with the $
syntax.
adata$obs$head()
orig.ident nCount_RNA nFeature_RNA seurat_annotations
AAACATACAACCAC pbmc3k 2419.0 779 Memory CD4 T
AAACATTGAGCTAC pbmc3k 4903.0 1352 B
AAACATTGATCAGC pbmc3k 3147.0 1129 Memory CD4 T
AAACCGTGCTTCCG pbmc3k 2639.0 960 CD14+ Mono
AAACCGTGTATGCG pbmc3k 980.0 521 NK
class(adata$obs)
- 'pandas.core.frame.DataFrame'
- 'pandas.core.generic.NDFrame'
- 'pandas.core.base.PandasObject'
- 'pandas.core.accessor.DirNamesMixin'
- 'pandas.core.base.SelectionMixin'
- 'pandas.core.indexing.IndexingMixin'
- 'pandas.core.arraylike.OpsMixin'
- 'python.builtin.object'
head(py_to_r(adata$obs))
orig.ident | nCount_RNA | nFeature_RNA | seurat_annotations | |
---|---|---|---|---|
<fct> | <dbl> | <int> | <fct> | |
AAACATACAACCAC | pbmc3k | 2419 | 779 | Memory CD4 T |
AAACATTGAGCTAC | pbmc3k | 4903 | 1352 | B |
AAACATTGATCAGC | pbmc3k | 3147 | 1129 | Memory CD4 T |
AAACCGTGCTTCCG | pbmc3k | 2639 | 960 | CD14+ Mono |
AAACCGTGTATGCG | pbmc3k | 980 | 521 | NK |
AAACGCACTGGTAC | pbmc3k | 2163 | 781 | Memory CD4 T |
Above, we loaded the anndata
R library. It is important to know when dealing with a Python AnnData object and an R AnnDataR6 Object. We can distinguish these by using the class()
method, then using the py_to_r(), r_to_py()
functions to interoperate between the two. Generally, it is recommended to use the R AnnDataR6 object to manipulate fields.
class(adata)
- 'anndata._core.anndata.AnnData'
- 'python.builtin.object'
class(py_to_r(adata))
- 'AnnDataR6'
- 'R6'
# Convert adata object to R AnnDataR6 object.
adata <- py_to_r(adata)
We can set fields in the AnnData object using the $
syntax. Here, we run CPM normalization using scanpy and save it to a new layer in the AnnData object. For the sake of demonstration, we do not use the inplace update option that scanpy provides. Note, this only works well if using the AnnDataR6 object.
X_norm <- sc$pp$normalize_total(adata, target_sum = 1e+09, inplace = FALSE)["X"]
adata$layers["X_norm"] <- X_norm
head(as.data.frame(adata$layers["X_norm"]))
AL627309.1 | AP006222.2 | RP11-206L10.2 | RP11-206L10.9 | LINC00115 | NOC2L | KLHL17 | PLEKHN1 | RP11-54O7.17 | HES4 | ⋯ | MT-ND4L | MT-ND4 | MT-ND5 | MT-ND6 | MT-CYB | AC145212.1 | AL592183.1 | AL354822.1 | PNRC2.1 | SRSF10.1 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
<dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | ⋯ | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | |
AAACATACAACCAC | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ⋯ | 0.0 | 4133939.5 | 413393.9 | 0 | 1653575.8 | 0 | 0.0 | 0 | 0 | 0 |
AAACATTGAGCTAC | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ⋯ | 0.0 | 6730573.5 | 203956.8 | 0 | 1631654.1 | 0 | 203956.8 | 0 | 0 | 0 |
AAACATTGATCAGC | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ⋯ | 0.0 | 953288.9 | 635525.9 | 0 | 1271051.9 | 0 | 0.0 | 0 | 0 | 0 |
AAACCGTGCTTCCG | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ⋯ | 378931.4 | 1136794.2 | 757862.8 | 0 | 757862.8 | 0 | 0.0 | 0 | 0 | 0 |
AAACCGTGTATGCG | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ⋯ | 0.0 | 0.0 | 2040816.4 | 0 | 1020408.2 | 0 | 0.0 | 0 | 0 | 0 |
AAACGCACTGGTAC | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ⋯ | 0.0 | 1849283.4 | 0.0 | 0 | 1386962.5 | 0 | 0.0 | 0 | 0 | 0 |
Now you should be comfortable interoperating between R and Python. Once you configure your AnnData object to contain all the necessary fields for your model of choice, you can intialize and train with the AnnData object. Visit our tutorials page for examples of running scvi-tools
in R.
Session Info#
sI <- sessionInfo()
sI$loadedOnly <- NULL
print(sI, locale=FALSE)
R version 4.0.3 (2020-10-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 16.04.6 LTS
Matrix products: default
BLAS/LAPACK: /data/yosef2/users/jhong/miniconda3/envs/r_tutorial/lib/libopenblasp-r0.3.12.so
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] stxBrain.SeuratData_0.1.1 pbmc3k.SeuratData_3.1.4
[3] ifnb.SeuratData_3.0.0 SeuratData_0.2.1
[5] SeuratObject_4.0.2 Seurat_4.0.4
[7] sceasy_0.0.6 anndata_0.7.5.3
[9] reticulate_1.22