Using Python in R with reticulate#

In this tutorial, we will demonstrate how to perform basic Python operations in R using the library reticulate. This includes converting between R and Python dataframe objects and running python functions. Since scvi-tools is written in Python, such an interface is necessary to take advantage of these models within the R environment.

Import Libraries#

library(reticulate)
library(anndata)
library(sceasy)

library(Seurat)
library(SeuratData)

Operating between Python and R#

First, we will create a dummy list, and convert between R and Python. Note that R is 1-indexed while Python is 0-indexed, so when retrieiving elements the user should be conscious of what kind of object they are operating on.

lst <- list(1, 2, 3)
print(lst)
print(typeof(lst))
[[1]]
[1] 1

[[2]]
[1] 2

[[3]]
[1] 3

[1] "list"

We will convert this R list to a Python list via a function provided by the reticulate library called r_to_py(). This works for various fundamental R types like vectors, lists, arrays, data frames, functions, and primitives. Any python object will have typeof(obj) as environment. To see the Python type, we can call class(obj) instead.

py_lst <- r_to_py(lst)
print(py_lst)
print(typeof(py_lst))
print(class(py_lst))
[1.0, 2.0, 3.0]
[1] "environment"
[1] "python.builtin.list"   "python.builtin.object"

We can call instance functions of a Python object by replacing the usual dot notation with $ instead. So something like lst.append(5) would become lst$append(5).

py_lst$append(5)
print(py_lst)
None
[1.0, 2.0, 3.0, 5.0]

Note, arguments passed into these functions can either be Python or R objects. R objects passed in as arguments will automatically converted to the corresponding Python type via the r_to_py() function. However, this can sometimes result in unexpected results. For example, 0 in R will be automatically inferred as a float, which can result in an error when trying to pop an element below. We workaround this by explictly casting the R term to an integer with as.integer(0) or using the 0L syntax, which results in the proper type conversion.

# This will fail.
py_lst$pop(0)
Error in py_call_impl(callable, dots$args, dots$keywords): TypeError: integer argument expected, got float
Traceback:

1. py_lst$pop(0)
2. py_call_impl(callable, dots$args, dots$keywords)
py_lst$pop(0L)
print(py_lst)
1.0
[2.0, 3.0, 5.0]

Finally, we will convert back into an R list with the function py_to_r() which executes the inverse of r_to_py().

lst <- py_to_r(py_lst)
print(lst)
[1] 2 3 5

Import Python libraries#

Now, we load the scanpy library via reticulate using the import() function. The convert boolean argument determines whether the output of Python functions is automatically converted to an R object equivalent via the py_to_r() function. Here, we set it to FALSE intentionally since often times we would like to retain the Python format for further manipulation in Python (e.g. with scanpy). Additionally, this keeps data type conversion more explicit, avoiding type confusion.

sc <- import('scanpy', convert = FALSE)

Load Dataset with SeuratData#

data("pbmc3k")
pbmc <- pbmc3k
pbmc
An object of class Seurat 
13714 features across 2700 samples within 1 assay 
Active assay: RNA (13714 features, 0 variable features)

In order to make use of scvi-tools, we use a third-party library called sceasy to convert the SeuratObject into an AnnData object, the primary format used by scanpy and scvi-tools.

adata <- convertFormat(pbmc, from="seurat", to="anndata", main_layer="counts", drop_single_values=FALSE)
adata
AnnData object with n_obs × n_vars = 2700 × 13714
    obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'seurat_annotations'
    var: 'name'

We can access the AnnData fields in the same way we call instance functions, with the $ syntax.

adata$obs$head()
               orig.ident  nCount_RNA  nFeature_RNA seurat_annotations
AAACATACAACCAC     pbmc3k      2419.0           779       Memory CD4 T
AAACATTGAGCTAC     pbmc3k      4903.0          1352                  B
AAACATTGATCAGC     pbmc3k      3147.0          1129       Memory CD4 T
AAACCGTGCTTCCG     pbmc3k      2639.0           960         CD14+ Mono
AAACCGTGTATGCG     pbmc3k       980.0           521                 NK
class(adata$obs)
  1. 'pandas.core.frame.DataFrame'
  2. 'pandas.core.generic.NDFrame'
  3. 'pandas.core.base.PandasObject'
  4. 'pandas.core.accessor.DirNamesMixin'
  5. 'pandas.core.base.SelectionMixin'
  6. 'pandas.core.indexing.IndexingMixin'
  7. 'pandas.core.arraylike.OpsMixin'
  8. 'python.builtin.object'
head(py_to_r(adata$obs))
A data.frame: 6 × 4
orig.identnCount_RNAnFeature_RNAseurat_annotations
<fct><dbl><int><fct>
AAACATACAACCACpbmc3k2419 779Memory CD4 T
AAACATTGAGCTACpbmc3k49031352B
AAACATTGATCAGCpbmc3k31471129Memory CD4 T
AAACCGTGCTTCCGpbmc3k2639 960CD14+ Mono
AAACCGTGTATGCGpbmc3k 980 521NK
AAACGCACTGGTACpbmc3k2163 781Memory CD4 T

Above, we loaded the anndata R library. It is important to know when dealing with a Python AnnData object and an R AnnDataR6 Object. We can distinguish these by using the class() method, then using the py_to_r(), r_to_py() functions to interoperate between the two. Generally, it is recommended to use the R AnnDataR6 object to manipulate fields.

class(adata)
  1. 'anndata._core.anndata.AnnData'
  2. 'python.builtin.object'
class(py_to_r(adata))
  1. 'AnnDataR6'
  2. 'R6'
# Convert adata object to R AnnDataR6 object.
adata <- py_to_r(adata)

We can set fields in the AnnData object using the $ syntax. Here, we run CPM normalization using scanpy and save it to a new layer in the AnnData object. For the sake of demonstration, we do not use the inplace update option that scanpy provides. Note, this only works well if using the AnnDataR6 object.

X_norm <- sc$pp$normalize_total(adata, target_sum = 1e+09, inplace = FALSE)["X"]
adata$layers["X_norm"] <- X_norm
head(as.data.frame(adata$layers["X_norm"]))
A data.frame: 6 × 13714
AL627309.1AP006222.2RP11-206L10.2RP11-206L10.9LINC00115NOC2LKLHL17PLEKHN1RP11-54O7.17HES4MT-ND4LMT-ND4MT-ND5MT-ND6MT-CYBAC145212.1AL592183.1AL354822.1PNRC2.1SRSF10.1
<dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>
AAACATACAACCAC0000000000 0.04133939.5 413393.901653575.80 0.0000
AAACATTGAGCTAC0000000000 0.06730573.5 203956.801631654.10203956.8000
AAACATTGATCAGC0000000000 0.0 953288.9 635525.901271051.90 0.0000
AAACCGTGCTTCCG0000000000378931.41136794.2 757862.80 757862.80 0.0000
AAACCGTGTATGCG0000000000 0.0 0.02040816.401020408.20 0.0000
AAACGCACTGGTAC0000000000 0.01849283.4 0.001386962.50 0.0000

Now you should be comfortable interoperating between R and Python. Once you configure your AnnData object to contain all the necessary fields for your model of choice, you can intialize and train with the AnnData object. Visit our tutorials page for examples of running scvi-tools in R.

Session Info#

sI <- sessionInfo()
sI$loadedOnly <- NULL
print(sI, locale=FALSE)
R version 4.0.3 (2020-10-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 16.04.6 LTS

Matrix products: default
BLAS/LAPACK: /data/yosef2/users/jhong/miniconda3/envs/r_tutorial/lib/libopenblasp-r0.3.12.so

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] stxBrain.SeuratData_0.1.1 pbmc3k.SeuratData_3.1.4  
[3] ifnb.SeuratData_3.0.0     SeuratData_0.2.1         
[5] SeuratObject_4.0.2        Seurat_4.0.4             
[7] sceasy_0.0.6              anndata_0.7.5.3          
[9] reticulate_1.22