Data Types in R and Python
An R User’s Learning Note on Python
26/09/2021
1 Summary
This is a short note about data types and how sometimes they behave differently in R and Python. Throughout the note, R and Python code are compared side by side. I keep it mainly as a study note, but hopefully it might be of interest to some fellow R users learning Python, and Python users learning R.
My ambition is that as I continue with this learning journey and as the notes continue to grow, maybe they can help to serve as a bridge between R and Python users? Please refer to this article for more general discussions on
RMarkdown does a brilliant job of reporting codes, results, tables and DataViz in a combined document. In the Rmarkdown version of this article, I am able to distinguish R and Python code chunks with different colours. Here is the RMarkdown version of the same document if you are interested.
2 Data Types
2.1 Python
2.2 R
3 Data Types in Practice
Interesting differences as demonstrated in the following code chunks
Python automatically distinguishes between integer and float:
In R, one needs specifically require a numeric to be an integer by adding “L” :
3.1 Python
a = 5
type(a)
## <class 'int'>
b = 5.0
type(b)
## <class 'float'>
# in Python, boolean type values should only have the first letter capitalized
boolP = TRUE
## Error in py_call_impl(callable, dots$args, dots$keywords): NameError: name 'TRUE' is not defined
boolP = True
Recommended by LinkedIn
3.2 R
a <- 5
class(a)
## [1] "numeric"
b <- 5.0
class(b)
## [1] "numeric"
# specifically require the integer class
c <- 5L
class(c)
## [1] "integer"
# in R, logic type values should have all letters capitalized
logicR <- True
## Error in eval(expr, envir, enclos): object 'True' not found
logicR <- TRUE
4 Assignment: “<-” vs “=”
In the above code chunks, “=” is used to assign values in Python, while “<-” is used in R.
Actually, you can also use “=” in R, but “=” and “<-” have different results:
The main difference between “<-” and “=” assignments is the scope.
Worth noting that “<-” is recommended in both Google R Style Guide and Hadley Wickham’s style guide
# calculate the sum of a vector ranging from 1 to 10
sum(x = 1:10)
## [1] 55
x
## Error in eval(expr, envir, enclos): object 'x' not found
## Because x only exists within the scope of the function
# same function, this time use "<-" instead of "=" to assign the vector
sum(x <- 1:10)
## [1] 55
x
## [1] 1 2 3 4 5 6 7 8 9 10
5 Operation on Data Types
Interesting differences as demonstrated in the following code chunks
Adding two strings?
Multiply a string by n?
5.1 Python code
x="cute"
y="bunny"
z="hop"
x + y
## 'cutebunny'
z * 5
## 'hophophophophop'
5.2 R code
With reticulate package, It is easy to apply R code on an object created in python code chunks, simply use py$python_object_created.
# call object x created in the above Python code chunk
py$x
## [1] "cute"
# adding or multiply strings in R creates error
py$x + py$y
## Error in py$x + py$y: non-numeric argument to binary operator
py$z * 5
## Error in py$z * 5: non-numeric argument to binary operator
# to paste two strings, you use paste() or paste0()
paste(py$x,py$y)
## [1] "cute bunny"
paste0(py$x,py$y)
## [1] "cutebunny"
6 What is next?
Hope this is of some help. Moving forward, I will go on to study and compare data structures in the two languages.