Data Types in R and Python

Data Types in R and Python

An R User’s Learning Note on Python

Mena WANG

26/09/2021

1 Summary

This is a short note about data types and how sometimes they behave differently in R and Python. Throughout the note, R and Python code are compared side by side. I keep it mainly as a study note, but hopefully it might be of interest to some fellow R users learning Python, and Python users learning R.

My ambition is that as I continue with this learning journey and as the notes continue to grow, maybe they can help to serve as a bridge between R and Python users? Please refer to this article for more general discussions on

  • why I started this learning journey
  • the advantages of each language, and
  • the benefit of going bilingual

RMarkdown does a brilliant job of reporting codes, results, tables and DataViz in a combined document. In the Rmarkdown version of this article, I am able to distinguish R and Python code chunks with different colours. Here is the RMarkdown version of the same document if you are interested.

2 Data Types

2.1 Python

  • float: Python uses 8 bytes (or 64 bits) to represent floating point numbers. Unlike the integer type, the float type uses a fixed number of bytes (more here)
  • int: Python uses a variable number of bits (e.g, 8 bits, 16 bits, 32bits, etc) to store integers (more here)
  • bool: True & False values (only capitalize the 1st letter, see below for how R differs)
  • str: strings

2.2 R

  • numeric: 64-bit double conforming to the IEEE 754 standard
  • integer: 32-bit numbers (hold between -2147483648 and +2147483647). R does not have native support for 64-bit integers. However, the bit64 package provides support for them. (more here)
  • logical: TRUE & FALSE values (capitalize all letters, see above for how Python differs)
  • character: strings

3 Data Types in Practice

Interesting differences as demonstrated in the following code chunks

Python automatically distinguishes between integer and float:

  • 5 is recognized as integer,
  • while 5.0 as float
  • boolP = TRUE results in error (should use True)

In R, one needs specifically require a numeric to be an integer by adding “L” :

  • both 5 and 5.0 are recognized as numeric;
  • to assign an integer data type, we add “L” to specify it.
  • logicR <- True results in error (should use TRUE)

3.1 Python

a = 5
type(a)
## <class 'int'>

b = 5.0
type(b)
## <class 'float'>

# in Python, boolean type values should only have the first letter capitalized
boolP = TRUE
## Error in py_call_impl(callable, dots$args, dots$keywords): NameError: name 'TRUE' is not defined

boolP = True        

3.2 R

a <- 5
class(a)
## [1] "numeric"

b <- 5.0
class(b)
## [1] "numeric"

# specifically require the integer class
c <- 5L
class(c)
## [1] "integer"

# in R, logic type values should have all letters capitalized
logicR <- True
## Error in eval(expr, envir, enclos): object 'True' not found

logicR <- TRUE
        

4 Assignment: “<-” vs “=”

In the above code chunks, “=” is used to assign values in Python, while “<-” is used in R.

Actually, you can also use “=” in R, but “=” and “<-” have different results:

The main difference between “<-” and “=” assignments is the scope.

  • “<-” result in an object in the user’s workspace
  • “=” result in an object within the scope of the function (please see demo in the following R code chunk)

Worth noting that “<-” is recommended in both Google R Style Guide and Hadley Wickham’s style guide

# calculate the sum of a vector ranging from 1 to 10
sum(x = 1:10)
## [1] 55

x
## Error in eval(expr, envir, enclos): object 'x' not found
## Because x only exists within the scope of the function


# same function, this time use "<-" instead of "=" to assign the vector
sum(x <- 1:10)
## [1] 55

x
##  [1]  1  2  3  4  5  6  7  8  9 10        

5 Operation on Data Types

Interesting differences as demonstrated in the following code chunks

Adding two strings?

  • R: error
  • Python: strings concatenated

Multiply a string by n?

  • R: error
  • Python: repeat the string n times

5.1 Python code

x="cute"
y="bunny"
z="hop"

x + y
## 'cutebunny'

z * 5
## 'hophophophophop'        

5.2 R code

With reticulate package, It is easy to apply R code on an object created in python code chunks, simply use py$python_object_created.

# call object x created in the above Python code chunk
py$x
## [1] "cute"


# adding or multiply strings in R creates error
py$x + py$y
## Error in py$x + py$y: non-numeric argument to binary operator


py$z * 5
## Error in py$z * 5: non-numeric argument to binary operator


# to paste two strings, you use paste() or paste0()
paste(py$x,py$y)
## [1] "cute bunny"
paste0(py$x,py$y)
## [1] "cutebunny"        

6 What is next?

Hope this is of some help. Moving forward, I will go on to study and compare data structures in the two languages.

To view or add a comment, sign in

More articles by Mena Ning Wang, PhD

  • Algorithm-Agnostic Model Deployment with Mlflow

    One common challenge in MLOps is the need to migrate between various estimators or algorithms to achieve the optimal…

  • An R User's Learning Notes on Python

    My learning journey on Python continues, but I now publish them to RPubs rather than on LinkedIn directly, because…

    2 Comments
  • Automation with Python - Chapter 1

    An R User’s Note on Learning Python Mena WANG 1 Introduction 1.1 Why Study Python? I love using R! R is brilliant when…

    3 Comments

Insights from the community

Others also viewed

Explore topics