Automation with Python - Chapter 1
An R User’s Note on Learning Python
1 Introduction
1.1 Why Study Python?
I love using R! R is brilliant when working with statistical methods, data wrangling and visualization. Since many R developers are statisticians or mathematicians, we get to try out their new research findings through R. For example, earlier this week I participated in a presentation on Anomaly Detection where applied mathematician Savvandi Kandanaarachchi introduced her research on applying Item Response Theory to construct unsupervised AD Ensembles (preprint of the research article), the algorithm is also published as an R Package outlierensembles for any R user to test and to apply. Moreover, RMarkdown is a cool reporting tool, click for the RMarkdown version of this article with floating TOC and highlighted code chunks. :)
However, Python is a general-purpose language and is hence much more versatile. Recently, I started to think about how to deploy statistical or ML models in production, say on a website or in a mobile App. After some research, I reached the understanding that in order to deploy a statistical/ML model in production, some knowledge of Python would be very helpful, especially when you need to work with other developers and engineers on various cloud platforms. For more discussion on R vs Python, this is a good read.
So here I am, an R user learning Python.
1.2 Learning Materials Used
I have some very basic understanding of Python, but only in terms of data visualization. In 2020 when Covid-19 case numbers started to surge in Australia, I started to make some DataViz to help interpret the numbers. Because a) R is not very supportive of dual-axis graphs (for good reason, they should be used with caution) and b) I am curious about how Python works, so I tried to use Python (especially the Matplotlib and Seaborn package) to make some of the covid-19 visuals (please see an example of dual-axis graph I made with Python).
Beyond DataViz, my knowledge about Python is very limited. So the first material I am using is a beginner-friendly course Using Python for Automation on LinkedIn Learning by Madecraft and Sam Pettus. I know very little about the topic, automation, and is curious to learn more. Upon finishing the first chapter of this course, I will move on to more data science specific topics.
1.3 The Content of This Document
The above-mentioned course has four chapters. This document will record my notes for Chapter 1. Automate File, Folder and Terminal Interactions. The note is real simple, mainly for records, and for some reflection from the perspective of an R user. Please refer to the original course on LinedIn Learning for a fuller understanding of the topic.
Where relevant, I will produce some R code to compare with the Python code. Therefore, hopefully it might be of interest to R users learning Python, and vice versa.
1.4 IDE
The IDE I use is RStudio, which allows you to run Python code through the reticulate package.
2 Read a txt file
The first task in the course is to read a txt file. The txt file has some hypothetical data with three fields: name, age and P/F (denoting whether the person passed or failed a test). The values are separated by space. Followed please see a subset of the data for demo.
Mary 25 P
John 32 P
Dylan 19 F
Julia 23 F
Chad 17 F
...
2.1 Python Approach
The following code is offered in the Python course mentioned above.
# "r"ead the file
f=open("Exercise Files/inputFile.txt","r")
print(f.read())
# close the file after the task
f.close()
2.2 R approach
Using read_delim() function, we can read the text file as such. The three columns in inputFile can be recognized by defining delim to be space.
f<-read_delim("Exercise Files/inputFile.txt",delim=" ",col_names=FALSE)
f %>%
# show the first five lines of the data
head(n=5)
#remove the file from the environment
rm(f)
2.3 Python vs R: Interesting Difference
Even with such a simple task, a very interesting difference between the R and Python approach already emerged: the relationship between “objects” and “methods/functions”.
This then leads to a very interesting discussion about functional versus object-oriented programming architecture for data science. Here is a good read on the topic. Maybe I will write more about it after I have gone deeper in my Python learning journey too.
BTW, you can do object-oriented programming in R too. But OOP is a bit more challenging in R than in other languages.
Recommended by LinkedIn
3 Print Part of the txt File
Here we would like to print only part of the file: people who passed the test.
3.1 Python Approach
The approach introduced in the course is to identify the 3rd element in the column and filter it to be P. Important to note that Python counts from 0, so [2] indicates the 3rd element.
f=open("Exercise Files/inputFile.txt","r")
for line in f:
# split each line by space
line_split=line.split()
# check whether the 3rd element is P
if line_split[2]=="P":
print(line)
3.2 R Approach
Since read_delim automatically recognizes three elements in the document, we can refer to the 3rd element by name.
f<-read_delim("Exercise Files/inputFile.txt",delim=" ",col_names=FALSE)
f %>%
filter(X3=="P")
4 Separate and Save Files
Now let’s try to
4.1 Python Approach
Note that we need to create passFile and failFile object first, and operate on them through open(), write() and close().
f = open("Exercise Files/inputFile.txt","r")
# Create pass and fail file respectively, and write on them
passFile = open("Exercise Files/passFile.txt","w")
failFile = open("Exercise Files/failFile.txt","w")
for line in f:
line_split=line.split()
# if P, save passFile; else save failFile
if line_split[2]=="P":
passFile.write(line)
else:
failFile.write(line)
f.close()
passFile.close()
failFile.close()
4.2 R Approach
We don’t need to create passFile or failFile object in advance. Just filter and write to disk directly. Function filter and write_csv are independent from the objects/data files.
f <- read_delim("Exercise Files/inputFile.txt",delim=" ",col_names=FALSE)
# save the R files as .csv so it won't replace the ones created by Python
f %>%
filter(X3=="P") %>%
write_csv("Exercise Files/passFile.csv")
f %>%
filter(X3=="F") %>%
write_csv("Exercise Files/failFile.csv")
5 Executing Terminal Commands
I can’t execute the following Python code in RStudio. The error message is as such:
CalledProcessError: Command '['python3', 'example_chapter1.py']' returned non-zero exit status 9009
(Same error message received when I run the code on Jupyter Notebook. Will move forward with the course, but bear this in mind and come back later)
import subprocess
for i in range(0,5):
subprocess.check_call(["python3","example_chapter1.py"])
# example_chapter1.py contains a simple print() command that is supposed to be repeated five times
6 Organizing Directories
6.1 A function to Identify File Type
First, let’s create a function to identify categories of a file based on its suffix.
SUBDIRECTORIES = {
"DOCUMENTS": ['.pdf','.rtf','.txt'],
"AUDIO":['.m4a','.m4b','.mp3'],
"VIDEOS": ['.mov','.avi','.mp4'],
"IMAGES": ['.jpg','.jpeg','.png']
}
def pickDirectory(value):
for category, suffixes in SUBDIRECTORIES.items():
for suffix in suffixes:
if suffix == value:
return category
return 'MISC' #If filetype doesn't exist in our dictionary
Let’s try the function pickDirectory() .
print(pickDirectory('.pdf'))
## DOCUMENTS
print(pickDirectory('.png'))
## IMAGES
print(pickDirectory('.py'))
## MISC
In the rest of the session, Sam showed how to reorganize files into relevant folders, and the code worked from RStudio. :) Please check out the course instructions for details.
Chapter 1 is a good learning experience! Moving forward, I will continue with some Data Science focused Python courses, and maybe come back in the future for Chapter 4 on API.
Higher education researcher and evaluator with passion in understanding and supporting students’ learning and development
3yWell done! This is so helpful to learners like me who have some knowledge of R and Python to make use of the strength of both.
PhD | AI and People Lead | End-to-End GenAI Solution Implementation
3yGood stuff Mena Ning Wang! Looking forward to more to come. 👍