Why you should start with Python in digital analytics
Python for digital analytics. This topic is on my mind for a while. Last week I was in Budapest for the Digital Analytics event Superweek. After an amazing event on top of the Galya-tető mountain in Budapest I was convinced to share my thoughts on this interesting topic. During the event I spoke with several people about the benefits of Python and how to use it in digital analytics. So, during an early morning walk in the snow with beautiful landscapes I decided that it is time to share some thoughts and knowledge about this with the digital analytics community. Are you curious about the benefits and interested in why Python can be very helpful? I hope you get excited!
In the next part of this article I will share my thoughts on why you should start with Python if you work in digital analytics. So let’s move on!
Python makes your life easier
The first reason why I started to use Python during my daily work was due to limitations. We are all familiar with the frustrating error in Excel: Microsoft Excel is not responding… #%^!* you did not save the file and all the work is gone. I was already convinced because of this point but let’s continue. Excel can handle a maximum of one million rows, but with a few tons of rows in Excel the calculations will get slower. Python can handle huge datasets (millions of rows) what makes it easier and faster to work with large amounts of data.
And the life of your colleagues as well
I like to work with notebooks, where you can run the Python code in different cells. Your formulas or code is not hided in the notebooks. This helps to structure your analysis. You can split your notebook in different parts for instance like reading data, preprocessing data, analyzing and visualizations. The huge benefit of this is that your colleagues can easily check the steps you took to create the analysis and validate your work. In the example below you can see how structured and readable the code is.
Standardize your work
Reproducing your work is another benefit of using Python. You can copy your notebooks with code or simply change the data source to run the same code on different data. Furthermore, we have all standard steps in our analyzing process. How is the shape of the data, do we have missing or duplicate values, how is the data distributed? All different kind of questions to answer at the beginning of an analysis. Automate your standard work can save a lot of time.
Explore your data
One of the things I always try to do before working on a new dataset is to dig into the data. I would like to understand my new dataset. Python does have some helpful functions to explore the data. The .describe() function helps you to get insights in the mean, median, quantiles, standard deviation, minimum value, maximum value, number of rows and missing values per column. Only one line of code! When you do analysis in Google Analytics or Excel it is harder to get insights of how your data is distributed. The describe() function helps you to explore the distribution of the dataset Adding these steps to your analysis will definitely increase the quality of the output.
# simple way to explore your new dataset df.describe(include='all')
New skills = new possibilities
In my team we started using Python for reading large csv files and concatenate or join these together. Without Python it is hard to do hit level data analysis because of the export limits in Google BigQuery. An other option is to send a hit id as custom dimension but again you will reach the processing limits of your spreadsheet tool or Google Analytics. That is the reason why we made connections with Google BigQuery to do analysis on the raw analytics data. Due to the connection with BigQuery we could analyze millions of rows with hit level data and get some very useful insights in for example the most visited funnels to a specific page on the website. Working with the raw data helps you to understand better how data is collected and processed. That is a great benefit!
Concern 1: steep learning curve
The learning curve is steep and endless. Sure, it is hard to learn a new language and sometimes it will crush your brain. I believe it is all worth it. Besides that, there are so many courses and good reads on the internet. So, no excuses! If you are dedicated and start using Python you will see the benefits soon.
Concern 2: Python is for Data Scientists
Maybe you are thinking: Python, that is a programming language used by Data Scientists. Of course when you start with Python you can not call yourself a Data Scientist, but what Doug Hall mentioned during his great talk on Superweek 2020 you definitely can use Data Science skills as a digital analyst. As you can read above you can do more than only Machine Learning.
Is Python the holy grail? Maybe not, it is not always the best option to write some Python code. For some type of analysis a more simple tool as Excel can be more helpful. Especially in the beginning sometimes it will take a lot of time to write your code. You need to decide which option is best. In the long term you will see Python helps you work more efficiently.
What is next?
Thanks for reading my first post about this topic. Next time I will explain how to get started with writing code. I hope you are excited and even more curious about Python. Let's stay in touch and see you soon!
Scaling Product; AI, Data & Digital transformation || Director EMEA
5yCool!!!