How to Calculate Percentage Contributions Over Time in Python + R

How to Calculate Percentage Contributions Over Time in Python + R

One data science summarisation job that can be a littler trickier than we have discussed so far is to calculate the percentage contribution to a total over time. This has many applications in business and science. In the example we are looking at today we simply want to calculate the percentage of monthly snowfall that fell on any one particular day in that month.

In Python we will be using the .transform() method which allows us to aggregate the sum of monthly snowfall and return these values to the initial daily index. This allows us to divide the daily snowfall by the total snowfall for that month which we then convert to a percentage and fill any NaNs with zero.

Gathering aggregate values to calculate percentages in Python

To do the same job in R requires a different approach since we don't have an index or similar .transform() method. Instead we can use either a combination of a group by and summarise or the summarise_by_time function from the timetk package. We'll use the latter in the code example below. While we don't have an index we can make use of the bind_cols() function to join the dates to the percent contributions. This works because the percentage for every row in the original table is calculated within the summarisation and therefore has the same number of rows as the original dataset.

Gathering aggregate values to calculate percentages in R

Calculating percentage contributions over time is important for many applications in data science and can be used not only to investigate seasonal changes in natural phenomena such as snowfall but also patterns in human behaviour such as spending.

Remember to like and subscribe to the Data Science Code in Python + R newsletter to get your weekly reminder and commit to learning a little more Python and/or R every week. Each newsletter covers how to do one data science job in both Python and R.

To view or add a comment, sign in

More articles by Matt Rosinski

Insights from the community

Others also viewed

Explore topics