Mastering Row-Level Transformations in Pandas with apply()

Mastering Row-Level Transformations in Pandas with apply()

Row-level transformations are crucial in data processing and feature engineering, allowing us to modify datasets dynamically. In this article, you’ll learn how to use Pandas' apply() method for row-wise operations, including:

  • Creating derived columns based on calculations.
  • Handling missing values effectively.
  • Applying conditional logic for business insights.

Let’s dive in!



Why Use apply() for Row-Level Transformations?

The apply() method allows applying a function to each row or column of a Pandas DataFrame. It’s useful when:

  • Built-in Pandas operations aren't sufficient.
  • Row-level calculations are required.
  • Custom logic or conditions must be applied to individual records.


You can download the datasets from the following GitHub link: GitHub Datasets

Step 1: Load the Dataset

Let’s load the dataset and preview the first 10 records.

import pandas as pd

# Load the dataset
toyota_sales_data = pd.read_csv("data/car_sales/toyota_sales_data.csv")

# Preview the dataset
print(toyota_sales_data.head())        

This dataset contains sale amount and commission percentage, which we will use for transformations.


Step 2: Understanding the apply() Method

Before using apply(), let’s check its documentation:

help(pd.DataFrame.apply)        

Key parameters of apply():

  • func → Function to apply.
  • axis=0 (default) → Apply the function to columns.
  • axis=1 → Apply the function to rows.

Since we are performing row-wise transformations, we will set axis=1.


Step 3: Creating a Derived Column - Commission Amount

Problem Statement:

We need to calculate the Commission Amount for each sale using:

Method 1: Using a Custom Function

# Define a function to calculate commission amount
def calculate_commission(sale):
    return sale["sale_amount"] * sale["commission_percentage"]

# Apply function to calculate commission amount for each row
toyota_sales_data["commission_amount"] = toyota_sales_data.apply(calculate_commission, axis=1)

# Preview the data
print(toyota_sales_data.head())        

Step 4: Handling Missing Values

Some commission_percentage values are missing (NaN). If not handled, NaN values will propagate.

Solution: Assign a Default Commission Rate

If commission_percentage is missing, we assume a default value of 0%.

# Enhanced function to handle NaN values
def calculate_commission_safe(sale):
    commission_pct = sale["commission_percentage"] if pd.notnull(sale["commission_percentage"]) else 0
    return sale["sale_amount"] * commission_pct

# Apply function
toyota_sales_data["commission_amount"] = toyota_sales_data.apply(calculate_commission_safe, axis=1)

# Preview data
print(toyota_sales_data.head())        

Step 5: Using a Lambda Function

We can simplify the function using a Lambda expression.

toyota_sales_data["commission_amount"] = toyota_sales_data.apply(
    lambda sale: sale["sale_amount"] * (sale["commission_percentage"] if pd.notnull(sale["commission_percentage"]) else 0),
    axis=1
)

# Preview data
print(toyota_sales_data.head())        

Step 6: Adding a Flag Column for High Commissions

Problem Statement:

We want to identify sales where the commission amount exceeds $1,000.

Solution:

Create a new column high_commission, which stores:

  • True if Commission Amount > $1,000
  • False otherwise

# Define function to flag high commission amounts
def flag_high_commission(sale):
    return sale["commission_amount"] > 1000

# Apply function to create a new column
toyota_sales_data["high_commission"] = toyota_sales_data.apply(flag_high_commission, axis=1)

# Preview data
print(toyota_sales_data.head())        

Step 7: Removing Unnecessary Columns

Since we no longer need sale_date and sale_status, let's drop them.

# Drop unnecessary columns
toyota_sales_data = toyota_sales_data.drop(columns=["sale_date", "sale_status"])

# Preview data
print(toyota_sales_data.head())        

Step 8: Combining Multiple Transformations

We can calculate commission and flag high commissions in one step:

# Define combined transformation
def calculate_commission_and_flag(sale):
    commission_pct = sale["commission_percentage"] if pd.notnull(sale["commission_percentage"]) else 0
    commission_amount = sale["sale_amount"] * commission_pct
    high_commission = commission_amount > 1000
    return pd.Series([commission_amount, high_commission], index=["commission_amount", "high_commission"])

# Apply function to create two new columns
toyota_sales_data[["commission_amount", "high_commission"]] = toyota_sales_data.apply(calculate_commission_and_flag, axis=1)

# Preview data
print(toyota_sales_data.head())        

Best Practices for Using apply()

  • Use apply() when built-in functions are not sufficient.
  • Use lambda for simple calculations and functions for complex logic.
  • Ensure missing values (NaN) are handled properly.
  • Apply transformations in a single pass for efficiency.


Practice Assignment

💡 Want to practice? Attempt the Implementing Custom Transformations with apply() in Pandas Assignment👉 Click Here.

💬 Need help? Leave a comment, and we’ll assist you!


What’s Next?

In the next lecture, we will explore Adding and Updating Columns in a Pandas DataFrame. This is a crucial aspect of data manipulation, allowing us to enhance, modify, and restructure our datasets effectively.


Click 👇 to Enroll in the Python for Beginners: Learn Python with Hands-on Projects. It only costs $10 and you can reach out to us for $10 Coupon.

Conclusion

In this article, you learned:

  • How to use apply() for row-level transformations.
  • How to handle missing values while performing calculations.
  • How to flag data points based on custom conditions.
  • How to perform multiple transformations efficiently.

Mastering apply() enables powerful transformations in Pandas, making your data processing workflows more efficient!


📢 Engage With Us!

Authored by Siva Kalyan Geddada , Abhinav Sai Penmetsa

🔄 Share this article with anyone interested in data engineering, Python, or data analysis. 💬 Have questions or need help? Comment below! Let's discuss.

🚀 Follow us for more hands-on data science tutorials!

To view or add a comment, sign in

More articles by ITVersity, Inc.

Insights from the community

Others also viewed

Explore topics