Mastering Row-Level Transformations in Pandas with apply()

ITVersity, Inc.

making IT resourceful (𝗗𝗮𝘁𝗮 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝗶𝗻𝗴 𝗟𝗔𝗕𝗦)

Published Feb 20, 2025

Row-level transformations are crucial in data processing and feature engineering, allowing us to modify datasets dynamically. In this article, you’ll learn how to use Pandas' apply() method for row-wise operations, including:

Creating derived columns based on calculations.
Handling missing values effectively.
Applying conditional logic for business insights.

Let’s dive in!

Why Use apply() for Row-Level Transformations?

The apply() method allows applying a function to each row or column of a Pandas DataFrame. It’s useful when:

Built-in Pandas operations aren't sufficient.
Row-level calculations are required.
Custom logic or conditions must be applied to individual records.

You can download the datasets from the following GitHub link: GitHub Datasets

Step 1: Load the Dataset

Let’s load the dataset and preview the first 10 records.

import pandas as pd

# Load the dataset
toyota_sales_data = pd.read_csv("data/car_sales/toyota_sales_data.csv")

# Preview the dataset
print(toyota_sales_data.head())

This dataset contains sale amount and commission percentage, which we will use for transformations.

Step 2: Understanding the apply() Method

Before using apply(), let’s check its documentation:

help(pd.DataFrame.apply)

Key parameters of apply():

func → Function to apply.
axis=0 (default) → Apply the function to columns.
axis=1 → Apply the function to rows.

Since we are performing row-wise transformations, we will set axis=1.

Step 3: Creating a Derived Column - Commission Amount

Problem Statement:

We need to calculate the Commission Amount for each sale using:

Method 1: Using a Custom Function

# Define a function to calculate commission amount
def calculate_commission(sale):
    return sale["sale_amount"] * sale["commission_percentage"]

# Apply function to calculate commission amount for each row
toyota_sales_data["commission_amount"] = toyota_sales_data.apply(calculate_commission, axis=1)

# Preview the data
print(toyota_sales_data.head())

Step 4: Handling Missing Values

Some commission_percentage values are missing (NaN). If not handled, NaN values will propagate.

Solution: Assign a Default Commission Rate

If commission_percentage is missing, we assume a default value of 0%.

# Enhanced function to handle NaN values
def calculate_commission_safe(sale):
    commission_pct = sale["commission_percentage"] if pd.notnull(sale["commission_percentage"]) else 0
    return sale["sale_amount"] * commission_pct

# Apply function
toyota_sales_data["commission_amount"] = toyota_sales_data.apply(calculate_commission_safe, axis=1)

# Preview data
print(toyota_sales_data.head())

Step 5: Using a Lambda Function

We can simplify the function using a Lambda expression.

Step 6: Adding a Flag Column for High Commissions

Problem Statement:

We want to identify sales where the commission amount exceeds $1,000.

Solution:

Create a new column high_commission, which stores:

True if Commission Amount > $1,000
False otherwise

# Define function to flag high commission amounts
def flag_high_commission(sale):
    return sale["commission_amount"] > 1000

# Apply function to create a new column
toyota_sales_data["high_commission"] = toyota_sales_data.apply(flag_high_commission, axis=1)

# Preview data
print(toyota_sales_data.head())

Step 7: Removing Unnecessary Columns

Since we no longer need sale_date and sale_status, let's drop them.

# Drop unnecessary columns
toyota_sales_data = toyota_sales_data.drop(columns=["sale_date", "sale_status"])

# Preview data
print(toyota_sales_data.head())

Step 8: Combining Multiple Transformations

We can calculate commission and flag high commissions in one step:

# Define combined transformation
def calculate_commission_and_flag(sale):
    commission_pct = sale["commission_percentage"] if pd.notnull(sale["commission_percentage"]) else 0
    commission_amount = sale["sale_amount"] * commission_pct
    high_commission = commission_amount > 1000
    return pd.Series([commission_amount, high_commission], index=["commission_amount", "high_commission"])

# Apply function to create two new columns
toyota_sales_data[["commission_amount", "high_commission"]] = toyota_sales_data.apply(calculate_commission_and_flag, axis=1)

# Preview data
print(toyota_sales_data.head())

Best Practices for Using apply()

Use apply() when built-in functions are not sufficient.
Use lambda for simple calculations and functions for complex logic.
Ensure missing values (NaN) are handled properly.
Apply transformations in a single pass for efficiency.

Practice Assignment

💡 Want to practice? Attempt the Implementing Custom Transformations with apply() in Pandas Assignment👉 Click Here.

💬 Need help? Leave a comment, and we’ll assist you!

What’s Next?

In the next lecture, we will explore Adding and Updating Columns in a Pandas DataFrame. This is a crucial aspect of data manipulation, allowing us to enhance, modify, and restructure our datasets effectively.

Click 👇 to Enroll in the Python for Beginners: Learn Python with Hands-on Projects. It only costs $10 and you can reach out to us for $10 Coupon.

Conclusion

In this article, you learned:

How to use apply() for row-level transformations.
How to handle missing values while performing calculations.
How to flag data points based on custom conditions.
How to perform multiple transformations efficiently.

Mastering apply() enables powerful transformations in Pandas, making your data processing workflows more efficient!

📢 Engage With Us!

✨ Authored by Siva Kalyan Geddada , Abhinav Sai Penmetsa

🔄 Share this article with anyone interested in data engineering, Python, or data analysis. 💬 Have questions or need help? Comment below! Let's discuss.

🚀 Follow us for more hands-on data science tutorials!

Mastering Row-Level Transformations in Pandas with apply()

ITVersity, Inc.

making IT resourceful (𝗗𝗮𝘁𝗮 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝗶𝗻𝗴 𝗟𝗔𝗕𝗦)

Why Use apply() for Row-Level Transformations?

Step 1: Load the Dataset

Step 2: Understanding the apply() Method

Step 3: Creating a Derived Column - Commission Amount

Problem Statement:

Method 1: Using a Custom Function

Step 4: Handling Missing Values

Solution: Assign a Default Commission Rate

Step 5: Using a Lambda Function

Recommended by LinkedIn

Step 6: Adding a Flag Column for High Commissions

Problem Statement:

Solution:

Step 7: Removing Unnecessary Columns

Step 8: Combining Multiple Transformations

Best Practices for Using apply()

Practice Assignment

What’s Next?

Conclusion

📢 Engage With Us!

AI, Data and Cloud Updates

2,604 followers

More articles by ITVersity, Inc.

Insights from the community

Others also viewed

How to index data into Vector DB from highly unstructured pdfs

Preparing Data for EDA

6-Month Roadmap to Master Data Structures and Algorithms: From Beginner to Advanced

R Basics in Data Science

Data Visualization and EDA

Pandas vs. Polars: Would You Replace Pandas with Polars?

Mastering Pandas: Key Functions You Need to Know

Data cleaning tools

Python Pandas

An Introduction to Principal Component Analysis (PCA) for Developers

Explore topics

Why Use apply() for Row-Level Transformations?

Step 1: Load the Dataset

Step 2: Understanding the apply() Method

Step 3: Creating a Derived Column - Commission Amount

Problem Statement:

Method 1: Using a Custom Function

Step 4: Handling Missing Values

Solution: Assign a Default Commission Rate

Step 5: Using a Lambda Function

Recommended by LinkedIn

Step 6: Adding a Flag Column for High Commissions

Problem Statement:

Solution:

Step 7: Removing Unnecessary Columns

Step 8: Combining Multiple Transformations

Best Practices for Using apply()

Practice Assignment

What’s Next?

Conclusion

📢 Engage With Us!

AI, Data and Cloud Updates

2,604 followers

More articles by ITVersity, Inc.

ITVersity AI and Data Newsletter - Edition - May 6, 2025

Top Online Resources & Job Updates - Apr 25, 2025

ITVersity AI and Data Newsletter - Edition - April 22, 2025

Top Online Resources & Job Updates - Edition - Apr 18, 2025

ITVersity AI and Data Newsletter - Edition - April 15, 2025

ITVersity AI and Data Newsletter - Edition - April 8, 2025

Top Online Resources & Job Updates - Edition - Apr 4, 2025

The Future of AI Unfolds!

Top Online Resources & Job Updates - Edition - Mar 28, 2025

ITVersity AI and Data Newsletter - Edition - Mar 25, 2025

Insights from the community

Others also viewed

How to index data into Vector DB from highly unstructured pdfs

Preparing Data for EDA

6-Month Roadmap to Master Data Structures and Algorithms: From Beginner to Advanced

R Basics in Data Science

Data Visualization and EDA

Pandas vs. Polars: Would You Replace Pandas with Polars?

Mastering Pandas: Key Functions You Need to Know

Data cleaning tools

Python Pandas

An Introduction to Principal Component Analysis (PCA) for Developers

Explore topics