Mastering Row-Level Transformations in Pandas with apply()
Row-level transformations are crucial in data processing and feature engineering, allowing us to modify datasets dynamically. In this article, you’ll learn how to use Pandas' apply() method for row-wise operations, including:
Let’s dive in!
Why Use apply() for Row-Level Transformations?
The apply() method allows applying a function to each row or column of a Pandas DataFrame. It’s useful when:
You can download the datasets from the following GitHub link: GitHub Datasets
Step 1: Load the Dataset
Let’s load the dataset and preview the first 10 records.
import pandas as pd
# Load the dataset
toyota_sales_data = pd.read_csv("data/car_sales/toyota_sales_data.csv")
# Preview the dataset
print(toyota_sales_data.head())
This dataset contains sale amount and commission percentage, which we will use for transformations.
Step 2: Understanding the apply() Method
Before using apply(), let’s check its documentation:
help(pd.DataFrame.apply)
Key parameters of apply():
Since we are performing row-wise transformations, we will set axis=1.
Step 3: Creating a Derived Column - Commission Amount
Problem Statement:
We need to calculate the Commission Amount for each sale using:
Method 1: Using a Custom Function
# Define a function to calculate commission amount
def calculate_commission(sale):
return sale["sale_amount"] * sale["commission_percentage"]
# Apply function to calculate commission amount for each row
toyota_sales_data["commission_amount"] = toyota_sales_data.apply(calculate_commission, axis=1)
# Preview the data
print(toyota_sales_data.head())
Step 4: Handling Missing Values
Some commission_percentage values are missing (NaN). If not handled, NaN values will propagate.
Solution: Assign a Default Commission Rate
If commission_percentage is missing, we assume a default value of 0%.
# Enhanced function to handle NaN values
def calculate_commission_safe(sale):
commission_pct = sale["commission_percentage"] if pd.notnull(sale["commission_percentage"]) else 0
return sale["sale_amount"] * commission_pct
# Apply function
toyota_sales_data["commission_amount"] = toyota_sales_data.apply(calculate_commission_safe, axis=1)
# Preview data
print(toyota_sales_data.head())
Step 5: Using a Lambda Function
We can simplify the function using a Lambda expression.
Recommended by LinkedIn
toyota_sales_data["commission_amount"] = toyota_sales_data.apply(
lambda sale: sale["sale_amount"] * (sale["commission_percentage"] if pd.notnull(sale["commission_percentage"]) else 0),
axis=1
)
# Preview data
print(toyota_sales_data.head())
Step 6: Adding a Flag Column for High Commissions
Problem Statement:
We want to identify sales where the commission amount exceeds $1,000.
Solution:
Create a new column high_commission, which stores:
# Define function to flag high commission amounts
def flag_high_commission(sale):
return sale["commission_amount"] > 1000
# Apply function to create a new column
toyota_sales_data["high_commission"] = toyota_sales_data.apply(flag_high_commission, axis=1)
# Preview data
print(toyota_sales_data.head())
Step 7: Removing Unnecessary Columns
Since we no longer need sale_date and sale_status, let's drop them.
# Drop unnecessary columns
toyota_sales_data = toyota_sales_data.drop(columns=["sale_date", "sale_status"])
# Preview data
print(toyota_sales_data.head())
Step 8: Combining Multiple Transformations
We can calculate commission and flag high commissions in one step:
# Define combined transformation
def calculate_commission_and_flag(sale):
commission_pct = sale["commission_percentage"] if pd.notnull(sale["commission_percentage"]) else 0
commission_amount = sale["sale_amount"] * commission_pct
high_commission = commission_amount > 1000
return pd.Series([commission_amount, high_commission], index=["commission_amount", "high_commission"])
# Apply function to create two new columns
toyota_sales_data[["commission_amount", "high_commission"]] = toyota_sales_data.apply(calculate_commission_and_flag, axis=1)
# Preview data
print(toyota_sales_data.head())
Best Practices for Using apply()
Practice Assignment
💡 Want to practice? Attempt the Implementing Custom Transformations with apply() in Pandas Assignment👉 Click Here.
💬 Need help? Leave a comment, and we’ll assist you!
What’s Next?
In the next lecture, we will explore Adding and Updating Columns in a Pandas DataFrame. This is a crucial aspect of data manipulation, allowing us to enhance, modify, and restructure our datasets effectively.
Click 👇 to Enroll in the Python for Beginners: Learn Python with Hands-on Projects. It only costs $10 and you can reach out to us for $10 Coupon.
Conclusion
In this article, you learned:
Mastering apply() enables powerful transformations in Pandas, making your data processing workflows more efficient!
📢 Engage With Us!
✨ Authored by Siva Kalyan Geddada , Abhinav Sai Penmetsa
🔄 Share this article with anyone interested in data engineering, Python, or data analysis. 💬 Have questions or need help? Comment below! Let's discuss.
🚀 Follow us for more hands-on data science tutorials!