Data Analysts: A Gentle & Practical Guide to Data Manipulation with Pandas

Adalbert Ngongang

Stats Enthusiast | Data Advocate | Strategic Thinker | AI Observer

Published Dec 10, 2024

“The true power of a data analyst lies in how they manipulate and breathe life into raw data.”

Welcome your practical guide to Pandas, a versatile Python library that empowers data analysts to transform raw data into actionable insights. Whether you're just starting out in data analysis or transitioning into this field, this guide will introduce you to key data manipulation techniques with Pandas—and do so in a way that’s clear, practical, and relevant.

By the end, you’ll have a solid foundation in Pandas and, hopefully, the confidence to take on real-world data challenges. Let’s go!

1. What is Pandas and Why Should You Care?

Imagine trying to organise a chaotic mountain of raw data. Doing so manually would be tedious and full of errors. Enter Pandas—your Swiss Army knife for working with data in Python.

Why do analysts love Pandas?

Simple yet powerful structures: It offers two core data structures: Series for 1D data (think of it as a column in a table). DataFrame for 2D tabular data (like a full Excel sheet or SQL table).
Automated alignment: It aligns data intuitively—handling missing data gracefully and reducing the risk of common manual errors.
Rich functionality: From slicing, filtering, and grouping data to reshaping and merging datasets, Pandas can solve most data manipulation challenges.
Time series support: Got timestamps or dates? Pandas has you covered.
Plays well with others: Seamlessly integrates with NumPy, Matplotlib, Scikit-learn, and other Python data libraries. In essence, Pandas is to data analysis what Excel is to spreadsheets—but significantly more flexible and scalable.

2. Getting Hands-On: Installation and Setup

Before we get into the fun stuff, let’s set up Pandas.

Installation

Not installed yet? A quick pip command is all you need:

pip install pandas

Importing

The conventional way to import Pandas is:

import pandas as pd

This shorthand (pd) is widely used, so sticking to it will help as you learn from others or share your code.

3. Understanding the Core Pandas Structures

To harness Pandas effectively, you need a good grasp of its building blocks:

3.1 Series: The 1D Powerhouse

A Series is like a single column in a spreadsheet, but smarter.

Here’s how you can create one:

# From a list
data = [10, 20, 30]
s = pd.Series(data)  
print(s)

Want custom labels instead of default numeric indexing? No problem:

s = pd.Series(data, index=['a', 'b', 'c'])  
print(s)
# Access elements: By label or position
print(s['a'])   # By label
print(s[0])     # By position

3.2 DataFrame: Your Tabular Best Friend

A DataFrame is a table with rows and columns. Think of it as Excel—but with Python’s flexibility.

Here’s how to create a DataFrame:

# From a dictionary
data = {'Name': ['Alice', 'Bob'], 'Age': [25, 30], 'City': ['London', 'Paris']}
df = pd.DataFrame(data)
print(df)

Add or access rows and columns effortlessly:

# Accessing a column
print(df['Name'])  # Outputs a Series

4. Importing Data: From File to DataFrame

Not all data comes pre-loaded. You’ll often read files (CSV, Excel, etc.) into Pandas.

# Reading a CSV file
df = pd.read_csv('data.csv')

It’s that simple! For other formats, Pandas has dedicated methods (e.g., read_excel for Excel, read_json for JSON).

5. Exploring Your Data

Before manipulating data, you need to understand what you’re working with.

First rows: Get a quick preview of the dataset.

print(df.head(5))  # First 5 rows

Dimensions & Summary:

print(df.shape)  # Output: (rows, columns)

print(df.info())  # Detailed metadata: Column names, types, and more

print(df.describe())  # Statistical summary for numeric columns

6. Selecting and Filtering Data

Now that you’ve seen your data, let’s dive into slicing, dicing, and filtering.

Recommended by LinkedIn

A Beginner's Guide to Pandas for Powerful Data Analysis

Walter Shields 7 months ago

Data Analysis Power with Pandas DataFrames

Hossein Safari 1 year ago

Unlocking Pandas: Listing Column Names and a Solid…

ITVersity, Inc. 4 months ago

6.1 Columns

Extract a column with ease:

# Single column (Series)
ages = df['Age']

# Multiple columns (DataFrame)
subset = df[['Name', 'City']]

6.2 Rows

This is where indexing comes into play:

# Select by label using loc
print(df.loc[0])  # First row
print(df.loc[0:2, ['Name', 'Age']])  # Rows 0 to 2, with Name and Age columns.

# Select by position using iloc
print(df.iloc[0])

6.3 Filtering with Conditions

Want rows with specific criteria?

# Filter rows where Age > 25
filtered = df[df['Age'] > 25]

Combine conditions for even more precision:

filtered = df[(df['Age'] > 25) & (df['City'] == 'London')]

7. Basic Data Manipulation

Data manipulation is where Pandas really shines.

7.1 Adding Columns

Introduce new insights by creating calculated fields:

df['Age_Double'] = df['Age'] * 2

7.2 Dropping Columns or Rows

Remove unneeded data:

# Drop the Age_Double column
df = df.drop('Age_Double', axis=1)

7.3 Sorting Data

Reorganise your data for clarity:

# Sort by Age (ascending)
sorted_df = df.sort_values('Age')

8. Handling Missing Data

Missing data doesn’t have to derail your analysis. Pandas makes handling it seamless.

# Fill NaN values
df['Salary'] = df['Salary'].fillna(0)

Or flag missing values easily:

df.isnull().sum()  # Counts missing values per column

9. Grouping and Aggregation

Want to summarise your data by categories? Use groupby:

grouped = df.groupby('City')['Age'].mean()
print(grouped)

Need more advanced summaries? Try agg for multiple metrics:

grouped_stats = df.groupby('City').agg({'Age': ['mean', 'max'], 'Salary': 'sum'})
print(grouped_stats)

10. Combining DataFrames

Concatenation

Stack DataFrames horizontally or vertically:

result = pd.concat([df1, df2])

Merging (Think SQL Joins)

Combine data based on common columns:

result = pd.merge(df1, df2, on='key', how='inner')

11. Wrapping Up: Your Path Forward

You’ve seen the basics of Pandas: how to explore, manipulate, and structure data efficiently. The best way to solidify your skills is by practising.

Your next steps:

Experiment with real-world datasets (there are plenty available online for free!).
Explore additional topics like pivoting, time series, or performance optimisation.
Build small analysis projects to gain confidence with Pandas tools.

Ready to take the next step? Share your questions or the data challenges you’d love to tackle in the comments. Let’s grow together! 🚀

To view or add a comment, sign in

Insights from the community

Data Science

What are the key differences between pandas' Series and DataFrame?

1. What is Pandas and Why Should You Care?

2. Getting Hands-On: Installation and Setup

Installation

Importing

3. Understanding the Core Pandas Structures

3.1 Series: The 1D Powerhouse

3.2 DataFrame: Your Tabular Best Friend

4. Importing Data: From File to DataFrame

5. Exploring Your Data

6. Selecting and Filtering Data

Recommended by LinkedIn

6.1 Columns

6.2 Rows

6.3 Filtering with Conditions

7. Basic Data Manipulation

7.1 Adding Columns

7.2 Dropping Columns or Rows

7.3 Sorting Data

8. Handling Missing Data

9. Grouping and Aggregation

10. Combining DataFrames

Concatenation

Merging (Think SQL Joins)

11. Wrapping Up: Your Path Forward

More articles by Adalbert Ngongang

The Art of Clean Data: What Every New Analyst Needs to Know About Data Quality

Discovering the Real You: A Journey Beyond Data and Doubt

The Missing Piece in Data Analysis? People. Meet Your KIPs.

Stop Wasting Time: What Every New Data Analyst Gets Wrong (and How to Fix It)

Why Your Data Science Career Needs a Belief Upgrade: Recalibrating Your Mental Models for Growth

The Hard Truth About Starting a Career in Data — And How to Keep Going

Turning Numbers into Stories: How to Visualise Abstract Math

Data Analyst: Why You Don’t Need Free Time to Accelerate Your Growth

Decoding the Math Lingo: Speak Data Science Like a Pro

Breaking the Myth: You Can Train Your Maths Brain

Insights from the community

Others also viewed

The Roadmap to learn Data Science in 2022 - The efficient way

Comparison Between SQL Joins and Python Joins

Mastering Big Data Analysis with Python's Pandas: Unleash the Power of Scalable Data Processing

Data Science Tools in Demand 2023

Mastering Data Wrangling with Pandas: A Step-by-Step Guide

Open-Source Tools for Data Analysis: A Beginner's Guide

Data Manipulation Tools (Pandas, SQL) in Data Science

Creating Your Next Big Thing? Don't Skip the Prototype Phase with SAS Viya

The Truth About ‘Learning Data Analytics in 30 Days’: Why Mastery Takes More Than a Hashtag

Pandas vs. Polars: Would You Replace Pandas with Polars?

Explore topics