SlideShare a Scribd company logo
CSV Files
WHAT IS A CSV FILE ?
• CSV files are used to store a large number of variables – or data.
• Incredibly simplified spreadsheets – think Excel – only the content is stored in plaintext.
• The CSV module is a built-in function that allows Python to parse these types of files.
• The text inside a CSV file is laid out in rows, and each of those has columns, all separated
by commas.
• Every line in the file is a row in the spreadsheet, while the commas are used to define and
separate cells.
CSV MODULE
• The csv module is useful for working with data exported from
spreadsheets and databases into text files formatted with fields and
records, commonly referred to as comma-separated value (CSV) format
because commas are often used to separate the fields in a record.
• If you want to import or export spreadsheets and databases for use in
the Python interpreter, you must rely on the CSV module, or Comma
Separated Values format.
STEPS
• First, save the excel file with ‘.csv’ extension .
• Second, save the csv file in same folder where the python file is there.
• And then write the code for reading and writing of the csv file.
READING A CSV FILE
• There are two ways to read a CSV file.
• You can use the csv module’s reader function or you can use the
DictReader class.
• Using DictReader class:
• Here we have open the csv file ‘mpg.csv’
and try to open the file and read the file
using DictReader() class.
• DictReader() is used to output the data in
dictionary format.
Here, m[:3] prints the first three row from
starting.
READING A CSV FILE
• USING READER() CLASS:
Here , we read the code using the reader() class which separate
the row and column value with comma.
Output:
Writing a CSV File
• The csv module also has two methods that you can use to write a CSV file,
you can use the writer function or the DictWriter class.
• USING DictWriter() CLASS:
LOOPING THROUGH ROWS
The for loop which defines that for the following indented lines, the row variable should contain each element
from the list, and the second line which will print this row variable.
We can open the csv file using open(filename.csv) and
then perform the operation.
LOOPING THROUGH ROWS
In this, we create an empty list ‘model_no’
• After creating empty list we append the
data of row[2] in the list and print the list.
• Once run, this code will print a single list
EXTRACTING INFORMATION FROM CSV FILE
• If you want information about a particular column
then extract it using row[].
• Here in this code, we extract the information about
‘model’ column.
CONVERTING LIST TO SETS IN CSV FILE
Here in this code, ‘set’ function is used to
remove the duplicay of the value and print only
the value once.
First we import the csv module while manipulating
with csv file.
PANDAS
• Pandas is an open source Python library for data analysis.
PANDAS DATA STRUCTURES
Pandas introduces two new data structures to Python :
• Series
• DataFrame
SERIES
SERIES
• Series is a one-dimensional labelled array capable of holding any data type.
• A Series is a one-dimensional object similar to an array, list, or column in a table.
• It will assign a labelled index to each item in the Series.
• By default, each item will receive an index label from 0 to N, where N is the length of
the Series minus one.
SERIES
CREATE A SERIES WITH AN ARBITRARY LIST
In the output the value in list is arranged in series with the index assigned.
The dtype in output is ‘object’ as the strings is taken as
object data type.
You can arrange the values in the list in series
form using pd.series() data structure.
SERIES
Alternatively , specify an index to use when creating the Series.
In this, we can specify the index of the elements which are in the list and then print it, for naming the index we
use index=[] .
The Series constructor can convert a dictonary as well, using the keys of the
dictionary as its index.
In this, series constructor convert
the dictionary key to use as its
index .
SERIES EXAMPLE
If you want to output the index of the values in the series then use , ‘index’ keyword.
SERIES EXAMPLE
If one of the elements in the series is ‘None’ then in the output it
prints ‘None’ only.
If one of the elements in the series is
‘none’ and all elements are numeric
then it prints in the output as ‘NaN’
(not a number) value.
• NaN is not same as None
keyword.
• In numpy we use isnan() to
check NaN value is there or
not.
QUERYING A SERIES
We can basically query in the series using:
 loc() : used when we query about the label
 iloc() : used when we query the data using numeric value.
When you want to query about the
particular element in series using
numeric position use ‘iloc[]’ .
When you want to query about the
particular element in series using
label use ‘loc[]’ .
Pandas csv
Pandas csv
Pandas csv
DATAFRAME
DATAFRAME
• A DataFrame is a tabular data structure comprised of rows and columns.
• A DataFrame is defined as a group of Series objects that share an index (the column
names).
• The Pandas data frame consists of three main components: the data, the index, and
the columns.
DATAFRAME EXAMPLE
head() is used to displays the first five records of the
dataset
Here pd.DataFrame() function is used to frame the different series
object and output the result in two-dimensional form.
EXTRACTING VALUES FROM DATAFRAME
To extract the element by label use loc[] attribute.
In this code, we find out the customer come in ‘shop 2’ index.
We can also extract the element if we want
only particular column by their mentioned
index, pass two values in df.loc[] function.
EXTRACTING VALUES FROM DATAFRAME
In this the ‘place’ column is added in the dataframe
. We can add any column using this form.
If we want to display two or more columns along with the index
then we use this form. In this cost and student column is shown
only with all indices.
RENAME A COLUMN NAME
In this , to rename the column we use
‘df.rename(columns={}) ‘ syntax.
In this, we write the column name which have to
rename.
In this, we have to write the new column
name which you want to mention.
INPLACE
• In any method , if inplace is False then operation won’t affect the underlying data.
• If the inplace is True then nothing going to print out
• And it is tip that something is happen in inplace.
DROP
To drop any column we use drop() function which
drop the mentioned column.
In this, we use inplace =True which tell
something is happen in inplace and nothing
prints it.
• Axis=1 is used if we want to drop the column
• Axis=0 is used if we want to drop the row.
QUERYING A DATAFRAME
In this, we want output for the cost>20 value in dataframe and
it returns True or False if it satisfies the condition.
Where() takes the Boolean masking condition,applies it to the
dataframe series and returns a new dataframe of the series of the
shape shape.
Here count() is used to count the occurrence of cost in
dataframe.
FILTER THE ROWS WITH NaN VALUE
Dropna() function is used to remove the row which contain not a
number value.
We can also filter the rows or drop row by using this way of
writing a code.
QUERYING DATAFRAME USING LOGICAL OPERATION
Here in this, &(and) operation is used in the two
condition and output the result if it satisfies the
both condition.
Here in this, |(or) operation is used in the
two condition and output the result if it
satisfies either of the condition.
USE THIS DATA FOR INDEXING A DATAFRAME
INDEXING A DATAFRAME
Index() is used to display the index or rows
of the dataframe.
Set_index() is used to set the column as an index in
the dataframe.
Reset_index() is used to reset the index that is set
using set_index().
HANDLE MISSING VALUES IN PANDAS
Output:
• Isnull() function returns True for a value if
the value is null otherwise returns False.
• Tail() function is used to display the last five
column from the data.
HANDLE MISSING VALUES IN PANDAS
Output :
Notnull() function returns True if the value is not
null and False when value is null.
HANDLE MISSING VALUES IN PANDAS
Fillna() is used to fill the missing values in csv file
to some value named to it. In this , ‘Various’ is
used to fill the missing values.
Output:
GROUPBY
GROUPBY
• groupby function is used anytime when u want to analyse panda series by
some category.
Census.csv is a csv file.
In this line of code, we want to find the mean of the BIRTHS2012 column
for each CTYNAME column.
GROUPBY EXAMPLE
In this code, if we want to find out the mean of BIRTHS2012 column wrt city
name ‘Ada county’ then use this way .
Output:
GROUPBY EXAMPLE
In this line of code, if you want to
calculate the mean over across all the
column for each CTYNAME, then use this.
AGG() Function
• agg() function allow to specify multiple aggregation function at once.
In this line of code, agg() function is used to aggregate the
value for count,min,max,mean.
SCALES
Pandas csv
NOMINAL SCALES EXAMPLE
Output:
.astype() simply convert the datatype of one form to another.
ORDINAL SCALES EXAMPLE
Output:
If we want to arrange the resulting data in ordered
form, then ordered attribute is used.
SCALES EXAMPLE
Here, the dtype return is of object type.
Here, the dtype return is of category type as we
change the dtype ‘object’ to category using astype.
PIVOT TABLE
PIVOT TABLE
• To give a better representation where the columns are the unique variables
and an index of dates identifies individual observations.
• To reshape the data into this form, use the pivot function
OUTPUT:
PIVOT TABLE
Here, we can use the aggfunc=[] and pass a number of
aggregate operations you want to apply on.
DATA FUNCTIONAITY IN PANDAS
• Timestamp:
• Period : represents a single time span.
DATA FUNCTIONAITY IN PANDAS
DatetimeIndex: is the index of the timestamp
PeriodIndex: is the index of the period
In this ,(‘abc’) is the index
assigned to timestamp value.
In this ,(‘abc’) is the index
assigned to period value.
CONVERTING TO DATETIME
To convert into datetime format
use ‘to _ datetime()’ .
TIMEDELTAS
• TIMEDELTAS: differences in time
In this, we find the difference between the two
timestamps.
MERGING DATAFRAMES
MERGING DATAFRAMES
Use this dataset to merge the
dataframes
OUTER JOIN
Merge() function is use dto merge
the two dataframes .
INNER JOIN
LEFT JOIN
RIGHT JOIN
MERGING DATAFRAMES
Ad

More Related Content

What's hot (20)

Python-Inheritance.pptx
Python-Inheritance.pptxPython-Inheritance.pptx
Python-Inheritance.pptx
Karudaiyar Ganapathy
 
List,tuple,dictionary
List,tuple,dictionaryList,tuple,dictionary
List,tuple,dictionary
nitamhaske
 
Python libraries
Python librariesPython libraries
Python libraries
Prof. Dr. K. Adisesha
 
Python Basics
Python BasicsPython Basics
Python Basics
tusharpanda88
 
Django - Python MVC Framework
Django - Python MVC FrameworkDjango - Python MVC Framework
Django - Python MVC Framework
Bala Kumar
 
Python Dictionaries and Sets
Python Dictionaries and SetsPython Dictionaries and Sets
Python Dictionaries and Sets
Nicole Ryan
 
Method overloading
Method overloadingMethod overloading
Method overloading
Lovely Professional University
 
Modules in Python Programming
Modules in Python ProgrammingModules in Python Programming
Modules in Python Programming
sambitmandal
 
Object oriented approach in python programming
Object oriented approach in python programmingObject oriented approach in python programming
Object oriented approach in python programming
Srinivas Narasegouda
 
Oop concepts in python
Oop concepts in pythonOop concepts in python
Oop concepts in python
baabtra.com - No. 1 supplier of quality freshers
 
Python-Functions.pptx
Python-Functions.pptxPython-Functions.pptx
Python-Functions.pptx
Karudaiyar Ganapathy
 
Regular expressions in Python
Regular expressions in PythonRegular expressions in Python
Regular expressions in Python
Sujith Kumar
 
class and objects
class and objectsclass and objects
class and objects
Payel Guria
 
Python recursion
Python recursionPython recursion
Python recursion
Prof. Dr. K. Adisesha
 
Python-Encapsulation.pptx
Python-Encapsulation.pptxPython-Encapsulation.pptx
Python-Encapsulation.pptx
Karudaiyar Ganapathy
 
Python OOPs
Python OOPsPython OOPs
Python OOPs
Binay Kumar Ray
 
Python pandas Library
Python pandas LibraryPython pandas Library
Python pandas Library
Md. Sohag Miah
 
VB net lab.pdf
VB net lab.pdfVB net lab.pdf
VB net lab.pdf
Prof. Dr. K. Adisesha
 
Basic Concepts of OOPs (Object Oriented Programming in Java)
Basic Concepts of OOPs (Object Oriented Programming in Java)Basic Concepts of OOPs (Object Oriented Programming in Java)
Basic Concepts of OOPs (Object Oriented Programming in Java)
Michelle Anne Meralpis
 
MYSQL.ppt
MYSQL.pptMYSQL.ppt
MYSQL.ppt
webhostingguy
 

Similar to Pandas csv (20)

Pandas yayyyyyyyyyyyyyyyyyin Python.pptx
Pandas yayyyyyyyyyyyyyyyyyin Python.pptxPandas yayyyyyyyyyyyyyyyyyin Python.pptx
Pandas yayyyyyyyyyyyyyyyyyin Python.pptx
AamnaRaza1
 
Lecture 9.pptx
Lecture 9.pptxLecture 9.pptx
Lecture 9.pptx
MathewJohnSinoCruz
 
Unit 3_Numpy_Vsp.pptx
Unit 3_Numpy_Vsp.pptxUnit 3_Numpy_Vsp.pptx
Unit 3_Numpy_Vsp.pptx
prakashvs7
 
SQL select statement and functions
SQL select statement and functionsSQL select statement and functions
SQL select statement and functions
Vikas Gupta
 
Ch08
Ch08Ch08
Ch08
Arriz San Juan
 
python for data anal gh i o fytysis creation.pptx
python for data anal gh i o fytysis creation.pptxpython for data anal gh i o fytysis creation.pptx
python for data anal gh i o fytysis creation.pptx
Vinod Deenathayalan
 
Group B - Pandas Pandas is a powerful Python library that provides high-perfo...
Group B - Pandas Pandas is a powerful Python library that provides high-perfo...Group B - Pandas Pandas is a powerful Python library that provides high-perfo...
Group B - Pandas Pandas is a powerful Python library that provides high-perfo...
HarshitChauhan88
 
Data structure
Data structureData structure
Data structure
Muhammad Farhan
 
Practical Tutorial about the PostgreSQL Database
Practical Tutorial about the PostgreSQL DatabasePractical Tutorial about the PostgreSQL Database
Practical Tutorial about the PostgreSQL Database
sistemashcp
 
Chapter 5-Numpy-Pandas.pptx python programming
Chapter 5-Numpy-Pandas.pptx python programmingChapter 5-Numpy-Pandas.pptx python programming
Chapter 5-Numpy-Pandas.pptx python programming
ssuser77162c
 
Python-for-Data-Analysis.pptx
Python-for-Data-Analysis.pptxPython-for-Data-Analysis.pptx
Python-for-Data-Analysis.pptx
ParveenShaik21
 
series and dataframes from python is discussed
series and dataframes from python is discussedseries and dataframes from python is discussed
series and dataframes from python is discussed
vidhyapm2
 
Unit4pptx__2024_11_ 11_10_16_09.pptx
Unit4pptx__2024_11_      11_10_16_09.pptxUnit4pptx__2024_11_      11_10_16_09.pptx
Unit4pptx__2024_11_ 11_10_16_09.pptx
GImpact
 
Python for data analysis
Python for data analysisPython for data analysis
Python for data analysis
Savitribai Phule Pune University
 
Python-for-Data-Analysis.pdf
Python-for-Data-Analysis.pdfPython-for-Data-Analysis.pdf
Python-for-Data-Analysis.pdf
ssuser598883
 
2. Data Preprocessing with Numpy and Pandas.pptx
2. Data Preprocessing with Numpy and Pandas.pptx2. Data Preprocessing with Numpy and Pandas.pptx
2. Data Preprocessing with Numpy and Pandas.pptx
PeangSereysothirich
 
interenship.pptx
interenship.pptxinterenship.pptx
interenship.pptx
Naveen316549
 
Aggregate.pptx
Aggregate.pptxAggregate.pptx
Aggregate.pptx
Ramakrishna Reddy Bijjam
 
introduction to data structures in pandas
introduction to data structures in pandasintroduction to data structures in pandas
introduction to data structures in pandas
vidhyapm2
 
Python-for-Data-Analysis.pptx
Python-for-Data-Analysis.pptxPython-for-Data-Analysis.pptx
Python-for-Data-Analysis.pptx
Sandeep Singh
 
Pandas yayyyyyyyyyyyyyyyyyin Python.pptx
Pandas yayyyyyyyyyyyyyyyyyin Python.pptxPandas yayyyyyyyyyyyyyyyyyin Python.pptx
Pandas yayyyyyyyyyyyyyyyyyin Python.pptx
AamnaRaza1
 
Unit 3_Numpy_Vsp.pptx
Unit 3_Numpy_Vsp.pptxUnit 3_Numpy_Vsp.pptx
Unit 3_Numpy_Vsp.pptx
prakashvs7
 
SQL select statement and functions
SQL select statement and functionsSQL select statement and functions
SQL select statement and functions
Vikas Gupta
 
python for data anal gh i o fytysis creation.pptx
python for data anal gh i o fytysis creation.pptxpython for data anal gh i o fytysis creation.pptx
python for data anal gh i o fytysis creation.pptx
Vinod Deenathayalan
 
Group B - Pandas Pandas is a powerful Python library that provides high-perfo...
Group B - Pandas Pandas is a powerful Python library that provides high-perfo...Group B - Pandas Pandas is a powerful Python library that provides high-perfo...
Group B - Pandas Pandas is a powerful Python library that provides high-perfo...
HarshitChauhan88
 
Practical Tutorial about the PostgreSQL Database
Practical Tutorial about the PostgreSQL DatabasePractical Tutorial about the PostgreSQL Database
Practical Tutorial about the PostgreSQL Database
sistemashcp
 
Chapter 5-Numpy-Pandas.pptx python programming
Chapter 5-Numpy-Pandas.pptx python programmingChapter 5-Numpy-Pandas.pptx python programming
Chapter 5-Numpy-Pandas.pptx python programming
ssuser77162c
 
Python-for-Data-Analysis.pptx
Python-for-Data-Analysis.pptxPython-for-Data-Analysis.pptx
Python-for-Data-Analysis.pptx
ParveenShaik21
 
series and dataframes from python is discussed
series and dataframes from python is discussedseries and dataframes from python is discussed
series and dataframes from python is discussed
vidhyapm2
 
Unit4pptx__2024_11_ 11_10_16_09.pptx
Unit4pptx__2024_11_      11_10_16_09.pptxUnit4pptx__2024_11_      11_10_16_09.pptx
Unit4pptx__2024_11_ 11_10_16_09.pptx
GImpact
 
Python-for-Data-Analysis.pdf
Python-for-Data-Analysis.pdfPython-for-Data-Analysis.pdf
Python-for-Data-Analysis.pdf
ssuser598883
 
2. Data Preprocessing with Numpy and Pandas.pptx
2. Data Preprocessing with Numpy and Pandas.pptx2. Data Preprocessing with Numpy and Pandas.pptx
2. Data Preprocessing with Numpy and Pandas.pptx
PeangSereysothirich
 
introduction to data structures in pandas
introduction to data structures in pandasintroduction to data structures in pandas
introduction to data structures in pandas
vidhyapm2
 
Python-for-Data-Analysis.pptx
Python-for-Data-Analysis.pptxPython-for-Data-Analysis.pptx
Python-for-Data-Analysis.pptx
Sandeep Singh
 
Ad

More from Devashish Kumar (6)

Python: Data Visualisation
Python: Data  VisualisationPython: Data  Visualisation
Python: Data Visualisation
Devashish Kumar
 
Data Analysis packages
Data Analysis packagesData Analysis packages
Data Analysis packages
Devashish Kumar
 
Functions in python slide share
Functions in python slide shareFunctions in python slide share
Functions in python slide share
Devashish Kumar
 
Data Structures in Python
Data Structures in PythonData Structures in Python
Data Structures in Python
Devashish Kumar
 
Introduction to Python Part-1
Introduction to Python Part-1Introduction to Python Part-1
Introduction to Python Part-1
Devashish Kumar
 
Cloud Computing Introductory-1
Cloud Computing Introductory-1Cloud Computing Introductory-1
Cloud Computing Introductory-1
Devashish Kumar
 
Python: Data Visualisation
Python: Data  VisualisationPython: Data  Visualisation
Python: Data Visualisation
Devashish Kumar
 
Functions in python slide share
Functions in python slide shareFunctions in python slide share
Functions in python slide share
Devashish Kumar
 
Data Structures in Python
Data Structures in PythonData Structures in Python
Data Structures in Python
Devashish Kumar
 
Introduction to Python Part-1
Introduction to Python Part-1Introduction to Python Part-1
Introduction to Python Part-1
Devashish Kumar
 
Cloud Computing Introductory-1
Cloud Computing Introductory-1Cloud Computing Introductory-1
Cloud Computing Introductory-1
Devashish Kumar
 
Ad

Recently uploaded (20)

新西兰文凭奥克兰理工大学毕业证书AUT成绩单补办
新西兰文凭奥克兰理工大学毕业证书AUT成绩单补办新西兰文凭奥克兰理工大学毕业证书AUT成绩单补办
新西兰文凭奥克兰理工大学毕业证书AUT成绩单补办
Taqyea
 
Ann Naser Nabil- Data Scientist Portfolio.pdf
Ann Naser Nabil- Data Scientist Portfolio.pdfAnn Naser Nabil- Data Scientist Portfolio.pdf
Ann Naser Nabil- Data Scientist Portfolio.pdf
আন্ নাসের নাবিল
 
CS-404 COA COURSE FILE JAN JUN 2025.docx
CS-404 COA COURSE FILE JAN JUN 2025.docxCS-404 COA COURSE FILE JAN JUN 2025.docx
CS-404 COA COURSE FILE JAN JUN 2025.docx
nidarizvitit
 
hersh's midterm project.pdf music retail and distribution
hersh's midterm project.pdf music retail and distributionhersh's midterm project.pdf music retail and distribution
hersh's midterm project.pdf music retail and distribution
hershtara1
 
L1_Slides_Foundational Concepts_508.pptx
L1_Slides_Foundational Concepts_508.pptxL1_Slides_Foundational Concepts_508.pptx
L1_Slides_Foundational Concepts_508.pptx
38NoopurPatel
 
Process Mining and Official Statistics - CBS
Process Mining and Official Statistics - CBSProcess Mining and Official Statistics - CBS
Process Mining and Official Statistics - CBS
Process mining Evangelist
 
AWS RDS Presentation to make concepts easy.pptx
AWS RDS Presentation to make concepts easy.pptxAWS RDS Presentation to make concepts easy.pptx
AWS RDS Presentation to make concepts easy.pptx
bharatkumarbhojwani
 
Improving Product Manufacturing Processes
Improving Product Manufacturing ProcessesImproving Product Manufacturing Processes
Improving Product Manufacturing Processes
Process mining Evangelist
 
indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...
indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...
indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...
disnakertransjabarda
 
Multi-tenant Data Pipeline Orchestration
Multi-tenant Data Pipeline OrchestrationMulti-tenant Data Pipeline Orchestration
Multi-tenant Data Pipeline Orchestration
Romi Kuntsman
 
Understanding Complex Development Processes
Understanding Complex Development ProcessesUnderstanding Complex Development Processes
Understanding Complex Development Processes
Process mining Evangelist
 
Z14_IBM__APL_by_Christian_Demmer_IBM.pdf
Z14_IBM__APL_by_Christian_Demmer_IBM.pdfZ14_IBM__APL_by_Christian_Demmer_IBM.pdf
Z14_IBM__APL_by_Christian_Demmer_IBM.pdf
Fariborz Seyedloo
 
Automated Melanoma Detection via Image Processing.pptx
Automated Melanoma Detection via Image Processing.pptxAutomated Melanoma Detection via Image Processing.pptx
Automated Melanoma Detection via Image Processing.pptx
handrymaharjan23
 
RAG Chatbot using AWS Bedrock and Streamlit Framework
RAG Chatbot using AWS Bedrock and Streamlit FrameworkRAG Chatbot using AWS Bedrock and Streamlit Framework
RAG Chatbot using AWS Bedrock and Streamlit Framework
apanneer
 
文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询
文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询
文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询
Taqyea
 
Chapter 6-3 Introducingthe Concepts .pptx
Chapter 6-3 Introducingthe Concepts .pptxChapter 6-3 Introducingthe Concepts .pptx
Chapter 6-3 Introducingthe Concepts .pptx
PermissionTafadzwaCh
 
AI ------------------------------ W1L2.pptx
AI ------------------------------ W1L2.pptxAI ------------------------------ W1L2.pptx
AI ------------------------------ W1L2.pptx
AyeshaJalil6
 
Process Mining at Dimension Data - Jan vermeulen
Process Mining at Dimension Data - Jan vermeulenProcess Mining at Dimension Data - Jan vermeulen
Process Mining at Dimension Data - Jan vermeulen
Process mining Evangelist
 
50_questions_full.pptxdddddddddddddddddd
50_questions_full.pptxdddddddddddddddddd50_questions_full.pptxdddddddddddddddddd
50_questions_full.pptxdddddddddddddddddd
emir73065
 
HershAggregator (2).pdf musicretaildistribution
HershAggregator (2).pdf musicretaildistributionHershAggregator (2).pdf musicretaildistribution
HershAggregator (2).pdf musicretaildistribution
hershtara1
 
新西兰文凭奥克兰理工大学毕业证书AUT成绩单补办
新西兰文凭奥克兰理工大学毕业证书AUT成绩单补办新西兰文凭奥克兰理工大学毕业证书AUT成绩单补办
新西兰文凭奥克兰理工大学毕业证书AUT成绩单补办
Taqyea
 
CS-404 COA COURSE FILE JAN JUN 2025.docx
CS-404 COA COURSE FILE JAN JUN 2025.docxCS-404 COA COURSE FILE JAN JUN 2025.docx
CS-404 COA COURSE FILE JAN JUN 2025.docx
nidarizvitit
 
hersh's midterm project.pdf music retail and distribution
hersh's midterm project.pdf music retail and distributionhersh's midterm project.pdf music retail and distribution
hersh's midterm project.pdf music retail and distribution
hershtara1
 
L1_Slides_Foundational Concepts_508.pptx
L1_Slides_Foundational Concepts_508.pptxL1_Slides_Foundational Concepts_508.pptx
L1_Slides_Foundational Concepts_508.pptx
38NoopurPatel
 
Process Mining and Official Statistics - CBS
Process Mining and Official Statistics - CBSProcess Mining and Official Statistics - CBS
Process Mining and Official Statistics - CBS
Process mining Evangelist
 
AWS RDS Presentation to make concepts easy.pptx
AWS RDS Presentation to make concepts easy.pptxAWS RDS Presentation to make concepts easy.pptx
AWS RDS Presentation to make concepts easy.pptx
bharatkumarbhojwani
 
indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...
indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...
indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...
disnakertransjabarda
 
Multi-tenant Data Pipeline Orchestration
Multi-tenant Data Pipeline OrchestrationMulti-tenant Data Pipeline Orchestration
Multi-tenant Data Pipeline Orchestration
Romi Kuntsman
 
Z14_IBM__APL_by_Christian_Demmer_IBM.pdf
Z14_IBM__APL_by_Christian_Demmer_IBM.pdfZ14_IBM__APL_by_Christian_Demmer_IBM.pdf
Z14_IBM__APL_by_Christian_Demmer_IBM.pdf
Fariborz Seyedloo
 
Automated Melanoma Detection via Image Processing.pptx
Automated Melanoma Detection via Image Processing.pptxAutomated Melanoma Detection via Image Processing.pptx
Automated Melanoma Detection via Image Processing.pptx
handrymaharjan23
 
RAG Chatbot using AWS Bedrock and Streamlit Framework
RAG Chatbot using AWS Bedrock and Streamlit FrameworkRAG Chatbot using AWS Bedrock and Streamlit Framework
RAG Chatbot using AWS Bedrock and Streamlit Framework
apanneer
 
文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询
文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询
文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询
Taqyea
 
Chapter 6-3 Introducingthe Concepts .pptx
Chapter 6-3 Introducingthe Concepts .pptxChapter 6-3 Introducingthe Concepts .pptx
Chapter 6-3 Introducingthe Concepts .pptx
PermissionTafadzwaCh
 
AI ------------------------------ W1L2.pptx
AI ------------------------------ W1L2.pptxAI ------------------------------ W1L2.pptx
AI ------------------------------ W1L2.pptx
AyeshaJalil6
 
Process Mining at Dimension Data - Jan vermeulen
Process Mining at Dimension Data - Jan vermeulenProcess Mining at Dimension Data - Jan vermeulen
Process Mining at Dimension Data - Jan vermeulen
Process mining Evangelist
 
50_questions_full.pptxdddddddddddddddddd
50_questions_full.pptxdddddddddddddddddd50_questions_full.pptxdddddddddddddddddd
50_questions_full.pptxdddddddddddddddddd
emir73065
 
HershAggregator (2).pdf musicretaildistribution
HershAggregator (2).pdf musicretaildistributionHershAggregator (2).pdf musicretaildistribution
HershAggregator (2).pdf musicretaildistribution
hershtara1
 

Pandas csv

  • 2. WHAT IS A CSV FILE ? • CSV files are used to store a large number of variables – or data. • Incredibly simplified spreadsheets – think Excel – only the content is stored in plaintext. • The CSV module is a built-in function that allows Python to parse these types of files. • The text inside a CSV file is laid out in rows, and each of those has columns, all separated by commas. • Every line in the file is a row in the spreadsheet, while the commas are used to define and separate cells.
  • 3. CSV MODULE • The csv module is useful for working with data exported from spreadsheets and databases into text files formatted with fields and records, commonly referred to as comma-separated value (CSV) format because commas are often used to separate the fields in a record. • If you want to import or export spreadsheets and databases for use in the Python interpreter, you must rely on the CSV module, or Comma Separated Values format.
  • 4. STEPS • First, save the excel file with ‘.csv’ extension . • Second, save the csv file in same folder where the python file is there. • And then write the code for reading and writing of the csv file.
  • 5. READING A CSV FILE • There are two ways to read a CSV file. • You can use the csv module’s reader function or you can use the DictReader class. • Using DictReader class: • Here we have open the csv file ‘mpg.csv’ and try to open the file and read the file using DictReader() class. • DictReader() is used to output the data in dictionary format. Here, m[:3] prints the first three row from starting.
  • 6. READING A CSV FILE • USING READER() CLASS: Here , we read the code using the reader() class which separate the row and column value with comma. Output:
  • 7. Writing a CSV File • The csv module also has two methods that you can use to write a CSV file, you can use the writer function or the DictWriter class. • USING DictWriter() CLASS:
  • 8. LOOPING THROUGH ROWS The for loop which defines that for the following indented lines, the row variable should contain each element from the list, and the second line which will print this row variable. We can open the csv file using open(filename.csv) and then perform the operation.
  • 9. LOOPING THROUGH ROWS In this, we create an empty list ‘model_no’ • After creating empty list we append the data of row[2] in the list and print the list. • Once run, this code will print a single list
  • 10. EXTRACTING INFORMATION FROM CSV FILE • If you want information about a particular column then extract it using row[]. • Here in this code, we extract the information about ‘model’ column.
  • 11. CONVERTING LIST TO SETS IN CSV FILE Here in this code, ‘set’ function is used to remove the duplicay of the value and print only the value once. First we import the csv module while manipulating with csv file.
  • 12. PANDAS • Pandas is an open source Python library for data analysis.
  • 13. PANDAS DATA STRUCTURES Pandas introduces two new data structures to Python : • Series • DataFrame
  • 15. SERIES • Series is a one-dimensional labelled array capable of holding any data type. • A Series is a one-dimensional object similar to an array, list, or column in a table. • It will assign a labelled index to each item in the Series. • By default, each item will receive an index label from 0 to N, where N is the length of the Series minus one.
  • 16. SERIES CREATE A SERIES WITH AN ARBITRARY LIST In the output the value in list is arranged in series with the index assigned. The dtype in output is ‘object’ as the strings is taken as object data type. You can arrange the values in the list in series form using pd.series() data structure.
  • 17. SERIES Alternatively , specify an index to use when creating the Series. In this, we can specify the index of the elements which are in the list and then print it, for naming the index we use index=[] .
  • 18. The Series constructor can convert a dictonary as well, using the keys of the dictionary as its index. In this, series constructor convert the dictionary key to use as its index .
  • 19. SERIES EXAMPLE If you want to output the index of the values in the series then use , ‘index’ keyword.
  • 20. SERIES EXAMPLE If one of the elements in the series is ‘None’ then in the output it prints ‘None’ only. If one of the elements in the series is ‘none’ and all elements are numeric then it prints in the output as ‘NaN’ (not a number) value. • NaN is not same as None keyword. • In numpy we use isnan() to check NaN value is there or not.
  • 21. QUERYING A SERIES We can basically query in the series using:  loc() : used when we query about the label  iloc() : used when we query the data using numeric value. When you want to query about the particular element in series using numeric position use ‘iloc[]’ . When you want to query about the particular element in series using label use ‘loc[]’ .
  • 26. DATAFRAME • A DataFrame is a tabular data structure comprised of rows and columns. • A DataFrame is defined as a group of Series objects that share an index (the column names). • The Pandas data frame consists of three main components: the data, the index, and the columns.
  • 27. DATAFRAME EXAMPLE head() is used to displays the first five records of the dataset Here pd.DataFrame() function is used to frame the different series object and output the result in two-dimensional form.
  • 28. EXTRACTING VALUES FROM DATAFRAME To extract the element by label use loc[] attribute. In this code, we find out the customer come in ‘shop 2’ index. We can also extract the element if we want only particular column by their mentioned index, pass two values in df.loc[] function.
  • 29. EXTRACTING VALUES FROM DATAFRAME In this the ‘place’ column is added in the dataframe . We can add any column using this form. If we want to display two or more columns along with the index then we use this form. In this cost and student column is shown only with all indices.
  • 30. RENAME A COLUMN NAME In this , to rename the column we use ‘df.rename(columns={}) ‘ syntax. In this, we write the column name which have to rename. In this, we have to write the new column name which you want to mention.
  • 31. INPLACE • In any method , if inplace is False then operation won’t affect the underlying data. • If the inplace is True then nothing going to print out • And it is tip that something is happen in inplace.
  • 32. DROP To drop any column we use drop() function which drop the mentioned column. In this, we use inplace =True which tell something is happen in inplace and nothing prints it. • Axis=1 is used if we want to drop the column • Axis=0 is used if we want to drop the row.
  • 33. QUERYING A DATAFRAME In this, we want output for the cost>20 value in dataframe and it returns True or False if it satisfies the condition. Where() takes the Boolean masking condition,applies it to the dataframe series and returns a new dataframe of the series of the shape shape. Here count() is used to count the occurrence of cost in dataframe.
  • 34. FILTER THE ROWS WITH NaN VALUE Dropna() function is used to remove the row which contain not a number value. We can also filter the rows or drop row by using this way of writing a code.
  • 35. QUERYING DATAFRAME USING LOGICAL OPERATION Here in this, &(and) operation is used in the two condition and output the result if it satisfies the both condition. Here in this, |(or) operation is used in the two condition and output the result if it satisfies either of the condition.
  • 36. USE THIS DATA FOR INDEXING A DATAFRAME
  • 37. INDEXING A DATAFRAME Index() is used to display the index or rows of the dataframe. Set_index() is used to set the column as an index in the dataframe. Reset_index() is used to reset the index that is set using set_index().
  • 38. HANDLE MISSING VALUES IN PANDAS Output: • Isnull() function returns True for a value if the value is null otherwise returns False. • Tail() function is used to display the last five column from the data.
  • 39. HANDLE MISSING VALUES IN PANDAS Output : Notnull() function returns True if the value is not null and False when value is null.
  • 40. HANDLE MISSING VALUES IN PANDAS Fillna() is used to fill the missing values in csv file to some value named to it. In this , ‘Various’ is used to fill the missing values. Output:
  • 42. GROUPBY • groupby function is used anytime when u want to analyse panda series by some category. Census.csv is a csv file. In this line of code, we want to find the mean of the BIRTHS2012 column for each CTYNAME column.
  • 43. GROUPBY EXAMPLE In this code, if we want to find out the mean of BIRTHS2012 column wrt city name ‘Ada county’ then use this way . Output:
  • 44. GROUPBY EXAMPLE In this line of code, if you want to calculate the mean over across all the column for each CTYNAME, then use this.
  • 45. AGG() Function • agg() function allow to specify multiple aggregation function at once. In this line of code, agg() function is used to aggregate the value for count,min,max,mean.
  • 48. NOMINAL SCALES EXAMPLE Output: .astype() simply convert the datatype of one form to another.
  • 49. ORDINAL SCALES EXAMPLE Output: If we want to arrange the resulting data in ordered form, then ordered attribute is used.
  • 50. SCALES EXAMPLE Here, the dtype return is of object type. Here, the dtype return is of category type as we change the dtype ‘object’ to category using astype.
  • 52. PIVOT TABLE • To give a better representation where the columns are the unique variables and an index of dates identifies individual observations. • To reshape the data into this form, use the pivot function OUTPUT:
  • 53. PIVOT TABLE Here, we can use the aggfunc=[] and pass a number of aggregate operations you want to apply on.
  • 54. DATA FUNCTIONAITY IN PANDAS • Timestamp: • Period : represents a single time span.
  • 55. DATA FUNCTIONAITY IN PANDAS DatetimeIndex: is the index of the timestamp PeriodIndex: is the index of the period In this ,(‘abc’) is the index assigned to timestamp value. In this ,(‘abc’) is the index assigned to period value.
  • 56. CONVERTING TO DATETIME To convert into datetime format use ‘to _ datetime()’ .
  • 57. TIMEDELTAS • TIMEDELTAS: differences in time In this, we find the difference between the two timestamps.
  • 59. MERGING DATAFRAMES Use this dataset to merge the dataframes
  • 60. OUTER JOIN Merge() function is use dto merge the two dataframes .
  翻译: