In this article, we will discuss Analyse Covid-19 data and will visualize it using Plotly Express in Python. This article deals with creating dozens of bar charts, line graphs, bubble charts, scatter plots. The graph that will be made in this project will be of excellent quality. Envisioning COVID-19 will primarily be using Plotly Express for this project. The analysis and visualization enable people to understand complex scenarios and make predictions about the future from the current situation.
This analysis summarizes the modeling, simulation, and analytics work around the COVID-19 outbreak around the world from the perspective of data science and visual analytics. It examines the impact of best practices and preventive measures in various sectors and enables outbreaks to be managed with available health resources.
Tools and Technologies Used in the Project: Google Colab(Runtime type - GPU).
Requirements to Build the Project:
Stepwise Implementation
Step 1: Importing Necessary Libraries
The task is simple, once the installation of all the required libraries is successful, they need to be imported to the working space, since they will provide the additional support for analysis and visualization.
Example: importing libraries
Python
# Data analysis and Manipulation
import plotly.graph_objs as go
import plotly.io as pio
import plotly.express as px
import pandas as pd
# Data Visualization
import matplotlib.pyplot as plt
# Importing Plotly
import plotly.offline as py
py.init_notebook_mode(connected=True)
# Initializing Plotly
pio.renderers.default = 'colab'
Step 2: Importing the Datasets
Importing three datasets into this project
- covid- This dataset contains Country/Region, Continent, Population, TotalCases, NewCases, TotalDeaths, NewDeaths, TotalRecovered, NewRecovered, ActiveCases, Serious, Critical, Tot Cases/1M pop, Deaths/1M pop, TotalTests, Tests/1M pop, WHO Region, iso_alpha.
- covid_grouped- This dataset contains Date(from 20-01-22 to 20-07-27), Country/Region, Confirmed, Deaths, Recovered, Active, New cases, New deaths, New recovered, WHO Region, iso_alpha.
- coviddeath- This dataset contains real-world examples of a number of Covid-19 deaths and the reasons behind the deaths.
To import datasets to the working space pandas read_csv() method can be used.
Syntax:
read_csv(path)
Example: Importing datasets
Python
# Importing Dataset1
dataset1 = pd.read_csv("covid.csv")
dataset1.head() # returns first 5 rows
Output:

Further, information regarding the dataset we are using will help us sample it better for analysis.
Example: getting dataset information
Python
# Returns tuple of shape (Rows, columns)
print(dataset1.shape)
# Returns size of dataframe
print(dataset1.size)
Output:
(209, 17)
3553
Example: dataset information
Python
# Information about Dataset1
# return concise summary of dataframe
dataset1.info()
Output:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 209 entries, 0 to 208
Data columns (total 17 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Country/Region 209 non-null object
1 Continent 208 non-null object
2 Population 208 non-null float64
3 TotalCases 209 non-null int64
4 NewCases 4 non-null float64
5 TotalDeaths 188 non-null float64
6 NewDeaths 3 non-null float64
7 TotalRecovered 205 non-null float64
8 NewRecovered 3 non-null float64
9 ActiveCases 205 non-null float64
10 Serious,Critical 122 non-null float64
11 Tot Cases/1M pop 208 non-null float64
12 Deaths/1M pop 187 non-null float64
13 TotalTests 191 non-null float64
14 Tests/1M pop 191 non-null float64
15 WHO Region 184 non-null object
16 iso_alpha 209 non-null object
dtypes: float64(12), int64(1), object(4)
memory usage: 27.9+ KB
Similarly other datasets can be imported and explored.
Example: Importing dataset
Python
# Importing Dataset2
dataset2 = pd.read_csv("covid_grouped.csv")
dataset2.head() # return first 5 rows of dataset2
Output:

Example: Getting dataset information
Python
# Returns tuple of shape (Rows, columns)
print(dataset2.shape)
# Returns size of dataframe
print(dataset2.size)
Output:
(35156, 11)
386716
Example: Dataset information
Python
# Information about Dataset2
dataset2.info() # return concise summary of dataframe
Output:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 35156 entries, 0 to 35155
Data columns (total 11 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date 35156 non-null object
1 Country/Region 35156 non-null object
2 Confirmed 35156 non-null int64
3 Deaths 35156 non-null int64
4 Recovered 35156 non-null int64
5 Active 35156 non-null int64
6 New cases 35156 non-null int64
7 New deaths 35156 non-null int64
8 New recovered 35156 non-null int64
9 WHO Region 35156 non-null object
10 iso_alpha 35156 non-null object
dtypes: int64(7), object(4)
memory usage: 3.0+ MB
Step 3: Dataset cleaning
Data cleaning is the process of altering, modifying a recordset, correcting erroneous records from the database and identifying incomplete, incorrect, or irrelevant parts of the data, and then removing dirty data.
Example: Getting dataset columns
Python
# Columns labels of a Dataset1
dataset1.columns
Output:
Index(['Country/Region', 'Continent', 'Population', 'TotalCases', 'NewCases', 'TotalDeaths', 'NewDeaths', 'TotalRecovered', 'NewRecovered', 'ActiveCases', 'Serious,Critical', 'Tot Cases/1M pop', 'Deaths/1M pop', 'TotalTests', 'Tests/1M pop', 'WHO Region', 'iso_alpha'], dtype='object')
We don't need 'NewCases', 'NewDeaths', 'NewRecovered' columns as they contains NaN values. So drop these columns by drop() function of pandas.
Syntax:
drop(name of columns)
Example: Cleaning dataframe
Python
# Drop NewCases, NewDeaths, NewRecovered rows from dataset1
dataset1.drop(['NewCases', 'NewDeaths', 'NewRecovered'],
axis=1, inplace=True)
# Select random set of values from dataset1
dataset1.sample(5)
Output:

Let's create a table through the table function already available in plotly express.
Example: Creating table using plotly express
Python
# Import create_table Figure Factory
from plotly.figure_factory import create_table
colorscale = [[0, '#4d004c'], [.5, '#f2e5ff'], [1, '#ffffff']]
table = create_table(dataset1.head(15), colorscale=colorscale)
py.iplot(table)
Output:

Step 4: Bar graphs- Comparisons between COVID infected countries in terms of total cases, total deaths, total recovered & total tests
Using one line of code, we will create amazing graphs using Plotly Express. Visualization can be done easily by moving the cursor in any plot, we can get label presence point directly by using the cursor. We can visualize and analyze the dataset with each aspect using the relation between the columns.
Primarily look at the country with respect to a total number of cases by top 15 countries only and color total cases and hover data as 'Country/Region', 'Continent'.
Example: Bar graph
Python
px.bar(dataset1.head(15), x = 'Country/Region',
y = 'TotalCases',color = 'TotalCases',
height = 500,hover_data = ['Country/Region', 'Continent'])
Output:

As the plot clearly shows the data for the top 15 countries, now again take the country with respect to the total number of cases from the top 15 countries, color the total deaths hover data as 'Country/Region', 'Continent' and analyze the visualization.
Example: Bar graph
Python
px.bar(dataset1.head(15), x = 'Country/Region', y = 'TotalCases',
color = 'TotalDeaths', height = 500,
hover_data = ['Country/Region', 'Continent'])
Output:

Let's analyze by coloring the total number of recovered cases
Example: Bar graph
Python
px.bar(dataset1.head(15), x = 'Country/Region', y = 'TotalCases',
color = 'TotalDeaths', height = 500,
hover_data = ['Country/Region', 'Continent'])
Output:

Visualize the same again by coloring the total number of tests.
Example: Bar graph
Python
px.bar(dataset1.head(15), x = 'Country/Region', y = 'TotalCases',
color = 'TotalTests', height = 500, hover_data = ['Country/Region', 'Continent'])
Output:

The visualization could be as we have done with the top 15 countries with total cases, deaths, recoveries, and tests. We can analyze the plot by looking at them.
Let's create a horizontal orientation plot with X-axis as 'TotalTests' and Y-axis as 'Country/Region' with passing parameter orientation="h" and color the plot by 'TotalTests'.
Example: Bar graph
Python
px.bar(dataset1.head(15), x = 'TotalTests', y = 'Country/Region',
color = 'TotalTests',orientation ='h', height = 500,
hover_data = ['Country/Region', 'Continent'])
Output:

Let's look at 'TotalTests' followed by 'Continent' and color the plot with 'Continent'.
Example: Bar graph
Python
px.bar(dataset1.head(15), x = 'TotalTests', y = 'Continent',
color = 'TotalTests',orientation ='h', height = 500,
hover_data = ['Country/Region', 'Continent'])
Output:
Total Tests & Continents(Horizontal Orientation)Step 5: Data Visualization through Bubble Charts-Continent Wise
Let's create a scatter plot and take a look at the continent's statistics, firstly look at the total number of cases by continent and take hover data as 'Country/Region', 'Continent'.
Example: Scatter plot
Python
px.scatter(dataset1, x='Continent',y='TotalCases',
hover_data=['Country/Region', 'Continent'],
color='TotalCases', size='TotalCases', size_max=80)
Output:

log_y= True, the histogram axis (not the returned parameter) is in log scale. The return parameter (n, bins), i.e. the values of bins and sides of bins are the same for log=True and log=False. This means both n==n2 and bins==bins2 are true
Example: Scatter plot
Python
px.scatter(dataset1.head(57), x='Continent',y='TotalCases',
hover_data=['Country/Region', 'Continent'],
color='TotalCases', size='TotalCases', size_max=80, log_y=True)
Output:

Example: Scatter plot
Python
px.scatter(dataset1.head(54), x='Continent',y='TotalTests',
hover_data=['Country/Region', 'Continent'],
color='TotalTests', size='TotalTests', size_max=80)
Output:

Example: Scatter plot
Python
px.scatter(dataset1.head(50), x='Continent',y='TotalTests',
hover_data=['Country/Region', 'Continent'],
color='TotalTests', size='TotalTests', size_max=80, log_y=True)
Output:

Step 6: Data Visualization through Bubble Charts-Country Wise
Let's take a look at the country-wise data visualization, first look at the continent with respect to the total number of deaths by the top 50 countries only and color the total number of deaths and take the hover data as 'Country/Region', 'Continent'.
Example: Bubble chart
Python
px.scatter(dataset1.head(100), x='Country/Region', y='TotalCases',
hover_data=['Country/Region', 'Continent'],
color='TotalCases', size='TotalCases', size_max=80)
Output:

Now, the Country/Region with respect to the total number of cases for top 30 countries only and color the total number of cases and take the hover data as 'Country/Region', 'Continent'.
Example: Bubble chart
Python
px.scatter(dataset1.head(30), x='Country/Region', y='TotalCases',
hover_data=['Country/Region', 'Continent'],
color='Country/Region', size='TotalCases', size_max=80, log_y=True)
Output:

Now format the image of the country/region in relation to the total number of deaths. And do the same for the other aspects of COVID-19 from dataset1.
Example: Bubble chart
Python
px.scatter(dataset1.head(10), x='Country/Region', y= 'TotalDeaths',
hover_data=['Country/Region', 'Continent'],
color='Country/Region', size= 'TotalDeaths', size_max=80)
Output:

Example: Bubble chart
Python
px.scatter(dataset1.head(30), x='Country/Region', y= 'Tests/1M pop',
hover_data=['Country/Region', 'Continent'],
color='Country/Region', size= 'Tests/1M pop', size_max=80)
Output:

Example: Country/Region VS Tests/1M pop (color-scale of Tests/1M pop)
Python
px.scatter(dataset1.head(30), x='Country/Region', y= 'Tests/1M pop',
hover_data=['Country/Region', 'Continent'],
color='Tests/1M pop', size= 'Tests/1M pop', size_max=80)
Output:

Example: Bubble chart
Python
px.scatter(dataset1.head(30), x='TotalCases', y= 'TotalDeaths',
hover_data=['Country/Region', 'Continent'],
color='TotalDeaths', size= 'TotalDeaths', size_max=80)
Output:

It is clear from the result that they have a linear relationship between the total number of cases and the total number of deaths. That means more cases, more deaths.
Example: Bubble chart
Python
px.scatter(dataset1.head(30), x='TotalCases', y= 'TotalDeaths',
hover_data=['Country/Region', 'Continent'],
color='TotalDeaths', size= 'TotalDeaths', size_max=80,
log_x=True, log_y=True)
Output:
TotalCases VS TotalDeaths (with log_y=True)Example: Bubble chart
Python
px.scatter(dataset1.head(30), x='TotalTests', y= 'TotalCases',
hover_data=['Country/Region', 'Continent'],
color='TotalTests', size= 'TotalTests', size_max=80,
log_x=True, log_y=True)
Output:
TotalTestes VS TotalCasesStep 7: Advanced Data Visualization- Bar graphs for All top infected Countries
In this task, we will explore covid-19 data using bar graphs and charts and use dataset2 as it has date column.
Example: Bar chart
Python
px.bar(dataset2, x="Date", y="Confirmed", color="Confirmed",
hover_data=["Confirmed", "Date", "Country/Region"], height=400)
Output:

The above graph we get as output which includes all countries with respect to recovered cases. we can imagine the exponential growth of corona cases by date. We can use log function for this to be more clear.
Example: Bar chart
Python
px.bar(dataset2, x="Date", y="Confirmed", color="Confirmed",
hover_data=["Confirmed", "Date", "Country/Region"],log_y=True, height=400)
Output:

Let's imagine death instead of confirmation with the same and color it by date.
Example: Bar chart
Python
px.bar(dataset2, x="Date", y="Deaths", color="Deaths",
hover_data=["Confirmed", "Date", "Country/Region"],
log_y=False, height=400)
Output:

Step 8: Countries Specific COVID Data Visualization: (United States)
In this specific task, we will analyze data of the USA country.
Example: Refining dataset to get only USA data
Python
df_US= dataset2.loc[dataset2["Country/Region"]=="US"]
Now let us plot and study the covid situation in the USA.
Example: Bar chart
Python
px.bar(df_US, x="Date", y="Confirmed", color="Confirmed", height=400)
Output:

Here we can clearly see how the confirmed cases increased in the United States with respect to time (January 2020 to July 2020). Similarly, we can check the same for recovered cases, tests and deaths.
Example: Bar chart
Python
px.bar(df_US,x="Date", y="Recovered", color="Recovered", height=400)
Output:

Similarly, we can analyze the data in all the ways to generate the line graph for the same.
Example: Line plot
Python
px.line(df_US,x="Date", y="Recovered", height=400)
Output:

Example: Line plot
Python
px.line(df_US,x="Date", y="Deaths", height=400)
Output:

Example: Line plot
Python
px.line(df_US,x="Date", y="Confirmed", height=400)
Output:

Example: Line plot
Python
px.line(df_US,x="Date", y="New cases", height=400)
Output:

Now let us create bar charts and study the USA situation using that.
Example: Bar chart
Python
px.bar(df_US,x="Date", y="New cases", height=400)
Output:

Similarly, let us also plot a line plot.
Example: Scatter plot
Python
px.scatter(df_US, x="Confirmed", y="Deaths", height=400)
Output:

Step 9: Visualization of Data in terms of Maps
We can use choropleth to visualize the data in terms of maps, with maps usually being the predominant way of visualizing the data. Since COVID-19 is a global phenomenon and so we look through and fix them in terms of wall maps. Ortho-graphics, rectangular and natural earth projection to visualize the data With dataset2 for the purpose as it has Dates column. It will look at the growth of Covid-19 (from Jan to July 2020) as in how the virus reached across the world.
Choropleth is an amazing representation of data on a map. Choropleth maps provide an easy way to visualize how a measurement varies across a geographic areal-Life
Project Application in Real choropleth map displays divided geographical areas or regions that are colored, shaded or patterned in relation to a data variable.
Equi-rectangular Projection:
Syntax: chloropleth()
parameters:
- dataset
- locations= ISOALPHA
- color
- hover_name
- color_continuous_scale= [RdYlGn, Blues, Viridis...]
- animation_frame= Date
Example: Creating map
Python
px.choropleth(dataset2,
locations="iso_alpha",
color="Confirmed",
hover_name="Country/Region",
color_continuous_scale="Blues",
animation_frame="Date")
Output:
This creates an animation containing visualizations from January to July 2020. Playing this animation will make it more clear how the virus spread around the world. The darker the color, the higher the confirmed cases are.
Example: Creating map
Python
px.choropleth(dataset2,
locations='iso_alpha',
color="Deaths",
hover_name="Country/Region",
color_continuous_scale="Viridis",
animation_frame="Date" )
Output:
This code creates an animation of death cases by date. By playing this animation it will be shown how deaths increase around the world.
Natural Earth projection is a compromise pseudo-cylindrical map projection for world maps.
Example: Natural earth projection
Python
px.choropleth(dataset2,
locations='iso_alpha',
color="Recovered",
hover_name="Country/Region",
color_continuous_scale="RdYlGn",
projection="natural earth",
animation_frame="Date" )
Output:
By running the output, things start to become more clear about how the recovery rate changes with respect to the date. Lets also look at how an animation can be added to a bar graph. We can convert the bar graph into animation using Dates column that is in dataset2.
Example: Bar graph animation
Python
px.bar(dataset2, x="WHO Region", y="Confirmed", color="WHO Region",
animation_frame="Date", hover_name="Country/Region")
Output:
When running the output, the animation will run from January to July 2020. It will show 6 different bar graphs, each continent has its own color representing the confirmed cases.
Step 10: Visualize text using Word Cloud
Visualize the causes of death due to covid-19, as covid-19 affects people in different ways, hence creating a word cloud to visualize the leading cause of covid-19 deaths. To visualize the text the steps need to be followed are-
- Used to convert data elements of an array into list.
- Convert the string to one single string.
- Convert the string into word cloud
Dataset3: This dataset contains real world examples of number of Covid-19 deaths and the reasons behind the deaths.
Example: Importing dataset
Python
dataset3= pd.read_csv("coviddeath.csv")
dataset3.head()
Output:

Example: Dataset information
Python
Output:

Example: Getting dataset information
Python
dataset3.groupby(["Condition"]).count()
Output:
Conditions Example: Creating wordcloud
Python
# import word cloud
from wordcloud import WordCloud
sentences = dataset3["Condition"].tolist()
sentences_as_a_string = ' '.join(sentences)
# Convert the string into WordCloud
plt.figure(figsize=(20, 20))
plt.imshow(WordCloud().generate(sentences_as_a_string))
Output:

From the output, it can be clearly seen that the leading cause of death is Influenza Pneumonia. We have converted the condition group to the list and stored the list in the variable "column_to_list". Here we have converted the list into a single string and stored in a variable named "column2_to_string" by using .join().
Example: Creating wordcloud
Python
column2_tolist= dataset3["Condition Group"].tolist()
# Convert the list to one single string
column_to_string= " ".join(column2_tolist)
# Convert the string into WordCloud
plt.figure(figsize=(20,20))
plt.imshow(WordCloud().generate(column_to_string))
Output:

Here, respiratory diseases are the major cause of death followed by circulatory diseases which are cardiovascular diseases.
Colab Link
To get the notebook, click here. For dataset, click here.
Similar Reads
Animated Data Visualization using Plotly Express
Data Visualization is a big thing in the data science industry and displaying the proper statistics to a business or governments can help them immeasurably in improving their services. It is very painful to understand data from different times from multiple charts and make any sense of it. That is w
4 min read
COVID-19 Data Visualization using matplotlib in Python
It feels surreal to imagine how the virus began to spread from one person that is patient zero to four million today. It was possible because of the transport system. Earlier back in the days, we didnât have a fraction of the transportation system we have today. Well, what good practices you can fol
8 min read
Data Analysis and Visualization with Jupyter Notebook
Jupyter Notebook is an interactive interface where you can execute chunks of programming code, each chunk at a time. Jupyter Notebooks are widely used for data analysis and data visualization as you can visualize the output without leaving the environment. In this article, we will go deep down to di
13 min read
Data Visualization using Matplotlib in Python
Matplotlib is a powerful and widely-used Python library for creating static, animated and interactive data visualizations. In this article, we will provide a guide on Matplotlib and how to use it for data visualization with practical implementation. Matplotlib offers a wide variety of plots such as
13 min read
Python - Data visualization using covid19 India API
API (Application Programming Interface) is a computing interface that interacts between multiple software. JSON (JavaScript Object Notation) is a lightweight format for storing and transporting data. It is used to send data from server to web. Required modules: matplotlibrequestspandasjson Command
5 min read
7 Pandas Plotting Functions for Data Visualization
Data visualization is an essential component of data analysis, enabling us to acquire understanding, detect regularities, and convey discoveries efficiently. In this article we will examine seven fundamental Pandas charting functions, including examples and explanations for each kind of plot. Types
7 min read
Visualization and Prediction of Crop Production data using Python
Prerequisite: Data Visualization in Python Visualization is seeing the data along various dimensions. In python, we can visualize the data using various plots available in different modules. In this article, we are going to visualize and predict the crop production data for different years using var
4 min read
Interactive Data Visualization with Plotly Express in R
Data Visualization in R is the process of representing data so that it is easy to understand and interpret. Various packages are present in the R Programming Language for data visualization. Plotly's R graphing library makes interactive, publication-quality graphs. Plotly can be used to make various
9 min read
Plotly for Data Visualization in Python
Plotly is an open-source Python library for creating interactive visualizations like line charts, scatter plots, bar charts and more. In this article, we will explore plotting in Plotly and covers how to create basic charts and enhance them with interactive features. Introduction to Plotly in Python
13 min read
Covid-19 Data Analysis Using Tableau
Tableau is a software used for data visualization and analysis. it's a tool that can make data-analysis easier. Visualizations can be in the form of worksheets or dashboard. Here are some simple steps in creating worksheets and dashboard using covid-19 dataset in Tableau tool. Data link: https://dat
4 min read