Building a Web Scrapper and Data Visualization using Pandas with Python.

Building a Web Scrapper and Data Visualization using Pandas with Python.

This article is going to demonstrate fundamentals of web scrapping and data visualization using pandas with Python programming language.

What is web scrapping?

In general terms, web scrapping refers to the extraction of data from a web page. Web scrapping is generally used in data science and data analysis. The objective of web scrapping is to extract data from the websites and make meaning out of it.

A web scrapper will start with capturing the entire data from a website and later allowing the user to manually select the specific data. For example, if someone is scrapping a stock exchange website, the user might want the data related to a specific company stocks.

This article with help you with the basic understanding of a website scrapper programming and data visualization using Pandas module.

In order to program a scrapper, we are going to use Pandas, Requests and BeautifulSoup4 libraries.

To import the libraries, use the below listed commands.

import pandas

import requests

from bs4 import BeautifulSoup

Pandas is an advanced library for python that is used for data manipulation, analysis and visualization.

Beautiful Soup is a library in python which is used for parsing HTML and XML data. And for extracting data from the website code.

Requests is used to handle HTTP requests.

We are going to use the website https://forecast.weather.gov/ as an example. As we move along, we are going to extract weather data from this website and represent it in the form of a table and in a csv file.

No alt text provided for this image

Copy the URL of the webpage. And use the below code to generate the requests.

page = requests.get("https://forecast.weather.gov/MapClick.php?lat=36.37410569300005&lon=-119.27022999999997#.XpQCcsgzbDc")

The response we get will be saved in the variable ‘page’.

The data we need is ‘Days’, ‘Temperature’ and ‘Short Description’

Let’s get into the code to determine the tags for the relevant data. As highlighted in the below image.

No alt text provided for this image

Next step,

Let’s parse the HTML response we got that is stored in the ‘page’ variable, using Beautiful Soup Module.

soup = BeautifulSoup(page.content, 'html.parser')

We’ll filter down the data as per out requirement as shown below.

The text in ‘green color’ are the tags we copied from the webpage elements. It will change as per each website.

week = soup.find(id='seven-day-forecast-body')

items = (week.find_all(class_ = 'tombstone-container'))

Now, we need to extract the text from the data that we have received in the response of these requests.

For each, period name, temperature and short descriptions

period_names = [item.find(class_ = 'period-name').get_text() for item in items]

shortdesc = [item.find(class_ = 'short-desc').get_text() for item in items]

temp = [item.find(class_ = 'temp').get_text() for item in items]

So in English, the above code means, get item ‘period-name’ from the list of items in ‘tombstone-container’ and extract text from it and store it in the variable ‘period_names’

Cool!! We are done with data extraction and if we print the variables ‘period_names’, ‘shortdesc’ and ‘temp’ we get the below output.

OUTPUT:

['Overnight', 'Tuesday', 'TuesdayNight', 'Wednesday', 'WednesdayNight', 'Thursday', 'ThursdayNight', 'Friday', 'FridayNight']

['Mostly Clear', 'Sunny', 'Clear', 'Sunny', 'Mostly Clear', 'Mostly Sunny', 'Slight ChanceShowers', 'ChanceShowers', 'Mostly Cloudy']


['Low: 49 °F', 'High: 75 °F', 'Low: 49 °F', 'High: 78 °F', 'Low: 53 °F', 'High: 79 °F', 'Low: 53 °F', 'High: 74 °F', 'Low: 51 °F']

THIS DOES NOT LOOK BEAUTIFUL!!

Let’s beautify it using pandas!

We will use the functions ‘DataFrame’ from the Pandas module. As given below

weather = pandas.DataFrame(

    {

        'period': period_names,

        'Description': shortdesc,

        'Temperature': temp,

    }

)



print(weather)

Output: Doesn’t it look lovely?

No alt text provided for this image


Alright! With one more line of code, I’ll show you how to save this data to csv file. And boom! We’re done!


weather.to_csv('result.csv')

Output file:

No alt text provided for this image


To view or add a comment, sign in

More articles by Umair Khan Patel

Insights from the community

Others also viewed

Explore topics