Web Scraping using Python

Abhishek Bhagat
3 min readDec 18, 2021
Web Scraping

This blog will explain web scraping and building a database to move Data Frame as a table on the dataset scraped from the website. I have used python as a programming language. I will explain in detail so that anyone who has some basic understanding of python will be able to scrape data and build models.

Web scraping refers to the extraction of data from a website. The information from the web is collected and then exported into a more helpful format. Be it a spreadsheet or an API or in tables of database.

The action of web scraping isn’t illegal. However, some rules need to be followed. Web scraping becomes illegal when we scrap data that have sensitive information (e.g., private data involving username, password, personal health or medical information) or if it is Copyrighted data (e.g., YouTube videos, Flickr photos). We need to follow the Terms of Service that explicitly prohibits web scraping, and always consult a lawyer if you are do it for profit.

Let’s do some basic setup before we begin scraping.

Check the version of Google Chrome installed on your system
Go to settings ->Help -> About Google Chrome and check the version.

Download Chrome driver according to your version of Chrome. And download driver according to your operating system.
Use this link:

Below is the code for initial setup. In the code replace {location} variable to the path of location of your Chrome driver. You will also need to install python libraries if it is not installed in your system. Use below code to install libraries.

#To install selenium and Beautifulsoup
!pip install selenium
!pip install beautifulsoup4

When libraries are installed and you have replaced the location, run the script with any python UI:

Initial set-up code

Below are the python functions that will return the matched element.
find_element_by_id: return only the first matched element
Return a list of matched elements
find_element_by_name
find_element_by_xpath
find_element_by_class_name
To get details on the above functions

Here, I have taken the Estimize company website to scrap data for reference. It is Open financial estimates platform.

Create a free account; a premium account is not required.
Click on Login, then hover on the username box, right-click, and go to Inspect.
Right-click on user[Login] as shown in figure 1. Choose “copy” and click on copy Xpath. It is the path for providing a username.
reference: //*[@id=”user_login”]
Similarly, again hover to the password box, right-click, and choose “inspect” and get XPath for password.
reference: //*[@id=”user_password”]
Similarly, again hover to the login box, right-click, and choose “inspect” and get XPath for Login.
reference: //*[@id=”new_user”]/input[3]

Figure 1

I have created a function to Login, which you can call to sign. You can use this function for any website. However, you will have to change Xpath for username, password and submit button according to the website.

Sign-In Function

Now, let’s scrap some data from the website. We will collect analyst information and build data frame on top of that. We will then move this data frame to MySQL database. We will scrap for one ticker. Here, the ticker represents a company. Refer to below code.

Web-scraping and Data Frame

We will now connect to MySQL Database to push our data to a table. We also need to install MySQL-Connector-Python. In addition, we need to create Database in MySQL.

#To install mysql-connector-python 
!pip install -U mysql-connector-python
#To create Database
Create Database Test_DB

We will now move the Data Frame to the Database using the below code.

Connect to Database and python

Conclusion

Web-scraping is very fun to learn, and it is very important if we want to get data from websites. However, we have to be very careful while scraping data as we should not scrape private data. You can connect with me if you need any help with code. You can reach out to me on LinkedIn @ Abhishek Bhagat | LinkedIn

--

--

Abhishek Bhagat

Big Data, Data Science, Data Lakes, and Cloud Computing enthusiast.