Web Scraping Political Content from Social Media Sites: An Exploratory
Data Analysis Approach
Abstract
In today’s rapidly evolving digital landscape, social media platforms
such as Twitter and Facebook are among the most popular microblogging
applications, playing an important role in quickly disseminating
up-to-date information to a large user base. In addition to being
valuable sources of entertainment and platforms for business campaigns,
social media apps significantly impact political activities in
developing democracies. However, social media networks often become
sources of rapid dissemination of fake news, viral videos, hate speech,
and false articles, leading to political propaganda. Existing studies
need to address how Pakistan’s three major political parties use social
media platforms for this purpose. In this study, we used exploratory
data analysis (EDA) to explore and analysed the initial content of
social networks to understand, identify, and gain insights for further
analysis. We developed a web scraper, a valuable tool used in data
science, to extract unstructured content from the official Twitter and
Facebook accounts, primarily used to spread political propaganda
publicly. The web scraper automatically extracts various information
from Facebook posts, including likes, shares, comments, and views. It
extracts information such as likes, comments, and retweets from tweets.
The collected data is then processed and analyzed using statistical
methods to gain knowledge and insights from social media sites. One
month of data analysis suggests that Pakistan Tehreek-e-Insaf (PTI)
posted 79.37% more content on Facebook, while Pakistan Muslim League
Nawaz (PML (N)) tweeted 89.30% more on Twitter compared to other
parties. This activity is part of their political propaganda to build a
narrative and shape public opinion among their followers and voters in
Pakistan.