Did Your Comments Say It All? Predicting Ghana’s Election Through YouTube Sentiment
My new found interest is playing with Youtube comments or generally just identifying patterns in any available data. In a groundbreaking exploration of online sentiment, I expanded my YouTube sentiment analysis approach from the U.S. elections to Ghana’s 2024 presidential election. As it stands now, the accuracy of the prediction is 2 for 2! Check out the older post which focused on the US election. Using a dataset of 100 videos per candidate, John Dramani Mahama(NDC) and Mahamudu Bawumia (NPP), I analyzed public sentiment through YouTube comments to predict the election outcome. In this blog, I continue to pose the question of whether sentiment from YouTube comments alone could genuinely forecast something as significant as a presidential election outcome.
Disclaimer: This exercise was born out of curiosity. While my prediction turned out to be accurate, this approach has limitations. There may be more efficient methods to analyze public sentiment on election outcomes, and improvements to the code are welcome.
PermalinkCollecting Data
I collected video data using the YouTube API, focusing on videos with keywords like "John Dramani Mahama 2024" and "Mahamudu Bawumia 2024." Comments were processed and analyzed using the TextBlob library, classifying them as positive, neutral, or negative. Key steps included:
Gathering video statistics and metadata.
Extracting comments while filtering out those with limited public engagement.
Applying sentiment analysis to aggregate public sentiment scores for each candidate.
PermalinkAnalyzing Sentiment
With the TextBlob
library, I developed a function to classify each comment as positive, neutral, or negative based on polarity scores. The process was straightforward but had some limitations due to the simplicity of the sentiment scoring. Here’s how each score was interpreted:
Positive Score: Comments expressing support or enthusiasm for the candidate.
Negative Score: Comments with critical or unfavorable views.
Neutral Score: Comments that appeared impartial or unrelated to either candidate.
A key limitation was that several videos had comments disabled, reducing the available data and potentially affecting the representativeness of the sample. Additionally, sentiment analysis can oversimplify complex political expressions, potentially misinterpreting sarcasm or nuanced opinions.
PermalinkAggregating and Predicting
After classifying each comment, I aggregated the positive, neutral, and negative scores for each candidate. Despite the constraints of a small dataset, John Dramani Mahama emerged with a slightly higher positive sentiment score, leading to a prediction of his victory. Below, I have added the edit timeline which shows the prediction happened before the election by the 6th of December. The edit on the 11th of December was to ensure the removal of my API Key to enable me to share this publicly.
PermalinkElection Outcome: Prediction Success!
The election announced declaring John Dramani Mahama confirmed the prediction, which was exciting to see. However, this success doesn’t necessarily validate the model as the most reliable predictor. Given the limited data even though I increased the number to 100 videos as opposed to the US election where I used 10, this approach might not generalize well. Again, the constraints of basic sentiment analysis is a limitation. More sophisticated techniques, such as deep learning or a broader dataset across multiple social media platforms, might yield more consistent results.
PermalinkWant to dive in ?
Overview: This project was developed in Google Colab and uses the YouTube Data API, accessed with an API key obtained from the Google Developers Console. Remember to generate one of your own and keep it private.
Permalink1. Preparing the Code in Google Colab
The project was done in Google Colab. I also used Google’s YouTube Data API to fetch comments and analyze sentiment for each candidate.
Key Code Snippets and Explanations
API Key Setup
In the Google Colab notebook, the API key was set up as follows:
pythonCopy codefrom googleapiclient.discovery import build api_key = "YOUR_API_KEY_HERE" youtube = build("youtube", "v3", developerKey=api_key)
Explanation:
- This code imports the
build
function fromgoogleapiclient
, which allows access to YouTube’s Data API. Theapi_key
variable should contain your API key, which grants access to the API.
- This code imports the
Searching for Videos by Candidate Name
Here’s a snippet of the search functionality:
# Initialize variables query = "John Dramani Mahama 2024" # Search query for the NDC presidential candidate max_results_per_page = 50 # Maximum allowed results per API request total_videos = 100 # Desired number of videos retrieved_videos = 0 # Counter for retrieved videos video_ids = [] # List to store video IDs next_page_token = None # Token for pagination # Iterate to fetch up to 100 videos while retrieved_videos < total_videos: # Executes a search request search_response = youtube.search().list( part="snippet", q=query, type="video", maxResults=max_results_per_page, pageToken=next_page_token ).execute()
Explanation:
- This code retrieves up to 100 YouTube videos related to "John Dramani Mahama 2024" for analysis. It initializes variables like
query
for the search term,max_results_per_page
(50), andtotal_videos
(100). Thewhile
loop sends paginated requests to the YouTube API usingyoutube.search
().list()
. Each request fetches video metadata (titles, descriptions, etc.), storing video IDs invideo_ids
until 100 videos are retrieved or no more pages are available. This prepares the dataset for further analysis, such as sentiment evaluation.
Here’s an the code to search for videos related to “Mahamudu Bawumia 2024”
# Searching for videos related to "Mahamudu Bawumia 2024"
bawumia_response = youtube.search().list(
part="snippet",
q="Mahamudu Bawumia 2024", # Search query for the NPP Presidential candidate
type="video",
maxResults=50 # Maximum results per page
).execute()
# Initialize variables for pagination and results
bawumia_video_ids = [] # List to store video IDs
next_page_token = None # Token for pagination
total_videos = 100 # Target number of videos
# Loop to fetch up to 100 video IDs
while len(bawumia_video_ids) < total_videos:
# Fetch results with pagination
response = youtube.search().list(
part="snippet",
q="Mahamudu Bawumia 2024",
type="video",
maxResults=50, # Maximum allowed per request
pageToken=next_page_token # Token for pagination
).execute()
Extracting Comments and Performing Sentiment Analysis
After obtaining video IDs, the next step is to fetch comments and perform sentiment analysis using the
TextBlob
library:from textblob import TextBlob # Function to classify sentiment def get_sentiment(text): analysis = TextBlob(text) if analysis.sentiment.polarity > 0: return "positive" elif analysis.sentiment.polarity < 0: return "negative" else: return "neutral"
Explanation:
This code defines a function,
get_sentiment
, that uses the TextBlob library to analyze the sentiment of a given text. Here's how it works:TextBlob Initialization: The input
text
is analyzed usingTextBlob
, which calculates its sentiment polarity.Polarity Classification:
If the
polarity
score is greater than 0, the text is classified as "positive".If the
polarity
score is less than 0, the text is classified as "negative".If the
polarity
score is 0, the text is classified as "neutral".
Purpose: This function is used to categorize text (e.g., comments) into sentiment types for further analysis, such as evaluating public opinion.
Aggregating Sentiment Results
After sentiment analysis, comments are grouped to determine the candidate with a higher positive score:
# Aggregate scores for John Dramani Mahama videos for video in search_response["items"]: comments = analyze_comments(video["id"]["videoId"]) update_scores(mahama_scores, comments) # Aggregate scores for Mahamudu Bawumia videos for video in bawumia_response["items"]: comments = analyze_comments(video["id"]["videoId"]) update_scores(bawumia_scores, comments) # Calculate total scores by summing the values in the dictionaries mahama_total = sum(mahama_scores.values()) bawumia_total = sum(bawumia_scores.values())
Explanation:
This code snippet is setting up and executing a process to retrieve up to 100 YouTube videos related to the search query "John Dramani Mahama 2024", which focuses on the NDC presidential candidate. Here’s a breakdown of its meaning:
Variable Initialization:
query
: Specifies the search keyword to find videos related to John Dramani Mahama.max_results_per_page
: Sets the maximum number of video results per API request (limit is 50).total_videos
: The total number of videos desired for analysis (100 in this case).retrieved_videos
: Tracks how many videos have been retrieved so far.video_ids
: An empty list to store the unique IDs of the retrieved videos.next_page_token
: A token used for paginating through multiple pages of search results.
Iteration to Fetch Videos:
The
while
loop continues until the desired number of videos (100) is retrieved.Each iteration sends a request to the YouTube API using the
youtube.search
().list()
method, with parameters such as:part="snippet"
: Requests video metadata like titles, descriptions, and publish dates.q=query
: Searches for videos matching the query "John Dramani Mahama 2024."type="video"
: Ensures the results are video content.maxResults=max_results_per_page
: Limits the number of results to 50 per request.pageToken=next_page_token
: Retrieves the next page of results when available.
Execution:
The
.execute()
method sends the API request and retrieves the search results.The process continues until the required number of videos (100) is collected or no more pages are available.
Predicting the Winner Based on Sentiment
Finally, the sentiment analysis results are used to predict the winner based on the higher aggregate positive score:
# Determine winner if mahama_total > bawumia_total: print("Based on sentiment analysis, John Dramani Mahama is projected to win.") else: print("Based on sentiment analysis, Mahamudu Bawumia is projected to win.")
PermalinkKey Findings
Comments for Mahama videos displayed a significant proportion of positive sentiment, reflecting themes of support and hope for his leadership.
Sentiment analysis for Bawumia videos indicated slightly lower positive scores, with neutral sentiments dominating.
Aggregate Sentiment Scores:
Mahama: Higher positive sentiment score, reflecting strong favorability.
Bawumia: A mix of neutral and positive sentiments but with less intensity.
PermalinkChallenges and Limitations
While this approach provides valuable insights into public perception, it has its constraints:
Comment Bias: Users engaging in online comments may not represent the broader voting population.
Data Availability: Disabled comments on some videos limited the dataset.
Sentiment Complexity: Nuanced or sarcastic comments can be misclassified, potentially affecting results.
PermalinkFinal Takeaways
This project highlights the potential of digital platforms like YouTube to gauge public sentiment during elections. While not a definitive predictor, sentiment analysis offers a creative lens to interpret public opinion. By refining methodologies and expanding datasets, such approaches could play a significant role in understanding voter behavior in the digital age. Could YouTube become the next election prediction tool for global politics? Only time will tell. There’s plenty of room for improvement, and I encourage others to explore this code, refine it, and experiment with larger and more diverse datasets.