Did Your Comments Say It All? Predicting Ghana’s Election Through YouTube Sentiment

My new found interest is playing with Youtube comments or generally just identifying patterns in any available data. In a groundbreaking exploration of online sentiment, I expanded my YouTube sentiment analysis approach from the U.S. elections to Ghana’s 2024 presidential election. As it stands now, the accuracy of the prediction is 2 for 2! Check out the older post which focused on the US election. Using a dataset of 100 videos per candidate, John Dramani Mahama(NDC) and Mahamudu Bawumia (NPP), I analyzed public sentiment through YouTube comments to predict the election outcome. In this blog, I continue to pose the question of whether sentiment from YouTube comments alone could genuinely forecast something as significant as a presidential election outcome.

Disclaimer: This exercise was born out of curiosity. While my prediction turned out to be accurate, this approach has limitations. There may be more efficient methods to analyze public sentiment on election outcomes, and improvements to the code are welcome.

Collecting Data

I collected video data using the YouTube API, focusing on videos with keywords like "John Dramani Mahama 2024" and "Mahamudu Bawumia 2024." Comments were processed and analyzed using the TextBlob library, classifying them as positive, neutral, or negative. Key steps included:

Gathering video statistics and metadata.
Extracting comments while filtering out those with limited public engagement.
Applying sentiment analysis to aggregate public sentiment scores for each candidate.

Analyzing Sentiment

With the TextBlob library, I developed a function to classify each comment as positive, neutral, or negative based on polarity scores. The process was straightforward but had some limitations due to the simplicity of the sentiment scoring. Here’s how each score was interpreted:

Positive Score: Comments expressing support or enthusiasm for the candidate.
Negative Score: Comments with critical or unfavorable views.
Neutral Score: Comments that appeared impartial or unrelated to either candidate.

A key limitation was that several videos had comments disabled, reducing the available data and potentially affecting the representativeness of the sample. Additionally, sentiment analysis can oversimplify complex political expressions, potentially misinterpreting sarcasm or nuanced opinions.

Aggregating and Predicting

After classifying each comment, I aggregated the positive, neutral, and negative scores for each candidate. Despite the constraints of a small dataset, John Dramani Mahama emerged with a slightly higher positive sentiment score, leading to a prediction of his victory. Below, I have added the edit timeline which shows the prediction happened before the election by the 6th of December. The edit on the 11th of December was to ensure the removal of my API Key to enable me to share this publicly.

Election Outcome: Prediction Success!

The election announced declaring John Dramani Mahama confirmed the prediction, which was exciting to see. However, this success doesn’t necessarily validate the model as the most reliable predictor. Given the limited data even though I increased the number to 100 videos as opposed to the US election where I used 10, this approach might not generalize well. Again, the constraints of basic sentiment analysis is a limitation. More sophisticated techniques, such as deep learning or a broader dataset across multiple social media platforms, might yield more consistent results.

Want to dive in ?

Overview: This project was developed in Google Colab and uses the YouTube Data API, accessed with an API key obtained from the Google Developers Console. Remember to generate one of your own and keep it private.

1. Preparing the Code in Google Colab

The project was done in Google Colab. I also used Google’s YouTube Data API to fetch comments and analyze sentiment for each candidate.

Key Code Snippets and Explanations

API Key Setup

In the Google Colab notebook, the API key was set up as follows:
```
  pythonCopy codefrom googleapiclient.discovery import build
  api_key = "YOUR_API_KEY_HERE"
  youtube = build("youtube", "v3", developerKey=api_key)
```
Explanation:
- This code imports the build function from googleapiclient, which allows access to YouTube’s Data API. The api_key variable should contain your API key, which grants access to the API.

Searching for Videos by Candidate Name

Here’s a snippet of the search functionality:

 # Initialize variables
 query = "John Dramani Mahama 2024"  # Search query for the NDC presidential candidate
 max_results_per_page = 50  # Maximum allowed results per API request
 total_videos = 100  # Desired number of videos
 retrieved_videos = 0  # Counter for retrieved videos
 video_ids = []  # List to store video IDs
 next_page_token = None  # Token for pagination

 # Iterate to fetch up to 100 videos
 while retrieved_videos < total_videos:
     # Executes a search request
     search_response = youtube.search().list(
         part="snippet",
         q=query,
         type="video",
         maxResults=max_results_per_page,
         pageToken=next_page_token
     ).execute()

Explanation:

This code retrieves up to 100 YouTube videos related to "John Dramani Mahama 2024" for analysis. It initializes variables like query for the search term, max_results_per_page (50), and total_videos (100). The while loop sends paginated requests to the YouTube API using youtube.search().list(). Each request fetches video metadata (titles, descriptions, etc.), storing video IDs in video_ids until 100 videos are retrieved or no more pages are available. This prepares the dataset for further analysis, such as sentiment evaluation.

Here’s an the code to search for videos related to “Mahamudu Bawumia 2024”

# Searching for videos related to "Mahamudu Bawumia 2024"
bawumia_response = youtube.search().list(
    part="snippet",
    q="Mahamudu Bawumia 2024",  # Search query for the NPP Presidential candidate
    type="video",
    maxResults=50  # Maximum results per page
).execute()

# Initialize variables for pagination and results
bawumia_video_ids = []  # List to store video IDs
next_page_token = None  # Token for pagination
total_videos = 100  # Target number of videos

# Loop to fetch up to 100 video IDs
while len(bawumia_video_ids) < total_videos:
    # Fetch results with pagination
    response = youtube.search().list(
        part="snippet",
        q="Mahamudu Bawumia 2024",
        type="video",
        maxResults=50,  # Maximum allowed per request
        pageToken=next_page_token  # Token for pagination
    ).execute()

Extracting Comments and Performing Sentiment Analysis

After obtaining video IDs, the next step is to fetch comments and perform sentiment analysis using the TextBloblibrary:
```
  from textblob import TextBlob

 # Function to classify sentiment
 def get_sentiment(text):
     analysis = TextBlob(text)
     if analysis.sentiment.polarity > 0:
         return "positive"
     elif analysis.sentiment.polarity < 0:
         return "negative"
     else:
         return "neutral"
```
Explanation:
- This code defines a function, get_sentiment, that uses the TextBlob library to analyze the sentiment of a given text. Here's how it works:
  1. TextBlob Initialization: The input text is analyzed using TextBlob, which calculates its sentiment polarity.
  2. Polarity Classification:
    - If the polarity score is greater than 0, the text is classified as "positive".
    - If the polarity score is less than 0, the text is classified as "negative".
    - If the polarity score is 0, the text is classified as "neutral".
  3. Purpose: This function is used to categorize text (e.g., comments) into sentiment types for further analysis, such as evaluating public opinion.
Aggregating Sentiment Results

After sentiment analysis, comments are grouped to determine the candidate with a higher positive score:
```
  # Aggregate scores for John Dramani Mahama videos
 for video in search_response["items"]:
     comments = analyze_comments(video["id"]["videoId"])
     update_scores(mahama_scores, comments)

 # Aggregate scores for Mahamudu Bawumia videos
 for video in bawumia_response["items"]:
     comments = analyze_comments(video["id"]["videoId"])
     update_scores(bawumia_scores, comments)

 # Calculate total scores by summing the values in the dictionaries
 mahama_total = sum(mahama_scores.values())
 bawumia_total = sum(bawumia_scores.values())
```
Explanation:
- This code snippet is setting up and executing a process to retrieve up to 100 YouTube videos related to the search query "John Dramani Mahama 2024", which focuses on the NDC presidential candidate. Here’s a breakdown of its meaning:
  1. Variable Initialization:
    - query: Specifies the search keyword to find videos related to John Dramani Mahama.
    - max_results_per_page: Sets the maximum number of video results per API request (limit is 50).
    - total_videos: The total number of videos desired for analysis (100 in this case).
    - retrieved_videos: Tracks how many videos have been retrieved so far.
    - video_ids: An empty list to store the unique IDs of the retrieved videos.
    - next_page_token: A token used for paginating through multiple pages of search results.
  2. Iteration to Fetch Videos:
    - The while loop continues until the desired number of videos (100) is retrieved.
    - Each iteration sends a request to the YouTube API using the youtube.search().list() method, with parameters such as:
      - part="snippet": Requests video metadata like titles, descriptions, and publish dates.
      - q=query: Searches for videos matching the query "John Dramani Mahama 2024."
      - type="video": Ensures the results are video content.
      - maxResults=max_results_per_page: Limits the number of results to 50 per request.
      - pageToken=next_page_token: Retrieves the next page of results when available.
  3. Execution:
    - The .execute() method sends the API request and retrieves the search results.
    - The process continues until the required number of videos (100) is collected or no more pages are available.

Predicting the Winner Based on Sentiment

Finally, the sentiment analysis results are used to predict the winner based on the higher aggregate positive score:

  # Determine winner
 if mahama_total > bawumia_total:
     print("Based on sentiment analysis, John Dramani Mahama is projected to win.")
 else:
     print("Based on sentiment analysis, Mahamudu Bawumia is projected to win.")

Key Findings

Comments for Mahama videos displayed a significant proportion of positive sentiment, reflecting themes of support and hope for his leadership.
Sentiment analysis for Bawumia videos indicated slightly lower positive scores, with neutral sentiments dominating.
Aggregate Sentiment Scores:
- Mahama: Higher positive sentiment score, reflecting strong favorability.
- Bawumia: A mix of neutral and positive sentiments but with less intensity.

Challenges and Limitations

While this approach provides valuable insights into public perception, it has its constraints:

Comment Bias: Users engaging in online comments may not represent the broader voting population.
Data Availability: Disabled comments on some videos limited the dataset.
Sentiment Complexity: Nuanced or sarcastic comments can be misclassified, potentially affecting results.

Final Takeaways

This project highlights the potential of digital platforms like YouTube to gauge public sentiment during elections. While not a definitive predictor, sentiment analysis offers a creative lens to interpret public opinion. By refining methodologies and expanding datasets, such approaches could play a significant role in understanding voter behavior in the digital age. Could YouTube become the next election prediction tool for global politics? Only time will tell. There’s plenty of room for improvement, and I encourage others to explore this code, refine it, and experiment with larger and more diverse datasets.