Is It Just Online Chatter, or Did We Really Predict the President with YouTube?

Is It Just Online Chatter, or Did We Really Predict the President with YouTube?

Casual online comments, often seen as mere “chatter,” may have real predictive power. This blog poses the question of whether sentiment from YouTube comments alone could genuinely forecast something as significant as a presidential election outcome.

Disclaimer: This post was part of an educational project. While my prediction turned out to be accurate, this approach has limitations. There may be more efficient methods to analyze public sentiment on election outcomes, and improvements to the code are welcome.

When our machine learning professor assigned us the task of predicting the 2024 US presidential election, I was both excited and skeptical. Using sentiment analysis on YouTube comments to gauge public opinion seemed ambitious, especially given the limited dataset. With only ten videos per candidate, the sample size was relatively small, making it challenging to capture the full spectrum of public sentiment. Still, the experience provided valuable insights into the potential of machine learning for social sentiment analysis.

Collecting Data

The task began with a simple approach: search YouTube for ten videos each for Kamala Harris and Donald Trump. My searches included keywords like “Kamala Harris 2024” and “Donald Trump 2024.” I then used the YouTube API to retrieve these videos, gathering information such as video titles, channels, and video IDs to extract relevant comments.

Analyzing Sentiment

With the TextBlob library, I developed a function to classify each comment as positive, neutral, or negative based on polarity scores. The process was straightforward but had some limitations due to the simplicity of the sentiment scoring. Here’s how each score was interpreted:

  1. Positive Score: Comments expressing support or enthusiasm for the candidate.

  2. Negative Score: Comments with critical or unfavorable views.

  3. Neutral Score: Comments that appeared impartial or unrelated to either candidate.

A key limitation was that several videos had comments disabled, reducing the available data and potentially affecting the representativeness of the sample. Additionally, sentiment analysis can oversimplify complex political expressions, potentially misinterpreting sarcasm or nuanced opinions.

Aggregating and Predicting

After classifying each comment, I aggregated the positive, neutral, and negative scores for each candidate. Despite the constraints of a small dataset, Donald Trump emerged with a slightly higher positive sentiment score, leading to a prediction of his victory.

Election Outcome: Prediction Success!

The election announced declaring Donald Trump elected as the next President of the United States confirmed the prediction, which was exciting to see. However, this success doesn’t necessarily validate the model as the most reliable predictor. Given the limited data and the constraints of basic sentiment analysis, this approach might not generalize well. More sophisticated techniques, such as deep learning or a broader dataset across multiple social media platforms, might yield more consistent results.

Want to dive in ?

Overview: This project was developed in Google Colab and uses the YouTube Data API, accessed with an API key obtained from the Google Developers Console. Remember to generate one of your own and keep it private.

1. Preparing the Code in Google Colab

Since the project was done in Google Colab, I used Google’s YouTube Data API to fetch comments and analyze sentiment for each candidate.

Key Code Snippets and Explanations

  1. API Key Setup

    In the Google Colab notebook, the API key was set up as follows:

     pythonCopy codefrom googleapiclient.discovery import build
     api_key = "YOUR_API_KEY_HERE"
     youtube = build("youtube", "v3", developerKey=api_key)
    

    Explan**ation:**

    • This code imports the build function from googleapiclient, which allows access to YouTube’s Data API. The api_key variable should contain your API key, which grants access to the API.
  2. Searching for Videos by Candidate Name

    Here’s a snippet of the search functionality:

     pythonCopy codesearch_response = youtube.search().list(part="snippet", q="Kamala Harris 2024", type="video", maxResults=10).execute()
     video_ids = [item['id']['videoId'] for item in search_response.get('items', [])]
    

    Explanation:

    • The youtube.search().list() function creates a search query to find videos about "Kamala Harris 2024" or "Donald Trump 2024" and retrieves up to 10 results.

    • The video_ids list captures the unique ID for each video, allowing further data extraction like comments and statistics.

  3. Extracting Comments and Performing Sentiment Analysis

    After obtaining video IDs, the next step is to fetch comments and perform sentiment analysis using the TextBloblibrary:

     pythonCopy codefrom textblob import TextBlob
    
     def get_sentiment(text):
         analysis = TextBlob(text)
         if analysis.sentiment.polarity > 0:
             return "positive"
         elif analysis.sentiment.polarity < 0:
             return "negative"
         else:
             return "neutral"
    
     def analyze_comments(video_id):
         comments_data = []
         comments_response = youtube.commentThreads().list(part="snippet", videoId=video_id, maxResults=20).execute()
         for item in comments_response["items"]:
             comment_text = item["snippet"]["topLevelComment"]["snippet"]["textDisplay"]
             sentiment = get_sentiment(comment_text)
             comments_data.append({"main_text": comment_text, "sentiment": sentiment})
         return comments_data
    

    Explanation:

    • The get_sentiment function uses TextBlob to analyze each comment's polarity. If the polarity is positive, the comment is classified as "positive"; if negative, it’s "negative"; otherwise, it’s "neutral."

    • The analyze_comments function retrieves comments for each video and classifies their sentiment using get_sentiment. Results are appended to a list, allowing further analysis.

  4. Aggregating S**entiment Results**

    After sentiment analysis, comments are grouped to determine the candidate with a higher positive score:

     pythonCopy codekamala_scores = {"positive": 0, "neutral": 0, "negative": 0}
     trump_scores = {"positive": 0, "neutral": 0, "negative": 0}
    
     def update_scores(scores, comments):
         for comment in comments:
             scores[comment["sentiment"]] += 1
    

    Explanation:

    • This code initializes sentiment score dictionaries for each candidate. The update_scores function takes the scores dictionary and a list of comments and increments each sentiment type (positive, neutral, negative) accordingly.
  5. Predicting the Winner Based on Sentiment

    Finally, the sentiment analysis results are used to predict the winner based on the higher aggregate positive score:

     pythonCopy codekamala_total = sum(kamala_scores.values())
     trump_total = sum(trump_scores.values())
    
     if kamala_total > trump_total:
         print("Based on sentiment analysis, Kamala Harris is projected to win.")
     else:
         print("Based on sentiment analysis, Donald Trump is projected to win.")
    

Final Takeaways

This project highlighted both the promise and limitations of sentiment analysis. It provided an opportunity to understand how machine learning can tap into social sentiment, but also revealed the challenges in creating reliable predictions with small datasets and simple methods. There’s plenty of room for improvement, and I encourage others to explore this code, refine it, and experiment with larger and more diverse datasets.

Explore and expand the code on GitHub