Is It Just Online Chatter, or Did We Really Predict the President with YouTube?
Casual online comments, often seen as mere “chatter,” may have real predictive power. This blog poses the question of whether sentiment from YouTube comments alone could genuinely forecast something as significant as a presidential election outcome.
Disclaimer: This post was part of an educational project. While my prediction turned out to be accurate, this approach has limitations. There may be more efficient methods to analyze public sentiment on election outcomes, and improvements to the code are welcome.
When our machine learning professor assigned us the task of predicting the 2024 US presidential election, I was both excited and skeptical. Using sentiment analysis on YouTube comments to gauge public opinion seemed ambitious, especially given the limited dataset. With only ten videos per candidate, the sample size was relatively small, making it challenging to capture the full spectrum of public sentiment. Still, the experience provided valuable insights into the potential of machine learning for social sentiment analysis.
Collecting Data
The task began with a simple approach: search YouTube for ten videos each for Kamala Harris and Donald Trump. My searches included keywords like “Kamala Harris 2024” and “Donald Trump 2024.” I then used the YouTube API to retrieve these videos, gathering information such as video titles, channels, and video IDs to extract relevant comments.
Analyzing Sentiment
With the TextBlob
library, I developed a function to classify each comment as positive, neutral, or negative based on polarity scores. The process was straightforward but had some limitations due to the simplicity of the sentiment scoring. Here’s how each score was interpreted:
Positive Score: Comments expressing support or enthusiasm for the candidate.
Negative Score: Comments with critical or unfavorable views.
Neutral Score: Comments that appeared impartial or unrelated to either candidate.
A key limitation was that several videos had comments disabled, reducing the available data and potentially affecting the representativeness of the sample. Additionally, sentiment analysis can oversimplify complex political expressions, potentially misinterpreting sarcasm or nuanced opinions.
Aggregating and Predicting
After classifying each comment, I aggregated the positive, neutral, and negative scores for each candidate. Despite the constraints of a small dataset, Donald Trump emerged with a slightly higher positive sentiment score, leading to a prediction of his victory.
Election Outcome: Prediction Success!
The election announced declaring Donald Trump elected as the next President of the United States confirmed the prediction, which was exciting to see. However, this success doesn’t necessarily validate the model as the most reliable predictor. Given the limited data and the constraints of basic sentiment analysis, this approach might not generalize well. More sophisticated techniques, such as deep learning or a broader dataset across multiple social media platforms, might yield more consistent results.
Want to dive in ?
Overview: This project was developed in Google Colab and uses the YouTube Data API, accessed with an API key obtained from the Google Developers Console. Remember to generate one of your own and keep it private.
1. Preparing the Code in Google Colab
Since the project was done in Google Colab, I used Google’s YouTube Data API to fetch comments and analyze sentiment for each candidate.
Key Code Snippets and Explanations
API Key Setup
In the Google Colab notebook, the API key was set up as follows:
pythonCopy codefrom googleapiclient.discovery import build api_key = "YOUR_API_KEY_HERE" youtube = build("youtube", "v3", developerKey=api_key)
Explan**ation:**
- This code imports the
build
function fromgoogleapiclient
, which allows access to YouTube’s Data API. Theapi_key
variable should contain your API key, which grants access to the API.
- This code imports the
Searching for Videos by Candidate Name
Here’s a snippet of the search functionality:
pythonCopy codesearch_response = youtube.search().list(part="snippet", q="Kamala Harris 2024", type="video", maxResults=10).execute() video_ids = [item['id']['videoId'] for item in search_response.get('items', [])]
Explanation:
The
youtube.search
().list()
function creates a search query to find videos about "Kamala Harris 2024" or "Donald Trump 2024" and retrieves up to 10 results.The
video_ids
list captures the unique ID for each video, allowing further data extraction like comments and statistics.
Extracting Comments and Performing Sentiment Analysis
After obtaining video IDs, the next step is to fetch comments and perform sentiment analysis using the
TextBlob
library:pythonCopy codefrom textblob import TextBlob def get_sentiment(text): analysis = TextBlob(text) if analysis.sentiment.polarity > 0: return "positive" elif analysis.sentiment.polarity < 0: return "negative" else: return "neutral" def analyze_comments(video_id): comments_data = [] comments_response = youtube.commentThreads().list(part="snippet", videoId=video_id, maxResults=20).execute() for item in comments_response["items"]: comment_text = item["snippet"]["topLevelComment"]["snippet"]["textDisplay"] sentiment = get_sentiment(comment_text) comments_data.append({"main_text": comment_text, "sentiment": sentiment}) return comments_data
Explanation:
The
get_sentiment
function usesTextBlob
to analyze each comment's polarity. If the polarity is positive, the comment is classified as "positive"; if negative, it’s "negative"; otherwise, it’s "neutral."The
analyze_comments
function retrieves comments for each video and classifies their sentiment usingget_sentiment
. Results are appended to a list, allowing further analysis.
Aggregating S**entiment Results**
After sentiment analysis, comments are grouped to determine the candidate with a higher positive score:
pythonCopy codekamala_scores = {"positive": 0, "neutral": 0, "negative": 0} trump_scores = {"positive": 0, "neutral": 0, "negative": 0} def update_scores(scores, comments): for comment in comments: scores[comment["sentiment"]] += 1
Explanation:
- This code initializes sentiment score dictionaries for each candidate. The
up
date_scores
function takes the scores dictionary and a list of comments and increments each sentiment type (positive, neutral, negative) accordingly.
- This code initializes sentiment score dictionaries for each candidate. The
Predicting the Winner Based on Sentiment
Finally, the sentiment analysis results are used to predict the winner based on the higher aggregate positive score:
pythonCopy codekamala_total = sum(kamala_scores.values()) trump_total = sum(trump_scores.values()) if kamala_total > trump_total: print("Based on sentiment analysis, Kamala Harris is projected to win.") else: print("Based on sentiment analysis, Donald Trump is projected to win.")
Final Takeaways
This project highlighted both the promise and limitations of sentiment analysis. It provided an opportunity to understand how machine learning can tap into social sentiment, but also revealed the challenges in creating reliable predictions with small datasets and simple methods. There’s plenty of room for improvement, and I encourage others to explore this code, refine it, and experiment with larger and more diverse datasets.