Election Buzz analyzes candidate Twitter profiles to reveal interesting information on both the candidate and their followers on key political issues.
What it does
Tweets about the presidential elections are overwhelming and sometimes don’t contain very much information. Election Buzz aims to help users uncover interesting insights. Which celebrities, for example, endorse unexpected candidates? What do a candidate's followers talk about? Does Donald Trump really talk about more about immigration than other topics?
Election Buzz brings users this insight by digging through the tweets of candidates and their followers to see which candidates talk the most about certain topics, such as education, security, and immigration. It then compares one candidate to others.
In addition, a searchable graph allows users to search for topics of their own choice to see how much the candidates’ followers discuss that topic.
Finally, candidate profile pages lets users see who a candidate’s top twitter followers are and which specific terms are most used by followers collectively.
How it works
There are 4 key features in the app, which all make use of a centralized NodeJS server.
The Twitter Streaming API allows Election Buzz to find tweets supporting a certain candidate. By way of example, Donald Trump supporters are presumed to use hashtags like #TrumpTrain or #Trump2016; Hillary Clinton supporters likely use #Hillary2016 and #ImWithHer.
The scrapper saves tweet information and user account information and tags the user as a supporter of a specific candidate.
After a large dataset of tweets and users has been created, subsets of 1,000 users who support each candidate are selected. The system seeks to find typical individuals and so filters out verified accounts, accounts with more than 10,000 followers, accounts with less than 100 followers, accounts with less than 100 tweets, and accounts that were created in the past year or that contain a candidate’s name in the username (e.g: NursesForBernie). The subsets of 1,000 are necessary to get around API limitations.
A Python script follows these candidates constantly via the Twitter Streaming API. An additional Python script was created to use the Twitter User Timeline API to obtain as many previous tweets from each of the 3,000 selected user accounts as possible. (Twitter currently limits this number to 3000 and 15 requests in 15 minutes, so this takes a few days).
All tweets are parsed and logged into a MySQL database.
NodeJS API (Election Buzz API)
Almost all of the requests on the client page are executed through a private API, which runs on a NodeJS server, executes queries into the MySQL database (and if possible, avoids executing these queries by looking at a cache version of previous requests in Firebase), and provides data in JSON format to the client side.
Popular followers — In order to show the top Twitter followers for each candidate, we request the Election Buzz API for top followers for a given candidate. The top 20 followers are returned in JSON format, and the client retrieves the Twitter profile pictures from the URL in the user’s information.
Topic Charts — In order to show the charts showing number of tweets for different candidates and their followers, we make use of our API again and of a Python script to categorize the data. We previously compiled a list of keywords related to a specific topic, and stored this in JSON format.
A Python script loads this JSON format. It then requests tweet text from the API for either candidates or followers. The candidate tweets are returned in full, whereas only 20,000 of the follower tweets are returned (the alternative would be to use as many as possible, but that number would be in the millions and thus slow down the performance, without showing a large increase in accuracy).
The script then parses through the text of each tweet and for each candidate, and generates a JSON object that has the following information: topic, candidate, the percent of tweets by that candidate that talked about a given topic, and the percent of tweets out of those 20,000 tweets that talked about that topic. The JSON file is loaded at the client site and presented in a graph format using D3. We chose to use different colors representing different political party affiliations, and to maintain the same x-axis, as to provide an easier way of comparing the two sides of the graph. Note that this data is cumulative and does not update in real-time.
Word Cloud Timelines — Word clouds are created using a python package: word-cloud downloaded with pip. In order to get the actual data (tweets), we use our API to request tweets from supporters of a given candidate, over a specific period of time (1 day). We then parse the tweets to normalize the text (uppercase vs lowercase, remove links, remove stop-words) and feed the data into the word cloud generator.
We played around with an additional list of words to remove, including candidate-specific words (candidate names, slogan words, state names, current president) since we wanted to be able to see words that were not strictly related to politics. The word-cloud generator is highly customizable. These resulting word clouds (by candidate) are stored in the server, and a link is generated to add to a TimelineJS spreadsheet that generates the timeline. We then embed the timeline into the website via an iframe.
Keyword graph generator — For the last element in the site, we provide a graph that shows popularity of a specific keyword over time, split by candidate followers. We execute a request to our API and display the data using d3.js. The request in the API uses Full-Text Search to find all tweets including the requested word or phrase, and groups the counts by candidate and by week, and returns this data in a JSON object.
Since the query requires a full-text search over the entire tweet database (about 14 million tweets currently), requests can take a few seconds, with some specific requests taking about a minute.
To speed up the process and avoid a large delay, a Firebase database was created, where the server checks before it executes the MySQL query, and if it is present, it returns the JSON object stored in the Firebase account. If that keyword is not in Firebase, then the database query is executed and the result is cached in Firebase. This implementation reduces the wait time to a few seconds. This cache could potentially be stored in the same machine as the server, increasing performance even more.
When we started this project we had the idea to make something that, “gave insight into the election based on information from Twitter.” This was a much broader task than we initially realized as there are very many tools we could have built to achieve this goal.
Ultimately we were successful, yet one of our biggest challenges was taking our initial goal and refining it to make it into practical applications. In the end, we were able to create a tool with three main sections: a graph organized by pre-chosen topics, candidate profiles, and a searchable database showing how much the candidates’ followers discuss terms of your choice.
If we had more time, we would have given Election Buzz embed properties so that users can embed graphs from our site on their social media accounts. We also would want to make Election Buzz easy to convert to other types of elections, so we could create “Election Buzz: Chicago,” for the next Chicago mayoral election or “Election Buzz: Brazil” for their next election, for instance. We also would have added some more context into the insights so we could explain to users why some candidates were mentioning certain terms more often than others or why candidates' followers were talking about some unexpected topics.
If we had more resources, we would have created a bigger database of our sample of followers to get a more accurate understanding of what the followers are talking about
Ligia Aguilhar — Medill (2016)
Michael Gofron — McCormick (2016)
Lori Janjigian — Medill (2016)
Bruno Peynetti — McCormick (2016)