Class Projects - Fall 2014

persoNews

Twitter-based news aggregator that customizes a user’s “newsfeed” to provide articles most relevant to them

This is "persoNews-screencast" by Northwestern U. Knight Lab on Vimeo, the home for high quality videos and the people who love them.
What it does

persoNews is a Twitter-based news aggregator that customizes a user’s “newsfeed” to provide articles most relevant to them. There are many other apps in existence that also customize news for their readers, but require more effort than persoNews. Other apps may ask you to go through tens, or even hundreds, of topics to select which ones you are interested in before they can provide your news, and if your interests change you would have to deselect and choose new topics. With the amount of time and thought this takes, you might as well just scroll through a news site and find your own articles. This application, built from scratch in 10 weeks, provides users with relevant articles simply based on their tweeting history.


How it works

The first step in the process is pulling the RSS feeds (we used The Huffington Post). The URLs we collect get run through Twitter’s Rest API and a search for users who have tweeted those URLs is performed. We take 10 users who have tweeted that URL and again, using the Twitter API, pull each of their last 100 tweets. The bodies of all their tweets get stored in a .json file within ElasticSearch – a text-based search engine and document matcher – and each file is indexed by the URL.

When a user comes to persoNews to receive their customized news, there is virtually no work on their part. All the user has to do is provide us with their Twitter handle, sit back, relax, and wait for their results. When they enter their handle on the website, we use the Twitter API to find them and pull their last 100 tweets. Their tweets are put into an ElasticSearch document as well. ElasticSearch then compares their document to all the documents already in the database that are tagged with the URLs, and the closest matches are returned, ranked by similarity. The top eight matching URLs are then returned to the user as their personalized news. When presented to the user, persoNews provides the title of the article, a corresponding photo, the first few sentences of the article, the source it came from, and a link to the article on its host site.

Much of the back-end was built using Python and we used Flask to help connect the different pieces. We used the Heroku app to host our entire project, although getting ElasticSearch to run on the Heroku app was the biggest challenge as we wrapped up our project.

Key Technologies:

  • Twitter Rest API
  • ElasticSearch
  • Flask

Next Steps

We have a few ideas for what could be added on or accomplished in the future. One addition would be to provide news suggestions from more than one source. Currently we are just returning results from the Huffington Post, but adding new sources would be quite simple as it just requires drawing from more RSS feeds.

The user’s results could also be organized by category, telling the user that “this is the political news we recommend to you,” or “this is what sports news we think you might be interested in.”

It also might be interesting to see what results you would get if, instead of gathering the user’s tweets, we gathered the tweets they have favorited and based their news off of those. It might provide a different perspective on their interests since people are sometimes cautious about what they tweet, but have no problem favoriting tweets that they like or agree with.

Connect

Student Team:

Caeiro,Shawn Marcus

Kulthum, Sharifah

Rubin,Dara Michelle

Sohoni, Tejaswinee

Faculty Guidance:

Larry Birnbaum

Rich Gordon