Fake News Tracker @LauzHack

Fake News Tracker project proposal.

Intro

Here, we will post additional resources to help you with the technical part of the project. Below, you will find information related to data collection, exploration, and visualization for Twitter, Youtube, and Reddit.

Have fun!

Twitter

You could analyze user tweets, retweet and followers networks, perform sentiment analysis using Twitter data.

Data collection

To access the Twitter API you need a developer account. When you register you have to fill a form and describe why you want to use the API, before getting approval. Make sure you tick either Research or Student project option. It should help to get approval faster.

There are several Python modules for collecting tweets from the API. I use Twython.
There is a function that allows collecting 200 latest tweets of a user. From these tweets, you can obtain the number of likes and retweets and find the most popular, with a high chance they are controversial. You could get the retweets and mentions of the users and explore the account retweeted/mentioned and collect their tweets.

Visualization

Gephi is a good tool for network visualization and exploration. Below, you will find some related network visualization tutorials:

Youtube

You could analyze comments, comment networks, Youtube recommendation bubbles, and featured channels networks.

Data collection

The cool thing about Youtube is that you don’t necessarily have to deal with API-related pains. To collect the data, we recommend using YouTube Data Tools provided by Digital Methods Initiative. The code is also available on GitHub.

This service allows you collecting different kinds of networks on Youtube such as comments, featured channels, video recommendations, and others. You can find an overview of available features in this video:

YouTube Data Tools

Visualization

YouTube Data Tools produce files that can also be visualized in Gephi. You can use the same visualization tools as for Twitter:

Reddit:

There are 3 main subreddit about coronavirus:

Data collection

To access Reddit there are 2 python modules PSAW and PRAW. PRAW allows to get more data using the ‘pushshift’ API but you need to register with a developer account, explained here.