Fake News Tracker @LauzHack
Fake News Tracker project proposal.
Intro
Here, we will post additional resources to help you with the technical part of the project. Below, you will find information related to data collection, exploration, and visualization for Twitter, Youtube, and Reddit.
Have fun!
You could analyze user tweets, retweet and followers networks, perform sentiment analysis using Twitter data.
Data collection
To access the Twitter API you need a developer account. When you register you have to fill a form and describe why you want to use the API, before getting approval. Make sure you tick either Research or Student project option. It should help to get approval faster.
There are several Python modules for collecting tweets from the API. I use Twython.
There is a function that allows collecting 200 latest tweets of a user. From these tweets, you can obtain the number of likes and retweets and find the most popular, with a high chance they are controversial. You could get the retweets and mentions of the users and explore the account retweeted/mentioned and collect their tweets.
Visualization
Gephi is a good tool for network visualization and exploration. Below, you will find some related network visualization tutorials:
Youtube
You could analyze comments, comment networks, Youtube recommendation bubbles, and featured channels networks.
Data collection
The cool thing about Youtube is that you don’t necessarily have to deal with API-related pains. To collect the data, we recommend using YouTube Data Tools provided by Digital Methods Initiative. The code is also available on GitHub.
This service allows you collecting different kinds of networks on Youtube such as comments, featured channels, video recommendations, and others. You can find an overview of available features in this video:
Visualization
YouTube Data Tools produce files that can also be visualized in Gephi. You can use the same visualization tools as for Twitter:
Reddit:
There are 3 main subreddit about coronavirus:
- r/Coronavirus, more to exchange info about the virus (the moderators are not really checking sources)
- r/COVID19, is for people interested in scientific research on the virus. Posts often links to scientific articles.
- r/China_Flu, whose purpose seems to be the exchange of rumors and apocalyptical photos (empty shops, crowded hospitals) more than real info on the outbreak.
Data collection
To access Reddit there are 2 python modules PSAW and PRAW. PRAW allows to get more data using the ‘pushshift’ API but you need to register with a developer account, explained here.
Fake News Tracker @LauzHack
Fake News Tracker project proposal.
Intro
Here, we will post additional resources to help you with the technical part of the project. Below, you will find information related to data collection, exploration, and visualization for Twitter, Youtube, and Reddit.
Have fun!
Twitter
You could analyze user tweets, retweet and followers networks, perform sentiment analysis using Twitter data.
Data collection
To access the Twitter API you need a developer account. When you register you have to fill a form and describe why you want to use the API, before getting approval. Make sure you tick either Research or Student project option. It should help to get approval faster.
There are several Python modules for collecting tweets from the API. I use Twython.
There is a function that allows collecting 200 latest tweets of a user. From these tweets, you can obtain the number of likes and retweets and find the most popular, with a high chance they are controversial. You could get the retweets and mentions of the users and explore the account retweeted/mentioned and collect their tweets.
Visualization
Gephi is a good tool for network visualization and exploration. Below, you will find some related network visualization tutorials:
Youtube
You could analyze comments, comment networks, Youtube recommendation bubbles, and featured channels networks.
Data collection
The cool thing about Youtube is that you don’t necessarily have to deal with API-related pains. To collect the data, we recommend using YouTube Data Tools provided by Digital Methods Initiative. The code is also available on GitHub.
This service allows you collecting different kinds of networks on Youtube such as comments, featured channels, video recommendations, and others. You can find an overview of available features in this video:
Visualization
YouTube Data Tools produce files that can also be visualized in Gephi. You can use the same visualization tools as for Twitter:
Reddit:
There are 3 main subreddit about coronavirus:
Data collection
To access Reddit there are 2 python modules PSAW and PRAW. PRAW allows to get more data using the ‘pushshift’ API but you need to register with a developer account, explained here.