Twitter Project

Twitter Network Graph

The Twitter Network Graph lets you compare the Twitter connections of two users with an interactive visualization.

Try It Out

Create a Twitter Network Visualization
Add a Twitter user to the data collection queue (deprecated)

Note: Twitter recently removed their free API, rendering most of this project useless. You can still make graphs from the users collected before February 2023.

How It Works

A Python script runs constantly and visits ndwalker.com/twitter/pull. In response, Django queries MariaDB to see if a User is currently being requested -- Twitter only allows 1,000 User objects to be returned every minute, so it's necessary to keep track of which User is being requested, what the pagination token is, and if we're requesting followers or followees. If no User is currently being requested, the next User is taken from the Request Queue. If the Request Queue is empty, a random User ID is selected.

Once the ID is selected, we connect to the Twitter API, sending our auth token, the ID we want information about, and whether we want the User's followers or followees. Assuming the ID is valid and not set to "private," Twitter sends back a JSON payload of 1,000 or fewer Users' data. Back in Django, we check each returned User to see if it's in MariaDB. If it isn't, we add the User information (ID, handle, display name, location, join date, tweet count). After all the new Users have been added, we add IDs to the Follower table, which is just two columns: Follower and Followee. If a relationship already exists, MariaDB just ignores it.

Finally, we return a 200 "OK" message to the Python script. If the Python script receives too many non-200 responses, something is clearly wrong, so it logs the error and quits.

Twitter User Request Queue

Twitter is pretty big (currently ~300 million users) and its rate limiting doesn't allow on-demand querying. If a user wants to see Twitter users that don't exist in the database, the user can add the usernames to the Request Queue and the user will be fetched in the order submitted.
Automatic Database Growth

Every minute, information for a Twitter user from the Request Queue is retrieved from Twitter. If no users exist in the queue, a user is selected at random.
Network Visualization
The user enters two Twitter usernames. The server connects to a database of Twitter users and returns user and connection data to create a network graph of users. The layout of the network shows not just the follower and followee relationships, but also how they compare between the two Twitter usernames.

Visualizations are made with NetworkX and Plotly. Plotting every user would take much longer, look much more cluttered, and in many cases would crash your browser. So what you see is no more than about 1,000 users based on their distribution across the graph. Because of this, if you search users @kf and @nathandwalker you'll only see about 30 users in the "Only nathandwalker" area, but if you search for just @nathandwalker, you'll see considerably more users.

Users are chosen based on their ranked score by area. There are nine areas in the graph: Follows, Followed By, and Mutual each for User A, User B, and Both Users. Users are placed into one of the nine areas and then each area creates an internal ranked list. User scores are based on a combination of tweet count and whether or not the account is: protected, verified, has the default profile picture, and whether or not their follower/followee data has been fetched from Twitter. For each area in the graph, their proportion of all users determines how many of the 1,000 spots they get to take up. That is, if User A's Follows contains 3,000 of 10,000 total users, they represent 30% of the total, so they get 30% of 1,000: 300 spots. The top ranked 300 users would be returned and everyone else would be discarded. The exception is that if an area represents less than 2% of the total, 20 users are returned (assuming they exist). Note: if a requested user has more than 40,000 connections, an arbitrary 40,000 are chosen and ranked -- I don't have the compute power to quickly work with more data than that.

After the user data is returned from the database, a NetworkX graph is created. Each returned user's data is turned into a node (circle) and their relationship with User A and User B is used to create a colored edge (line). Each node is plotted at a random x, y coordinate inside the image based on the bounding box of the area the user belongs to. Each node has its hover-text added, then all nodes and edges get added to the image. Finally, the now-completed NetworkX graph is given to Plotly, which creates a div (HTML) that gets passed back to the browser and gets displayed.