Twitter Follow Graph

Rahul Maheshwari
4 min readApr 14, 2021

--

Image: Unsplash

Table of Contents

1. Introduction

2. Dataset Information

3. Pre-Processing

4. Basic Properties analyzed

5. Advanced Analysis

6. Conclusion

7. Contribution

Introduction

Twitter is one of the most popular social media platforms and has a huge amount of active users. Twitter currently has 353 million monthly active users. Up from 54 million in 2010 when twitter kicked off and is expected to grow even more in the coming years. Twitter is most popular among the youngsters who tweet very frequently and thus generates a huge amount of data that can be analyzed to derive useful information.

So we decided to analyze how the Twitter network behaves in terms of properties defined for a real-world network under the field of Network Science such as degree distribution, average path length, average clustering coefficient, etc., and to investigate if Twitter is a social network or an information network.

Along with these basic properties for a network, we have analyzed some of the advanced properties such as Community detection and Page Rank for the nodes of the real-world network. These gave insights on how communities are formed within a network and who among the community are the most popular few.

We went ahead and compared the Twitter network dataset results with other popular social media networks as well. These include Facebook, LinkedIn, Flickr, EU-email, and Twitch.

Dataset Information

We have generated a Twitter-ego network from the Twitter dataset to compare its behavior with the original dataset.

We compare the results of Twitter with other popular social media platforms such as Facebook, Linkedin, Flickr, and Twitch.

  1. Twitter Follower Network
  2. EU Email Network
  3. LinkedIn, Flickr
  4. Twitch
  5. Facebook (unlabeled)
  6. Facebook (labeled)

The Twitter dataset is huge in size (24 GB) hence it was sampled for better analysis. The LinkedIn dataset was also subsampled for analysis.

Twitter dataset information

Pre-Processing

  • We have sampled sufficient data points from huge datasets such as Twitter, Linkedin in order to compute network-related properties.
  • We have mapped and extracted user information using Twitter, Facebook API to represent communities and Page Rank on the GUI (Graphical User Interface).

Basic Properties analyzed

Average Path length, Average Clustering Coefficient, Degree Assortativity, Diameter, Degree Distribution, Cumulative Degree Distribution, In-Degree, and Out-Degree Distribution Plots.

Network Properties
Directed Network (Twitter) Degree Distribution
Undirected Degree Distribution

Advanced Analysis

Community Detection Analysis and PageRank algorithm were applied on Twitter and Facebook dataset.

The following community detection algorithms were applied-
1. Leiden
2. Surprise communities
3. Walktrap

PageRank algorithm was applied and a few of the top results were displayed along with followers and images of the corresponding nodes.

Conclusion

  • Twitter-Ego in contrary with Twitter network show higher clustering
    coefficient indicating the presence of tightly knit communities between
    hub nodes of the Twitter follow graph network.
  • Social networks such as Facebook, Email show high clustering
    coefficient.
  • The path of networks is small indicating that information flows very fast
    on all these networks.
  • Twitch network is disassortative due to the fact that the most popular
    gamers take part in many showdowns for the games they stream and end
    up following other not so popular streamers who are part of the
    showdowns.
  • Facebook and Flickr network predominantly show assortative behavior
    since people are likely to follow their family and colleagues who are
    likely to have friends of the same order as them.

From all the plots we inferred that all follow power-law distribution indicating the presence of scale-free property which is observed in a real-world network.

Based on the results of the sampled dataset, twitter looks more like an informational network than a social network. Hence, We can conclude that the Twitter network is an informational network.

Contributions

Rahul Maheshwari (MT19027) — Fetching user information from Twitter, Facebook for advanced properties, GUI development, and blog.

P. Akshay Kumar (MT19094) — Preprocessing, basic graph properties computation such as degree distribution, average path length, clustering coefficient, etc.

Piyush Pradeep Jain (MT19122) — Preprocessing, community detection, and page rank analysis, generation of plots, graphs using Gephi.

Thanks a lot for reading our blog! Show appreciation by smashing the clap button.

Project code and all the related data will be uploaded soon on GitHub.

You can check some of my other projects as well :).

--

--