Detecting Communities in Social Networks #
Social networks are well-known for having very community-centric structures. It makes sense when you think about it—like-minded people often group together, forming their own little clusters. This can create some pretty interesting network patterns.
There are plenty of ways to detect these communities, and in this guide, we’re going to introduce a few algorithms that do the job effectively. But before diving in, let’s get a general idea of what these algorithms are actually doing.
How does community detection work? #
Simply put, community detection algorithms group nodes into non-overlapping clusters. To break it down a bit more, let’s use an example. Below is a simple undirected graph with seven nodes.
Just by looking at this graph, you can probably spot two clear communities. Nodes A, B, and C form one group, while nodes E, F, G, and H make up the other. It was pretty easy for us to see these clusters visually, but our method for choosing which nodes belong to which group might vary. Luckily, there are some strategies we can use to make this process easier. Here are a few ideas:
- Remove weak ties: The edge between C and G acts as a natural break point to separate the two groups.
- Preserve strong ties: Nodes A, B, C, and nodes E, F, G, H are tightly connected, suggesting strong ties within each group.
- Distance: How far apart two nodes are might tell us whether they belong in the same group or not.
Algorithms #
Now that we’ve looked at some ways community detection can be done, let’s check out a few popular algorithms, specifically those used by NetworkX. Here are the main ones.
Girvan-Newman #
The Girvan-Newman algorithm is one of the most popular and effective community detection methods out there. It works by removing edges that connect nodes with high betweenness centrality. We covered this concept a bit in the (see Node Profiling).
Using this approach in Python is a bit more involved than some others, but it’s highly effective when done right.
>>> comp = nx.girvan_newman(G)
Instead of specifying how many communities you want, NetworkX just removes edges one by one in each step. As it does, the number of communities automatically grows as you go along.
If you’re wondering what happens during the first five steps, you can check it out with this code:
>>> import itertools
>>> comp = nx.girvan_newman(G)
>>> k = 5
>>> for communities in itertools.islice(comp, k):
... print(tuple(sorted(c) for c in communities))
More examples can be found here.
K-clique #
The K-clique method is a way to find tightly connected groups of nodes in large networks. A clique is basically a subgraph where every node is connected to every other node in that group. You can think of cliques as groups where everyone knows everyone. Above, you can see examples of cliques between nodes A, B and C and with nodes E, F, G and H.
To use k-clique in networkx, all you need to do is specify the size of the clique k.
>>> cliques = nx.k_clique_communities(G, k)
Unlike other community detection methods, this one only finds a community when a group of nodes forms a fully connected clique. In other words, the nodes need to be completely connected before they’re considered part of the same community.
Tree Partitioning #
Tree partitioning works by grouping nodes into clusters based on a mix of node and edge weights. The catch is that this approach only works if the network is set up like a tree — meaning it has a hierarchical structure. One of the most common algorithms for this is Lukes partitioning.
When using networkx, the algorithm needs a cut-off point, max_size
, which means the partition can’t go over a certain size.
>>> tree = nx.lukes_partitioning(G, max_size)
Modularity #
You’ve probably seen Modularity pop up a few times on this blog. It’s one of the simplest and quickest ways to spot communities in networks. While it’s not the most precise method out there, it’s super fast and gives you a good rough idea of whether your network can be divided into communities.
To use it with networkx, you’ll use the label_propagation_communities
function. This function helps to label the communities by estimating which node belongs to which community. After that, you can use Modularity to check how good the result is. Here’s how you put it all together…
>>> import networkx.algorithms.community as nx_comm
>>> comms = nx_comm.label_propagation_communities(G)
>>> nx_comm.modularity(G, comms)
Final thoughts #
So, we’ve taken a quick look at how to spot communities within social network structures. The algorithms we covered here are quite different from each other and serve various purposes. But at the end of the day, they all aim to do the same thing: find those communities.
Prev: Node Profiling Next: Core Analysis