Node Profiling and Centrality Measures #
In an earlier guide (see Simple Metrics), we covered some basic metrics to help describe the overall structure of a network. So far, we’ve mainly been looking at the bigger picture — focusing on the structure and metrics of the whole network. But now, we’re going to shift our perspective a bit.
In this guide, we’ll introduce the concept of centrality and how it helps us understand individual nodes within a network. We’ll explain what centrality is, why it’s useful, and how we can use it in NetworkX. For more details, check out this resource.
What exactly is centrality? #
Remember in the last guide, we talked about a few different ways to analyse a network? One of those was looking at in-degree and out-degree.
Degree is a simple example of centrality because it gives us a sense of how important a node is based on how many connections it has compared to others.
Why do we use centrality? #
We use centrality mainly to rank nodes based on their importance or influence within a network. Here are a few places where centrality comes in handy:
Social Networks #
In social networks (where nodes represent people), centrality can show how popular or influential a user is compared to others. This works both online and offline, with connections representing things like friendships or communication.
The Web / Internet #
The web (or “The World Wide Web” — if people still call it that!) is another kind of network. Here, websites are the nodes, and hyperlinks are the edges. Centrality can help rank websites based on how many external links they receive, using algorithms like the one we’ll mention later.
Transportation Networks #
This one might not seem obvious, but transportation networks are also great examples. In this case, nodes could be bridges, stations, or towns, and edges represent roads or train lines. Centrality can help measure how crucial a piece of infrastructure is based on how much traffic it gets.
How can it be used? #
There are loads of different ways to calculate centrality. In this guide, we’ll explore just a few, and as always, we’ll show you how to do it in NetworkX.
Degree #
>>> nx.degree_centrailty(G)
Degree centrality is probably the simplest measure out there. As we talked about before, it’s all about how many connections a node has — incoming and outgoing if it’s a directed graph.
The formula is pretty straightforward: it’s the ratio of a node’s degree to the maximum degree in the whole network. So, the more connections a node has, the higher it ranks.
While this gives a decent sense of importance, keep in mind that it only looks at a node’s immediate connections (its neighbours) and doesn’t factor in any links beyond that.
Eigenvector #
>>> nx.eigenvector_centrality(G)
Eigenvector centrality can be a bit tricky to explain, but the gist of it is that a node’s value depends on how central the nodes it’s connected to are. This is all calculated using the adjacency matrix (if you’re unsure what that is, check out Reading and Exporting Graphs) where eigenvalues are used in the process.
Eigenvector centrality is great for directed networks where nodes point to other nodes. Here’s how it works: if a bunch of important nodes (ones with high centrality values) are all pointing to the same node, that node will also get a high centrality score. It’s kind of like a ripple effect, where one node influences the others it’s connected to.
Unlike degree centrality, which is more straightforward, this method takes the entire network into account when ranking nodes. The downside? It can be slower to compute, especially if you’re working with a really big network.
Betweenness and Closeness #
Betweenness and closeness are two well-known methods for figuring out how central a node is using different techniques.
Betweenness #
Betweenness centrality looks at how often a node shows up in the shortest paths connecting other nodes in the network. Basically, it tells you how important a node is based on how often it helps link other nodes together. For instance, in a communication network, nodes with high betweenness centrality are key because they carry more info between users – kind of like a hub or exchange point.
Here are a few ways you can calculate it in NetworkX:
nx.betweenness_centrality(G)
: the standard way to calculate betweenness centrality.nx.communicability_betweenness_centrality(G)
: uses triads (groups of 3 nodes) to simulate walks connecting different nodes.
Closeness #
As the name suggests, closeness is all about how near a node is to other nodes in the network. Closeness centrality is based on the total distance between a node and all the other nodes, using the shortest possible paths. If a node has high closeness centrality, it means it’s connected to a lot of nodes, making it more central in the network.
Here are a couple of ways to calculate it:
nx.closeness_centrality(G)
: This is the basic method to compute closeness centrality.nx.current_flow_closeness_centrality(G)
: Also known as information centrality, this is another variation.
Web-based Algorithms #
nx.pagerank(G)
: PageRank is based upon a variation of eigenvector centrality and is widely used on the internet. A little bit of history, PageRank was introduced by the founders of Google as a way of ranking web pages indexed on their servers. It was designed to measure the value of the website based upon how many other websites are linked to it as well as factoring in the probability of a user clicking on it. Check out the Wikipedia article for more.nx.hits(G)
: The HITS (Hyperlink-Induced Topic Search) algorithm is another technique used to rank webpages based on connected hyperlinks in a hub formation. The algorithm calculates two metrics to determine the value of a node: authority and hub. The authority is an estimation of the value of the sites’ content and hub, the estimated quality of its outgoing hyperlinks.
Others Implementations #
The methods mentioned above are pretty popular, but there are a few other lesser-known techniques that are worth a mention. Without getting too technical, here are a few of them:
nx.load_centrality(G)
: A variation of betweenness centrality that factors in weighted edges.nx.subgraph_centrality(G)
: This one looks at subgraphs (smaller networks within the main network) to find how many cycles a node is part of.nx.harmonic_centrality(G)
: Similar to betweenness centrality but uses a different way to calculate shortest paths.
Final Thoughts #
There are loads of ways to use centrality to get insights into nodes in a network. In this guide, we’ve covered some of the most popular methods, along with a few others that are just as useful. We’ve also touched on how centrality can be used to profile individual nodes in a complex network.
Prev: Graph Types Next: Community Detection