Loading and Exporting Data With Pandas

Loading and Exporting Data With Pandas #

One of the features which makes NetworkX really useful is the ability to import and export data from a variety of sources. The figures below show all the methods used for importing and exporting data with NetworkX and Pandas.

flowchart LR from_dict_of_dicts --> networkx from_dict_of_lists --> networkx from_edgelist --> networkx from_numpy_matrix --> networkx from_pandas_adjacency --> networkx from_pandas_edgelist --> networkx from_scipy_sparse_matrix --> networkx from_numpy_array --> networkx read_adjlist --> networkx read_multiline_adjlist --> networkx read_edgelist --> networkx read_weighted_edgelist --> networkx read_gpickle --> networkx read_pajek --> networkx read_leda --> networkx from_sparse6_bytes --> networkx read_sparse6 --> networkx from_graph6_bytes --> networkx read_graph6 --> networkx read_yaml --> networkx read_gml --> networkx read_graphml --> networkx read_gexf --> networkx read_shp --> networkx from_nested_tuple --> networkx from_prufer_sequence --> networkx networkx --> to_directed networkx --> to_undirected networkx --> to_networkx_graph networkx --> to_dict_of_dicts networkx --> to_dict_of_lists networkx --> to_edgelist networkx --> to_numpy_matrix networkx --> to_pandas_adjacency networkx --> to_pandas_edgelist networkx --> to_numpy_recarray networkx --> to_scipy_sparse_matrix networkx --> to_numpy_array networkx --> write_adjlist networkx --> write_multiline_adjlist networkx --> write_edgelist networkx --> write_weighted_edgelist networkx --> write_gpickle networkx --> write_pajek networkx --> to_sparse6_bytes networkx --> write_sparse6 networkx --> to_graph6_bytes networkx --> write_graph6 networkx --> write_yaml networkx --> write_gml networkx --> write_graphml networkx --> write_graphml_xml networkx --> write_graphml_lxml networkx --> write_gexf networkx --> write_shp networkx --> to_nested_tuple networkx --> to_prufer_sequence

This is especially useful as data can come in all different shapes and sizes which may not always be consistent. The purpose of this guide is to walk through some of the standard techniques for reading and writing graphs using NetworkX and Pandas.

To allow for more flexibility and control, NetworkX supports the ability to convert to and from Pandas data frames. When combined this allows for more options when reading and writing data. Using Pandas alone produces a total of 399 distinct combinations.

flowchart LR read_excel --> pandas read_csv --> pandas read_fwf --> pandas read_table --> pandas read_pickle --> pandas read_hdf --> pandas read_sql --> pandas read_sql_query --> pandas read_sql_table --> pandas read_clipboard --> pandas read_parquet --> pandas read_orc --> pandas read_feather --> pandas read_gbq --> pandas read_html --> pandas read_json --> pandas read_stata --> pandas read_sas --> pandas read_spss --> pandas pandas --> to_clipboard pandas --> to_excel pandas --> to_hdf pandas --> to_latex pandas --> to_parquet pandas --> to_records pandas --> to_string pandas --> to_csv pandas --> to_feather pandas --> to_html pandas --> to_markdown pandas --> to_period pandas --> to_sql pandas --> to_timestamp pandas --> to_dict pandas --> to_gbq pandas --> to_json pandas --> to_numpy pandas --> to_pickle pandas --> to_stata pandas --> to_xarray

To allow for more flexibility and control, NetworkX supports the ability to convert to and from Pandas data frames. When combined this allows for more options when reading and writing data. Using Pandas alone produces a total of 399 distinct combinations.

Network Representations #

Before we get into how to import/export data it’s worth going through some of the ways in which graphs can be represented in data. Networks (also known as graphs) are essentially a collection of nodes and edges representing two things – an entity (node) and a relationship (an edge). Below is an example of a directed graph.

Edge list #

An edge list is exactly what it says – it’s a list of edges. Simple. They usually come in the form of a table with two columns. One column for source, and one for the target. Depending on the type of graph it might feature multiple columns which contain attributes relating to an edge. This may include things like a timestamp. This is what an edge list look like using the example above:

1 2 {}
1 3 {}
2 1 {}
2 3 {}
2 5 {}
3 1 {}
3 4 {}
3 5 {}
4 1 {}
4 2 {}
5 2 {}
5 3 {}

Adjacency matrix #

An adjacency matrix is an n-by-n square matrix used to indicate the presence of an edge between nodes. For example, by reading the graph by row then column, a ‘1’ is used to indicate an edge between the corresponding row then column. This is what an adjacency matrix look like using the same example as before:

1 2 3 4 5
1 0 1 1 0 0
2 1 0 1 0 1
3 1 0 0 1 1
4 1 1 0 0 0
5 0 1 1 0 0

Using NetworkX #

As shown above, graphs can be represented in different ways and NetworkX has a whole range of methods for importing and exporting networks. To keep things simple we will go through some of the most widely used methods which are integrated into NetworkX.

Example 1: Reading and Writing Edge Lists #

By far the easiest and simplest approach is to store data in a simple text file. This can be achieved using the read_edgelist and write_edgelist functions within NetworkX. To save an edge list to file, the write_edgelist function takes a graph as input, and the path of the output file (example.edgelist).

G = nx.DiGraph()
G.add_edge('A', 'B')
G.add_edge('A', 'C')
G.add_edge('C', 'D')
G.add_edge('D', 'C')

nx.write_edgelist(G, 'example.edgelist', data=False)
Note This function also takes other parameters to control for things such as edge attributes. In our case, we set data=False as we don’t need to save the edge attributes as we don’t have any. You can also just things such as the delimiters too. By default, columns are separated by a space.

The output of this graph looks something like this…

A B
A C
C D
D C

Now that the data has been saved, we can read this in using the read_edgelist function. This is as simple as doing the following.

>>> H = nx.read_edgelist('example.edgelist', create_using=nx.DiGraph)
>>> H.edges()
OutEdgeView([('A', 'B'), ('A', 'C'), ('C', 'D'), ('D', 'C')])
Note When reading in a graph it’s important to ensure that you’ve got the right graph type defined (more on this in Graph Types). By default, NetworkX uses a simple undirected graph nx.Graph whereas in our case we explicitly mention that this is a directed graph by setting create_using=nx.DiGraph .

Example 2: GEXF #

In some cases when you’re exporting a graph, you’re doing so with the intention of analysing it with other software. For example, many use Gephi (a popular graph visualisation tool) to visualise their networks as this provides a whole suite of tools to allow them to create presentable graphs quickly and easily. NetworkX allows us to import / export graphs directly to a compatible file format for Gephi using the read_gexf / write_gexf functions. If you want to see what Gephi is capable of, check out a link below to a blog post of mine.

Example 3: JSON #

One of the more complex ways for exporting graphs is to use JSON as a way of serialising a network. This approach is typically used for those who wish to use graphs on the Web either through an API or an interactive visualisation package such as the D3.js. Again, exporting a JSON graph in NetworkX can be achieved using one of four ways. These are described as follows:

Format Write Read Notes
Node-link node_link_data node_link_graph A popular format using tools such D3.js
Adjacency adjacency_data adjacency_graph An adjacency matrix
Cytoscape cytoscape_data cytoscape_graph An open source bioinformatics software platform for visualising molecular interaction networks
Tree tree_data tree_graph Returns data in tree format

Using Pandas #

As mentioned previously (see Getting Started), Pandas provide multiple ways of import/export data. Pandas is primarily used to provide interactive data frames within a Python environment. These data frames are represented as tabular data. Why use Pandas? This is particularly ideal when working with edge lists when you want to do additional processing such as filtering and querying. To export a graph to a Pandas data frame, it’s as simple as using to_pandas_edgelist .

>>> G = nx.DiGraph()
>>> G.add_edge('A', 'B')
>>> G.add_edge('A', 'C')
>>> G.add_edge('C', 'D')
>>> G.add_edge('D', 'C')
>>> df = nx.to_pandas_edgelist(G)
>>> df
  source target
0      A      B
1      A      C
2      C      D
3      D      C

Now that we’ve got a Panda data frame, we can do additional processing such as filtering and querying our edge list. For example, if we wanted to examine edges where ‘A’ is the target…

>>> df[df['source'] == 'A']
  source target
0      A      B
1      A      C

By using Pandas, you can perform more complex operations but, for the purpose of this example, we will keep things simple. Let’s say we want to read this edge list back into a NetworkX graph, all we need to do is use from_pandas_edgelist.

Note As mentioned before, it’s important to make sure we get the graph type correct hence why we’re using create_using=nx.DiGraph .
>>> df_new = df[df['source'] == 'A']
>>> G = nx.from_pandas_edgelist(df_new, create_using=nx.DiGraph)
>>> G.edges()
OutEdgeView([('A', 'B'), ('A', 'C')])

As we can see, we now have a new graph which we modified using Pandas. It’s also worth pointing out that by using Pandas, we’ve also opened up our opportunities to export our graphs into many other formats too (see above).

Final Thoughts and Conclusions #

In this guide, we explored a few ways in which graphs can be imported and exported to different formats. We also covered some of the ways in which graphs can be represented using edge lists and adjacency matrices. This guide also provides a very basic overview of how to manipulate edge list data with the help of Pandas. By using this approach, there are many more operations we can perform as shown in the figures is above.

Prev: Getting Started Next: Simple Metrics

Created by James Ashford