- Charting: to produce the most common DataViz plots;
- Graphs: representing and analyzing graph-based data;
- Maps: for representing data that has geo-features and
- 3D: creating three-dimensional charts and cool animations.
A graph is a special data structure composed of nodes and edges between them. Each graph node is a unique entity in relation to other nodes. This relation is represented with an edge. In mathematics and computer sciences, there is a full graph theory dealing with and analyzing the properties of the graphs. In the figure below, we see one graph example.
The graphs have a specific structure consisting of nodes and edges between them which requires special structuring of the data used to model and represent them. For this reason, there exist several conventions to store the graph-based data ranging from very simple formats to very complex that can store arbitrary data. These data formats include:
- CSV: the simplest file format storing the information in a delimited text.
- DOT: simple and easily understandable format supported by GraphViz, an open-source tool for graph visualization. More useful information can be found on Wikipedia.
- GDF: a database-like file system that can be easily converted from CSV. It is officially used by GUESS, an exploratory data analysis and visualization tool for graphs and networks.
- GML: the Graph Modelling Language is a hierarchical file format with key-value lists. Its aim is to support flexibility and portability. It is supported by many tools. More information can be found on Wikipedia.
- GraphML: is an XML based file format. This gives the format a lot of expressiveness power and we can define and represent many different graph structures. Many tools have adopted this file format. The full documentation, as well as specification, can be found on the official GraphML website.
- GEXF: abbreviated from Graph Exchange XML format, as its name implies is also an XML format. It was designed by Gephi, an open-source graph visualization platform, with a goal to describe complex and exchangeable network structures. To find out more check the Gephi's GitHub Wiki Page describing the format.
A nice summary of all possible graph file formats can be found on the Gephi's website on Supported Graph Formats page. For convenience reasons we summarize the above-mentioned formats and their properties in the table below:
In order to facilitate the network and graph-analysis research, there are plenty of data repositories. These data sources contain tones of data in diverse and numerous domains including biology, machine learning, social sciences, physics, etc. Some of the most popular repositories are the following:
- SNAP Collection: hosted by Stanford University, this collection is part of the Stanford Network Analysis Platform (SNAP) which is a C++ graph-mining and network analysis library. More than 100 well-documented data sets, nicely clustered in various areas can be found on the official SNAP website.
- Network Repository: one of the largest repositories for network and graph data in more than 30 domains. All the data is nicely documented and organized on the official web page.
- KONECT: which is an abbreviation of the Koblenz Network Collection is a collaboration whose aim is to collect various types of large network data sets in order to boost the network research. The data repository holds more than 260 well-maintained data sets.
- UCI Network Repository: is maintained by the University of California with the goal to facilitate the research in this domain. The repo contains many different network data sets as well as links to other useful data sources.
Graph Visualization Libraries
All these libraries are open-source and everyone can contribute to their development. Their implementation is hosted on GitHub. For this reason, in the figure below we summarize their GitHub stats as of the date of publishing of this post. These stats might be a good indicator of the development activity and popularity of the library.
It is worth mentioning that all of these libraries allow us to explore the data interactively, which is quite important as we have demonstrated in the blog "The Importance of Interactive Data Visualization":
The human-computer interaction is more immersive and the results are more interpretable.
Sigma JS is an open-source tool mainly focused on deploying interactive graph visualizations in web applications. It is based on WebGL (or alternatively on Canvas) to render efficiently the graphs in the browser. On the official GitHub repository, we can find many interesting examples as well as proper documentation.
Among the other alternatives is the VivaGraph JS library which provides a nice API for rendering graphs on the web using many different engines and layouts. The official GitHub repo provides introductory guidelines as well as examples of how to use the library.
Springy JS is another open-source tool designed to be lightweight to use. Thus, it only provides a force-directed layout with simple graph-manipulation capabilities. Same as with many other tools, it is possible to use either WebGL, Canvas or SVG rendering engine.
If you liked what you just read, it would be really helpful to subscribe to the mailing list below. You will not get spammed that's a promise! You will get updates for the newest blog posts and visualizations from time to time.
Appendix: GUI-based tools
Aside from the programming interfaces that allow us to explicitly program the graph visualization, there are many out-of-the-box tools. These tools can automatically load data, with support for many different file formats. On top of this, they provide functionalities to manipulate and analyze the graphs in real-time.
In this appendix, we briefly summarize the following two tools: Gephi and GraphVis. You can find a comprehensive list of many of these tools in the following Medium story.
Gephi is a free, open-source and multi-platform tool. It was designed and developed to enable real-time rendering and analysis of huge graphs. Thus, it contains a special 3D engine that only uses the GPU and leaves the CPU free. Moreover, it implements a fast and efficient constraint-optimization algorithm called Force Atlas in order to draw the graphs in an aesthetically-pleasing way. Apart from this, Gephi defined and developed the GEXF file format as we described before.
GraphVis is a web-based commercial tool that empowers real-time interactive graph mining and relational machine learning (link prediction, finding influential nodes). Its GUI is very appealing and enables users to interactively query, filter and find patterns in one complex network of nodes and edges.