JavaScript: Graph Visualization using Cytoscape JS

7 minute read

In the last blog post, we dived in the Graph-Visualization ecosystem. We elaborated more on the common data formats, the existing and popular data repositories, the JavaScript libraries as well as some GUI-based tools.

This time we go step further and demonstrate what means an interactive graph visualization. On the importance of interactive visualization, we already discussed in one of the previous blogs, "The Importance of Interactive Data Visualization", which is quite aligned with the following quote:

Visualization gives you answers to questions you didn’t know you had.

-- Ben Schneiderman

In a similar way, we show how the interactive graph visualization can be beneficial in discovering data patterns. Stay tuned!

Hands-On

To demonstrate the capabilities of the Graph Visualization Libraries we use the Cytoscape JS library. As a starting point, we use the code provided in the following GitHub repo, which is a common playground to start with Cytoscape.

Data

The dataset we use is the class dependency network of the JDK 1.6.0.7 framework downloaded from the KOBLENZ data repository. Each node represents one class and an edge between them means there is a dependency between the classes. The data set is stored in a CSV file format and it contains 6,434 nodes and 150,985 edges, which is an immense number to load and render in the browser. For this reason, just for demonstration purposes, we select a subset of 50 nodes and all the edges between them.

The Cytoscape JS library expects the data to be stored in a JSON file with a predefined structure. Thus, we transform the original data set by using this Python Notebook.

We start by selecting a subset of 50 nodes. For each node, we extract the name of the class it represents and the package where it belongs. Along with this, we include some attributes that define the rendering of the node:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
NUM_NODES = 50  # how many nodes to subselect
nodes = []  # the final subset of nodes
packages = []  # the set of packages for each node

# each line represents one node
with open('raw_data/ent.subelj_jdk_jdk.class.name', 'r') as f:
    for i, line in enumerate(f):
        # stop when the limit is reached
        if i == NUM_NODES:
            break

        full_name = str(line[:-1])  # the  fill name
        class_name = full_name.split('.')[-1]  # only the class name
        package = '.'.join(full_name.split('.')[:-1])  # the package where the class belongs
        packages.append(package)
        node = {
            "data": {
                "id": str(i + 1),  # the string representation of the unique node ID
                "idInt": i + 1,  # the numeric representation of the unique node ID
                "name": 'cls: ' + class_name + "; pkg: " + package,  # the name of the node used for printing
                "query": True,
                "classes": package  # the keyword 'classes' is used to group the nodes in classes
            },
            "group": "nodes",  # it belongs in the group of nodes
            "removed": False,
            "selected": False,  # the node is not selected
            "selectable": True,  # we can select the node
            "locked": False,  # the node position is not immutable
            "grabbable": True  # we can grab and move the node
        }
        nodes.append(node)

# get all the unique package names
packages = set(packages)
print(packages)

Once we know the nodes, we select the edges between them. For each edge, it is mandatory to specify the source and the target node. We also include additional rendering options. Additionally, for each node, we calculate the in-degree number which is the number of incoming edges. We use the normalized in-degree number as a score to help us better visualize the nodes:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
edges = []  # the final subset of edges

# each line represents an edge between two nodes. The nodes are represented by their id
with open('raw_data/out.subelj_jdk_jdk', 'r') as f:
    # jump the first two lines, they contain some info
    for i, line in enumerate(f):
        if i == 0 or i == 1:
            continue

        # get the source node and the target node
        node_ids = line.strip().split(' ')
        source, target = node_ids[0], node_ids[1]
        if int(source) <= NUM_NODES and int(target) <= NUM_NODES:
            edge = {
                "data": {
                    "source": str(source),  # the source node id (edge comes from this node)
                    "target": str(target),  # the target node id (edge goes to this node)
                    "directed": True,
                    "intn": True,
                    "rIntnId": i - 1,
                    "id": "e" + str(i - 1)
                },
                "position": {},  # the initial position is not known
                "group": "edges",  # it belongs in the group of edges
                "removed": False,
                "selected": False,  # the edge is not selected
                "selectable": True,  # we can select the node
                "locked": False,  # the edge position is not immutable
                "grabbable": True,  # we can grab and move the node
                "directed": True  # the edge is directed
            }
            edges.append(edge)

# initial dictionary mapping each node id to its normalized indegree
nodes_indegree = dict(zip(list(range(1, NUM_NODES + 1)), [0]*(NUM_NODES + 1)))
N = len(edges)
for e in edges:
    nodes_indegree[int(e["data"]["target"])] += 1.0/N

# assign the normalized indegree to each node
for node in nodes:
    node["data"]["score"] = nodes_indegree[node["data"]["idInt"]]

Demo

There are three basic operations to execute in order to visualize the graph: i) load the data, ii) apply a style, and iii) apply a rendering algorithm.

We already transformed and stored the data as described above. We can load the data using the following code snippet:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
let $dataset = "jdk_dependency.json";
let getDataset = name => fetch(`datasets/${name}`).then( toJson );
let applyDataset = dataset => {
  cy.zoom(0.001);  // set a zoom level
  cy.pan({ x: -9999999, y: -9999999 });  // set the current panning position.
  cy.elements().remove();  // remove all the elements if any, before loading the data
  cy.add( dataset );  // set the data
}
let applyDatasetFromSelect = () =>
      Promise.resolve( $dataset ).then( getDataset ).then( applyDataset );

The style defines the visual appearance of the graph elements. The style information is usually stored in a JSON file that resembles and follows the CSS conventions. For instance, we can specify the style of the nodes in the following manner:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
{
  "selector": "node",
  "style": {
     "width": "20",
     "height": "20",
     "label": "data(name)",
     "font-size": "5px",
     "background-color": "#aa0755",
     "text-outline-width": "1px",
     "color": "#fff",
  }
}

For full details regarding the style please refer to the following page. After defining the style we can simply load it:
1
2
3
4
5
6
7
8
let $stylesheet = "style.json";
let getStylesheet = name => {
  let convert = res => name.match(/[.]json$/) ? toJson(res) : toText(res);
  return fetch(`stylesheets/${name}`).then( convert );}
let applyStylesheet = stylesheet => {
  cy.style().fromJson( stylesheet ).update();}
let applyStylesheetFromSelect = () =>
  Promise.resolve( $stylesheet ).then( getStylesheet ).then( applyStylesheet );

Finally, we need to specify the layout, which algorithmically infers the positions of the nodes. Such an example is the force-directed algorithm, which draws the graph in an aesthetically-pleasing way. In this demo, we use the CoLa layout, which is a more sophisticated version of the force-directed layout.

Last but not least, once the graph is fully rendered we can run many searching and analysis algorithms on top of it. For example in this demo, we can run the Breadth-First Search (BFS) or Depth-First Search (DFS) algorithms.

In the animated image below we illustrate the final visualization of the graph. If you want to try it on your own, you can do it by opening the following web page.

Demo of the Graph Visualization
Animation: Demo of the Graph Visualization using Cytoscape

The full code and data can be found in the following GitHub repo. For more information, follow me on Twitter.

If you liked what you just saw, it would be really helpful to subscribe to the mailing list below. You will not get spammed that's a promise! You will get updates for the newest blog posts and visualizations from time to time.

Summary

In this blog post, we showed a real coding example on how to create an interactive visualization using the Cytoscape JS library. It is a powerful library with many options given to the developer. First of all, we can design the appearance of the graph by using custom stylesheets. The graph aesthetical appearance is furthermore reinforced with dozens of layouts like the one we used in our demo, the CoLa layout. Last but not least this JavaScript library offers plenty of graph-analysis algorithms.

Updated:

Leave a comment