# I wish every data scientist had interactive data explorers like this one!

Comprehensive understanding of the data is a top priority for each data scientist. One important part of this process is plotting and visualizing the data. However, the data is always high-dimensional and the visualization is not a straightforward process. For this reason we use data projection techniques like UMAP, t-SNE or even PCA.

All these data projection techniques are amazing, however there’s one drawback with them. They are not interactive, not at least their Python implementations. In this short geeky coffee break, we will take a look at one interactive data exploration tool over the UMAP projections of the MNIST dataset.

# What

The UMAP Explorer is
a web app rendering an interactive UMAP visualization of the MNIST dataset.
For even greater satisfaction, each data point is rendered as the image of the hand-written digit itself. Besides **UMAP**, we can also load a
**t-SNE** projection of the data and observe how these two projection techniques differ.

It is a **React** application with a purpose to demonstrate how to render tens of thousands of images mapped to data points, but it
also serves as an excellent tool for the data scientist.

# Why

The interactivity is a crucial component to understand the data better. To fully immerse into data exploration we need to dive interactively into the particular data points. In this way we can see their neighborhood and how they compare with the other data points.

In this MNIST use case, there are at least two aspects to observe: different digits that appear to be the same and the tiny (and wide respectively) gap between different groups of digits.

## Different digits similar to each other

In the particular case of the **UMAP Explorer** we can zoom into the clusters of data points, click on them to have a more detailed
view and observe the visual cues of the digits. This enables us to compare the digit with other similar digits and to even
conclude that some totally different digits are written in so similar way. One such particular case is depicted in the
figure below. A digit labeled and written as 2 is among a cluster of digits written as 7. No wonder if some classifier
takes that *2* as *7*.

## Blending between the similar digits

Zooming into the particular use-cases is not the only advantage of the interactive visualizations. We can also investigate
the global properties of the data, especially the boundaries between the different classes. As illustrated below, the
gap between the digits *8* and *1* is obvious, however the digits *8* and *3* seem to blend. No wonder if some classifier
struggles to distinguish *8s* and *3s*.

# How

The **UMAP Explorer** is perfectly combining multiple technologies. First, it pre-computes the
**UMAP** projections of the MNIST digits. This gives the projections of the images in the 2D plane, i.e.
their **(x, y)** coordinates.

Very briefly, UMAP (Uniform Manifold Approximation and Projection) is a non-linear data dimension reduction algorithm. It builds a high-dimensional graph representation of the data then optimizes a low-dimensional graph to be as structurally similar as possible. The most two important parameters it uses are:

`n_neighbors`

: the number of approximate nearest neighbors used to construct the initial high-dimensional graph`min_dist`

: the minimum distance between points in low-dimensional space

In the **Resources** section below you can find an amazing set of articles that explain **UMAP**.

Furthermore, to be able to connect the 2D projected data points with the image they represent, it pre-computes a texture atlas (an image containing multiple smaller images). Finally, it uses three.js (a JavaScript 3D rendering library) to put and render everything together.

I wish we had more of these interactive data explorers.

If this is something you like and would like to see similar content you could follow me on LinkedIn or Twitter. Additionally, you can subscribe to the mailing list below to get similar updates from time to time.

# Resources

- The GitHub repository of the
**UMAP Explorer** - A great tutorial on understanding UMAP
- A deeper dive in UMAP