Focus on what instead of how to plot your data

2 minute read

In many situations we don’t have time to think how to plot our data. To make a quick data visualization we can use the Altair Python library. Altair focuses on what instead on how to visualize the data.

What

Altair is a Python library built on top of Vega-Lite. Vega-Lite is a light version of Vega. It is a visualization grammar, a declarative language for describing how to make visualizations. We write these declarations in JSON format.

Why

With high-level visualization grammars we can spend more time understanding the data. Altair is well aligned with this paradigm.

How

Using Altair for Python is quite simple. The most common pattern is to chain the following functions:

  • Chart(pd.Dataframe): create an instace of the Chart object using a loaded Pandas dataframe
  • mark_*: with all the mark_* methods we specify how to plot the data in the Pandas dataframe. We can pass various arguments to mark_* methods.
  • encode: we define how to map the data in the columns in the loaded Pandas dataframe

Without any effort, using these 3 calls we can visualze the Iris dataset. The Python code is given below:

1
2
3
4
5
6
7
8
9
10
import altair as alt
from vega_datasets import data

iris = data.iris()
alt.Chart(df).mark_circle(size=60).encode(
    x='sepalLength',
    y='sepalWidth',
    color='species',
    tooltip=['sepalLength', 'sepalWidth', 'petalLength', 'petalWidth']
).interactive()


With the call to interactive() we make the plot interactive. The resulting interactive plot is shown below:

Interactive scatter plot of the Iris dataset
Figure 1: Interactive Scatter Plot of the Iris dataset


We can also apply aggregate encodings on the data. For instance, we can split the data into bins and apply an average aggregation. An example in Python is given below:

1
2
3
4
5
6
stocks = data.stocks()
alt.Chart(stocks[stocks["symbol"] != "GOOG"]).mark_line().encode(
    x=alt.X("year(date):T", title="Year", bin=True),
    y="average(price)",
    color="symbol",
)

We make a line chart using mark_line(). We split the data into bins by the year. We select the year and make the data of temporal type with year(date):T. The resulting plot is depicted below:

Line plot of the average stock price over the years
Figure 2: Line plot of the average stock price


Altair supports as well numerous transformation of the data. They are all summarized here.

The source code for this work can be found in this Jupyter Notebook. If this is something you like and would like to see similar content you could follow me on LinkedIn or Twitter. Additionally, you can subscribe to the mailing list below to get similar updates from time to time.


Leave a comment