Neural Networks Hyperparameter Search, the Visualized Way

6 minute read

In Machine Learning (ML) out-of-the-shelf models are not always available. In many instances, we need to train a model on a specific task. But as in every optimization problem, "there ain't no such thing as a free lunch". Thus, we have to find the model that performs well on our task.

The ML models, especially the Neural Networks, are characterized by their set of hyperparameters that control the learning process. For this reason, the performance of an ML model heavily depends on the hyperparameter values. One set of values may result in better performance than another set. This search of hyperparameter values is known as hyperparameter optimization.

In this blog we will see how to easily keep track of the model's performance depending on the hyperparameter values in a visualized way. First, we will build a simple neural network using Keras. We will train this network on a sentiment analysis task for many combinations of the hyperparameters.

Finally, we will see how to use the HiPlot library to build an interactive visualization and search for optimal values. Stay tuned!

Just Another Keras Model

For demonstration purposes, we build a simple model in Keras trained on the IMDB Sentiment Analysis Dataset.

Loading the Data

The Keras Dataset module provides a few preprocessed and vectorized datasets ready to use. The IMDB Sentiment Analysis Dataset contains already processed and tokenized sentences (each word has a unique ID) coupled with a label, either 1 indicating a positive sentiment or 0 for negative sentiment. To load it, we use the following Python code:

1
2
3
4
5
6
7
8
9
10
11
from keras.datasets import imdb
from keras.preprocessing import sequence

max_features = 20000  # vocabulary size
maxlen = 100  # max length of every input sequence

(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
x_train, y_train = x_train[:2500], y_train[:2500]
x_test, y_test = x_test[:1000], y_test[:1000]
x_train = sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = sequence.pad_sequences(x_test, maxlen=maxlen)

Building the Model

The machine learning model we build is a typical Neural Network architecture used in many text classification tasks. It includes the following layers:

  • Embedding layer with hyperparameter embedding_dim indicating the dimensionality of the resulting embeddings;
  • Dropout layer with hyperparameter dropout indicating the dropout rate;
  • 1D Convolution with hyperparameters filters and kernel_size defining the number of output channels and the width of the 1D kernel respectively;
  • bi-LSTM layer with hyperparameter lstm_output_size for the dimensionality of the output and
  • Dense layer with only one output and sigmoid activation.

The following Python snippet demonstrates what we just described above:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
from keras.models import Sequential
from keras.layers import Activation, Bidirectional, Conv1D, Dense
from keras.layers import Dropout, Embedding, LSTM, MaxPooling1D


def make_model(
    embedding_dim: int,
    dropout: float,
    filters: int,
    kernel_size: int,
    pool_size: int,
    lstm_output_size: int,
    metrics: list,
    vocab_size: int,
    maxlen: int,
):
    model = Sequential(
        [
            Embedding(vocab_size, embedding_dim, input_length=maxlen),
            Dropout(dropout),
            Conv1D(filters, kernel_size, padding="valid", activation="relu"),
            MaxPooling1D(pool_size=pool_size),
            Bidirectional(LSTM(lstm_output_size), merge_mode="ave"),
            Dense(1),
            Activation("sigmoid"),
        ]
    )

    model.compile(optimizer="adam", loss="binary_crossentropy", metrics=metrics)
    return model

Hyperparameter Search

To measure the impact of the hyperparameters we must define a set of performance metrics. By default, we track the training and validation loss, which in this case is the binary cross-entropy. On top of this, we will trace the accuracy, precision, and recall. In general, it is useful to benchmark the model on multiple metrics. Depending on the use case we might prioritize one over another and at the same time observe the dependency between them.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
from keras.metrics import BinaryAccuracy, Precision, Recall

METRICS = [
    BinaryAccuracy(name='accuracy'),
    Precision(name='precision'),
    Recall(name='recall'),
]  # metrics to track

# hyperparameters to track
embedding_size = [32, 128]
dropout = [0.01, 0.1]
filters = [16, 32, 64]
kernel_size = [3, 5, 7]
pool_size = [2, 4]
lstm_output_size = [16, 64]
batch_size = [8, 16, 32]

Once we have defined the hyperparameters to track coupled with the performance metrics, we can start the hyperparameter search by plugging-in various combinations of values. In this sense, we create a hypergrid from the hyperparameter values. For each point on this hypergrid, we train and evaluate the model. We can think of this as one experiment, which is usually the case.

As we run the experiments, we log the model performance as a function of the hyperparameter values as one row in some external database or file. This is illustrated with the following snippet:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
import itertools

epochs = 3  # number of training epochs
test_batch_size = 32  # batch size for testing
arrays = [
    embedding_size,
    dropout,
    filters,
    kernel_size,
    pool_size,
    lstm_output_size,
    batch_size,
]  # all hyper-params

for ed, d, flt, ks, ps, ls, bs in itertools.product(*arrays):
    model = make_model(
        embedding_dim=ed,
        dropout=d,
        filters=flt,
        kernel_size=ks,
        pool_size=ps,
        lstm_output_size=ls,
        metrics=METRICS,
        vocab_size=max_features,
        maxlen=maxlen,
    )
    h = model.fit(x_train, y_train, batch_size=bs, epochs=epochs, verbose=2)
    train_loss = h.history["loss"][-1]
    test_metrics = model.evaluate(x=x_test, y=y_test, batch_size=test_batch_size)
    test_loss, test_acc, test_prec, test_rec = test_metrics

    # write everything to external JSON file

Now that we have generated metadata for our experiments, we have to make it actionable.

Visualize the Hyperparameters Impact

Data in raw format is difficult, sometimes impossible to interpret. This especially holds for multivariate data!

We can easily resolve this by using the parallel coordinates plot. With this type of plot, the data dimensions (a.k.a. features) are represented by parallel axes, one per dimension. Thus, each multivariate point is manifested as a poly-line connecting the corresponding dimensions. At the same time, this plot encodes the correlation between the data dimensions: line crossings indicate inverse correlation. One example of a parallel coordinates plot is shown below:

Parallel Coordinates Example plot Figure 1. Credits: A Parallel Coordinates plot from Blocks.

For seamless creation of interactive parallel coordinates plots, we can use HiPlot, an open-source Python library. Given the data that follows a consistent and predefined schema, it automatically generates parallel coordinates plot. The plot can be easily integrated into a Jypyter Notebook, as a standalone HTML file, or directly in a Streamlit app.

In our case, we have generated metadata for all experiments and we have to make it actionable. Ultimately, we want to see a summary of the hyperparameters' influence on the model performance and look at what suits our case.

By just running two lines of code such as:

1
2
import hiplot as hip
hip.Experiment.from_iterable(hiplt_data).display()

we obtain this nice interactive plot as depicted below. Go on, give it a try and see how it works!


The benefit of using this interactive plot is that we have a clear overview of all experiments which can be additionally indexed with a unique ID.

In many practical situations, for model reproducibility reasons it is advisable to assign traceable IDs to the experiments, meaning we can always roll back and reproduce the same results. For example, if the model and the hyperparameter values are tracked with git, as an experiment ID we can use the SHA code of the commit encapsulating the latest repository modifications before running the experiment.

Obviously one disadvantage of this technique is the computational cost. Sometimes it is not affordable to run a plethora of experiments just to find the best hyperparameter values. However, over a longer time range, it is possible that the number of experiments will become significant. Therefore, in order not to lose any knowledge, it is still better to log the experiments and eventually visualize them with HiPlot.

The source code for the implementation can be found on GitHub. If this is something you like and would like to see similar content you could follow me on LinkedIn or Twitter. Additionally, you can subscribe to the mailing list below.

Summary

In this blog we learned how to make our machine learning experiments more useful with a visualization technique called parallel coordinates plot.

We tracked and logged the performance of one simple Keras model depending on the hyperparameter values. Later we made this metadata actionable using HiPlot, an open-source Python library for creating interactive parallel coordinates plots.

Leave a comment