Animated Visualization of Random Walks in Python

7 minute read

The random processes are fundamental in a plethora of engineering and science domains. They are used to model complex systems like weather forecasting, which are highly unpredictable.

In this blog post, we will introduce the simplest random process, i.e. the simple random walk, which is the basis and foundation for many random processes. We will simulate a random walk using the Python numerical libraries like NumPy. Moreover, we will make an animated visualization using the Matplotlib Animation API in order to emphasize the random nature of this phenomenon. This is an important part because we convey our message more easily. So let's begin and stay tuned!

Intuition

We are not in control of everything and we can't predict all outcomes although we are used thinking that way. For example, how could we know the exact number of people in our favorite book store exactly at 12:36? We never know since this number is random and can't be calculated based on all observations we have. It is not possible to fit a magical deterministic formula to give us the answer. Instead, we can think of rough estimates and let the exact number of people be pure mere of luck. Thus, the number of visitors in the book store from minute to minute is a random process.


Fig. 1: Queueing System used to model arrivals and departures

The random processes are used to model plenty of phenomena in the exact sciences and engineering. The best possible way to start thinking of them is to play a game with coin tossing. Every time we flip the coin if the outcome is head we win 1 dollar and of the outcome is tail we lose 1 dollar. Can we know how much dollars we would win or lose after 10 consecutive coin flips? Certainly no, but what we can know is how probable some outcome is after 10 steps. This leads us to the definition of a random walk.

Random Walk Definition

Let \( \{\epsilon_{k}\}_{k \in \mathbb{N}} \) be a sequence of independent and identically distributed (i.i.d) discrete random variables such that \( \mathbb{P}\{ \epsilon_{k} = \pm 1 \} = {1 \over 2} \). We can think of this in terms of the coin-tossing we had before. With a probability of \( 0.5 \), it can happen to be a head, thus winning 1 dollar, and with a probability of \( 0.5 \) it can be a tail thus losing 1 dollar.

Then, a simple random walk \( \{X_{n}\}_{n \in \mathbb{N}} \) is defined as \[ X_{n} = {\sum_{k=1}^{n} \epsilon_{k}} \] We can think of this random variable like the cumulative sum of dollars we would have at time step \( n \). After flipping the coin 10 times, this sum might range from \( -10 \) to \( +10 \), or in general from \( -N \) to \( +N \). Because \( N \) might be a large number we can normalize this range to be most of the time between \( -1 \) and \( +1 \). We can construct such a thing using the Central Limit Theorem: \[ {X_{N} \over \sqrt{N}} \longrightarrow {\mathcal{N}(0, 1)} \] where \( \mathcal{N} \) denotes a Gaussian Random Variable with given mean and variance, in this case, \( 0 \) and \( 1 \). There is a proof showing that the given identity always holds.

Once we know the definition of a simple random walk, we can implement a simulation in Python and make a visualization of the possible outcomes.

Random Walk in Python

By using the NumPy utilities we can easily simulate a simple random walk. Given the number of steps \( N \) as an input argument, we can randomly generate \( N \) samples from the set \( \{ +1, -1\} \) with an equal probability of \( 0.5 \). Then, we will only use the cumsum function, to give us the cumulative sum in every time step. The Python code is given below:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import numpy as np
np.random.seed(1234)

def random_walk(N):
    """
    Simulates a discrete random walk
    :param int N : the number of steps to take
    """
    # event space: set of possible increments
    increments = np.array([1, -1])
    # the probability to generate 1
    p=0.5

    # the epsilon values
    random_increments = np.random.choice(increments, N, p)
    # calculate the random walk
    random_walk = np.cumsum(random_increments)

    # return the entire walk and the increments
    return random_walk, random_increments

# generate a random walk
N = 500
X, epsilon = random_walk(N)

# normalize the random walk using the Central Limit Theorem
X = X * np.sqrt(1./N)

Animating the Random Walk

Matplotlib library provides an animation API that offers a different perspective of the plotting in Python. Instead of serving static charts, we can use an eye-catching animation to emphasize and more efficiently transfer the message we would like to send. This comes quite handy and the effect is furthermore amplified for time-series data such as the simple random walk

Using this animation API we can animate one realization of a simple random walk as shown in the code snippet below:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
import matplotlib.pyplot as plt
import matplotlib.animation as animation
plt.style.use('seaborn')

#general figure options
fig = plt.figure(figsize=(15, 7))
ax = plt.axes(xlim=(0, N), ylim=(np.min(X) - 0.5, np.max(X) + 0.5))
line, = ax.plot([], [], lw=2)
ax.set_title('2D Random Walk', fontsize=18)
ax.set_xlabel('Steps', fontsize=16)
ax.set_ylabel('Value', fontsize=16)
ax.tick_params(labelsize=12)

# initialization function 
def init():
    # creating an empty plot/frame 
    line.set_data([], [])
    return line,

# lists to store x and y axis points 
xdata, ydata = [], []

# animation function 
def animate(i):
    # x, y values to be plotted 
    y = X[i]

    # appending new points to x, y axes points list 
    xdata.append(i)
    ydata.append(y)
    line.set_data(xdata, ydata)
    return line,

# call the animator	 
anim = animation.FuncAnimation(fig, animate, init_func=init,
                               frames=N, interval=20, blit=True)
# save the animation as mp4 video file 
anim.save('random_walk.gif',writer='imagemagick')

The animation function needs three essential inputs: i) an instance of the figure to show the animation, ii) data initialization function and iii) a function that defines the animation logic. Then we provide the number of frames to draw (how many times to iterate over the data) and the delay in milliseconds between the consecutive frames. One final option we could use is the bit blit, i.e. to redraw only those parts that have changed between two consecutive frames, thus leading to faster animation. The result we get is shown in the animation below:

Random Walk Animation through the steps
Animation: Simple Random Walk

The full source code behind this random walk simulation and animation can be found in this GitHub repository. For more information, please follow me on Twitter.

If you liked what you just saw, it would be really helpful to subscribe to the mailing list below. You will not get spammed that's a promise! You will get updates for the newest blog posts and visualizations from time to time.

Conclusion

In this short tutorial blog post, we simulated and animated a simple random walk using Python's Matplotlib library. This random process, although very simple is quite important and opens the way towards the other applied random processes.

This list includes but is not limited to the Brownian Motion and its variants like the Geometric Brownian Motion, used to model systems with random exponential growth. Moreover, it is a foundation for the Stochastic Differential Equations and Stochastic Integration. We will cover all of these topics in the next blog posts.

Appendix: Central Limit Theorem Numerical Simulation

In this tutorial, we claimed that the normalized random walk follows a Gaussian distribution with mean 0 and variance 1, for which there is a strong mathematical proof. Although this is a universal truth, we can still make a numerical simulation, as shown in the code snippet below:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
num_simulations = 10000 # num of simulation
N = 100000 # number od steps in each simulation
dt = 1./N # the time step
X_norm = [0] * num_simulations # the normalized random variable

# run the simulations
for i in range(num_simulations):
    X, _ = random_walk(N)
    X_norm[i] = X[N - 1] * np.sqrt(dt)

print('The mean is: {0}'.format(np.mean(X_norm)))
print('The var is: {0}'.format(np.var(X_norm)))

plt.figure(figsize=(15, 7))
plt.grid(True, which='major', linestyle='--', color='black', alpha=0.4)
plt.title('Probability distribution of X_n', fontsize=18)
plt.xlabel('Value for one realization of X_n', fontsize=18)
plt.ylabel('Probability Density', fontsize=22)
plt.hist(X_norm, 100, density=True, color='#0492C2')

The resulting plot of this simulation is depicted in the image below:
Gaussian Bell Curve
Fig 2: Simulation of the Central Limit Theorem

Leave a comment