Jekyll2023-04-20T13:56:59+02:00https://isquared.digital/blog.xmliSquared | BlogWebsite to perceive science through interesting visualisations. It explains and implements the visualised concepts in Python and JavascriptVladimir IlievskiBatch vs Layer Normalization in Deep Neural Nets. The Illustrated Way!2023-03-15T10:00:00+01:002023-03-15T10:00:00+01:00https://isquared.digital/blog/illustrated-batch-vs-layer-norm<p>The <a href="https://arxiv.org/pdf/1502.03167.pdf" target="_blank">Batch Normalization (BN)</a> and <a href="https://arxiv.org/pdf/1607.06450.pdf" target="_blank">Layer Normalization (LN)</a> techniques are widely used techniques in deep learning. They ease the optimization process and help very deep networks converge faster.</p> <p>The Batch Normalization (BN) has been successfully applied to the vision tasks while the the Layer Normalization (LN) to the sequential tasks, mainly in NLP.</p> <p>They are both normalization techniques applied to the input of each layer. Therefore, both techniques calculate the same two statistics: <em>mean</em> and <em>variance</em>, only in a different manner.</p> <p>To fully understand and know the difference between <em>BN</em> and <em>LN</em> is not quite straightforward. For this reason in this blog we explain batch and layer normalization with intuitive illustrations.</p> <h1 id="batch-normalization">Batch Normalization</h1> <p>The <a href="https://arxiv.org/pdf/1502.03167.pdf" target="_blank">Batch Normalization (BN)</a> was first introduced to solve the <em>internal covariance shift</em> i.e. the change in the distributions of the hidden layers in the course of training.</p> <p>In general <em>BN</em> accelerates the training of deep neural nets. It also reduces the dependence of gradients on the scale of the parameters (or of their initial values) which in turn allows the use of much higher learning rates. However, it has one drawback, it requires a sufficiently large batch size.</p> <p>To save us the pain of reading the entire paper, without going too much into the details, the essential part on how <em>Batch Normalization</em> works is illustrated in the image below:</p> <center> <img data-src="https://isquared.digital/assets/images/illustrated_batch_norm.png" class="lazyload" alt="Illustrated Batch Normalization" /> <br /> <span class="caption text-muted"> Illustrated Batch Normalization </span> </center> <p><br /></p> <p>In <em>Batch Normalization</em> the <em>mean</em> and <em>variance</em> are calculated for each individual channel across all elements (pixels or tokens) in all batches.</p> <p>Even though at first sight it may sound counterintuitive, but because it iterates over all batches it is called <em>Batch Normalization</em></p> <h1 id="layer-normalization">Layer Normalization</h1> <p>Having sufficiently large batch size is impractical for sequential tasks where the length of the sequence can be very large. To mitigate this constraint, the <a href="https://arxiv.org/pdf/1607.06450.pdf" target="_blank">Layer Normalization (LN)</a> technique was introduced.</p> <p>Thus, <em>LN</em> is less dependent on the batch size and can be used with small batch sizes. It can also help to reduce the vanishing gradient in recurrent neural networks.</p> <p>Agian, to save us the the time of reading the entire paper the essential part on how <em>Layer Normalization</em> works is illustrated in the image below:</p> <center> <img data-src="https://isquared.digital/assets/images/illustrated_layer_norm.png" class="lazyload" alt="Illustrated Layer Normalization" /> <br /> <span class="caption text-muted"> Illustrated Layer Normalization </span> </center> <p><br /></p> <p>In <em>Batch Normalization</em> the <em>mean</em> and <em>variance</em> are calculated for each individual batch across all elements (pixels or tokens) in all channels.</p> <p>At first sight it may be counterintuitive, but because it iterates over all channels i.e. features it is called <em>Layer Normalization</em></p> <h1 id="pytorch-implementation">PyTorch Implementation</h1> <p>The <em>PyTorch</em> implementation is given in code snippets below. During ttraining, we create two learnable parameters <code class="language-plaintext highlighter-rouge">gamma</code> and <code class="language-plaintext highlighter-rouge">beta</code> to shift the normalized input.</p> <p>To have unbiased inference, during training we calculate the <em>moving mean</em> and <em>moving variance</em>. Later on, during inference we use these moving averages as a replacement of the test data <em>mean</em> and <em>variance</em>.</p> <figure class="highlight"><pre><code class="language-python" data-lang="python"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1 2 </pre></td><td class="code"><pre><span class="kn">import</span> <span class="nn">torch</span> <span class="kn">import</span> <span class="nn">torch.nn</span> <span class="k">as</span> <span class="n">nn</span> </pre></td></tr></tbody></table></code></pre></figure> <p>Below you can find the <em>Batch Normalization</em> implementation in PyTorch:</p> <figure class="highlight"><pre><code class="language-python" data-lang="python"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 </pre></td><td class="code"><pre><span class="k">class</span> <span class="nc">BatchNorm</span><span class="p">(</span><span class="n">nn</span><span class="p">.</span><span class="n">Module</span><span class="p">):</span> <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">num_features</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">training</span><span class="p">:</span> <span class="nb">bool</span><span class="p">,</span> <span class="n">eps</span><span class="p">:</span> <span class="nb">float</span><span class="o">=</span><span class="mf">1e-6</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="bp">None</span><span class="p">:</span> <span class="nb">super</span><span class="p">().</span><span class="n">__init__</span><span class="p">()</span> <span class="bp">self</span><span class="p">.</span><span class="n">training</span> <span class="o">=</span> <span class="n">training</span> <span class="c1"># learnable parameters </span> <span class="bp">self</span><span class="p">.</span><span class="n">gamma</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">Parameter</span><span class="p">(</span><span class="n">torch</span><span class="p">.</span><span class="n">ones</span><span class="p">(</span><span class="n">num_features</span><span class="p">))</span> <span class="bp">self</span><span class="p">.</span><span class="n">beta</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">Parameter</span><span class="p">(</span><span class="n">torch</span><span class="p">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">num_features</span><span class="p">))</span> <span class="c1"># hyperparams </span> <span class="bp">self</span><span class="p">.</span><span class="n">eps</span> <span class="o">=</span> <span class="n">eps</span> <span class="bp">self</span><span class="p">.</span><span class="n">moving_mean</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">Parameter</span><span class="p">(</span><span class="n">torch</span><span class="p">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">num_features</span><span class="p">),</span> <span class="n">requires_grad</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span> <span class="bp">self</span><span class="p">.</span><span class="n">moving_var</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">Parameter</span><span class="p">(</span><span class="n">torch</span><span class="p">.</span><span class="n">ones</span><span class="p">(</span><span class="n">num_features</span><span class="p">),</span> <span class="n">requires_grad</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span> <span class="k">def</span> <span class="nf">forward</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">x</span><span class="p">):</span> <span class="k">if</span> <span class="bp">self</span><span class="p">.</span><span class="n">training</span><span class="p">:</span> <span class="n">mean</span> <span class="o">=</span> <span class="n">x</span><span class="p">.</span><span class="n">mean</span><span class="p">(</span><span class="n">dim</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">keepdim</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span> <span class="n">var</span> <span class="o">=</span> <span class="n">x</span><span class="p">.</span><span class="n">var</span><span class="p">(</span><span class="n">dim</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">keepdim</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span> <span class="bp">self</span><span class="p">.</span><span class="n">moving_mean</span> <span class="o">=</span> <span class="mf">0.9</span> <span class="o">*</span> <span class="bp">self</span><span class="p">.</span><span class="n">moving_mean</span> <span class="o">+</span> <span class="mf">0.1</span> <span class="o">*</span> <span class="n">mean</span> <span class="bp">self</span><span class="p">.</span><span class="n">moving_var</span> <span class="o">=</span> <span class="mf">0.9</span> <span class="o">*</span> <span class="bp">self</span><span class="p">.</span><span class="n">moving_var</span> <span class="o">+</span> <span class="mf">0.1</span> <span class="o">*</span> <span class="n">var</span> <span class="k">else</span><span class="p">:</span> <span class="n">mean</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">moving_mean</span> <span class="n">var</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">moving_var</span> <span class="n">x</span> <span class="o">=</span> <span class="p">(</span><span class="n">x</span> <span class="o">-</span> <span class="n">mean</span><span class="p">)</span> <span class="o">/</span> <span class="n">torch</span><span class="p">.</span><span class="n">sqrt</span><span class="p">(</span><span class="n">var</span> <span class="o">+</span> <span class="bp">self</span><span class="p">.</span><span class="n">eps</span><span class="p">)</span> <span class="n">x</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">gamma</span> <span class="o">*</span> <span class="n">x</span> <span class="o">+</span> <span class="bp">self</span><span class="p">.</span><span class="n">beta</span> <span class="k">return</span> <span class="n">x</span> </pre></td></tr></tbody></table></code></pre></figure> <p>Below you can find the <em>Layer Normalization</em> implementation in PyTorch:</p> <figure class="highlight"><pre><code class="language-python" data-lang="python"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 </pre></td><td class="code"><pre><span class="k">class</span> <span class="nc">LayerNorm</span><span class="p">(</span><span class="n">nn</span><span class="p">.</span><span class="n">Module</span><span class="p">):</span> <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">num_features</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">training</span><span class="p">:</span> <span class="nb">bool</span><span class="p">,</span> <span class="n">eps</span><span class="p">:</span> <span class="nb">float</span><span class="o">=</span><span class="mf">1e-6</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="bp">None</span><span class="p">:</span> <span class="nb">super</span><span class="p">().</span><span class="n">__init__</span><span class="p">()</span> <span class="bp">self</span><span class="p">.</span><span class="n">training</span> <span class="o">=</span> <span class="n">training</span> <span class="c1"># learnable parameters </span> <span class="bp">self</span><span class="p">.</span><span class="n">gamma</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">Parameter</span><span class="p">(</span><span class="n">torch</span><span class="p">.</span><span class="n">ones</span><span class="p">(</span><span class="n">num_features</span><span class="p">))</span> <span class="bp">self</span><span class="p">.</span><span class="n">beta</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">Parameter</span><span class="p">(</span><span class="n">torch</span><span class="p">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">num_features</span><span class="p">))</span> <span class="c1"># hyperparams </span> <span class="bp">self</span><span class="p">.</span><span class="n">eps</span> <span class="o">=</span> <span class="n">eps</span> <span class="bp">self</span><span class="p">.</span><span class="n">moving_mean</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">Parameter</span><span class="p">(</span><span class="n">torch</span><span class="p">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">num_features</span><span class="p">),</span> <span class="n">requires_grad</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span> <span class="bp">self</span><span class="p">.</span><span class="n">moving_var</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">Parameter</span><span class="p">(</span><span class="n">torch</span><span class="p">.</span><span class="n">ones</span><span class="p">(</span><span class="n">num_features</span><span class="p">),</span> <span class="n">requires_grad</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span> <span class="k">def</span> <span class="nf">forward</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">x</span><span class="p">):</span> <span class="k">if</span> <span class="bp">self</span><span class="p">.</span><span class="n">training</span><span class="p">:</span> <span class="n">mean</span> <span class="o">=</span> <span class="n">x</span><span class="p">.</span><span class="n">mean</span><span class="p">(</span><span class="n">dim</span><span class="o">=-</span><span class="mi">1</span><span class="p">,</span> <span class="n">keepdim</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span> <span class="n">var</span> <span class="o">=</span> <span class="n">x</span><span class="p">.</span><span class="n">var</span><span class="p">(</span><span class="n">dim</span><span class="o">=-</span><span class="mi">1</span><span class="p">,</span> <span class="n">keepdim</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span> <span class="bp">self</span><span class="p">.</span><span class="n">moving_mean</span> <span class="o">=</span> <span class="mf">0.9</span> <span class="o">*</span> <span class="bp">self</span><span class="p">.</span><span class="n">moving_mean</span> <span class="o">+</span> <span class="mf">0.1</span> <span class="o">*</span> <span class="n">mean</span> <span class="bp">self</span><span class="p">.</span><span class="n">moving_var</span> <span class="o">=</span> <span class="mf">0.9</span> <span class="o">*</span> <span class="bp">self</span><span class="p">.</span><span class="n">moving_var</span> <span class="o">+</span> <span class="mf">0.1</span> <span class="o">*</span> <span class="n">var</span> <span class="k">else</span><span class="p">:</span> <span class="n">mean</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">moving_mean</span> <span class="n">var</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">moving_var</span> <span class="n">x</span> <span class="o">=</span> <span class="p">(</span><span class="n">x</span> <span class="o">-</span> <span class="n">mean</span><span class="p">)</span> <span class="o">/</span> <span class="n">torch</span><span class="p">.</span><span class="n">sqrt</span><span class="p">(</span><span class="n">var</span> <span class="o">+</span> <span class="bp">self</span><span class="p">.</span><span class="n">eps</span><span class="p">)</span> <span class="n">x</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">gamma</span> <span class="o">*</span> <span class="n">x</span> <span class="o">+</span> <span class="bp">self</span><span class="p">.</span><span class="n">beta</span> <span class="k">return</span> <span class="n">x</span> </pre></td></tr></tbody></table></code></pre></figure> <p>Take a look and downlaod the PDF document containing the illustrations above by clicking on the button below:</p> <p><a href="https://isquared.digital/assets/pdfs/illustrated_batch_vs_layer_norm.pdf" target="_blank" class="btn btn--primary .btn--small">Downlaod Illustrations</a></p> <p>For more information, please follow me on <a href="https://www.linkedin.com/in/vilievski/" target="_blank" rel="noopener"><b>LinkedIn</b></a> or <a href="https://twitter.com/VladOsaurus" target="_blank" rel="noopener"><b>Twitter</b></a>. If you like this content you can subscribe to the mailing list below to get similar updates from time to time.</p> <link href="//cdn-images.mailchimp.com/embedcode/horizontal-slim-10_7.css" rel="stylesheet" type="text/css" /> <link href="/assets/css/mailchimp.css" /> <div id="mc_embed_signup"> <form action="https://digital.us19.list-manage.com/subscribe/post?u=cb9dbe40387c27177a25de80f&amp;id=08bda6f8e0" method="post" id="mc-embedded-subscribe-form" name="mc-embedded-subscribe-form" class="validate" target="_blank" novalidate=""> <div id="mc_embed_signup_scroll"> <label for="mce-EMAIL">Join the iSquared mailing list</label> <input type="email" value="" name="EMAIL" class="email" id="mce-EMAIL" placeholder="email address" required="" /> <!-- real people should not fill this in and expect good things - do not remove this or risk form bot signups--> <div style="position: absolute; left: -5000px;" aria-hidden="true"><input type="text" name="b_cb9dbe40387c27177a25de80f_08bda6f8e0" tabindex="-1" value="" /></div> <div class="clear"><input type="submit" value="Subscribe" name="subscribe" id="mc-embedded-subscribe" class="button" /></div> </div> </form> </div> <p><br /></p>Vladimir IlievskiIntuitive illustration of the batch and layer normalization techniques in neural networks with PyTorch implementationThe 100-page ChatGPT Generated Python Tutorial For Absolute Beginners2023-01-29T10:00:00+01:002023-01-29T10:00:00+01:00https://isquared.digital/blog/chat-gpt-generated-python-tutorial<p><a href="https://openai.com/blog/chatgpt/" target="_blank">ChatGPT</a> is a revolutionary large language model. It is capable to generate text on literally any subject. It has outstanding capabilities to generate code explanations. This can serve as an excellent tool to teach programming as <strong>“programming is learned by programming”.</strong></p> <p>Following this premise, I used <strong>ChatGPT</strong> to compile explanations for 100 Python exercises for complete beginners. One exercise per page. Following the exercise explanations a novice in Python can learn how to code. You can find the download link below.</p> <h1 id="how-i-created-the-tutorial">How I created the tutorial</h1> <p>Obviously we need a set of Python exercises well suited for beginners. The <a href="https://github.com/darkprinx/break-the-ice-with-python" target="_blank"><strong>Break The Ice With Python</strong></a> GitHub repository contains 100 simple Python questions with solutions.</p> <p>Then, using the code snippets we ask <strong>ChatGPT</strong> to explain them line by line. To obtain good explanations suited for beginners we use the following prompt:</p> <pre><code class="language-plain">Explain me the following code snippet written in Python as explaining it to someone who doesn't know programming in Python: &lt;code_snippet&gt; </code></pre> <p>after which the Python snippet followed. The output is then taken for further processing.</p> <p>After this we stich all explanations into a final <em>PDF</em> document. One exercise per page. Every page follows the same structure having the following sections:</p> <ul> <li>Exercise N: the description of the exercise number <em>N</em></li> <li>Code: The solution of the exercise</li> <li>ChatGPT Explanations: the ChatGPT explanation</li> </ul> <p>Take a look and downlaod the document by clicking on the button below:</p> <p><a href="https://isquared.digital/assets/pdfs/100_page_chat_gpt_generated_python_tutorial.pdf" target="_blank" class="btn btn--primary .btn--small">Downlaod Document</a></p> <p>All the resources can be found in this <a href="https://github.com/IlievskiV/the-100-page-chat-gpt-generated-python-tutorial" target="_blank">GitHub Repository</a>. If this is something you like and would like to see similar content you could follow me on <a href="https://www.linkedin.com/in/vilievski/" target="_blank" rel="noopener">LinkedIn</a> or <a href="https://twitter.com/VladOsaurus" target="_blank" rel="noopener">Twitter</a>. Additionally, you can subscribe to the mailing list below to get similar updates from time to time.</p> <link href="//cdn-images.mailchimp.com/embedcode/horizontal-slim-10_7.css" rel="stylesheet" type="text/css" /> <link href="/assets/css/mailchimp.css" /> <div id="mc_embed_signup"> <form action="https://digital.us19.list-manage.com/subscribe/post?u=cb9dbe40387c27177a25de80f&amp;id=08bda6f8e0" method="post" id="mc-embedded-subscribe-form" name="mc-embedded-subscribe-form" class="validate" target="_blank" novalidate=""> <div id="mc_embed_signup_scroll"> <label for="mce-EMAIL">Join the iSquared mailing list</label> <input type="email" value="" name="EMAIL" class="email" id="mce-EMAIL" placeholder="email address" required="" /> <!-- real people should not fill this in and expect good things - do not remove this or risk form bot signups--> <div style="position: absolute; left: -5000px;" aria-hidden="true"><input type="text" name="b_cb9dbe40387c27177a25de80f_08bda6f8e0" tabindex="-1" value="" /></div> <div class="clear"><input type="submit" value="Subscribe" name="subscribe" id="mc-embedded-subscribe" class="button" /></div> </div> </form> </div> <p><br /></p>Vladimir IlievskiHow I used ChatGPT to generate a tutorial for Python beginnersTrack the CO2 emissions of your Python code the same way you time it. Here is how!2023-01-17T10:00:00+01:002023-01-17T10:00:00+01:00https://isquared.digital/blog/code-carbon<blockquote> <p>Nothing exists until it is measured</p> <p>– <cite>Niels Bohr</cite></p> </blockquote> <p>Just think of the following absurd: we live in a digital era where almost everything could be measured and tracked, yet we are struggling to reliably measure the carbon footprint behind the AI computing. There’s no need to say that this becomes <a href="https://hai.stanford.edu/news/ais-carbon-footprint-problem" target="_blank" rel="nofollow noopener">especially important</a> with the rising amount of computation.</p> <p>In the AI community there is already an ongoing effort to encourage responsible research and start measuring the environmental impact. For instance, one of the most eminent conferences <strong>NeurIPS</strong> is <a href="https://neurips.cc/public/guides/PaperChecklist" target="_blank">encouriging the researchers</a> to report the CO2 emissions of their research. The <a href="https://arxiv.org/pdf/1907.10597.pdf" target="_blank">Green AI</a> initiative is calling for measures of efficiency in order to boost the innovations in AI without skyrocketing the computational costs.</p> <p>To make this possible, at least there are couple of existing open-source solutions to track the AI computing CO2 emissions, even though they are not on par with the developments in the AI. Nevertheless, one of these initiatives is <a href="https://codecarbon.io/" target="_blank" rel="noopener">Code Carbon</a>. It is build on the same premise as the quote from Niels Bohr above: the AI computing CO2 emissions will be hidden until we discover them by measuring.</p> <p>In this blog post we will take a look at the <strong>CodeCarbon</strong> Python library and its importance in the mission to track the AI carbon footprint. Finally, we will experiment a bit and make a demonstration of training a toy neural network in Keras on the IMDb sentiment analysis dataset in order to track the CO2 emissions.</p> <h1 id="what-is-codecarbon">What is CodeCarbon?</h1> <p><a href="https://codecarbon.io/" target="_blank" rel="noopener">Code Carbon</a> is an initiative with the aim to finally start tracking and reporting the AI computing CO2 emissions. It is a lightweight open-source Python library that lets you track the CO2 emissions produced by running your code.</p> <p>To achieve this, it executes the following two tasks:</p> <ol> <li>Tracks the electricity consumption of the machine on which the code is executed. This is measured in kilowatts (<code class="language-plaintext highlighter-rouge">kWh</code>).</li> <li>Estimates the CO2 emissions per <code class="language-plaintext highlighter-rouge">kWh</code> of the electricity in the same geolocation where the machine resides.</li> </ol> <p>The first task is less prone to errors as the environment is predictable. <strong>CodeCarbon</strong> is measuring the energy consumption of the <em>CPU</em>, <em>GPU</em> (if available) and the <em>RAM</em> memory by taking samples every 15 seconds by default.</p> <p>There are a multitude of tools to precisely measure the energy consumption of the <em>CPUs</em>. This is an <a href="https://luiscruz.github.io/2021/07/20/measuring-energy.html" target="_blank">excelent blog</a> that goes over many of them. Currently, <strong>CodeCarbon</strong> is using either <a href="https://www.intel.com/content/www/us/en/developer/articles/tool/power-gadget.html" target="_blank" rel="nofollow noopener">Intel Power Gadget</a> or <a href="https://01.org/blogs/2014/running-average-power-limit-%E2%80%93-rapl" target="_blank" rel="nofollow noopener">Intel REPL</a>. If none of these energy profilers is available it falls back to handcrafted techniques: using the <em>CPU</em> load to estimate the <em>CPU</em> power.</p> <p>For the <em>GPUs</em> it uses the well established <a href="https://github.com/gpuopenanalytics/pynvml" target="_blank" rel="nofollow noopener">PyNvml</a> Python library. To track the <em>RAM</em> memory energy consumption it uses only handcrafted rules.</p> <p>The second task - to estimate the CO2 emissions of the electricity – is far more trickier. To estimate the CO2 emissions, <strong>CodeCarbon</strong> is calculating the <em>carbon intensity</em> of the electricity: a weighted average of the energy sources emissions in the current grid. Ideally this should be dynamically computed, but in spite of that it is already a good approximation.</p> <p>To compute the <em>carbon intensity</em>, the library relies on the <a href="https://www.co2signal.com/" target="_blank" rel="nofollow noopener">CO2 Signal API</a>. This API gives the sources of energy in the region where the computation is taking place. For cloud-based computing, this is even more relevant because there is a precise geo-location and sources of energy that the computing center is using.</p> <p>Finally, using the hardware energy consumption and the CO2 emissions of the electricity we calculate the total carbon footprint as a simple multiplication between them.</p> <h1 id="how-we-can-use-codecarbon">How we can use CodeCarbon?</h1> <p>Now we are ready to demonstrate how we can track the CO2 emissions of a toy neural network training process. Let’s dive in.</p> <p>As a first step we load the <a href="https://keras.io/api/datasets/imdb/" target="_blank" rel="nofollow noopener">IMDb sentiment analysis dataset</a>. It contains two classes: positive and negative sentiment, meaning this is a straightforward binary classification task. To load this dataset is fairly easy, as it is part of the Keras built-in datasets:</p> <figure class="highlight"><pre><code class="language-python" data-lang="python"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1 2 3 4 5 6 7 8 9 </pre></td><td class="code"><pre><span class="kn">from</span> <span class="nn">keras.datasets</span> <span class="kn">import</span> <span class="n">imdb</span> <span class="kn">from</span> <span class="nn">keras.utils</span> <span class="kn">import</span> <span class="n">pad_sequences</span> <span class="n">max_features</span> <span class="o">=</span> <span class="mi">50000</span> <span class="c1"># vocabulary size </span><span class="n">maxlen</span> <span class="o">=</span> <span class="mi">512</span> <span class="c1"># The length of every input sequence </span> <span class="p">(</span><span class="n">x_train</span><span class="p">,</span> <span class="n">y_train</span><span class="p">),</span> <span class="p">(</span><span class="n">x_test</span><span class="p">,</span> <span class="n">y_test</span><span class="p">)</span> <span class="o">=</span> <span class="n">imdb</span><span class="p">.</span><span class="n">load_data</span><span class="p">(</span><span class="n">num_words</span><span class="o">=</span><span class="n">max_features</span><span class="p">)</span> <span class="n">x_train</span> <span class="o">=</span> <span class="n">pad_sequences</span><span class="p">(</span><span class="n">x_train</span><span class="p">,</span> <span class="n">maxlen</span><span class="o">=</span><span class="n">maxlen</span><span class="p">)</span> <span class="n">x_test</span> <span class="o">=</span> <span class="n">pad_sequences</span><span class="p">(</span><span class="n">x_test</span><span class="p">,</span> <span class="n">maxlen</span><span class="o">=</span><span class="n">maxlen</span><span class="p">)</span> </pre></td></tr></tbody></table></code></pre></figure> <p>As a second step we build a simple neural network typical for a text classification task. It includes an embedding layer followed by 1D convolution and bi-direcional LSTM layer in order to finish with a single neuron predicting the sentiment.</p> <figure class="highlight"><pre><code class="language-python" data-lang="python"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 </pre></td><td class="code"><pre><span class="kn">from</span> <span class="nn">keras.layers</span> <span class="kn">import</span> <span class="n">Dense</span><span class="p">,</span> <span class="n">Dropout</span><span class="p">,</span> <span class="n">Activation</span> <span class="kn">from</span> <span class="nn">keras.layers</span> <span class="kn">import</span> <span class="n">Embedding</span> <span class="kn">from</span> <span class="nn">keras.layers</span> <span class="kn">import</span> <span class="n">LSTM</span><span class="p">,</span> <span class="n">Bidirectional</span> <span class="kn">from</span> <span class="nn">keras.layers</span> <span class="kn">import</span> <span class="n">Conv1D</span><span class="p">,</span> <span class="n">MaxPooling1D</span> <span class="kn">from</span> <span class="nn">keras.metrics</span> <span class="kn">import</span> <span class="n">BinaryAccuracy</span><span class="p">,</span> <span class="n">Precision</span><span class="p">,</span> <span class="n">Recall</span> <span class="kn">from</span> <span class="nn">keras.models</span> <span class="kn">import</span> <span class="n">Sequential</span> <span class="k">def</span> <span class="nf">make_model</span><span class="p">():</span> <span class="n">model</span> <span class="o">=</span> <span class="n">Sequential</span><span class="p">([</span> <span class="n">Embedding</span><span class="p">(</span><span class="mi">50000</span><span class="p">,</span> <span class="mi">256</span><span class="p">,</span> <span class="n">input_length</span><span class="o">=</span><span class="mi">512</span><span class="p">),</span> <span class="n">Dropout</span><span class="p">(</span><span class="mf">0.1</span><span class="p">),</span> <span class="n">Conv1D</span><span class="p">(</span><span class="mi">128</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="n">padding</span><span class="o">=</span><span class="s">'valid'</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s">'relu'</span><span class="p">),</span> <span class="n">MaxPooling1D</span><span class="p">(</span><span class="n">pool_size</span><span class="o">=</span><span class="mi">4</span><span class="p">),</span> <span class="n">Bidirectional</span><span class="p">(</span><span class="n">LSTM</span><span class="p">(</span><span class="mi">64</span><span class="p">),</span> <span class="n">merge_mode</span><span class="o">=</span><span class="s">'ave'</span><span class="p">),</span> <span class="n">Dense</span><span class="p">(</span><span class="mi">1</span><span class="p">),</span> <span class="n">Activation</span><span class="p">(</span><span class="s">'sigmoid'</span><span class="p">),</span> <span class="p">])</span> <span class="n">model</span><span class="p">.</span><span class="nb">compile</span><span class="p">(</span> <span class="n">optimizer</span><span class="o">=</span><span class="s">'adam'</span><span class="p">,</span> <span class="n">loss</span><span class="o">=</span><span class="s">'binary_crossentropy'</span><span class="p">,</span> <span class="n">metrics</span><span class="o">=</span><span class="p">[</span> <span class="n">Precision</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s">'precision'</span><span class="p">),</span> <span class="n">Recall</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s">'recall'</span><span class="p">),</span> <span class="p">]</span> <span class="p">)</span> <span class="k">return</span> <span class="n">model</span> </pre></td></tr></tbody></table></code></pre></figure> <p>Finally we define the training procedure specifying the batch size and the number of epochs.</p> <figure class="highlight"><pre><code class="language-python" data-lang="python"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 </pre></td><td class="code"><pre><span class="k">def</span> <span class="nf">train_model</span><span class="p">(</span><span class="n">model</span><span class="p">):</span> <span class="n">h</span> <span class="o">=</span> <span class="n">model</span><span class="p">.</span><span class="n">fit</span><span class="p">(</span> <span class="n">x_train</span><span class="p">,</span> <span class="n">y_train</span><span class="p">,</span> <span class="n">batch_size</span><span class="o">=</span><span class="mi">32</span><span class="p">,</span> <span class="n">epochs</span><span class="o">=</span><span class="mi">50</span><span class="p">,</span> <span class="n">verbose</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="p">)</span> <span class="n">test_metrics</span> <span class="o">=</span> <span class="n">model</span><span class="p">.</span><span class="n">evaluate</span><span class="p">(</span> <span class="n">x</span><span class="o">=</span><span class="n">x_test</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="n">y_test</span><span class="p">,</span> <span class="n">batch_size</span><span class="o">=</span><span class="mi">32</span><span class="p">,</span> <span class="p">)</span> <span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"test loss: </span><span class="si">{</span><span class="n">test_metrics</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="si">}</span><span class="s">"</span><span class="p">)</span> <span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"test precision: </span><span class="si">{</span><span class="n">test_metrics</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="si">}</span><span class="s">"</span><span class="p">)</span> <span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"test recall: </span><span class="si">{</span><span class="n">test_metrics</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span><span class="si">}</span><span class="s">"</span><span class="p">)</span> </pre></td></tr></tbody></table></code></pre></figure> <p>And now it is time to train this toy neural network and track the CO2 emissions. Using <strong>CodeCarbon</strong> this is such a simple task, it’s same as we were measuring the elapsed training time. All we have to do it to instantiate a <code class="language-plaintext highlighter-rouge">EmissionsTracker</code> object ans squeeze the training procedure between the <code class="language-plaintext highlighter-rouge">start</code> and <code class="language-plaintext highlighter-rouge">stop</code> methods. <strong>CodeCarbon</strong> will take care of the rest as shown below:</p> <figure class="highlight"><pre><code class="language-python" data-lang="python"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1 2 3 4 5 6 </pre></td><td class="code"><pre><span class="kn">from</span> <span class="nn">codecarbon</span> <span class="kn">import</span> <span class="n">EmissionsTracker</span> <span class="n">tracker</span> <span class="o">=</span> <span class="n">EmissionsTracker</span><span class="p">(</span><span class="n">project_name</span><span class="o">=</span><span class="s">"imbd_sentiment_classification"</span><span class="p">)</span> <span class="n">tracker</span><span class="p">.</span><span class="n">start</span><span class="p">()</span> <span class="n">train_model</span><span class="p">(</span><span class="n">make_model</span><span class="p">())</span> <span class="n">tracker</span><span class="p">.</span><span class="n">stop</span><span class="p">()</span> </pre></td></tr></tbody></table></code></pre></figure> <p>Indeed, <strong>CodeCarbon</strong> tracked and logged many aspects of the training process. The summary of every run is saved as one row in a file named <code class="language-plaintext highlighter-rouge">emissions.csv</code> by default. My general opinion is that it lacks better techniques to track the CPU consumption.</p> <p>The library also comes with a command line tool named <code class="language-plaintext highlighter-rouge">carbonboard</code> that produces a dashboard showing equivalents of the carbon emission produced by the experiment. An example for the experiment we did above is shown below:</p> <center> <img data-src="https://isquared.digital/assets/images/co2_equivalents_dashboard.png" class="lazyload" alt="Dashboard showing the CO2 equivalents" /> <br /> <span class="caption text-muted"> <i>Fig. 1:</i> Dashboard showing CO2 equivalents </span> </center> <p><br /></p> <p>The source code for the implementation can be found on <a href="https://github.com/IlievskiV/Amusive-Blogging-N-Coding/blob/master/Carbon%20Footprint/codecarbon_experiments.ipynb" target="_blank">GitHub</a>. If this is something you like and would like to see similar content you could follow me on <a href="https://www.linkedin.com/in/vilievski/" target="_blank" rel="noopener">LinkedIn</a> or <a href="https://twitter.com/VladOsaurus" target="_blank" rel="noopener">Twitter</a>. Additionally, you can subscribe to the mailing list below to get similar updates from time to time.</p> <link href="//cdn-images.mailchimp.com/embedcode/horizontal-slim-10_7.css" rel="stylesheet" type="text/css" /> <link href="/assets/css/mailchimp.css" /> <div id="mc_embed_signup"> <form action="https://digital.us19.list-manage.com/subscribe/post?u=cb9dbe40387c27177a25de80f&amp;id=08bda6f8e0" method="post" id="mc-embedded-subscribe-form" name="mc-embedded-subscribe-form" class="validate" target="_blank" novalidate=""> <div id="mc_embed_signup_scroll"> <label for="mce-EMAIL">Join the iSquared mailing list</label> <input type="email" value="" name="EMAIL" class="email" id="mce-EMAIL" placeholder="email address" required="" /> <!-- real people should not fill this in and expect good things - do not remove this or risk form bot signups--> <div style="position: absolute; left: -5000px;" aria-hidden="true"><input type="text" name="b_cb9dbe40387c27177a25de80f_08bda6f8e0" tabindex="-1" value="" /></div> <div class="clear"><input type="submit" value="Subscribe" name="subscribe" id="mc-embedded-subscribe" class="button" /></div> </div> </form> </div> <p><br /></p> <h1 id="appendix-other-initiatives-to-track-the-co2-emissions">Appendix: other initiatives to track the CO2 emissions</h1> <p><strong>CodeCarbon</strong> is not the only movement to help tracking the CO2 emissions. There are at least two other with the same goal:</p> <ul> <li> <p><a href="https://github.com/Breakend/experiment-impact-tracker" target="_blank">Experiment Impact Tracker</a>: similar to <strong>CodeCarbon</strong> a simple drop-in method to track energy usage, carbon emissions, and compute utilization of the underlying system</p> </li> <li> <p><a href="https://mlco2.github.io/impact/" target="_blank">ML CO2 Impact Calculator</a>: a simple web interface that let’s you calculate the CO2 emissions yourself.</p> </li> </ul>Vladimir IlievskiEstimating your Python code CO2 emissions was never easier using tools like CodeCarbonNeural Networks Hyperparameter Search, the Visualized Way2021-12-19T10:00:00+01:002021-12-19T10:00:00+01:00https://isquared.digital/blog/hyperparam-search<p> In <b>Machine Learning (ML)</b> out-of-the-shelf models are not always available. In many instances, we need to train a model on a specific task. But as in every optimization problem, <a href="https://en.wikipedia.org/wiki/There_ain%27t_no_such_thing_as_a_free_lunchs" target="_blank" rel="noopener">"there ain't no such thing as a free lunch"</a>. Thus, we have to find the model that performs well on our task. </p> <p> The ML models, especially the Neural Networks, are characterized by their set of <b>hyperparameters</b> that control the learning process. For this reason, the <b>performance</b> of an ML model heavily depends on the hyperparameter values. One set of values may result in better performance than another set. This search of hyperparameter values is known as hyperparameter optimization. </p> <p> In this blog we will see how to easily keep track of the model's performance depending on the hyperparameter values in a visualized way. First, we will build a simple neural network using <a href="https://keras.io/" target="_blank" rel="noopener nofollow">Keras</a>. We will train this network on a sentiment analysis task for many combinations of the hyperparameters. </p> <p> Finally, we will see how to use the <a href="https://facebookresearch.github.io/hiplot/index.html" target="_blank" rel="noopener nofollow">HiPlot</a> library to build an interactive visualization and search for optimal values. Stay tuned! </p> <h1>Just Another Keras Model</h1> <p> For demonstration purposes, we build a simple model in <a href="https://keras.io/" target="_blank" rel="noopener nofollow">Keras</a> trained on the <a href="https://keras.io/api/datasets/imdb/" target="_blank" rel="noopener nofollow">IMDB Sentiment Analysis Dataset</a>. </p> <h2>Loading the Data</h2> <p> The <a href="https://keras.io/api/datasets/" target="_blank" rel="noopener nofollow">Keras Dataset</a> module provides a few preprocessed and vectorized datasets ready to use. The <i>IMDB Sentiment Analysis Dataset</i> contains already processed and tokenized sentences (each word has a unique ID) coupled with a label, either 1 indicating a positive sentiment or 0 for negative sentiment. To load it, we use the following Python code: </p> <div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;border-width:.1em .1em .1em .8em;padding:.2em .6em;"><table><tr><td style="border-bottom: none;"><pre style="margin: 0; line-height: 125%">1 2 3 4 5 6 7 8 9 10 11</pre></td><td style="border-bottom: none;"><pre style="margin: 0; line-height: 125%"><span style="color: #008800; font-weight: bold">from</span> <span style="color: #bb0066; font-weight: bold">keras.datasets</span> <span style="color: #008800; font-weight: bold">import</span> imdb <span style="color: #008800; font-weight: bold">from</span> <span style="color: #bb0066; font-weight: bold">keras.preprocessing</span> <span style="color: #008800; font-weight: bold">import</span> sequence max_features = <span style="color: #0000DD; font-weight: bold">20000</span> <span style="color: #888888"># vocabulary size</span> maxlen = <span style="color: #0000DD; font-weight: bold">100</span> <span style="color: #888888"># max length of every input sequence</span> (x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features) x_train, y_train = x_train[:<span style="color: #0000DD; font-weight: bold">2500</span>], y_train[:<span style="color: #0000DD; font-weight: bold">2500</span>] x_test, y_test = x_test[:<span style="color: #0000DD; font-weight: bold">1000</span>], y_test[:<span style="color: #0000DD; font-weight: bold">1000</span>] x_train = sequence.pad_sequences(x_train, maxlen=maxlen) x_test = sequence.pad_sequences(x_test, maxlen=maxlen) </pre></td></tr></table></div> <h2>Building the Model</h2> <p> The machine learning model we build is a typical Neural Network architecture used in many text classification tasks. It includes the following layers: </p> <ul> <li><a href="https://keras.io/api/layers/core_layers/embedding/" target="_blank" rel="noopener nofollow">Embedding layer</a> with hyperparameter <b><i>embedding_dim</i></b> indicating the dimensionality of the resulting embeddings;</li> <li><a href="https://keras.io/api/layers/regularization_layers/dropout/" target="_blank" rel="noopener nofollow">Dropout layer</a> with hyperparameter <b><i>dropout</i></b> indicating the dropout rate;</li> <li><a href="https://keras.io/api/layers/convolution_layers/convolution1d/" target="_blank" rel="noopener nofollow">1D Convolution</a> with hyperparameters <b><i>filters</i></b> and <b><i>kernel_size</i></b> defining the number of output channels and the width of the 1D kernel respectively;</li> <li><a href="https://keras.io/api/layers/recurrent_layers/lstm/" target="_blank" rel="noopener nofollow">bi-LSTM</a> layer with hyperparameter <b><i>lstm_output_size</i></b> for the dimensionality of the output and</li> <li><a href="https://keras.io/api/layers/core_layers/dense/" target="_blank" rel="noopener nofollow">Dense</a> layer with only one output and sigmoid activation.</li> </ul> <p> The following Python snippet demonstrates what we just described above: </p> <div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;border-width:.1em .1em .1em .8em;padding:.2em .6em;"><table><tr><td style="border-bottom: none;"><pre style="margin: 0; line-height: 125%">1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30</pre></td><td style="border-bottom: none;"><pre style="margin: 0; line-height: 125%"><span style="color: #008800; font-weight: bold">from</span> <span style="color: #bb0066; font-weight: bold">keras.models</span> <span style="color: #008800; font-weight: bold">import</span> Sequential <span style="color: #008800; font-weight: bold">from</span> <span style="color: #bb0066; font-weight: bold">keras.layers</span> <span style="color: #008800; font-weight: bold">import</span> Activation, Bidirectional, Conv1D, Dense <span style="color: #008800; font-weight: bold">from</span> <span style="color: #bb0066; font-weight: bold">keras.layers</span> <span style="color: #008800; font-weight: bold">import</span> Dropout, Embedding, LSTM, MaxPooling1D <span style="color: #008800; font-weight: bold">def</span> <span style="color: #0066bb; font-weight: bold">make_model</span>( embedding_dim: <span style="color: #003388">int</span>, dropout: <span style="color: #003388">float</span>, filters: <span style="color: #003388">int</span>, kernel_size: <span style="color: #003388">int</span>, pool_size: <span style="color: #003388">int</span>, lstm_output_size: <span style="color: #003388">int</span>, metrics: <span style="color: #003388">list</span>, vocab_size: <span style="color: #003388">int</span>, maxlen: <span style="color: #003388">int</span>, ): model = Sequential( [ Embedding(vocab_size, embedding_dim, input_length=maxlen), Dropout(dropout), Conv1D(filters, kernel_size, padding=<span style="color: #dd2200; background-color: #fff0f0">&quot;valid&quot;</span>, activation=<span style="color: #dd2200; background-color: #fff0f0">&quot;relu&quot;</span>), MaxPooling1D(pool_size=pool_size), Bidirectional(LSTM(lstm_output_size), merge_mode=<span style="color: #dd2200; background-color: #fff0f0">&quot;ave&quot;</span>), Dense(<span style="color: #0000DD; font-weight: bold">1</span>), Activation(<span style="color: #dd2200; background-color: #fff0f0">&quot;sigmoid&quot;</span>), ] ) model.compile(optimizer=<span style="color: #dd2200; background-color: #fff0f0">&quot;adam&quot;</span>, loss=<span style="color: #dd2200; background-color: #fff0f0">&quot;binary_crossentropy&quot;</span>, metrics=metrics) <span style="color: #008800; font-weight: bold">return</span> model </pre></td></tr></table></div> <h2>Hyperparameter Search</h2> <p> To measure the impact of the hyperparameters we must define a set of <b>performance metrics</b>. By default, we track the training and validation loss, which in this case is the binary <a href="https://en.wikipedia.org/wiki/Cross_entropy" target="_blank" rel="noopener nofollow">cross-entropy</a>. On top of this, we will trace the <i>accuracy</i>, <i>precision</i>, and <i>recall</i>. In general, it is useful to benchmark the model on multiple metrics. Depending on the use case we might prioritize one over another and at the same time observe the dependency between them. </p> <div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;border-width:.1em .1em .1em .8em;padding:.2em .6em;"><table><tr><td style="border-bottom: none;"><pre style="margin: 0; line-height: 125%">1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16</pre></td><td style="border-bottom: none;"><pre style="margin: 0; line-height: 125%"><span style="color: #008800; font-weight: bold">from</span> <span style="color: #bb0066; font-weight: bold">keras.metrics</span> <span style="color: #008800; font-weight: bold">import</span> BinaryAccuracy, Precision, Recall METRICS = [ BinaryAccuracy(name=<span style="color: #dd2200; background-color: #fff0f0">&#39;accuracy&#39;</span>), Precision(name=<span style="color: #dd2200; background-color: #fff0f0">&#39;precision&#39;</span>), Recall(name=<span style="color: #dd2200; background-color: #fff0f0">&#39;recall&#39;</span>), ] <span style="color: #888888"># metrics to track</span> <span style="color: #888888"># hyperparameters to track</span> embedding_size = [<span style="color: #0000DD; font-weight: bold">32</span>, <span style="color: #0000DD; font-weight: bold">128</span>] dropout = [<span style="color: #0000DD; font-weight: bold">0.01</span>, <span style="color: #0000DD; font-weight: bold">0.1</span>] filters = [<span style="color: #0000DD; font-weight: bold">16</span>, <span style="color: #0000DD; font-weight: bold">32</span>, <span style="color: #0000DD; font-weight: bold">64</span>] kernel_size = [<span style="color: #0000DD; font-weight: bold">3</span>, <span style="color: #0000DD; font-weight: bold">5</span>, <span style="color: #0000DD; font-weight: bold">7</span>] pool_size = [<span style="color: #0000DD; font-weight: bold">2</span>, <span style="color: #0000DD; font-weight: bold">4</span>] lstm_output_size = [<span style="color: #0000DD; font-weight: bold">16</span>, <span style="color: #0000DD; font-weight: bold">64</span>] batch_size = [<span style="color: #0000DD; font-weight: bold">8</span>, <span style="color: #0000DD; font-weight: bold">16</span>, <span style="color: #0000DD; font-weight: bold">32</span>] </pre></td></tr></table></div> <br/> <p> Once we have defined the hyperparameters to track coupled with the performance metrics, we can start the hyperparameter search by plugging-in various combinations of values. In this sense, we create a hypergrid from the hyperparameter values. For each point on this hypergrid, we train and evaluate the model. We can think of this as one <strong>experiment</strong>, which is usually the case. </p> <p> As we run the experiments, we log the model performance as a function of the hyperparameter values as one row in some external database or file. This is illustrated with the following snippet: </p> <div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;border-width:.1em .1em .1em .8em;padding:.2em .6em;"><table><tr><td style="border-bottom: none;"><pre style="margin: 0; line-height: 125%">1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32</pre></td><td style="border-bottom: none;"><pre style="margin: 0; line-height: 125%"><span style="color: #008800; font-weight: bold">import</span> <span style="color: #bb0066; font-weight: bold">itertools</span> epochs = <span style="color: #0000DD; font-weight: bold">3</span> <span style="color: #888888"># number of training epochs</span> test_batch_size = <span style="color: #0000DD; font-weight: bold">32</span> <span style="color: #888888"># batch size for testing</span> arrays = [ embedding_size, dropout, filters, kernel_size, pool_size, lstm_output_size, batch_size, ] <span style="color: #888888"># all hyper-params</span> <span style="color: #008800; font-weight: bold">for</span> ed, d, flt, ks, ps, ls, bs <span style="color: #008800">in</span> itertools.product(*arrays): model = make_model( embedding_dim=ed, dropout=d, filters=flt, kernel_size=ks, pool_size=ps, lstm_output_size=ls, metrics=METRICS, vocab_size=max_features, maxlen=maxlen, ) h = model.fit(x_train, y_train, batch_size=bs, epochs=epochs, verbose=<span style="color: #0000DD; font-weight: bold">2</span>) train_loss = h.history[<span style="color: #dd2200; background-color: #fff0f0">&quot;loss&quot;</span>][-<span style="color: #0000DD; font-weight: bold">1</span>] test_metrics = model.evaluate(x=x_test, y=y_test, batch_size=test_batch_size) test_loss, test_acc, test_prec, test_rec = test_metrics <span style="color: #888888"># write everything to external JSON file</span> </pre></td></tr></table></div> <br/> <p> Now that we have generated metadata for our experiments, we have to make it actionable. </p> <h1>Visualize the Hyperparameters Impact</h1> <p> Data in raw format is difficult, sometimes impossible to interpret. This especially holds for multivariate data! </p> <p> We can easily resolve this by using the <a href="/blog/2020-02-08-interactive-dataviz/" target="_blank" rel="dofollow">parallel coordinates plot</a>. With this type of plot, the data dimensions (a.k.a. features) are represented by parallel axes, one per dimension. Thus, each multivariate point is manifested as a poly-line connecting the corresponding dimensions. At the same time, this plot encodes the correlation between the data dimensions: line crossings indicate inverse correlation. One example of a parallel coordinates plot is shown below: </p> <center> <picture> <source data-srcset="https://isquared.digital/assets/images/regular/parallel_coordinates_example_regular.webp" media="(min-width: 1281px)" type="image/webp"/> <source data-srcset="https://isquared.digital/assets/images/regular/parallel_coordinates_example_regular.png" media="(min-width: 1281px)" type="image/png"/> <source data-srcset="https://isquared.digital/assets/images/tablet/parallel_coordinates_example_tablet.webp" media="(min-width: 641px) and (max-width: 1280px) and (orientation: landscape)" type="image/webp"/> <source data-srcset="https://isquared.digital/assets/images/tablet/parallel_coordinates_example_tablet.png" media="(min-width: 641px) and (max-width: 1280px) and (orientation: landscape)" type="image/png"/> <source data-srcset="https://isquared.digital/assets/images/tablet/parallel_coordinates_example_tablet.webp" media="(min-width: 641px) and (max-width: 1280px) and (orientation: portrait)" type="image/webp"/> <source data-srcset="https://isquared.digital/assets/images/tablet/parallel_coordinates_example_tablet.png" media="(min-width: 641px) and (max-width: 1280px) and (orientation: portrait)" type="image/png"/> <source data-srcset="https://isquared.digital/assets/images/mobile/parallel_coordinates_example_mobile.webp" media="(max-width: 640px)" type="image/webp"/> <source data-srcset="https://isquared.digital/assets/images/mobile/parallel_coordinates_example_mobile.png" media="(max-width: 640px)" type="image/png"/> <img data-src="https://isquared.digital/assets/images/regular/parallel_coordinates_example_regular.png" class="lazyload" alt="Parallel Coordinates Example plot"> </picture> <span class="caption text-muted"><i>Figure 1. Credits: A <a href="https://bl.ocks.org/jasondavies/raw/1341281/" target="_blank" rel=”noopener”>Parallel Coordinates</a> plot from <a href="https://bl.ocks.org/" target="_blank" rel=”noopener”>Blocks</a></i>. </span> </center> <br/> <p> For seamless creation of interactive <i>parallel coordinates plots</i>, we can use <a href="https://facebookresearch.github.io/hiplot/index.html" target="_blank" rel="noopener nofollow">HiPlot</a>, an open-source Python library. Given the data that follows a consistent and predefined schema, it automatically generates <i>parallel coordinates plot.</i> The plot can be easily integrated into a <a href="https://jupyter.org/" target="_blank" rel="noopener nofollow">Jypyter Notebook</a>, as a standalone HTML file, or directly in a <a href="https://streamlit.io/" target="_blank" rel="noopener nofollow">Streamlit</a> app. </p> <p> In our case, we have generated metadata for all experiments and we have to make it actionable. Ultimately, we want to see a summary of the hyperparameters' influence on the model performance and look at what suits our case. </p> <p> By just running two lines of code such as: </p> <div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;border-width:.1em .1em .1em .8em;padding:.2em .6em;"><table><tr><td style="border-bottom: none;"><pre style="margin: 0; line-height: 125%">1 2</pre></td><td style="border-bottom: none;"><pre style="margin: 0; line-height: 125%"><span style="color: #008800; font-weight: bold">import</span> <span style="color: #bb0066; font-weight: bold">hiplot</span> <span style="color: #008800; font-weight: bold">as</span> <span style="color: #bb0066; font-weight: bold">hip</span> hip.Experiment.from_iterable(hiplt_data).display() </pre></td></tr></table></div> <br/> <p> we obtain this nice interactive plot as depicted below. Go on, give it a try and see how it works! </p> <iframe src="https://isquared.digital/assets/html/hiplot.html?hip.color_by=%22test_prec%22&amp;hip.PARALLEL_PLOT.height=350" height="860px" style="width:100%;"> </iframe> <br/> <p> The benefit of using this interactive plot is that we have a clear overview of all experiments which can be additionally indexed with a unique ID. </p> <p> In many practical situations, for model reproducibility reasons it is advisable to assign traceable IDs to the experiments, meaning we can always roll back and reproduce the same results. For example, if the model and the hyperparameter values are tracked with <i><a href="https://www.git-scm.com/book/en/v2/Getting-Started-What-is-Git%3F" target="_blank" rel="noopener nofollow">git</a></i>, as an experiment ID we can use the SHA code of the commit encapsulating the latest repository modifications before running the experiment. </p> <p> Obviously one disadvantage of this technique is the computational cost. Sometimes it is not affordable to run a plethora of experiments just to find the best hyperparameter values. However, over a longer time range, it is possible that the number of experiments will become significant. Therefore, in order not to lose any knowledge, it is still better to log the experiments and eventually visualize them with <i>HiPlot</i>. </p> <p> The source code for the implementation can be found on <a href="https://github.com/IlievskiV/Amusive-Blogging-N-Coding/blob/master/Hyperparameters%20Search/HiPlot_Tutorial.ipynb" target="_blank">GitHub</a>. If this is something you like and would like to see similar content you could follow me on <a href="https://www.linkedin.com/in/vilievski/" target="_blank" rel="noopener">LinkedIn</a> or <a href="https://twitter.com/VladOsaurus" target="_blank" rel="noopener">Twitter</a>. Additionally, you can subscribe to the mailing list below. </p> <link href="//cdn-images.mailchimp.com/embedcode/horizontal-slim-10_7.css" rel="stylesheet" type="text/css"> <link href="/assets/css/mailchimp.css"> <div id="mc_embed_signup"> <form action="https://digital.us19.list-manage.com/subscribe/post?u=cb9dbe40387c27177a25de80f&amp;id=08bda6f8e0" method="post" id="mc-embedded-subscribe-form" name="mc-embedded-subscribe-form" class="validate" target="_blank" novalidate> <div id="mc_embed_signup_scroll"> <label for="mce-EMAIL">Join the iSquared mailing list</label> <input type="email" value="" name="EMAIL" class="email" id="mce-EMAIL" placeholder="email address" required> <!-- real people should not fill this in and expect good things - do not remove this or risk form bot signups--> <div style="position: absolute; left: -5000px;" aria-hidden="true"><input type="text" name="b_cb9dbe40387c27177a25de80f_08bda6f8e0" tabindex="-1" value=""></div> <div class="clear"><input type="submit" value="Subscribe" name="subscribe" id="mc-embedded-subscribe" class="button"></div> </div> </form> </div> <h1>Summary</h1> <p> In this blog we learned how to make our machine learning experiments more useful with a visualization technique called <i>parallel coordinates plot.</i> </p> <p> We tracked and logged the performance of one simple Keras model depending on the hyperparameter values. Later we made this metadata actionable using <a href="https://facebookresearch.github.io/hiplot/index.html" target="_blank" rel="noopener nofollow">HiPlot</a>, an open-source Python library for creating interactive <i>parallel coordinates plots.</i> </p>Vladimir IlievskiTrack and visualize Machine Learning experiments using HiPlot Parallel Coordinates Plot in PythonSimple but Stunning: Animated Cellular Automata in Python2021-05-02T11:00:00+02:002021-05-02T11:00:00+02:00https://isquared.digital/blog/cellular-automata<script src="https://polyfill.io/v3/polyfill.min.js?features=es6"></script> <script id="MathJax-script" async src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script> <p> The automation epitomizes the last few decades of rapid technological development where many processes take place without human intervention. But what exactly does it mean? </p> <p> <b>Automaton</b> (noun): </p> <ul style="list-style-type: none;"> <li>: a mechanism that is relatively self-operating</li> <li>: a machine or control mechanism designed to follow a predetermined sequence of operations</li> </ul> <p> These are the two most common definitions of the word <b>automaton</b>, related to the word <b>automation</b>, <b>automatic</b> or <b>automatically</b>. </p> <p> In the very same sense, as you might already know, one <a href="https://en.wikipedia.org/wiki/Automaton" target="_blank" rel="noopener nofollow">automaton</a> is always in some precise and unequivocal state. It transitions to another precisely defined state given some instruction. We can visually represent any automata using a directed graph, where states are represented with nodes, while the transitions with edges. One very simple automaton is shown in the figure below: </p> <center> <img data-src="https://isquared.digital/assets/images/dfa_example.png" class="lazyload" alt="Graphical representation of an automaton"/> <br/> <span class="caption text-muted"> <i>Fig. 1:</i> Simple Automaton with two states. Taken from Wikipedia. </span> </center> <br/> <p> This is the fundamental principle of the entire computing machinery we use today. Starting from the very simple coffee machine at your home up to the very complex device on which you read this story. All these machines follow a set of instructions and transition from one state to another. </p> <p> Contrary to this deterministic definition of automata, what happens if we have some arbitrary program? What if we have a program that does not rely on receiving some precise set of instructions, but rather arbitrary ones depending on its current state. Something more closely related to the evolution of the living organisms in nature. </p> <p> I am not intending to reinvent the wheel here, many scientists had the same questions a long time ago. It is only to give the intuition and motivation behind the conception and definition of the <b>cellular automata</b>. </p> <p> The <a href="https://en.wikipedia.org/wiki/Cellular_automaton" target="_blank" rel="nofollow noopener">cellular automaton</a> relies on and is motivated by the principles of the living organisms in nature. It consists of a grid of cells each having some state. Every cell advances in time according to some mathematical function of the neighboring cells. Thus, the entire automata behave like a living organism. </p> <p> By the end of this blog post, you will be able to understand and see the potential of cellular automata. First, we'll delve into the formal definition and elaborate more on some particular cases. Then to complete the puzzle, we will see how to implement some interesting cellular automata in Python complemented with interesting animated visualizations using Matplotlib. Stay tuned! </p> <h2>Definition of Cellular Automaton</h2> <p> A cellular automaton is fully defined by three key elements: </p> <ol> <li>Collection of <i>“colored”</i> cells each having a state;</li> <li>The cells are distributed on a grid with a predefined shape and</li> <li>The cells update their state according to a rule-based on the neighboring cells.</li> </ol> <p> For instance, we can take the most simple case when the state of the cell is either on or off, or in other words either 1 or 0. </p> <p> The grid in general is represented as a rectangle with a shape of $$M \times N$$, meaning it has M rows and N columns. Then, each cell is represented as a small rectangle on the grid. </p> <p> Knowing all of this, we can define what cells are considered neighbors of a given cell. For a squared grid, the two most common neighborhoods are: </p> <ol> <li> <a href="https://en.wikipedia.org/wiki/Moore_neighborhood" rel="nofollow noopener" target="_blank">Moore neighborhood</a> (squared neighborhood) as shown on the left of Fig. 2; </li> <li> <a href="https://en.wikipedia.org/wiki/Von_Neumann_neighborhood" rel="nofollow noopener" target="_blank">von Neumann neighborhood</a> (diamond-shaped) as shown on the right of Fig. 2. </li> </ol> <center> <img data-src="https://isquared.digital/assets/images/ca_neighborhoods.png" class="lazyload" alt="Types of neighborhoods for cellular automata"/> <br/> <span class="caption text-muted"> <i>Fig. 2:</i> Moore vs Von Neumann neighborhoods </span> </center> <br/> <p> Now, the update rule is simply a function of the current cell state and the states of the cells in its pre-defined neighborhood. The only missing piece of the puzzle is the initial state of the automaton, which might be deterministic or random. </p> <p> The initial condition, the 2-dimensional grid, the type of neighborhood, and the update rule give rise to infinitely many scenarios that are difficult to analyze. For this reason, we’ll focus only on the most simple case of cellular automata, namely the <b>elementary cellular automata</b>. </p> <h2>Elementary Cellular Automata</h2> <p> The elementary cellular automata are the simplest non-trivial automata. They are one-dimensional, meaning there is only one row of cells. Each cell is having two possible states (0 or 1), and its neighbors are the adjacent cells on either side of it as shown in the figure below: </p> <center> <img data-src="https://isquared.digital/assets/images/elementary_ca.png" class="lazyload" alt="Schema for the elementary cellular automata"/> <br/> <span class="caption text-muted"> <i>Fig. 3:</i> Elementary cellular automaton </span> </center> <br/> <p> Every cell with its two neighboring cells forms a patch of 3 cells, each of which can have 2 states: either 0 or 1. Plugging a simple variation with repetition gives us $$2^{3} = 8$$ possible options: 000, 001, 010, 011, 100, 101, 110, 111 (the numbers from 0 to 7 in binary format). </p> <p> Using these 8 options, we can decide whether the next state of the central cell will take a value of 0 or 1. This is equivalent to asking the question: in how many possible ways one can arrange 8 bits? Same as before, this gives us $$2^{8} = 256$$ options for an update rule. Following the same analogy, these are the numbers from 0 to 255 in binary format. For this reason, the rules are referred to by their ordering number. </p> <p> For example, since $$150_{2} = 10010110$$, the update rule number 150 is illustrated in the figure below: </p> <center> <img data-src="https://isquared.digital/assets/images/example_rule_150.png" class="lazyload" alt="Sketch for the update rule 150 in elementary cellular automata"/> <br/> <span class="caption text-muted"> <i>Fig. 4:</i> Update rule 150 </span> </center> <br/> <p> Next, we’ll see how to turn this into a Python implementation and a visualization using Matplotlib. </p> <h2>Python Implementation</h2> <p> The first thing to implement is the update rule. Given the state of each cell in the row at some time step <b>T</b> (denoted as <b>x</b>) and the update rule number, we need to derive the state of each cell in the next time step. To do so, we use the following code: </p> <div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;border-width:.1em .1em .1em .8em;padding:.2em .6em;"><table><tr><td style="border-bottom: none;"><pre style="margin: 0; line-height: 125%"> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20</pre></td><td style="border-bottom: none;"><pre style="margin: 0; line-height: 125%"><span style="color: #008800; font-weight: bold">import</span> <span style="color: #0e84b5; font-weight: bold">numpy</span> <span style="color: #008800; font-weight: bold">as</span> <span style="color: #0e84b5; font-weight: bold">np</span> powers_of_two <span style="color: #333333">=</span> np<span style="color: #333333">.</span>array([[<span style="color: #0000DD; font-weight: bold">4</span>], [<span style="color: #0000DD; font-weight: bold">2</span>], [<span style="color: #0000DD; font-weight: bold">1</span>]]) <span style="color: #888888"># shape (3, 1)</span> <span style="color: #008800; font-weight: bold">def</span> <span style="color: #0066BB; font-weight: bold">step</span>(x, rule_binary): <span style="color: #DD4422">&quot;&quot;&quot;Makes one step in the cellular automaton.</span> <span style="color: #DD4422"> Args:</span> <span style="color: #DD4422"> x (np.array): current state of the automaton</span> <span style="color: #DD4422"> rule_binary (np.array): the update rule</span> <span style="color: #DD4422"> Returns:</span> <span style="color: #DD4422"> np.array: updated state of the automaton</span> <span style="color: #DD4422"> &quot;&quot;&quot;</span> x_shift_right <span style="color: #333333">=</span> np<span style="color: #333333">.</span>roll(x, <span style="color: #0000DD; font-weight: bold">1</span>) <span style="color: #888888"># circular shift to right</span> x_shift_left <span style="color: #333333">=</span> np<span style="color: #333333">.</span>roll(x, <span style="color: #333333">-</span><span style="color: #0000DD; font-weight: bold">1</span>) <span style="color: #888888"># circular shift to left</span> y <span style="color: #333333">=</span> np<span style="color: #333333">.</span>vstack((x_shift_right, x, x_shift_left))<span style="color: #333333">.</span>astype(np<span style="color: #333333">.</span>int8) <span style="color: #888888"># stack row-wise, shape (3, cols)</span> z <span style="color: #333333">=</span> np<span style="color: #333333">.</span>sum(powers_of_two <span style="color: #333333">*</span> y, axis<span style="color: #333333">=</span><span style="color: #0000DD; font-weight: bold">0</span>)<span style="color: #333333">.</span>astype(np<span style="color: #333333">.</span>int8) <span style="color: #888888"># LCR pattern as number</span> <span style="color: #008800; font-weight: bold">return</span> rule_binary[<span style="color: #0000DD; font-weight: bold">7</span> <span style="color: #333333">-</span> z] </pre></td></tr></table></div> <br/> <p> First, we shift the state of each cell to the right, then to the left in a circular fashion. Next, we stack one upon the other the left shift, the current and the right shift of the cells’ states. This gives us a structure where each column has three elements with a value of either 0 or 1. It means that one column represents one number from 0 to 7 in a binary format. We use this value as an index in the update rule which determines the next state of the central cell. The entire procedure is sketched in the figure below: </p> <center> <img data-src="https://isquared.digital/assets/images/rule_150_code_explanation.png" class="lazyload" alt="Sketch to explain the code"/> <br/> <span class="caption text-muted"> <i>Fig. 5:</i> Explanation of the function <i>step</i> </span> </center> <br/> <p> Having the update rule implemented, the rest of the implementation is quite straightforward. We have to initialize the cellular automaton and then run it for a pre-defined number of time steps. The Python implementation is given below: </p> <div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;border-width:.1em .1em .1em .8em;padding:.2em .6em;"><table><tr><td style="border-bottom: none;"><pre style="margin: 0; line-height: 125%"> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43</pre></td><td style="border-bottom: none;"><pre style="margin: 0; line-height: 125%"><span style="color: #008800; font-weight: bold">import</span> <span style="color: #0e84b5; font-weight: bold">numpy</span> <span style="color: #008800; font-weight: bold">as</span> <span style="color: #0e84b5; font-weight: bold">np</span> <span style="color: #008800; font-weight: bold">def</span> <span style="color: #0066BB; font-weight: bold">cellular_automaton</span>(rule_number, size, steps, init_cond<span style="color: #333333">=</span><span style="background-color: #fff0f0">&#39;random&#39;</span>, impulse_pos<span style="color: #333333">=</span><span style="background-color: #fff0f0">&#39;center&#39;</span>): <span style="color: #DD4422">&quot;&quot;&quot;Generate the state of an elementary cellular automaton after a pre-determined</span> <span style="color: #DD4422"> number of steps starting from some random state.</span> <span style="color: #DD4422"> Args:</span> <span style="color: #DD4422"> rule_number (int): the number of the update rule to use</span> <span style="color: #DD4422"> size (int): number of cells in the row</span> <span style="color: #DD4422"> steps (int): number of steps to evolve the automaton</span> <span style="color: #DD4422"> init_cond (str): either random or impulse. If random every cell</span> <span style="color: #DD4422"> in the row is activated with prob. 0.5. If impulse only one cell</span> <span style="color: #DD4422"> is activated.</span> <span style="color: #DD4422"> impulse_pos (str): if init_cond is impulse, activate the</span> <span style="color: #DD4422"> left-most, central or right-most cell.</span> <span style="color: #DD4422"> Returns:</span> <span style="color: #DD4422"> np.array: the final state of the automaton</span> <span style="color: #DD4422"> &quot;&quot;&quot;</span> <span style="color: #008800; font-weight: bold">assert</span> <span style="color: #0000DD; font-weight: bold">0</span> <span style="color: #333333">&lt;=</span> rule_number <span style="color: #333333">&lt;=</span> <span style="color: #0000DD; font-weight: bold">255</span> <span style="color: #008800; font-weight: bold">assert</span> init_cond <span style="color: #000000; font-weight: bold">in</span> [<span style="background-color: #fff0f0">&#39;random&#39;</span>, <span style="background-color: #fff0f0">&#39;impulse&#39;</span>] <span style="color: #008800; font-weight: bold">assert</span> impulse_pos <span style="color: #000000; font-weight: bold">in</span> [<span style="background-color: #fff0f0">&#39;left&#39;</span>, <span style="background-color: #fff0f0">&#39;center&#39;</span>, <span style="background-color: #fff0f0">&#39;right&#39;</span>] rule_binary_str <span style="color: #333333">=</span> np<span style="color: #333333">.</span>binary_repr(rule_number, width<span style="color: #333333">=</span><span style="color: #0000DD; font-weight: bold">8</span>) rule_binary <span style="color: #333333">=</span> np<span style="color: #333333">.</span>array([<span style="color: #007020">int</span>(ch) <span style="color: #008800; font-weight: bold">for</span> ch <span style="color: #000000; font-weight: bold">in</span> rule_binary_str], dtype<span style="color: #333333">=</span>np<span style="color: #333333">.</span>int8) x <span style="color: #333333">=</span> np<span style="color: #333333">.</span>zeros((steps, size), dtype<span style="color: #333333">=</span>np<span style="color: #333333">.</span>int8) <span style="color: #008800; font-weight: bold">if</span> init_cond <span style="color: #333333">==</span> <span style="background-color: #fff0f0">&#39;random&#39;</span>: <span style="color: #888888"># random init of the first step</span> x[<span style="color: #0000DD; font-weight: bold">0</span>, :] <span style="color: #333333">=</span> np<span style="color: #333333">.</span>array(np<span style="color: #333333">.</span>random<span style="color: #333333">.</span>rand(size) <span style="color: #333333">&lt;</span> <span style="color: #6600EE; font-weight: bold">0.5</span>, dtype<span style="color: #333333">=</span>np<span style="color: #333333">.</span>int8) <span style="color: #008800; font-weight: bold">if</span> init_cond <span style="color: #333333">==</span> <span style="background-color: #fff0f0">&#39;impulse&#39;</span>: <span style="color: #888888"># starting with an initial impulse</span> <span style="color: #008800; font-weight: bold">if</span> impulse_pos <span style="color: #333333">==</span> <span style="background-color: #fff0f0">&#39;left&#39;</span>: x[<span style="color: #0000DD; font-weight: bold">0</span>, <span style="color: #0000DD; font-weight: bold">0</span>] <span style="color: #333333">=</span> <span style="color: #0000DD; font-weight: bold">1</span> <span style="color: #008800; font-weight: bold">elif</span> impulse_pos <span style="color: #333333">==</span> <span style="background-color: #fff0f0">&#39;right&#39;</span>: x[<span style="color: #0000DD; font-weight: bold">0</span>, size <span style="color: #333333">-</span> <span style="color: #0000DD; font-weight: bold">1</span>] <span style="color: #333333">=</span> <span style="color: #0000DD; font-weight: bold">1</span> <span style="color: #008800; font-weight: bold">else</span>: x[<span style="color: #0000DD; font-weight: bold">0</span>, size <span style="color: #333333">//</span> <span style="color: #0000DD; font-weight: bold">2</span>] <span style="color: #333333">=</span> <span style="color: #0000DD; font-weight: bold">1</span> <span style="color: #008800; font-weight: bold">for</span> i <span style="color: #000000; font-weight: bold">in</span> <span style="color: #007020">range</span>(steps <span style="color: #333333">-</span> <span style="color: #0000DD; font-weight: bold">1</span>): x[i <span style="color: #333333">+</span> <span style="color: #0000DD; font-weight: bold">1</span>, :] <span style="color: #333333">=</span> step(x[i, :], rule_binary) <span style="color: #008800; font-weight: bold">return</span> x </pre></td></tr></table></div> <br/> <p> Now, using this code we can easily plot the evolution of one cellular automaton over time. For a cellular automaton that follows the rule number 60, such that at the beginning only the left-most cell is active, the evolution of the automaton in the first 60 steps is depicted in the figure below: </p> <center> <img data-src="https://isquared.digital/assets/images/elementary_ca_rule_60.png" class="lazyload" alt="Plot of the evolution of one cellular automaton"/> <br/> <span class="caption text-muted"> <i>Fig. 6:</i> Evolution of an elementary cellular automaton </span> </center> <br/> <h2>Animated Visualization</h2> <p> Let’s put one cellular automaton in action and see how it looks like when it evolves in time. In order to do this, we use the Matplotlib Animation API to create an animation of the evolution process. We will use rule number 90 starting with one active cell at the center of the row. </p> <p> If we set a sliding window over which we will observe the evolution over time of the cellular automaton, we get the following animation: </p> <center> <img data-src="https://isquared.digital/assets/images/elementary_ca_animation.gif" class="lazyload" alt="Animation of the evolution of one cellular automaton"/> <br/> <span class="caption text-muted"> <i>Fig. 7:</i> Animation of the evolution of an elementary cellular automaton </span> </center> <br/> <p> The entire source code for the implementation of the elementary cellular automata can be found on <a href="https://github.com/IlievskiV/Amusive-Blogging-N-Coding/blob/master/Cellular%20Automata/cellular_automata.ipynb" target="_blank">GitHub</a>. </p> <p> If this is something you like and would like to receive similar posts, please subscribe to the mailing list below. For more information, please follow me on <a href="https://www.linkedin.com/in/vilievski/" target="_blank" rel="noopener">LinkedIn</a> or <a href="https://twitter.com/VladOsaurus" target="_blank" rel="noopener">Twitter</a>. </p> <link href="//cdn-images.mailchimp.com/embedcode/horizontal-slim-10_7.css" rel="stylesheet" type="text/css"> <link href="/assets/css/mailchimp.css"> <div id="mc_embed_signup"> <form action="https://digital.us19.list-manage.com/subscribe/post?u=cb9dbe40387c27177a25de80f&amp;id=08bda6f8e0" method="post" id="mc-embedded-subscribe-form" name="mc-embedded-subscribe-form" class="validate" target="_blank" novalidate> <div id="mc_embed_signup_scroll"> <label for="mce-EMAIL">Join the iSquared mailing list</label> <input type="email" value="" name="EMAIL" class="email" id="mce-EMAIL" placeholder="email address" required> <!-- real people should not fill this in and expect good things - do not remove this or risk form bot signups--> <div style="position: absolute; left: -5000px;" aria-hidden="true"><input type="text" name="b_cb9dbe40387c27177a25de80f_08bda6f8e0" tabindex="-1" value=""></div> <div class="clear"><input type="submit" value="Subscribe" name="subscribe" id="mc-embedded-subscribe" class="button"></div> </div> </form> </div> <h2>Summary: Elementary but not simple</h2> <p> It is interesting to see the repeating pattern of triangles over the evolution of the cellular automaton above. However, this is not always the case. For example, it is proven that by using <a href="https://en.wikipedia.org/wiki/Rule_110" target="_blank" rel="nofollow noopener">rule number 110</a> we can get a <a href="https://en.wikipedia.org/wiki/Turing_completeness" target="_blank" rel="nofollow noopener">Turing-complete machine</a> that is capable of universal computing which is pretty stunning for such a simple system. </p> <p> To conclude, the elementary cellular automata and in general the cellular automata are very interesting phenomenon in the world of computing. With a simple set of rules motivated by the evolution of the living organisms, it is possible to construct universal machines capable of computing everything. </p> <h2>References</h2> <p>  Stephen Wolfram, <a href="https://www.wolframscience.com/nks/" target="_blank" rel="noopener">"A New Kind of Science"</a> (2002), Wolfram Media<br/> </p>Vladimir IlievskiIllustration, implementation and animation of the elementary cellular automata in Python with MatplotlibThe new normal that changes the way we do AI. Here is how, with illustrated examples2021-01-14T10:00:00+01:002021-01-14T10:00:00+01:00https://isquared.digital/blog/ai-new-normal<p> After two days of intense debate, the United Methodist Church has agreed to a historic split - one that is expected to end in the creation of a new denomination, one that will be "theologically and socially conservative," according to The Washington Post. </p> <p> You might be wondering what this bizarre and trifling text has in common with AI, but in fact, it does. It is one of the news articles generated from the biggest ever and most sophisticated neural network, namely the <a href="https://arxiv.org/pdf/2005.14165.pdf" target="_blank" rel="noopener nofollow">GPT-3</a>. If the text befuddles you, don’t worry, you’re not the only one. Only 12% of the pundits got it right. That’s not a typo. GPT-3 mastered many tasks that were considered human-only, setting the bar ever higher. </p> <p> In the industry, there is a burgeoning need for intelligent systems working out-of-the-box with delicate specifications. Be it analyzing different text sources to predict the stock market tendency, detecting fake news, or simply answering questions like what color is Kardashian's hair (go for it, ask <a href="https://www.google.com/search?q=what+color+is+Kardashian%27s+hair&oq=what+color+is+Kardashian%27s+hair&aqs=chrome..69i57j0i22i30i457j0i22i30l6.417j0j7&sourceid=chrome&ie=UTF-8" target="_blank" rel="noopener nofollow">Google</a> or Bing). We can <a href="https://openai.com/blog/openai-api/" target="_blank" rel="noopener">use GPT-3 as a pre-trained model</a> and make it learn any task we want: generating poems, code, spreadsheets, even some simple mobile and web applications. </p> <p> Building and training such a pervasive and intelligent NLP system from scratch was never a problem. We only need a few million dollars to train the final version of it, without counting all the trials and errors throughout the process. The apocryphal <a href="https://lambdalabs.com/blog/demystifying-gpt-3/" target="_blank" rel="noopener nofollow">sum for training GPT-3 is \$4.6 million</a>. Now, that is a problem. Not everyone can afford such commodities, especially the young start-ups aiming to build disruptive NLP-based applications. </p> <p> Being such an elusive task, it doesn’t make sense to start from scratch and reluctantly fail over and over again. Instead, we need to adopt a new paradigm and take advantage of the pre-trained models, a strategy that is still not fully adopted in the industry. </p> <p> In the pursuit of this idea, through illustrated examples, I’m going to show you what lies at the heart of what we as machine learning engineers want to achieve: how to build better AI applications. </p> <h2>Transformers: the abolishing of the recurrence</h2> <p> Generally in machine learning, there are two casts of practitioners: the first who admire convolutional neural networks (CNNs), and the others who admire recurrent neural networks (RNNs). Of course, there is an interplay between them in some scenarios. </p> <p> Certainly, the RNNs are the hallmark of NLP, and for quite good reason. NLP is a sequencing task where words have a strict order and uncover cues for the next ones. Due to this temporal nature, the most obvious choice is to use RNNs. However, there are numerous impediments to their successful employment. First of all, the everlasting problem of <a href="https://www.dlology.com/blog/how-to-deal-with-vanishingexploding-gradients-in-keras/" target="_blank" rel="noopener nofollow">vanishing and exploding gradients</a>, being extremely slow, and on top of everything, they are not parallelizable. </p> <p> Starting from the hypothesis that words that group together have a similar meaning, the <a href="https://en.wikipedia.org/wiki/Word2vec" target="_blank" rel="noopener nofollow">word2vec</a> word embeddings were coined. This was a major step toward adopting pre-trained models at scale in NLP, besides the ability to mathematically conclude that the words “cat” and “dog” are closely related. </p> <p> This shift culminated with the invention of the fully convolutional <a href="https://ai.googleblog.com/2017/08/transformer-novel-neural-network.html" target="_blank" rel="noopener">Transformer architecture</a> pre-trained in a self-supervised manner. The model is summarized in the figure below. The scope of this text is too short for an exhaustive drill of the Transformer, thus I refer readers to the <a href="https://nlp.seas.harvard.edu/2018/04/03/attention.html" target="_blank" rel="noopener">excellent blog “The Annotated Transformer”</a>. </p> <center> <img data-src="https://isquared.digital/assets/images/transformer_model.png" class="lazyload" alt="Sketch of the Transformer model"/> <br/> <span class="caption text-muted"> <i>Fig. 1:</i> Transformer Model </span> </center> <br/> <p> The main message here is: this superposition of the concepts propelled <strong>the paradigm shift of re-using pre-trained models and slowly abolishing the RNN use in NLP</strong>. As never before, with a decent amount of data and processing power it is now possible to craft a custom, top-notch NLP module. </p> <p> This zoo of ready-to-use pre-trained Transformer models is becoming the <strong>“new normal”</strong>, creating an exquisite platform, an ecosystem in which many ideas can grow and blossom. </p> <p> Let’s get to the nuts and bolts and see how to take advantage of this impetus and start creating better NLP apps. </p> <h2>Enter BERT</h2> <p> No, not the one from <i>Sesame Street</i>, but the Transformer-based model called “Bidirectional Encoder Representations from Transformers” or <strong>BERT</strong> in short. </p> <p> The <a href="https://arxiv.org/abs/1810.04805" target="_blank" rel="noopener nofollow">original paper</a> was first introduced in 2018 along with the <a href="https://github.com/google-research/bert" target="_blank" rel="noopener nofollow">open-source implementation</a> distributed with a few already pre-trained models. Along with the <a href="https://openai.com/blog/language-unsupervised/" target="_blank" rel="noopener">first version of GPT</a>, it was one of the first models of this kind. After this, an entire concoction of pre-trained models accessible through a programming interface has spurred off. </p> <p> Conceptually, the BERT model is quite simple, it is a stack of Transformer Encoders as depicted in the figure below: </p> <center> <img data-src="https://isquared.digital/assets/images/bert_model.png" class="lazyload" alt="Sketch of the BERT model"/> <br/> <span class="caption text-muted"> <i>Fig. 2:</i> BERT Model </span> </center> <br/> <p> This <i>puppetware</i>, as it is popularly referred to, was created with one goal in mind: <strong>to be a general task-agnostic pre-trained model that can be fine-tuned on many downstream tasks</strong>. To achieve this, the input/output structure and the pre-training procedure are designed to be complementary and flexible enough for a wide variety of downstream tasks. One thing is for sure, and that is: </p> <blockquote style="font-size: 24px;"> “In order to use a pre-trained BERT model properly, the most important task is to understand the expected input and output.” </blockquote> <h3>Input/Output</h3> <p> Both the input and output are on the level of individual tokens, such that for each token there is a corresponding multidimensional array of size <strong>H</strong>. </p> <p> The input consists of two textual segments <strong>A</strong> and <strong>B</strong> separated by a special token designated as <strong>[SEP]</strong>. Additionally, there is always one special token at the beginning denoted as <strong>[CLS]</strong>. Having this input structure, the final input token representation is expressed as a sum of three embeddings: <a href="https://arxiv.org/pdf/1609.08144.pdf" target="_blank" rel="noopener nofollow">WordPiece embeddings</a>, segment embeddings (token belongs to segment <strong>A</strong> or <strong>B</strong>), and positional embeddings (global position encoding of the token). There is a particular reason behind this mixture of three different embeddings, however, the most important thing to remember is that each token is now projected to a dimension of size <strong>H</strong>. This dimension persists through the entire BERT model. All of this is illustrated in the figure below: </p> <center> <img data-src="https://isquared.digital/assets/images/bert_input_details.png" class="lazyload" alt="Sketch of the BERT model inputs and outputs"/> <br/> <span class="caption text-muted"> <i>Fig. 3:</i> BERT Input and Output </span> </center> <br/> <p> The output is a learned representation of size <strong>H</strong> for each input token. <strong>All of the outputs are left to be used furthermore by the engineers depending on the use case</strong>. First, the output representation for the special token <strong>[CLS]</strong> can be used for <i>any</i> text classification task. Second, the output representations of the actual word tokens can be used in <i>any</i> language understanding assignment. </p> <h3>Pre-training</h3> <p> The pre-training is based on two techniques: 1. Masked Language Modeling (MLM) and 2. Next Sentence Prediction (NSP). </p> <p> The <strong>MLM</strong> task uses the same assumption as in <i>word2vec</i>: words that appear in the same context have a similar meaning. Thus, it selects 15% of the input words at random for possible masking. Then, 80% of them are masked, 10% are replaced with a random word and the last 10% are left unchanged. Finally, the BERT outputs for the randomly selected words are passed in a <i>softmax</i> output over the entire vocabulary. </p> <p> The <strong>NSP</strong> task extends the scope of the <strong>MLP</strong> task by capturing dependencies between sentences. To accomplish this, 50% of the time segment B is the <i>de facto</i> next segment after A, and the remaining 50% of the time it is some random choice. Both tasks are shown in the figure below: </p> <center> <img data-src="https://isquared.digital/assets/images/bert_mlm_nsp.png" class="lazyload" alt="Sketch of the BERT pre-training"/> <br/> <span class="caption text-muted"> <i>Fig. 4:</i> BERT pre-training </span> </center> <br/> <p> In the original paper, there are 2 pre-trained models: 1. BERT-Base, containing 110 million parameters and 2. BERT-Large containing 340 million parameters. There is no wonder why this model knows almost everything. Both models are pre-trained using the <a href="https://github.com/soskek/bookcorpus" target="_blank" rel="noopener nofollow">Book Corpus dataset</a> (800 million words) and the entire English Wikipedia (2500 million words). </p> <h2>Basic usage of BERT</h2> <p> As noted earlier there is a trove of open-source and pre-trained BERT models ready to be used by almost everyone. One such amazing repository is the one offered by <a href="https://huggingface.co/transformers/model_doc/bert.html" target="_blank" rel="noopener nofollow">Huggingface</a> 🤗, and trust me, it is quite straightforward to take advantage of this mighty machinery. The code below demonstrates this: </p> <div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;border-width:.1em .1em .1em .8em;padding:.2em .6em;"> <table> <tr> <td style="border-bottom: none;"><pre style="margin: 0; line-height: 125%"> 1 2 3 4 5 6 7 8 9 10 11</pre> </td> <td style="border-bottom: none;"><pre style="margin: 0; line-height: 125%"><span style="color: #008800; font-weight: bold">from</span> <span style="color: #bb0066; font-weight: bold">transformers</span> <span style="color: #008800; font-weight: bold">import</span> BertTokenizer, BertModel <span style="color: #008800; font-weight: bold">import</span> <span style="color: #bb0066; font-weight: bold">torch</span> tokenizer = BertTokenizer.from_pretrained(<span style="color: #dd2200; background-color: #fff0f0">&#39;bert-base-uncased&#39;</span>) model = BertModel.from_pretrained(<span style="color: #dd2200; background-color: #fff0f0">&#39;bert-base-uncased&#39;</span>) inputs = tokenizer(<span style="color: #dd2200; background-color: #fff0f0">&quot;[CLS] This is very awesome!&quot;</span>, return_tensors=<span style="color: #dd2200; background-color: #fff0f0">&quot;pt&quot;</span>) outputs = model(**inputs) <span style="color: #888888"># the learned representation for the [CLS] token</span> cls = outputs.last_hidden_state[<span style="color: #0000DD; font-weight: bold">0</span>, <span style="color: #0000DD; font-weight: bold">0</span>, :] </pre> </td> </tr> </table> </div> <br/> <p> Now, let’s see what kind of NLP applications we can develop with BERT. </p> <h2>BERT for Text Classification</h2> <p> Sentiment analysis epitomizes text classification, but its span is much wider. Text classification is a hodgepodge of many things. It ranges from classifying single sentences and reviews to categorizing entire documents. </p> <p> The task of text classification is to take some text corpora and automatically assign a label to it. It might be handy in many different situations, as summarized in the following section. </p> <h3>Use Cases</h3> <p> Text classification can be used to automate the following tasks, but it is not limited to: <ul> <li> <strong>Sentiment Analysis:</strong> detect subjectivity and polarity in a text. It is beneficial to understand our customers, whether they feel satisfied or not from our service. </li> <li> <strong>Intent classification:</strong> understand the topic of the user’s utterance. This can be helpful in our chatbot or to automatically route the request to the right agents to take care of. </li> <li> <strong>Document categorization:</strong> automatically entitle labels to textual documents. This can ameliorate the document retrieval in our product or organization, taking into consideration the rule of thumb that nearly 80% of corporate information exists in textual format. </li> <li> <strong>Language detection:</strong> as absurd as it sounds, but sometimes it is crucial to first detect the language in which a given sentence is written. </li> <li> <strong>Customer categorization:</strong> group social media users into cohorts. This is important for the marketing teams for the sake of segmenting the different casts of potential customers. </li> </ul> </p> <h3>How to do it with BERT</h3> <p> If you need any kind of text classification, look no further. With the help of BERT or BERT-like models already at our disposal, we can craft reliable and functional text classification systems. In the figure below, it is demonstrated how we can easily adapt a pre-trained BERT model for any text classification task: </p> <center> <img data-src="https://isquared.digital/assets/images/bert_text_classification.png" class="lazyload" alt="Sketch of the BERT for text classification"/> <br/> <span class="caption text-muted"> <i>Fig. 5:</i> BERT for text classification </span> </center> <br/> <p> We just need to take a sentence and append in front of it the special <strong>[CLS]</strong> token. After feeding the sentence into the BERT model, we use only the first output corresponding to the <strong>[CLS]</strong> token and discard the rest of the output. </p> <h3>One excellent resource</h3> <p> The paper <a href="https://arxiv.org/abs/1905.05583" target="_blank" rel="noopener nofollow">How to Fine-Tune BERT for Text Classification</a> along with the underlying <a href="https://github.com/xuyige/BERT4doc-Classification" target="_blank" rel="noopener nofollow">GitHub repository</a> represents an excellent solution to re-use and start with text classification. </p> <h2>BERT for Named-Entity Recognition (NER)</h2> <p> “Floyd revolutionized rock with the Wall” - in this sentence we all know that the word <strong>“rock”</strong> refers to the rock genre of music instead of the geological object, otherwise the sentence would not make any sense. </p> <p> This is exactly the task of the <a href="https://en.wikipedia.org/wiki/Entity_linking" target="_blank" rel="noopener nofollow">Named-Entity Recognition</a> (NER) task, to link a word or group of words to a unique entity, or a class depending on the context. Once knowing all entities, we can link them to a knowledge base or a database where we can find more info about them. In other words, it is extracting data about the data. </p> <h3>Use Cases</h3> <p> The scope of the <strong>NER</strong> is big, but these are the main relevant fields where it is applied: <ul> <li> <strong>Information retrieval:</strong> by discovering the entities of the words we can understand the semantic search queries much better. This can help us to find more relevant search results. </li> <li> <strong>Building better chatbots:</strong> understand the users better. In fact, chatbots rely on NER, based on the extracted entities they can search knowledge bases, databases, and return relevant answers driving the conversation in the right direction. </li> <li> <strong>Knowledge extraction:</strong> make the unstructured data relevant. As most of the information exists in textual format, extracting value from it becomes an essential task. </li> </ul> </p> <h3>How to do it with BERT</h3> <p> There are many different techniques for tackling the NER task including highly specialized neural network architectures. With the invention of BERT and BERT-like systems, crafting NER systems is quite handy, as illustrated below: </p> <center> <img data-src="https://isquared.digital/assets/images/bert_ner.png" class="lazyload" alt="Sketch of the BERT for NER"/> <br/> <span class="caption text-muted"> <i>Fig. 6:</i> BERT for Named-Entity Recognition </span> </center> <br/> <p> The tokenized sentence at the input is fed into the pre-trained BERT model. To determine the entities to which the words belong, we use the BERT learned representation for the words and feed them in a classifier. </p> <h3>One excellent resource</h3> <p> NER is one of the benchmarking tasks in the original paper, thus it is possible to use the original repository. On top of this, there are a plethora of repositories solving NER with BERT in a similar way, out of which the most comprehensive is <a href="https://github.com/kamalkraj/BERT-NER" target="_blank" rel="noopener nofollow">this GitHub repository</a>, giving a way to put it in production immediately. </p> <h2>BERT for Extractive Text Summarization</h2> <p> Imagine having a long text without an abstract, or a news article without a headline. What would you first do in this case is skim through the text in order to understand what is it about. This mundane task can easily be bypassed, if there were some automatic summary extraction system. </p> <p> The task of the extractive text summarization is to automatically sample the most salient and informative sentences from a given text. This is quite convenient in many different scenarios as described further down. </p> <h3>Use Cases</h3> <p> Automatic text summarization can be applied for everything related to long documents: <ul> <li> <strong>News summarization:</strong> summarizing the brimming amount of every day news articles. </li> <li> <strong>Legal contract analysis:</strong> summarizing the excruciatingly confusing and long legal documents in order to understand them in simple words. </li> <li> <strong>Marketing and SEO:</strong> crawl and summarize the content of the competitors to understand it better. </li> <li> <strong>Financial reports analysis:</strong> extract meaningful information from the financial news and reports for the sake of making better decisions. </li> </ul> </p> <h3>How to do it with BERT</h3> <p> Performing extractive text summarization with BERT might be tricky since it is not one of the tasks for which BERT was designed to be a pre-trained model. Despite this, the BERT model is flexible enough to be customized to work in this scenario. We show how to do this in the illustration below: </p> <center> <img data-src="https://isquared.digital/assets/images/bert_summarization.png" class="lazyload" alt="Sketch of the BERT for Summarization"/> <br/> <span class="caption text-muted"> <i>Fig. 7:</i> BERT for Extractive Text Summarization </span> </center> <br/> <p> In this case, we operate on a sentence level, but the sentences are anyway considered as a list of tokens. To select what sentences are important, we enclose each sentence with <strong>[CLS]</strong> token on the left and <strong>[SEP]</strong> token on the right. To make the neighboring sentences depend on each other we explicitly provide segmentation tokens. For each sentence, we alternatively switch between <strong>segment A</strong> and <strong>segment B</strong>. </p> <p> After feeding this composite input into the pre-trained BERT model, we only use the output representation for each of the <strong>[CLS]</strong> tokens to select the best sentences that summarize the text. </p> <h3>One excellent resource</h3> <p> The work in the paper entitled <a href="https://arxiv.org/abs/1908.08345" target="_blank" rel="noopener nofollow">Text Summarization with Pre-Trained Encoders</a> along with its associated <a href="https://github.com/nlpyang/BertSum" target="_blank" rel="noopener nofollow">GitHub repository</a> presents the system called <i>BertSum</i>. This system can serve as a very good foundation in developing more specialized text summarizers based on BERT. </p> <p> If this is something you like and would like to receive similar posts, please subscribe to the mailing list below. For more information, please follow me on <a href="https://twitter.com/VladOsaurus" target="_blank" rel="noopener">Twitter</a> or <a href="https://www.linkedin.com/in/vilievski/" target="_blank" rel="noopener">LinkedIn</a>. </p> <link href="//cdn-images.mailchimp.com/embedcode/horizontal-slim-10_7.css" rel="stylesheet" type="text/css"> <link href="/assets/css/mailchimp.css"> <div id="mc_embed_signup"> <form action="https://digital.us19.list-manage.com/subscribe/post?u=cb9dbe40387c27177a25de80f&amp;id=08bda6f8e0" method="post" id="mc-embedded-subscribe-form" name="mc-embedded-subscribe-form" class="validate" target="_blank" novalidate> <div id="mc_embed_signup_scroll"> <label for="mce-EMAIL">Join the iSquared mailing list</label> <input type="email" value="" name="EMAIL" class="email" id="mce-EMAIL" placeholder="email address" required> <!-- real people should not fill this in and expect good things - do not remove this or risk form bot signups--> <div style="position: absolute; left: -5000px;" aria-hidden="true"><input type="text" name="b_cb9dbe40387c27177a25de80f_08bda6f8e0" tabindex="-1" value=""></div> <div class="clear"><input type="submit" value="Subscribe" name="subscribe" id="mc-embedded-subscribe" class="button"></div> </div> </form> </div> <h2>Conclusion</h2> <p> The purpose of this text is to elicit the machine learning practitioners as well as the decision makes in the business domain to start adopting the new paradigm shift. </p> <p> With illustrated examples, we see how we can easily use and adapt a pre-trained BERT model on a variety of tasks including: text classification, named-entity recognition and extractive text summarization. </p> <p> In future, this zoo of high performant pre-trained models like BERT will become more important and relevant. We see this from the recent developments, for instance the creation of the GPT-3 model. This might be the push we all need in achieving our goals. Thus, let’s grab this chance and use the momentum. </p>Vladimir IlievskiIllustrated examples on how and why to use and adopt the BERT-like modelsIntegrals are Fun: Illustrated Riemann-Stieltjes Integral2020-10-01T11:00:00+02:002020-10-01T11:00:00+02:00https://isquared.digital/blog/riemann-stieltjes-integration<script src="https://polyfill.io/v3/polyfill.min.js?features=es6"></script> <script id="MathJax-script" async src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script> <p> <i>Integrals</i> are more than just the sum of its parts! Well, let's not exaggerate. In their most fundamental definition, they are <i>only a sum</i> of infinitely many rectangles under the curve of some function $$f$$ over some interval $$[a, b]$$ for $$a, b \in \mathbb{R}$$. However, solving and numerically approximating them is lots of fun and besides that, they are quite useful in applied sciences. </p> <p> In the <a href="/blog/2020-05-27-riemann-integration/" target="_blank" rel="dofollow">previous blog</a> dedicated to the integrals, we went through the most basic form of integration, namely the <b>Riemann Integral</b>. This time we take a step forward and learn about its more general extension, the <a href="https://en.wikipedia.org/wiki/Riemann%E2%80%93Stieltjes_integral" target="_blank" rel="noopener nofollow">Riemann-Stieltjes Integral</a>, the precursor of the great <b>Lebesgue Integral</b>. </p> <p> In the following few minutes of reading, first, we give intuitive reasoning why the Riemann Integral is limited and how to escape those restrictions. This paves the way to the formal definition of the <b>Riemann-Stieltjes Integration</b>. Taking the theoretical foundations for granted we provide a straightforward <i>Python</i> implementation. Finally, we make an illustration using Matplotlib to facilitate the understanding of this subject. <b>Stay tuned!</b> </p> <h2>Intuition: How to extend the Riemann Integral?</h2> <p> The <i>Riemann Integral</i> is quite straightforward. It allows us to take any <i>real-valued function</i> $$f$$ and calculate the area of the surface bounded by the intersection of the same function $$f$$ with the two vertical lines $$x = a$$ and $$x = b$$, for $$a, b \in \mathbb{R}$$. We have to fit infinitely many <i>rectangles</i> (or <i>right trapezoids</i>) inside this body and sum their areas as depicted on the left in Fig. 1, something we already explained in the <a href="/blog/2020-05-27-riemann-integration/" target="_blank" rel="dofollow">previous blog on Riemann Integration</a>. </p> <center> <img data-src="https://isquared.digital/assets/images/riemann_vs_stieltjes.png" class="lazyload" alt=""/> <br/> <span class="caption text-muted"> <i>Fig. 1:</i> Extending the Riemann Integral </span> </center> <br/> <p> But why being constrained on the interval $$[a, b]$$ which gives all rectangles more or less the same width? Can we scale and shift this interval as shown on the right in Fig. 1, such that some rectangles would be less and adequately more important than others? In fact, this is the main limitation of the Riemann Integral, we have no option, but to use only the interval $$[a, b]$$ without the possibility to assign different weights to different parts. </p> <p> The point here is not to reinvent the wheel, probably Bernhard Riemann and Thomas Stieltjes had the same questions more than a century ago for which they came up with an excellent solution. The goal is to gain a common sense of how we can easily twist the problems we already solved to get new ones, thus opening new horizons. </p> <p> All of this brings us to the basic principle of the <i>Riemann-Stieltjes Integration</i>, which we define formally in the next section. </p> <h2>Theoretical Foundations of the Riemann-Stieltjes Integration</h2> <p> Time to be more serious and formal now. Let $$f: [a, b] \rightarrow \mathbb{R}$$ be a <i>non-negative</i> and <i>continuous</i> function over the interval $$[a, b]$$. In order to fit infinitely many <i>rectangles</i>, we need to divide the input space into $$N$$ sub-intervals for some $$N \in \mathbb{N}$$. In other words, we need a <i>partition</i> of the interval $$[a, b]$$ which is a sequence of numbers in the form: </p> <center> <img data-src="https://isquared.digital/assets/images/riemann_partition.png" class="lazyload" alt="Math expression for a partition of an interval"/> </center> <br/> <p> Without loss of generality, we can assume that all sub-intervals are equidistant. As we already saw in the previous section, the surface under the curve $$f$$ is not always bounded on the interval $$[a, b]$$, i.e. we can shift it or scale it. This shift is formally defined as applying a <i>real-to-real</i> and <a href="https://en.wikipedia.org/wiki/Monotonic_function" target="_blank" rel="noopener nofollow">monotone</a> (it preserves the order) function $$g$$ on the interval $$[a, b]$$. The interval is now transformed into $$[g(a), g(b)]$$, and respectively all members of the partition. </p> <center> <img data-src="https://isquared.digital/assets/images/trapezoids_riemann_stieltjes.png" class="lazyload" alt="Approximating area under the curve with trapezoids"/> <br/> <span class="caption text-muted"> <i>Fig. 2:</i> Trapezoidal Rule to approximate the area under the curve </span> </center> <br/> <p> Having the partition set and transformed, it divides the space into infinitely many parts whose areas we need to sum. Thus, by using the <a href="https://en.wikipedia.org/wiki/Trapezoidal_rule" target="_blank" rel="nofollow noopener">Trapezoidal Rule</a>, the <i>Riemann-Stieltjes Integral</i> is defined as: </p> <center> <img data-src="https://isquared.digital/assets/images/riemann_stieltjes_sum_trapezoids.png" class="lazyload" alt="Math equation of the Riemann-Stieltjes Integral with Trapezoids"/> </center> <br/> <p> Typically, the functions $$f$$ and $$g$$ are called the <i>integrand</i> and the <i>integrator</i>, because the integral of $$f$$ is calculated with respect to $$g$$. </p> <h2>Python Implementation</h2> <p> Once we have a neat equation, we can easily transcribe it to a <i>Python</i> implementation, as given in the code snippet below: </p> <div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;border-width:.1em .1em .1em .8em;padding:.2em .6em;"><table><tr><td style="border-bottom: none;"><pre style="margin: 0; line-height: 125%"> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28</pre></td><td style="border-bottom: none;"><pre style="margin: 0; line-height: 125%"><span style="color: #008800; font-weight: bold">def</span> <span style="color: #0066BB; font-weight: bold">derivative</span>(f, a, h<span style="color: #333333">=</span><span style="color: #6600EE; font-weight: bold">0.01</span>): <span style="color: #DD4422">&#39;&#39;&#39;Approximates the derivative of the function f in a</span> <span style="color: #DD4422"> </span> <span style="color: #DD4422"> :param function f: function to differentiate</span> <span style="color: #DD4422"> :param float a: the point of differentiation</span> <span style="color: #DD4422"> :param float h: step size</span> <span style="color: #DD4422"> :return float: the derivative of f in a</span> <span style="color: #DD4422"> &#39;&#39;&#39;</span> <span style="color: #008800; font-weight: bold">return</span> (f(a <span style="color: #333333">+</span> h) <span style="color: #333333">-</span> f(a <span style="color: #333333">-</span> h))<span style="color: #333333">/</span>(<span style="color: #0000DD; font-weight: bold">2</span><span style="color: #333333">*</span>h) <span style="color: #008800; font-weight: bold">def</span> <span style="color: #0066BB; font-weight: bold">stieltjes_integral</span>(f, g, a, b, n): <span style="color: #DD4422">&#39;&#39;&#39;Calculates the Riemann-Stieltjes integral based on the composite trapezoidal rule</span> <span style="color: #DD4422"> relying on the Riemann Sums.</span> <span style="color: #DD4422"> </span> <span style="color: #DD4422"> :param function f: integrand function</span> <span style="color: #DD4422"> :param function f: integrator function </span> <span style="color: #DD4422"> :param int a: lower bound of the integral</span> <span style="color: #DD4422"> :param int b: upper bound of theintergal</span> <span style="color: #DD4422"> :param int n: number of trapezoids of equal width</span> <span style="color: #DD4422"> :return float: the integral of the function f between a and b</span> <span style="color: #DD4422"> &#39;&#39;&#39;</span> eps <span style="color: #333333">=</span> <span style="color: #6600EE; font-weight: bold">1e-9</span> h <span style="color: #333333">=</span> (b <span style="color: #333333">-</span> a)<span style="color: #333333">/</span>(n <span style="color: #333333">+</span> eps) <span style="color: #888888"># width of the rectangle</span> dg <span style="color: #333333">=</span> <span style="color: #008800; font-weight: bold">lambda</span> x: derivative(g, x, h<span style="color: #333333">=</span><span style="color: #6600EE; font-weight: bold">1e-8</span>) <span style="color: #888888"># derivative of the integrator function</span> result <span style="color: #333333">=</span> <span style="color: #6600EE; font-weight: bold">0.5</span><span style="color: #333333">*</span>f(a)<span style="color: #333333">*</span>dg(a) <span style="color: #333333">+</span> <span style="color: #007020">sum</span>([f(a <span style="color: #333333">+</span> i<span style="color: #333333">*</span>h)<span style="color: #333333">*</span>dg(a <span style="color: #333333">+</span> i<span style="color: #333333">*</span>h) <span style="color: #008800; font-weight: bold">for</span> i <span style="color: #000000; font-weight: bold">in</span> <span style="color: #007020">range</span>(<span style="color: #0000DD; font-weight: bold">1</span>, n)]) <span style="color: #333333">+</span> <span style="color: #6600EE; font-weight: bold">0.5</span><span style="color: #333333">*</span>f(b)<span style="color: #333333">*</span>dg(b) result <span style="color: #333333">*=</span> h <span style="color: #008800; font-weight: bold">return</span> result </pre></td></tr></table></div> <br/> <p> In order to fully understand the process of <i>Riemann-Stieltjes Integration</i>, we make an illustration using <i>Matplotlib</i>. For this purpose let's take some linear <i>integrand</i> function $$f$$ and let the <i>integrator</i> function be $$g(x) = 3x$$. </p> <p> In the image below, the standard <i>Riemann Integration</i> is depicted with the <i>blueish</i> rectangles on the left. In this case, the actual width of the underlying rectangles is kept. To transform this to a <i>Riemann-Stieltjes Integral</i> we must plot the graph of the curve $$(x, y) = (g(x), f(x))$$, which is depicted with the <i>greenish</i> rectangles on the right. As we can see their widths are 3 times bigger since $$g(x) = 3x$$, while still preserving the same height $$f(x)$$. </p> <center> <img data-src="https://isquared.digital/assets/images/riemann_stieltjes_illustrated.png" class="lazyload" alt="Illustration of the Riemann-Stieltjes Sum"> <br/> <span class="caption text-muted"> <i>Illustration: Transforming Riemann Integral to Riemann-Stieltjes Integral</i> </span> </center> <br/> <p> The full source code related to all we have discussed during this blog can be found on <a href="https://github.com/IlievskiV/Amusive-Blogging-N-Coding/blob/master/Integration/rieman_stieltjes_sums.ipynb" target="_blank" rel="dofollow noopener">GitHub</a>. If you have any suggestions or remarks please let me know by commenting below. I would be happy to discuss this. </p> <h2>Applications of the Riemann-Stieltjes Integral</h2> <p> One might ask what is so special about this integral? Well, this mechanism is indispensable and lays down the foundations for many fields. </p> <p> First of all, the so-called <a href="https://en.wikipedia.org/wiki/Law_of_the_unconscious_statistician" target="_blank" rel="nofollow noopener">law of the unconscious statistician</a> would not be possible. This theorem is used to calculate the <i>expected value</i> of the <i>random variable</i> obtained by applying some arbitrary function $$f$$ on some other <i>random variable</i> $$X$$, for which we only know its <a href="https://en.wikipedia.org/wiki/Cumulative_distribution_function" target="_blank" rel="nofollow noopener">cumulative distribution function</a> $$g(x)$$. That means we don't have to know at all the distribution of $$f(X)$$, for which the expected value is given by the Riemann-Stieltjes Integral: </p> <center> <img data-src="https://isquared.digital/assets/images/lotus_theorem.png" class="lazyload" alt="Math equation for the expected value of a random variable"/> </center> <br/> <p> More importantly, the Riemann-Stieltjes Integral is one of the cornerstones of the <i>Stochastic Calculus</i>. Specifically, the <i>Itô Integral for Elementary Random Process</i> $$X$$ is defined as: </p> <center> <img data-src="https://isquared.digital/assets/images/ito_integral.png" class="lazyload" alt="Math equation for the Ito Integral in Stochastic Calculus"/> </center> <br/> <p> where $$W$$ is a <a href="/blog/2020-04-16-brownian-motion/" target="_blank" rel="dofollow">Brownian Motion</a>. The details about this integral are intentionally left because it is a complex topic that could be covered in several blog posts. </p> <p> If this is something you like and would like to receive similar posts, please subscribe to the mailing list below. For more information, please follow me on <a href="https://twitter.com/VladOsaurus" target="_blank" rel="noopener">Twitter</a> or <a href="https://www.linkedin.com/in/vilievski/" target="_blank" rel="noopener">LinkedIn</a>. </p> <link href="//cdn-images.mailchimp.com/embedcode/horizontal-slim-10_7.css" rel="stylesheet" type="text/css"> <link href="/assets/css/mailchimp.css"> <div id="mc_embed_signup"> <form action="https://digital.us19.list-manage.com/subscribe/post?u=cb9dbe40387c27177a25de80f&amp;id=08bda6f8e0" method="post" id="mc-embedded-subscribe-form" name="mc-embedded-subscribe-form" class="validate" target="_blank" novalidate> <div id="mc_embed_signup_scroll"> <label for="mce-EMAIL">Join the iSquared mailing list</label> <input type="email" value="" name="EMAIL" class="email" id="mce-EMAIL" placeholder="email address" required> <!-- real people should not fill this in and expect good things - do not remove this or risk form bot signups--> <div style="position: absolute; left: -5000px;" aria-hidden="true"><input type="text" name="b_cb9dbe40387c27177a25de80f_08bda6f8e0" tabindex="-1" value=""></div> <div class="clear"><input type="submit" value="Subscribe" name="subscribe" id="mc-embedded-subscribe" class="button"></div> </div> </form> </div> <h2>Conclusion</h2> <p> Finally, we reached the end of this blog that covered in short the <i>Riemann-Stieltjes Integral</i>. It is a generalization of the <i>Riemann Integral</i> such that it provides means to transform the input space. We started with a general intuition and gradually continued towards a theoretical definition and illustrated examples for a complete overview. In the end, we saw the most important applications of this integral. </p> <h2>References</h2> <p>  Svein Linge, Hans Petter Langtangen, <a href="https://link.springer.com/content/pdf/10.1007%2F978-3-319-32428-9.pdf" target="_blank">"Programming for Computations - Python"</a> (2016), Springer Open <br/>  Arturo Fernandez, <a href="https://www.stat.berkeley.edu/~arturof/Teaching/STAT150/Notes/II_Brownian_Motion.pdf" target="_blank">“Brownian Motion and An Introduction to Stochastic Integration”</a> (2011), Statistics 157: Topics In Stochastic Processes Seminar <br/> </p>Vladimir IlievskiIllustrated examples of the Riemann-Stieltjes Integration in Python with MatplotlibAnimate Your Own Fractals in Python with Matplotlib2020-07-04T11:00:00+02:002020-07-04T11:00:00+02:00https://isquared.digital/blog/animated-fractals<p><strong>Disclaimer</strong>: This post was originally published on the <a href="https://matplotlib.org/matplotblog/posts/animated-fractals/" target="_blank">Matplotlib’s Blog</a>.</p> <p>Imagine zooming an image over and over and never go out of finer details. It may sound bizarre but the mathematical concept of <a href="https://en.wikipedia.org/wiki/Fractal" target="_blank">fractals</a> opens the realm towards this intricating infinity. This strange geometry exhibits the same or similar patterns irrespectively of the scale. We can see one fractal example in the image above.</p> <p>The <em>fractals</em> may seem difficult to understand due to their peculiarity, but that’s not the case. As Benoit Mandelbrot, one of the founding fathers of the fractal geometry said in his legendary <a href="https://www.ted.com/talks/benoit_mandelbrot_fractals_and_the_art_of_roughness?language=en" target="_blank">TED Talk</a>:</p> <blockquote> <p>A surprising aspect is that the rules of this geometry are extremely short. You crank the formulas several times and at the end, you get things like this (pointing to a stunning plot)</p> <p>– <cite>Benoit Mandelbrot</cite></p> </blockquote> <p>In this tutorial blog post, we will see how to construct fractals in Python and animate them using the amazing <em>Matplotlib’s</em> Animation API. First, we will demonstrate the convergence of the <em>Mandelbrot Set</em> with an enticing animation. In the second part, we will analyze one interesting property of the <em>Julia Set</em>. Stay tuned!</p> <h1 id="intuition">Intuition</h1> <p>We all have a common sense of the concept of similarity. We say two objects are similar to each other if they share some common patterns.</p> <p>This notion is not only limited to a comparison of two different objects. We can also compare different parts of the same object. For instance, a leaf. We know very well that the left side matches exactly the right side, i.e. the leaf is symmetrical.</p> <p>In mathematics, this phenomenon is known as <a href="https://en.wikipedia.org/wiki/Self-similarity" target="_blank">self-similarity</a>. It means a given object is similar (completely or to some extent) to some smaller part of itself. One remarkable example is the <a href="https://isquared.digital/visualizations/2020-06-15-koch-curve/" target="_blank">Koch Snowflake</a> as shown in the image below:</p> <center> <img data-src="https://isquared.digital/assets/images/snowflake.png" class="lazyload" alt="Sketch showing a snowflake constructed using fractal geometry" /> <br /> <span class="caption text-muted"> <i>Fig. 1:</i> Koch Snowflake </span> </center> <p><br /></p> <p>We can infinitely magnify some part of it and the same pattern will repeat over and over again. This is how fractal geometry is defined.</p> <h1 id="animated-mandelbrot-set">Animated Mandelbrot Set</h1> <p><a href="https://en.wikipedia.org/wiki/Mandelbrot_set" target="_blank">Mandelbrot Set</a> is defined over the set of <em>complex numbers</em>. It consists of all complex numbers <strong>c</strong>, such that the sequence <strong>zᵢ₊ᵢ = zᵢ² + c, z₀ = 0</strong> is bounded. It means, after a certain number of iterations the absolute value must not exceed a given limit. At first sight, it might seem odd and simple, but in fact, it has some mind-blowing properties.</p> <p>The <em>Python</em> implementation is quite straightforward, as given in the code snippet below:</p> <figure class="highlight"><pre><code class="language-python" data-lang="python"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 </pre></td><td class="code"><pre><span class="k">def</span> <span class="nf">mandelbrot</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">threshold</span><span class="p">):</span> <span class="s">"""Calculates whether the number c = x + i*y belongs to the Mandelbrot set. In order to belong, the sequence z[i + 1] = z[i]**2 + c must not diverge after 'threshold' number of steps. The sequence diverges if the absolute value of z[i+1] is greater than 4. :param float x: the x component of the initial complex number :param float y: the y component of the initial complex number :param int threshold: the number of iterations to considered it converged """</span> <span class="c1"># initial conditions </span> <span class="n">c</span> <span class="o">=</span> <span class="nb">complex</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span> <span class="n">z</span> <span class="o">=</span> <span class="nb">complex</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">threshold</span><span class="p">):</span> <span class="n">z</span> <span class="o">=</span> <span class="n">z</span><span class="o">**</span><span class="mi">2</span> <span class="o">+</span> <span class="n">c</span> <span class="k">if</span> <span class="nb">abs</span><span class="p">(</span><span class="n">z</span><span class="p">)</span> <span class="o">&gt;</span> <span class="mf">4.</span><span class="p">:</span> <span class="c1"># it diverged </span> <span class="k">return</span> <span class="n">i</span> <span class="k">return</span> <span class="n">threshold</span> <span class="o">-</span> <span class="mi">1</span> <span class="c1"># it didn't diverge</span> </pre></td></tr></tbody></table></code></pre></figure> <p>As we can see, we set the maximum number of iterations encoded in the variable <code class="language-plaintext highlighter-rouge">threshold</code>. If the magnitude of the sequence at some iteration exceeds <strong>4</strong>, we consider it as diverged (<strong>c</strong> does not belong to the set) and return the iteration number at which this occurred. If this never happens (<strong>c</strong> belongs to the set), we return the maximum number of iterations.</p> <p>We can use the information about the number of iterations before the sequence diverges. All we have to do is to associate this number to a color relative to the maximum number of loops. Thus, for all complex numbers <strong>c</strong> in some lattice of the complex plane, we can make a nice animation of the convergence process as a function of the maximum allowed iterations.</p> <p>One particular and interesting area is the <em>3x3</em> lattice starting at position -2 and -1.5 for the <em>real</em> and <em>imaginary</em> axis respectively. We can observe the process of convergence as the number of allowed iterations increases. This is easily achieved using the <em>Matplotlib’s</em> Animation API, as shown with the following code:</p> <figure class="highlight"><pre><code class="language-python" data-lang="python"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 </pre></td><td class="code"><pre><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span> <span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="n">plt</span> <span class="kn">import</span> <span class="nn">matplotlib.animation</span> <span class="k">as</span> <span class="n">animation</span> <span class="n">x_start</span><span class="p">,</span> <span class="n">y_start</span> <span class="o">=</span> <span class="o">-</span><span class="mi">2</span><span class="p">,</span> <span class="o">-</span><span class="mf">1.5</span> <span class="c1"># an interesting region starts here </span><span class="n">width</span><span class="p">,</span> <span class="n">height</span> <span class="o">=</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">3</span> <span class="c1"># for 3 units up and right </span><span class="n">density_per_unit</span> <span class="o">=</span> <span class="mi">250</span> <span class="c1"># how many pixles per unit </span> <span class="c1"># real and imaginary axis </span><span class="n">re</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">linspace</span><span class="p">(</span><span class="n">x_start</span><span class="p">,</span> <span class="n">x_start</span> <span class="o">+</span> <span class="n">width</span><span class="p">,</span> <span class="n">width</span> <span class="o">*</span> <span class="n">density_per_unit</span> <span class="p">)</span> <span class="n">im</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">linspace</span><span class="p">(</span><span class="n">y_start</span><span class="p">,</span> <span class="n">y_start</span> <span class="o">+</span> <span class="n">height</span><span class="p">,</span> <span class="n">height</span> <span class="o">*</span> <span class="n">density_per_unit</span><span class="p">)</span> <span class="n">fig</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">10</span><span class="p">))</span> <span class="c1"># instantiate a figure to draw </span><span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">axes</span><span class="p">()</span> <span class="c1"># create an axes object </span> <span class="k">def</span> <span class="nf">animate</span><span class="p">(</span><span class="n">i</span><span class="p">):</span> <span class="n">ax</span><span class="p">.</span><span class="n">clear</span><span class="p">()</span> <span class="c1"># clear axes object </span> <span class="n">ax</span><span class="p">.</span><span class="n">set_xticks</span><span class="p">([],</span> <span class="p">[])</span> <span class="c1"># clear x-axis ticks </span> <span class="n">ax</span><span class="p">.</span><span class="n">set_yticks</span><span class="p">([],</span> <span class="p">[])</span> <span class="c1"># clear y-axis ticks </span> <span class="n">X</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">empty</span><span class="p">((</span><span class="nb">len</span><span class="p">(</span><span class="n">re</span><span class="p">),</span> <span class="nb">len</span><span class="p">(</span><span class="n">im</span><span class="p">)))</span> <span class="c1"># re-initialize the array-like image </span> <span class="n">threshold</span> <span class="o">=</span> <span class="nb">round</span><span class="p">(</span><span class="mf">1.15</span><span class="o">**</span><span class="p">(</span><span class="n">i</span> <span class="o">+</span> <span class="mi">1</span><span class="p">))</span> <span class="c1"># calculate the current threshold </span> <span class="c1"># iterations for the current threshold </span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">re</span><span class="p">)):</span> <span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">im</span><span class="p">)):</span> <span class="n">X</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">]</span> <span class="o">=</span> <span class="n">mandelbrot</span><span class="p">(</span><span class="n">re</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="n">im</span><span class="p">[</span><span class="n">j</span><span class="p">],</span> <span class="n">threshold</span><span class="p">)</span> <span class="c1"># associate colors to the iterations with an iterpolation </span> <span class="n">img</span> <span class="o">=</span> <span class="n">ax</span><span class="p">.</span><span class="n">imshow</span><span class="p">(</span><span class="n">X</span><span class="p">.</span><span class="n">T</span><span class="p">,</span> <span class="n">interpolation</span><span class="o">=</span><span class="s">"bicubic"</span><span class="p">,</span> <span class="n">cmap</span><span class="o">=</span><span class="s">'magma'</span><span class="p">)</span> <span class="k">return</span> <span class="p">[</span><span class="n">img</span><span class="p">]</span> <span class="n">anim</span> <span class="o">=</span> <span class="n">animation</span><span class="p">.</span><span class="n">FuncAnimation</span><span class="p">(</span><span class="n">fig</span><span class="p">,</span> <span class="n">animate</span><span class="p">,</span> <span class="n">frames</span><span class="o">=</span><span class="mi">45</span><span class="p">,</span> <span class="n">interval</span><span class="o">=</span><span class="mi">120</span><span class="p">,</span> <span class="n">blit</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span> <span class="n">anim</span><span class="p">.</span><span class="n">save</span><span class="p">(</span><span class="s">'mandelbrot.gif'</span><span class="p">,</span><span class="n">writer</span><span class="o">=</span><span class="s">'imagemagick'</span><span class="p">)</span> </pre></td></tr></tbody></table></code></pre></figure> <p>We make animations in <em>Matplotlib</em> using the <code class="language-plaintext highlighter-rouge">FuncAnimation</code> function from the <em>Animation</em> API. We need to specify the <code class="language-plaintext highlighter-rouge">figure</code> on which we draw a predefined number of consecutive <code class="language-plaintext highlighter-rouge">frames</code>. A predetermined <code class="language-plaintext highlighter-rouge">interval</code> expressed in milliseconds defines the delay between the frames.</p> <p>In this context, the <code class="language-plaintext highlighter-rouge">animate</code> function plays a central role, where the input argument is the frame number, starting from 0. It means, in order to animate we always have to think in terms of frames. Hence, we use the frame number to calculate the variable <code class="language-plaintext highlighter-rouge">threshold</code> which is the maximum number of allowed iterations.</p> <p>To represent our lattice we instantiate two arrays <code class="language-plaintext highlighter-rouge">re</code> and <code class="language-plaintext highlighter-rouge">im</code>: the former for the values on the <em>real</em> axis and the latter for the values on the <em>imaginary</em> axis. The number of elements in these two arrays is defined by the variable <code class="language-plaintext highlighter-rouge">density_per_unit</code> which defines the number of samples per unit step. The higher it is, the better quality we get, but at a cost of heavier computation.</p> <p>Now, depending on the current <code class="language-plaintext highlighter-rouge">threshold</code>, for every complex number <strong>c</strong> in our lattice, we calculate the number of iterations before the sequence <strong>zᵢ₊ᵢ = zᵢ² + c, z₀ = 0</strong> diverges. We save them in an initially empty matrix called <code class="language-plaintext highlighter-rouge">X</code>. In the end, we <em>interpolate</em> the values in <code class="language-plaintext highlighter-rouge">X</code> and assign them a color drawn from a prearranged <em>colormap</em>.</p> <p>After cranking the <code class="language-plaintext highlighter-rouge">animate</code> function multiple times we get a stunning animation as depicted below:</p> <center> <img data-src="https://isquared.digital/assets/images/mandelbrot_animation.gif" class="lazyload" alt="Animation showing the convergence of the Madelbrot set as the number of allowed iterations increases" /> <br /> <span class="caption text-muted"> <i>Fig. 2:</i> Madelbrot Set Convergence </span> </center> <p><br /></p> <h1 id="animated-julia-set">Animated Julia Set</h1> <p>The <a href="https://en.wikipedia.org/wiki/Julia_set" target="_blank">Julia Set</a> is quite similar to the <em>Mandelbrot Set</em>. Instead of setting <strong>z₀ = 0</strong> and testing whether for some complex number <strong>c = x + i*y</strong> the sequence <strong>zᵢ₊ᵢ = zᵢ² + c</strong> is bounded, we switch the roles a bit. We fix the value for <strong>c</strong>, we set an arbitrary initial condition <strong>z₀ = x + i*y</strong>, and we observe the convergence of the sequence. The <em>Python</em> implementation is given below:</p> <figure class="highlight"><pre><code class="language-python" data-lang="python"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 </pre></td><td class="code"><pre><span class="k">def</span> <span class="nf">julia_quadratic</span><span class="p">(</span><span class="n">zx</span><span class="p">,</span> <span class="n">zy</span><span class="p">,</span> <span class="n">cx</span><span class="p">,</span> <span class="n">cy</span><span class="p">,</span> <span class="n">threshold</span><span class="p">):</span> <span class="s">"""Calculates whether the number z = zx + i*zy with a constant c = x + i*y belongs to the Julia set. In order to belong, the sequence z[i + 1] = z[i]**2 + c, must not diverge after 'threshold' number of steps. The sequence diverges if the absolute value of z[i+1] is greater than 4. :param float zx: the x component of z :param float zy: the y component of z :param float cx: the x component of the constant c :param float cy: the y component of the constant c :param int threshold: the number of iterations to considered it converged """</span> <span class="c1"># initial conditions </span> <span class="n">z</span> <span class="o">=</span> <span class="nb">complex</span><span class="p">(</span><span class="n">zx</span><span class="p">,</span> <span class="n">zy</span><span class="p">)</span> <span class="n">c</span> <span class="o">=</span> <span class="nb">complex</span><span class="p">(</span><span class="n">cx</span><span class="p">,</span> <span class="n">cy</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">threshold</span><span class="p">):</span> <span class="n">z</span> <span class="o">=</span> <span class="n">z</span><span class="o">**</span><span class="mi">2</span> <span class="o">+</span> <span class="n">c</span> <span class="k">if</span> <span class="nb">abs</span><span class="p">(</span><span class="n">z</span><span class="p">)</span> <span class="o">&gt;</span> <span class="mf">4.</span><span class="p">:</span> <span class="c1"># it diverged </span> <span class="k">return</span> <span class="n">i</span> <span class="k">return</span> <span class="n">threshold</span> <span class="o">-</span> <span class="mi">1</span> <span class="c1"># it didn't diverge</span> </pre></td></tr></tbody></table></code></pre></figure> <p>Obviously, the setup is quite similar as the <em>Mandelbrot Set</em> implementation. The maximum number of iterations is denoted as <code class="language-plaintext highlighter-rouge">threshold</code>. If the magnitude of the sequence is never greater than <strong>4</strong>, the number <strong>z₀</strong> belongs to the <em>Julia Set</em> and vice-versa.</p> <p>The number <strong>c</strong> is giving us the freedom to analyze its impact on the convergence of the sequence, given that the number of maximum iterations is fixed. One interesting range of values for <strong>c</strong> is for <strong>c = r cos α + i × r sin α</strong> such that <strong>r=0.7885</strong> and <strong>α ∈ [0, 2π]</strong>.</p> <p>The best possible way to make this analysis is to create an animated visualization as the number <strong>c</strong> changes. This <a href="https://isquared.digital/blog/2020-02-08-interactive-dataviz/" target="_blank">ameliorates our visual perception</a> and understanding of such abstract phenomena in a captivating manner. To do so, we use the Matplotlib’s <em>Animation API</em>, as demonstrated in the code below:</p> <figure class="highlight"><pre><code class="language-python" data-lang="python"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 </pre></td><td class="code"><pre><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span> <span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="n">plt</span> <span class="kn">import</span> <span class="nn">matplotlib.animation</span> <span class="k">as</span> <span class="n">animation</span> <span class="n">x_start</span><span class="p">,</span> <span class="n">y_start</span> <span class="o">=</span> <span class="o">-</span><span class="mi">2</span><span class="p">,</span> <span class="o">-</span><span class="mi">2</span> <span class="c1"># an interesting region starts here </span><span class="n">width</span><span class="p">,</span> <span class="n">height</span> <span class="o">=</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">4</span> <span class="c1"># for 4 units up and right </span><span class="n">density_per_unit</span> <span class="o">=</span> <span class="mi">200</span> <span class="c1"># how many pixles per unit </span> <span class="c1"># real and imaginary axis </span><span class="n">re</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">linspace</span><span class="p">(</span><span class="n">x_start</span><span class="p">,</span> <span class="n">x_start</span> <span class="o">+</span> <span class="n">width</span><span class="p">,</span> <span class="n">width</span> <span class="o">*</span> <span class="n">density_per_unit</span> <span class="p">)</span> <span class="n">im</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">linspace</span><span class="p">(</span><span class="n">y_start</span><span class="p">,</span> <span class="n">y_start</span> <span class="o">+</span> <span class="n">height</span><span class="p">,</span> <span class="n">height</span> <span class="o">*</span> <span class="n">density_per_unit</span><span class="p">)</span> <span class="n">threshold</span> <span class="o">=</span> <span class="mi">20</span> <span class="c1"># max allowed iterations </span><span class="n">frames</span> <span class="o">=</span> <span class="mi">100</span> <span class="c1"># number of frames in the animation </span> <span class="c1"># we represent c as c = r*cos(a) + i*r*sin(a) = r*e^{i*a} </span><span class="n">r</span> <span class="o">=</span> <span class="mf">0.7885</span> <span class="n">a</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">linspace</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">2</span><span class="o">*</span><span class="n">np</span><span class="p">.</span><span class="n">pi</span><span class="p">,</span> <span class="n">frames</span><span class="p">)</span> <span class="n">fig</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">10</span><span class="p">))</span> <span class="c1"># instantiate a figure to draw </span><span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">axes</span><span class="p">()</span> <span class="c1"># create an axes object </span> <span class="k">def</span> <span class="nf">animate</span><span class="p">(</span><span class="n">i</span><span class="p">):</span> <span class="n">ax</span><span class="p">.</span><span class="n">clear</span><span class="p">()</span> <span class="c1"># clear axes object </span> <span class="n">ax</span><span class="p">.</span><span class="n">set_xticks</span><span class="p">([],</span> <span class="p">[])</span> <span class="c1"># clear x-axis ticks </span> <span class="n">ax</span><span class="p">.</span><span class="n">set_yticks</span><span class="p">([],</span> <span class="p">[])</span> <span class="c1"># clear y-axis ticks </span> <span class="n">X</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">empty</span><span class="p">((</span><span class="nb">len</span><span class="p">(</span><span class="n">re</span><span class="p">),</span> <span class="nb">len</span><span class="p">(</span><span class="n">im</span><span class="p">)))</span> <span class="c1"># the initial array-like image </span> <span class="n">cx</span><span class="p">,</span> <span class="n">cy</span> <span class="o">=</span> <span class="n">r</span> <span class="o">*</span> <span class="n">np</span><span class="p">.</span><span class="n">cos</span><span class="p">(</span><span class="n">a</span><span class="p">[</span><span class="n">i</span><span class="p">]),</span> <span class="n">r</span> <span class="o">*</span> <span class="n">np</span><span class="p">.</span><span class="n">sin</span><span class="p">(</span><span class="n">a</span><span class="p">[</span><span class="n">i</span><span class="p">])</span> <span class="c1"># the initial c number </span> <span class="c1"># iterations for the given threshold </span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">re</span><span class="p">)):</span> <span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">im</span><span class="p">)):</span> <span class="n">X</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">]</span> <span class="o">=</span> <span class="n">julia_quadratic</span><span class="p">(</span><span class="n">re</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="n">im</span><span class="p">[</span><span class="n">j</span><span class="p">],</span> <span class="n">cx</span><span class="p">,</span> <span class="n">cy</span><span class="p">,</span> <span class="n">threshold</span><span class="p">)</span> <span class="n">img</span> <span class="o">=</span> <span class="n">ax</span><span class="p">.</span><span class="n">imshow</span><span class="p">(</span><span class="n">X</span><span class="p">.</span><span class="n">T</span><span class="p">,</span> <span class="n">interpolation</span><span class="o">=</span><span class="s">"bicubic"</span><span class="p">,</span> <span class="n">cmap</span><span class="o">=</span><span class="s">'magma'</span><span class="p">)</span> <span class="k">return</span> <span class="p">[</span><span class="n">img</span><span class="p">]</span> <span class="n">anim</span> <span class="o">=</span> <span class="n">animation</span><span class="p">.</span><span class="n">FuncAnimation</span><span class="p">(</span><span class="n">fig</span><span class="p">,</span> <span class="n">animate</span><span class="p">,</span> <span class="n">frames</span><span class="o">=</span><span class="n">frames</span><span class="p">,</span> <span class="n">interval</span><span class="o">=</span><span class="mi">50</span><span class="p">,</span> <span class="n">blit</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span> <span class="n">anim</span><span class="p">.</span><span class="n">save</span><span class="p">(</span><span class="s">'julia_set.gif'</span><span class="p">,</span> <span class="n">writer</span><span class="o">=</span><span class="s">'imagemagick'</span><span class="p">)</span> </pre></td></tr></tbody></table></code></pre></figure> <p>The logic in the <code class="language-plaintext highlighter-rouge">animate</code> function is very similar to the previous example. We update the number <strong>c</strong> as a function of the frame number. Based on that we estimate the convergence of all complex numbers in the defined lattice, given the fixed <code class="language-plaintext highlighter-rouge">threshold</code> of allowed iterations. Same as before, we save the results in an initially empty matrix <code class="language-plaintext highlighter-rouge">X</code> and associate them to a color relative to the maximum number of iterations. The resulting animation is illustrated below:</p> <center> <img data-src="https://isquared.digital/assets/images/julia_set.gif" class="lazyload" alt="Animation of the Julia Set as the number c changes" /> <br /> <span class="caption text-muted"> <i>Fig. 3:</i> Julia Set </span> </center> <p><br /></p> <h1 id="summary">Summary</h1> <p>The fractals are really mind-gobbling structures as we saw during this blog. First, we gave a general intuition of the fractal geometry. Then, we observed two types of fractals: the <em>Mandelbrot</em> and <em>Julia</em> sets. We implemented them in Python and made interesting animated visualizations of their properties.</p> <p>If you liked this tutorial feel free to share it on the social media. Also it would be helpful if you subscribe to the mailing list below. You will get updates from time to time.</p> <link href="//cdn-images.mailchimp.com/embedcode/horizontal-slim-10_7.css" rel="stylesheet" type="text/css" /> <link href="/assets/css/mailchimp.css" /> <div id="mc_embed_signup"> <form action="https://digital.us19.list-manage.com/subscribe/post?u=cb9dbe40387c27177a25de80f&amp;id=08bda6f8e0" method="post" id="mc-embedded-subscribe-form" name="mc-embedded-subscribe-form" class="validate" target="_blank" novalidate=""> <div id="mc_embed_signup_scroll"> <label for="mce-EMAIL">Join the iSquared mailing list</label> <input type="email" value="" name="EMAIL" class="email" id="mce-EMAIL" placeholder="email address" required="" /> <!-- real people should not fill this in and expect good things - do not remove this or risk form bot signups--> <div style="position: absolute; left: -5000px;" aria-hidden="true"><input type="text" name="b_cb9dbe40387c27177a25de80f_08bda6f8e0" tabindex="-1" value="" /></div> <div class="clear"><input type="submit" value="Subscribe" name="subscribe" id="mc-embedded-subscribe" class="button" /></div> </div> </form> </div>Vladimir IlievskiWhat are Fractals and How to Make Your OwnIntegrals are Easy: Visualized Riemann Integration in Python2020-05-27T11:00:00+02:002020-05-27T11:00:00+02:00https://isquared.digital/blog/riemann-integration<script src="https://polyfill.io/v3/polyfill.min.js?features=es6"></script> <script id="MathJax-script" async src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script> <p> The <i>integral</i> is not so complicated as it seems to be. It is one of the fundamental and universal tools in mathematics allowing us to calculate the area or the volume of any arbitrary body. It is one of the cornerstones of mathematics having a multitude of applications in many disciplines. </p> <p> The <a href="https://en.wikipedia.org/wiki/Riemann_integral" target="_blank" rel="noopener nofollow">Riemann Integral</a> is the simplest form of integration, yet it lays down the foundation of all other types of integrals. It offers a rigorous method for approximating the area under the curve of some function $$f$$ over some interval $$[a, b]$$. This fact assigns to it an intuitive geometrical interpretation. </p> <p> In this blog post we will introduce and elaborate more on the <i>Riemann Integration</i>. We will start with intuitive reasoning on the process of integration in order to have a smooth transition towards the mathematical foundations. Then, we will see how to transform the theory into an easy <i>Python</i> implementation. Finally, we <a href="/blog/2020-02-08-interactive-dataviz/" target="_blank" rel="dofollow">visually enhance</a> and complement our understanding with an animated visualization of the Riemann Sums using the <i>Matplotlib's Animation API</i>. </p> <h2>Intuition</h2> <p> We can easily calculate the area of any regular-shaped bodies like the rectangle because it consists of only straight lines. Thus, for a rectangle with width $$m$$ and height $$n$$, the area is calculated as simple as $$m \times n$$. It is straightforward to deduct this because for every small unit on the side $$m$$ there is still a regular rectangle with height $$n$$. </p> <p> However, if we only change the top side from a straight line to some <i>arbitrary</i> line that can be described with some <i>non-linear</i> function, the circumstances get complicated. We can still divide the surface in small rectangles, but now, they have varying height and on top of this, they do not entirely fit inside the body. This is illustrated in the image below: </p> <center> <img data-src="https://isquared.digital/assets/images/calculate_area.png" class="lazyload" alt="Calculate area of regular and irregular shape bodies"/> <br/> <span class="caption text-muted"> <i>Fig. 1:</i> Area of regular and irregular bodies </span> </center> <br/> <p> Now, to reduce the calculation error we would need to fit <b>infinitely many</b> rectangles inside the irregular body. This leads us to the definition of the <i>Riemann Integral</i> which has exactly the same geometrical motivation and interpretation. We only need to give a more rigorous definition of this procedure, which we do in the next section. </p> <h2>Riemann Integration Definition</h2> <p> To formally define the <i>Riemann Integral</i>, we start with some real function $$f: [a, b] \rightarrow \mathbb{R}$$ which is non-negative (it includes zero values) and <a href="https://en.wikipedia.org/wiki/Continuous_function" target="_blank" rel="nofollow noopener">continuous</a> over the interval $$[a, b]$$. This is our <b>arbitrary top line</b> in the example above depicted in Figure 1. To complete the missing parts, we need to define the <b>widths</b> and <b>heights</b> of the mini rectangles. </p> <p> To define the rectangles' <b>widths</b>, we make a partition of the interval $$[a, b]$$. That means we divide the interval $$[a, b]$$ into $$N$$ sub-intervals for some $$N \in \mathbb{N}$$, i.e. </p> <center> <img data-src="https://isquared.digital/assets/images/riemann_partition.png" class="lazyload" alt="Math expression for a partition of an interval"/> </center> <br/> <p> For simplicity reasons we can consider all sub-intervals equidistant, although in the general case they can take any length. Thus, the width of any rectangle would be $$w = (b - a) \div N$$, such that for any $$i \in [1, N]$$ it holds that $$x_{i} - x_{i-1} = w = (b - a) \div N$$. </p> <p> To define the rectangles' <b>heights</b>, we simply chose a random point from each sub-interval, i.e. $$x_{i}^{*} \in [x_{i - 1}, x_{i}]$$ for any $$i \in [1, N]$$. To make things easier, we select this point to be the left-most one, i.e. $$x_{i}^{*} = x_{i-1}$$. Thus, the height of each mini rectangle is $$f(x_{i - 1})$$ for $$i \in [1, N]$$. </p> <p> Having said all of this, the formal and <b>simplified</b> definition of the <i>Riemann Integral</i> is as follows: </p> <center> <img data-src="https://isquared.digital/assets/images/riemann_sum.png" class="lazyload" alt="Math equation of the Riemann Integral with Rectangles"/> </center> <br/> <p> However, this simple definition is too lossy and it would need a large $$N$$ to converge properly. To escape this limitation, we make a simple trick: transforming the <i>mini rectangles</i> to <a href="https://mathworld.wolfram.com/RightTrapezoid.html" target="_blank" rel="nofollow noopener"><i>mini right trapezoids</i></a>. The right trapezoids fit better under the curve, accounting for the loss, as depicted in the figure below: </p> <center> <img data-src="https://isquared.digital/assets/images/rectangles_to_trapezoids.png" class="lazyload" alt="Approximating area under the curve with trapezoids instead of rectangles"/> <br/> <span class="caption text-muted"> <i>Fig. 2:</i> The difference between rectangles and right trapezoids </span> </center> <br/> <p> With this in mind, we only need to slightly modify our formal <i>Riemann Integral</i> definition. We switch from an area of a rectangle to an area of a right trapezoid. In literature, this is referred to as <a href="https://en.wikipedia.org/wiki/Trapezoidal_rule" target="_blank" rel="nofollow noopener">Trapezoidal Rule</a>. In this case, the sum would be: </p> <center> <img data-src="https://isquared.digital/assets/images/riemann_sum_trapezoids.png" class="lazyload" alt="Math equation of the Riemann Integral with Trapezoids"/> </center> <br/> <h2>Numerical Integration in Python</h2> <p> We only need to translate the last equation into a <i>Python</i> set of instructions. Thus, the Python implementation is a piece of cake as given below: </p> <div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;border-width:.1em .1em .1em .8em;padding:.2em .6em;"><table><tr><td style="border-bottom: none;"><pre style="margin: 0; line-height: 125%"> 1 2 3 4 5 6 7 8 9 10 11 12 13 14</pre></td><td style="border-bottom: none;"><pre style="margin: 0; line-height: 125%"><span style="color: #008800; font-weight: bold">def</span> <span style="color: #0066bb; font-weight: bold">calculate_integral</span>(f, a, b, n): <span style="color: #dd2200; background-color: #fff0f0">&#39;&#39;&#39;Calculates the integral based on the composite trapezoidal rule</span> <span style="color: #dd2200; background-color: #fff0f0"> relying on the Riemann Sums.</span> <span style="color: #dd2200; background-color: #fff0f0"> :param function f: the integrand function</span> <span style="color: #dd2200; background-color: #fff0f0"> :param int a: lower bound of the integral</span> <span style="color: #dd2200; background-color: #fff0f0"> :param int b: upper bound of theintergal</span> <span style="color: #dd2200; background-color: #fff0f0"> :param int n: number of trapezoids of equal width</span> <span style="color: #dd2200; background-color: #fff0f0"> :return float: the integral of the function f between a and b</span> <span style="color: #dd2200; background-color: #fff0f0"> &#39;&#39;&#39;</span> w = (b - a)/n result = <span style="color: #0000DD; font-weight: bold">0.5</span>*f(a) + <span style="color: #003388">sum</span>([f(a + i*w) <span style="color: #008800; font-weight: bold">for</span> i <span style="color: #008800">in</span> <span style="color: #003388">range</span>(<span style="color: #0000DD; font-weight: bold">1</span>, n)]) + <span style="color: #0000DD; font-weight: bold">0.5</span>*f(b) result *= w <span style="color: #008800; font-weight: bold">return</span> result </pre></td></tr></table></div> <br/> <p> Once we have the implementation, it is necessary to test it against some <i>universal mathematical truth</i>. For instance, it is well known and we can mathematically calculate that: </p> <center> <img data-src="https://isquared.digital/assets/images/approximating_pi_integral.png" class="lazyload" alt="Integral approximating Pi"/> </center> <br/> <p> To test the convergence of our numerical integration implementation, we calculate the <i>absolute difference</i> between the exact and approximated value of $$\pi$$. Therefore, at the same time we try to approximate $$\pi$$ and test our implementation. </p> <p> Moreover, to enhance the <b>perception</b> of this approximation it is necessary to show a geometrical and <b>visual interpretation</b> of the process. For this reason, we make an animated visualization using <i>Matplotlib's</i> Animation API. We make the following observation: as the number of trapezoids $$N$$ increase, the approximation error decrease. On top of that, we see how the number of trapezoids geometrically reflects in the calculation of the integral. The animation is shown below: </p> <center> <img data-src="https://isquared.digital/assets/images/riemann_sum_animation.gif" class="lazyload" alt="Animation of the Riemann Sum"> <br/> <span class="caption text-muted"> <i>Animation: </i> Approximating Pi using the Riemann Sums </span> </center> <br/> <p> We can notice that for a fairly small number of trapezoids, i.e. 200 in total, the approximation error is already in an order of magnitude of $$10^{-5}$$. That means our implementation is correct, although we can apply additional error bound analysis. </p> <p> The full source code related to all we have discussed during this blog can be found on <a href="https://github.com/IlievskiV/Amusive-Blogging-N-Coding/blob/master/Integration/riemann_sums.ipynb" target="_blank" rel="dofollow noopener">GitHub</a>. For more information, please follow me on <a href="https://twitter.com/VladOsaurus" target="_blank" rel="noopener">Twitter</a>. </p> <p> If you liked what you just saw, it would be really helpful to subscribe to the mailing list below. You will not get spammed that's a promise! You will get updates for the newest blog posts and visualizations from time to time. </p> <link href="//cdn-images.mailchimp.com/embedcode/horizontal-slim-10_7.css" rel="stylesheet" type="text/css"> <link href="/assets/css/mailchimp.css"> <div id="mc_embed_signup"> <form action="https://digital.us19.list-manage.com/subscribe/post?u=cb9dbe40387c27177a25de80f&amp;id=08bda6f8e0" method="post" id="mc-embedded-subscribe-form" name="mc-embedded-subscribe-form" class="validate" target="_blank" novalidate> <div id="mc_embed_signup_scroll"> <label for="mce-EMAIL">Join the iSquared mailing list</label> <input type="email" value="" name="EMAIL" class="email" id="mce-EMAIL" placeholder="email address" required> <!-- real people should not fill this in and expect good things - do not remove this or risk form bot signups--> <div style="position: absolute; left: -5000px;" aria-hidden="true"><input type="text" name="b_cb9dbe40387c27177a25de80f_08bda6f8e0" tabindex="-1" value=""></div> <div class="clear"><input type="submit" value="Subscribe" name="subscribe" id="mc-embedded-subscribe" class="button"></div> </div> </form> </div> <h2>Conclusion</h2> <p> In this short blog post, we extended our general geometry knowledge to calculate the area of some irregular-shaped bodies. We achieved this with the simplest form of integration, the <b>Riemann Sums</b>, for which we gave a formal definition. Later on, we provided a straightforward Python implementation and an animated visualization of the integration process using Matplotlib's Animation API. </p> <p> The <b>Riemann Integral</b> is one simple but yet powerful tool to calculate the area under the curve. However, the fact that we fit mini rectangles or trapezoids inside the area is quite limiting. More generally, the body can have any irregular shape for which we need other methods like the <i>Stieltjes</i> or <i>Lebesgue</i> integrals. </p> <h2>References</h2> <p>  Svein Linge, Hans Petter Langtangen, <a href="https://link.springer.com/content/pdf/10.1007%2F978-3-319-32428-9.pdf" target="_blank">"Programming for Computations - Python"</a> (2016), Springer Open <br/> </p>Vladimir IlievskiAnimated Visualization of the Riemann Numerical Integration in Python with MatplotlibForget Determinism: Random Walks Crash Course2020-05-22T11:00:00+02:002020-05-22T11:00:00+02:00https://isquared.digital/blog/random-walks-crash-course<script src="https://polyfill.io/v3/polyfill.min.js?features=es6"></script> <script id="MathJax-script" async src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>Vladimir Ilievski