Guide: Finding Matrix Diagonals


As part of my “Get Ready for Senior Tech Interviews” I was looking at Hacker Rank Questions and parsing through diagnoals of a Matrix Came up.

Obviously the “I am at work, not in an interview way is to use numpy, diag” However this it about understanding the iterations instead

Guide to Finding Matrix Diagonals <!– HOW TO USE IN WORDPRESS: 1. Edit a page or post in WordPress. 2. Add a "Custom HTML" block where you want this guide to appear. 3. Copy everything from the tag below to the closing tag… 4. …and paste it directly into the Custom HTML block. 5. The guide should now render correctly on your WordPress site. –> https://cdn.tailwindcss.com body { font-family: ‘Inter’, sans-serif; } .matrix-grid { display: grid; gap: 0.5rem; } .matrix-grid.grid-3 { grid-template-columns: repeat(3, minmax(0, 1fr)); } .matrix-grid.grid-4 { grid-template-columns: repeat(4, minmax(0, 1fr)); } .matrix-cell { width: 4rem; /* 64px */ height: 4rem; /* 64px */ display: flex; align-items: center; justify-content: center; font-size: 1.25rem; /* 20px */ font-weight: 700; border-radius: 0.5rem; /* 8px */ background-color: rgb(229 231 235); color: rgb(31 41 55); transition-property: all; transition-timing-function: cubic-bezier(0.4, 0, 0.2, 1); transition-duration: 300ms; } .highlight-path { background-color: rgb(16 185 129); color: #fff; box-shadow: 0 10px 15px -3px rgb(0 0 0 / 0.1), 0 4px 6px -4px rgb(0 0 0 / 0.1); transform: scale(1.1); } .highlight-path-main { background-color: rgb(99 102 241); color: #fff; box-shadow: 0 10px 15px -3px rgb(0 0 0 / 0.1), 0 4px 6px -4px rgb(0 0 0 / 0.1); transform: scale(1.1); } .highlight-row { /* Using border instead of ring for better compatibility */ border: 4px solid rgb(56 189 248); } .highlight-col { /* Using border instead of ring for better compatibility */ border: 4px solid rgb(236 72 153); } .code-block { background-color: rgb(17 24 39); color: #fff; padding: 1rem; border-radius: 0.5rem; font-family: ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, “Liberation Mono”, “Courier New”, monospace; font-size: 0.875rem; /* 14px */ overflow-x: auto; }

Finding Matrix Diagonals

An Illustrated Guide to the Python Techniques

1. The Main Diagonal (Top-Left to Bottom-Right)

The Core Rule

For the main diagonal, the rule is the simplest. An element is on the main diagonal if its row index is equal to its column index.

row_index == column_index

The Python Code

# For the main diagonal, we use enumerate to get the index 'i'
main_diagonal = [row[i] for i, row in enumerate(matrix)]

# Or, more simply with a range
main_diagonal = [matrix[i][i] for i in range(n)]

# For matrix = [[10,20,30],[40,50,60],[70,80,90]], result is [10, 50, 90]

Interactive Example: Main Diagonal

Click through the steps. Here, n = 3 and we look for elements where row_index == col_index.

Next Step Reset

Resulting List:

[ ]

2. The Anti-Diagonal (Top-Right to Bottom-Left)

The Core Rule

For the anti-diagonal, the sum of an element’s row index and column index is always equal to n - 1.

row_index + column_index = n – 1

The Python Code

# For the anti-diagonal, we use the formula: col = n - 1 - row
anti_diagonal = [matrix[i][n - 1 - i] for i in range(n)]

# For matrix = [[10,20,30],[40,50,60],[70,80,90]], result is [30, 50, 70]

Interactive Example: Anti-Diagonal

Click through the steps. Here, n = 3, so the target index sum is n - 1 = 2.

Next Step Reset

Resulting List:

[ ]
const matrixData = [ [10, 20, 30], [40, 50, 60], [70, 80, 90] ]; const n = matrixData.length; function setupInteractiveExample(config) { let currentStep = -1; const result = []; const matrixContainer = document.getElementById(config.matrixId); const stepInfoContainer = document.getElementById(config.stepInfoId); const resultList = document.getElementById(config.resultListId); const nextStepBtn = document.getElementById(config.nextBtnId); const resetBtn = document.getElementById(config.resetBtnId); function createMatrix() { matrixContainer.innerHTML = ”; matrixData.flat().forEach((val, index) => { const cell = document.createElement(‘div’); cell.className = ‘matrix-cell’; cell.id = `${config.prefix}-cell-${Math.floor(index / n)}-${index % n}`; cell.textContent = val; matrixContainer.appendChild(cell); }); } function resetHighlights() { matrixContainer.querySelectorAll(‘.matrix-cell’).forEach(cell => { cell.classList.remove(config.highlightClass, ‘highlight-row’, ‘highlight-col’); cell.style.border = ‘none’; // Reset border style }); } function updateStep() { resetHighlights(); if (currentStep >= n) { stepInfoContainer.innerHTML = `

Finished!

The loop is complete and the final list has been generated.

`; nextStepBtn.disabled = true; nextStepBtn.classList.add(‘opacity-50’, ‘cursor-not-allowed’); return; } const { i, j, value } = config.logic(currentStep); result.push(value); for (let k = 0; k { currentStep++; updateStep(); }); resetBtn.addEventListener(‘click’, () => { currentStep = -1; result.length = 0; resetHighlights(); stepInfoContainer.innerHTML = `

Step 0: Start

Click “Next Step” to begin the loop.

`; resultList.textContent = ‘[ ]’; nextStepBtn.disabled = false; nextStepBtn.classList.remove(‘opacity-50’, ‘cursor-not-allowed’); }); createMatrix(); resetBtn.click(); } // — Configuration for Main Diagonal — setupInteractiveExample({ prefix: ‘main’, matrixId: ‘main-interactive-matrix’, stepInfoId: ‘main-step-info’, resultListId: ‘main-result-list’, nextBtnId: ‘main-next-step-btn’, resetBtnId: ‘main-reset-btn’, highlightClass: ‘highlight-path-main’, logic: (step) => ({ i: step, j: step, value: matrixData[step][step] }), explanation: (i, j, value) => `

Step ${i + 1}: (i = ${i})

The rule is row == col, so we get the element at matrix[${i}][${i}].

Value: ${value}

` }); // — Configuration for Anti-Diagonal — setupInteractiveExample({ prefix: ‘anti’, matrixId: ‘anti-interactive-matrix’, stepInfoId: ‘anti-step-info’, resultListId: ‘anti-result-list’, nextBtnId: ‘anti-next-step-btn’, resetBtnId: ‘anti-reset-btn’, highlightClass: ‘highlight-path’, logic: (step) => ({ i: step, j: n – 1 – step, value: matrixData[step][n – 1 – step] }), explanation: (i, j, value) => `

Step ${i + 1}: (i = ${i})

Calculate column: j = n-1-i => ${n}-1-${i} => ${j}.

Coordinates: matrix[${i}][${j}]

Value: ${value}

` });

CALCULATE PI USING MONTECARLO SIMULATION Phase IV – Excel VBA & Python via PYXLL


Introduction

Everywhere I have ever worked (I am mainly talking about the standard sell side Wholesale / Investment banks) use Excel & VBA for their prototypes and any desk based development. While lots of time, money and effort is being spent on moving FO and MO staff onto Python there will still be quite a lot of legacy applications coupled with traders who don’t want to skill up to Python and a myriad of other reasons

Excel VBA Prototype

So lets see what we can do in native VBA to simulate a Front Office / Strat developed legacy solution we are looking to provide a strategic solution for.

Public Function CalcPiMonteCarloVBA(ByVal Simulations As Long, _
                                    Optional ByRef TimeTaken As Double) As Double

          Dim Inside As Long
          Dim i As Long
          Dim x As Double
          Dim y As Double
          Dim StartTime As Double
          
10        On Error GoTo ErrHandler

20        StartTime = Timer

30        Inside = 0

40        For i = 1 To Simulations
50            x = Rnd: y = Rnd
              
60            If x ^ 2 + y ^ 2 <= 1 Then
70                Inside = Inside + 1
80            End If
90        Next

100       CalcPiMonteCarloVBA = 4 * (Inside / Simulations)

110       TimeTaken = Timer - StartTime

120       Exit Function
ErrHandler:
130       Err.Raise 4480, Erl, "Error in CalcPiMonteCarloVBA: " & Err.Description
End Function

Excel Front End

We can take advantage of the advantages of an Excel VBA based solution and create a little GUI to test out some basic assumptions.

Eventhandling Code

Sub CalcPiMonteCarlo_Click()

          Dim TimeTaken As Double

10        On Error GoTo ErrHandler

20        Me.Range("CalculatedPi").Value2 = CalcPiMonteCarloVBA(Me.Range("Simulations").Value2, TimeTaken)
30        Me.Range("TimeTaken").Value2 = TimeTaken

40        Exit Sub
ErrHandler:
50        MsgBox Err.Description & " occured on " & Erl, vbCritical, "Error in CalcPiMonteCarlo_Click"
End Sub

As can be seen in the Photo compared to Python itself the performance is none too shabby. About twice the speed of Native python without any tweaks. However we know from Phase III that numba will massively speed things up.

What would be interesting is to re-use this Python code, in its existing form and try to call this from VBA and see if we still get the performance benefit.

Calling Python from VBA.

I have been spoilt by the banks I work for doing all of this within their existing Quant Library. Mainly routing through the COM layer and into C++ and with the calling of Python tacked onto the side of that. However from homebrewing solutions there seem to be two main contenders I have found that I would like to try out.

  1. pyxll
  2. xlwings

PyXLL

Setup & Install

Microsoft Office issues

Well I don’t use it much at home and it turned out I have 32 Bit office installed on my machine. I use 64bit Python so I have to uninstall and re-install Office to not mix bit-ness between them.

I am glad I waited to finish off this blog piece, as I found the inital set up of this a nightmare. However this really wasn’t PyXLL’s fault. I had many many python installs across different versions, both Pip and Anaconda and 32bit 3.7 and 64 bit 2.7 etc. It was awful. Reminds me of this :

Yes, my environments were a total mess too.

So I decided that as I had elected to use Anaconda for Jupyter Notebooks, I would wipe ALL of my python installs off the map and start afresh. I am super glad I did as after that all the installs and setup was miles easier and I now have a conda pyxll venv set up to play in.

Initial Python file changes

The documentation for PyXLL is very good and I was able to adapt my calc_pi code pretty easily

from random import random
from numba import njit, jit
from pyxll import xl_menu, xl_app, xlcAlert, xl_func


@xl_func
@jit
def calc_pi_numba(num_attempts: int) -> float:
    inside = 0

    for _ in range(num_attempts):
        x = random()
        y = random()

        if x ** 2 + y ** 2 <= 1:
            inside += 1

    return 4 * inside / num_attempts

@xl_func
def calc_pi_quick(num_attempts: int) -> float:
    from random import random

    inside = 0

    for _ in range(num_attempts):
        x = random()
        y = random()

        if x ** 2 + y ** 2 <= 1:
            inside += 1

    return 4 * inside / num_attempts

@xl_func
def multi_calc_pi(num_attempts: int, verbose=False) -> float:
    from multiprocessing import Pool
    from random import random
    import os
    from statistics import mean

    # lets leave one spare for OS related things so everything doesn't freeze up
    num_cpus = os.cpu_count() - 1
    # print('Num of CPUS: {}'.format(num_cpus))

    try:
        attempts_to_try_in_process = int(num_attempts / num_cpus)
        pool = Pool(num_cpus)
        # what I am trying to do here is get the calc happening over n-1 Cpus each with an even
        # split of the num_attempts. i.e. 150 attempts over 15 CPU will lead to 10 each then the mean avg
        # of the returned list being returned.
        data_outputs = pool.map(calc_pi_quick, [attempts_to_try_in_process] * num_cpus)
        return mean(data_outputs)
    finally:  # To make sure processes are closed in the end, even if errors happen
        pool.close()
        pool.join()

I did have to do some debugging which I hacked up from the examples bundled with PyXLL. Some of it was quite cool and worth noting here.

from pyxll import xl_menu, xl_app, xlcAlert, xl_func
from win32com.client import constants
import win32clipboard
import math
import time
import pydevd_pycharm


@xl_menu("Connect to PyCharm", menu="Profiling Tools")
def connect_to_pycharm():
    pydevd_pycharm.settrace('localhost',
                            port=5000,
                            suspend=False,
                            stdoutToServer=True,
                            stderrToServer=True)


@xl_menu("Time to Calculate", menu="Profiling Tools")
def time_calculation():
    """Recalculates the selected range and times how long it takes"""
    xl = xl_app()

    # switch Excel to manual calculation and disable screen updating
    orig_calc_mode = xl.Calculation
    try:
        xl.Calculation = constants.xlManual
        xl.ScreenUpdating = False

        # get the current selection
        selection = xl.Selection

        # Start the timer and calculate the range
        start_time = time.clock()
        selection.Calculate()
        end_time = time.clock()
        duration = end_time - start_time

    finally:
        # restore the original calculation mode and enable screen updating
        xl.ScreenUpdating = True
        xl.Calculation = orig_calc_mode

    win32clipboard.OpenClipboard()
    win32clipboard.EmptyClipboard()
    win32clipboard.SetClipboardText(str(duration))
    win32clipboard.CloseClipboard()

    # report the results
    xlcAlert('time taken : {}'.format(duration))

    return duration

VBA based PERFORMANCE measuring

I tried to log the performance by passing in a cell to a Python macro function in PyXLL but as PXLL support informed me if that is called from another cell, then the calculation tree is already going and it wont work. Hence I had to use a menu Item.

Then I realised that if I could get a good enough timer in VBA by hooking into some of the underlying Windows functionality I could get soe decent timing by wrapping each of the calls in a VBA call.

modTimer VBA

Type UINT64
    LowPart As Long
    HighPart As Long
End Type

Private Const BSHIFT_32 = 4294967296# ' 2 ^ 32
Private Declare PtrSafe Function QueryPerformanceCounter Lib "kernel32" (lpPerformanceCount As UINT64) As Long
Private Declare PtrSafe Function QueryPerformanceFrequency Lib "kernel32" (lpFrequency As UINT64) As Long

Global u64Start As UINT64
Global u64End As UINT64
Global u64Fqy As UINT64
 
Public Function U64Dbl(U64 As UINT64) As Double
    Dim lDbl As Double, hDbl As Double
    lDbl = U64.LowPart
    hDbl = U64.HighPart
    If lDbl < 0 Then lDbl = lDbl + BSHIFT_32
    If hDbl < 0 Then hDbl = hDbl + BSHIFT_32
    U64Dbl = lDbl + BSHIFT_32 * hDbl
End Function


Public Sub StartTimer(ByRef u64Fqy As UINT64, ByRef u64Start As UINT64)
    ' Get the counter frequency
    QueryPerformanceFrequency u64Fqy
    ' Get the counter before
    QueryPerformanceCounter u64Start
End Sub


Public Function EndTimer() As UINT64

    ' Get the counter before
    QueryPerformanceCounter u64End
    EndTimer = u64End
    
End Function


Public Function CalcTimeTaken(ByRef u64Fqy As UINT64, ByRef u64Start As UINT64) As Double
    ' User Defined Type cannot be passed ByVal and also not copying it should reduce the error seen in
    ' tioming for very small increments as we expect to see say timing something like Numba loops vs VBA for
    ' small iteations (< 1000)

    CalcTimeTaken = (U64Dbl(EndTimer) - U64Dbl(u64Start)) / U64Dbl(u64Fqy)

End Function

Wrapper VBA Function

Returns a 2 cell variant with the estimated Pi value and then the time taken

Public Function CalcPi(ByVal NumAttempts As Long, _
                       Optional ByVal CalcMethod As String = "CALCPIMONTECARLOVBA") As Variant

    Dim x(1) As Variant
    Dim StartTime As Double
    Dim EndTime As Double
    Dim TimeTaken As Double
    Dim Pi As Double
    
    
    On Error GoTo ErrHandler
    
    Pi = 0
        
    modTimer.StartTimer modTimer.u64Fqy, modTimer.u64Start
    Pi = Run(CalcMethod, NumAttempts)
    TimeTaken = CalcTimeTaken(modTimer.u64Fqy, modTimer.u64Start)
    
    x(0) = Pi
    x(1) = TimeTaken
    
    CalcPi = Application.Transpose(x)
    
    Exit Function
ErrHandler:
    Err.Raise 4444, Erl, "Error in CalcPi: " & Err.Description
End Function

Results

Performance of existing VBA – considerations

So it is interesting to note that if you have lots of small number crunching tasks to complete, moving all these to native Python might not actually save your bacon if looking for performance improvements.

CalcPiMonteCarloVBA takes almost half the time ( 24s ) vs calc_pi_quick taking 46 seconds to complete for 50,000,000 iterations.

Comparison to Jupyter Notebook

Iterations505005000500005000005000000
calc_pi_numba0.00022740.0002450.0004370.0020.01648070.1567803
calc_pi_numba Jupyter0.0000130.000020.0001610.0015880.0158830.152588
CalcPiMonteCarloVBA0.00014330.0003250.0026270.0250180.24452122.4704114
N/AN/AN/AN/AN/AN/AN/A
multi_calc_pi2.15119512.2541022.2633842.2548212.47322452.9926874
multi_calc_pi Jupyter2.8211772.8119432.7145792.7103182.9168163.315011
calc_pi_quick0.00047330.0007430.005730.048880.46228825.3454645
calc_pi_quick Jupyter0.0000710.0005040.0051170.0512970.4672264.660022

THe above table is a little data dense and not that informational but it seems that calling from PyXLL is even perhaps slightly faster than from a Notebook which is awesome news!

So computationally with Python diligence and some CompSci though we can move swathes of calculation intensive code out of VBA and into Python and get the panacea of

  • greater code control – we can assume the python code can live in a repository and have CI/CD “DevOps” practices around its delivery to the user.
  • a great library of open source functionality (pandas etc)
  • increased speed from Cython
  • code re-use (harder to do outside of XLAs)

CALCULATE PI USING MONTECARLO SIMULATION – PHASE III – aka "Multicarlo"


Introduction

After Phase II, we discovered that Numpy wasn’t going to help. Also, getting a half way decent approximation of Pi was going to take millions of paths / attempts. So on a slimmed down calculation where the [x,y] position of each random try was not kept, what could be done to make this as quick and accurate as possible?

Multi-Processing?

Can I use the 16 Cores on my machine to go 16X faster? Am I being naiive here? If I am, how and what is the most that can be done? I would like to see if the infamous G.I.L will Get me Into Logjam or not.

First Attempt [Disaster]

from multiprocessing import Pool
import os

num_cpus = os.cpu_count()
print('Num of CPUS: {}'.format(num_cpus))

try:
    pool = Pool(num_cpus) # on 8 processors
    data_outputs = pool.map(calc_pi_quick, [10])
    print (data_outputs)
finally: # To make sure processes are closed in the end, even if errors happen
    pool.close()
    pool.join()

Disappointingly, not only is it not any quicker; in fact, doesn’t finish running AT ALL.

Awkward, lets look at why that is happening first?

  • I quite like Medium Articles and this one seems to hit the spot.
  • Also this from StackOverflow

Second Attempt

from random import random

def calc_pi_quick(num_attempts: int) -> float:
    from random import random

    inside = 0

    for _ in range(num_attempts):
        x = random()
        y = random()

        if x ** 2 + y ** 2 <= 1:
            inside += 1

    return 4 * inside / num_attempts


def multi_calc_pi(num_attempts: int, verbose=False) -> float:
    from multiprocessing import Pool
    from random import random
    import os
    from statistics import mean

    # lets leave one spare for OS related things so everything doesn't freeze up
    num_cpus = os.cpu_count() - 1
    # print('Num of CPUS: {}'.format(num_cpus))

    try:
        attempts_to_try_in_process = int(num_attempts / num_cpus)
        pool = Pool(num_cpus)
        # what I am trying to do here is get the calc happening over n-1 Cpus each with an even
        # split of the num_attempts. i.e. 150 attempts over 15 CPU will lead to 10 each then the mean avg
        # of the returned list being returned.
        data_outputs = pool.map(calc_pi_quick, [attempts_to_try_in_process] * num_cpus)
        return mean(data_outputs)
    finally:  # To make sure processes are closed in the end, even if errors happen
        pool.close()
        pool.join()

To get multiprocessing to work on Jupyter Notebooks and Windows 10, I had to save this code to a different file to be called from the Jupyter Notebook, and executed within __main__. However once I did, this did lead to a significant performance increase.

Collect the Stats

import calcpi
from timeit import default_timer as timer
from datetime import timedelta
import matplotlib.pyplot as plt

if __name__ == '__main__':
    
    num_attempts = [50, 500, 5_000, 50_000, 500_000, 5_000_000, 50_000_000]
    multi_timings = []
    single_timings = []
    
    for x in num_attempts:
    
        start = timer()
        multi_pi = calcpi.multi_calc_pi(x)
        end = timer()  
        multi_timings.append((end-start))
        print("MultiCarlo Pi: {}: {} took {}".format(multi_pi, x, timedelta(seconds=end-start)))

        start = timer()    
        pi = calcpi.calc_pi_quick(x)
        end = timer()
        single_timings.append((end-start))
        print("MonteCarlo Pi: {}: {} took {}".format(pi, x, timedelta(seconds=end-start)))

    plt.plot(num_attempts, multi_timings)
    plt.plot(num_attempts, single_timings)

    plt.show()

Display the stats in a more meaningful way

import plotly.graph_objects as go

import numpy as np

# lets pretty this up so the axis label is more readable
x_axis = ['{:,}'.format(x) for x in num_attempts]

fig = go.Figure(data=[
    go.Bar(name='Multi Process', x=x_axis, y=multi_timings),
    go.Bar(name='Single Process', x=x_axis, y=single_timings)
])
# Change the bar mode
fig.update_layout(xaxis_type='category',barmode='group')
fig.show()
Single vs Multiprocessing for increasing simulation size

Early on in the exponentially increasing simulation size, the overhead of creating the Pool, asking the os for CPU count the splitting the attempts up to the pool seemed to result in a floor of 0.4s. However, at that level the Pi value returned was nonsense anyway. So I think the overhead for any reasonable approximation of Pi is worth the set up cost in this case.

Baby, can I get your Numba?

Can we go even further? I think we could give Numba a try.

After some research and reading around the internet for a day or so while avoiding Covid-19 I found out it is incredibly simple.

4 little chars that change everything. @jit

from random import random
from numba import njit, jit

@jit
def calc_pi_numba(num_attempts: int) -> float:
    inside = 0

    for _ in range(num_attempts):
        x = random()
        y = random()

        if x ** 2 + y ** 2 <= 1:
            inside += 1

    return 4 * inside / num_attempts

That is it. Just one import and one teensy little decorator and we are away. How does this stack up agianst my mighty multi monte carlo from earlier though?

Lets get some timings

import calcpi
from timeit import default_timer as timer
from datetime import timedelta
import matplotlib.pyplot as plt
import plotly.graph_objects as go

if __name__ == '__main__':
    
    num_attempts = [50, 500, 5_000, 50_000, 500_000, 5_000_000, 50_000_000]
    function_list = [calcpi.calc_pi_quick, calcpi.calc_pi_numba, calcpi.multi_calc_pi]
    results_dict = {}

    for func in function_list:

        timings = []
        
        for x in num_attempts:
        
            start = timer()
            pi = func(x)
            end = timer()  
            timings.append((end-start))
            # print("{} => Pi: {} Attempts: {} Time: {}".format(func.__name__, pi, x, timedelta(seconds=end-start)))

        results_dict[func.__name__] = timings
    
    go_data = []
    # lets pretty this up so the axis label is more readable
    x_axis = ['{:,}'.format(x) for x in num_attempts]
    
    for func in function_list:
    
        plt.plot(num_attempts, results_dict[func.__name__])
        
        go_data.append(go.Bar(name=func.__name__, x=x_axis, y=results_dict[func.__name__]))

    plt.show()

    fig = go.Figure(data=go_data)
    # Change the bar mode
    fig.update_layout(xaxis_type='category',barmode='group')
    fig.show()
        

Results visulisation

X: Attempts Y: Seconds to run
X: Attempts Y: Seconds to run

Oh… it absolutely smashes it.

Out of the park for

  • Speed
  • Readability
  • Implementation simplicity

CALCULATE PI USING MONTECARLO SIMULATION – PHASE II


Context

Carrying on from Phase I, I wanted to see how I could take the initial answer and make it more efficient. The first port of call then is… how expensive is it at the moment. Will any changes I make have a significant impact on that baseline.

timeit

I dont think using %%timeit as an ipython coding secret on the cell will cut the mustard, but we shall see. I would like to be able to single out code sections for timing too.

Calc Pi Unoptimised

def calc_pi(num_attempts, display_plot=False):

    import matplotlib
    import matplotlib.pyplot as plt
    from random import random
    
    inside = 0

    x_inside = []
    y_inside = []
    x_outside = []
    y_outside = []

    for _ in range(num_attempts):
        x = random()
        y = random()
        if x**2+y**2 <= 1:
            inside += 1
            x_inside.append(x)
            y_inside.append(y)
        else:
            x_outside.append(x)
            y_outside.append(y)

    pi = 4*inside/num_attempts
    
    if display_plot:
        fig, ax = plt.subplots()
        ax.set_aspect('equal')
        ax.scatter(x_inside, y_inside, color='g')
        ax.scatter(x_outside, y_outside, color='r')

    return pi

Calc Pi Quick (Version 1)

Quick Python only changes, remove array recording and if we never plan to plot remove the if statement

def calc_pi_quick(num_attempts: int):

    inside = 0

    for _ in range(num_attempts):
        x = random()
        y = random()
        if x**2+y**2 <= 1:
            inside += 1

    return 4*inside/num_attempts

Initial comparison

from timeit import default_timer as timer
from datetime import timedelta

start = timer()
calc_pi(50000000, True)
end = timer()
print(timedelta(seconds=end-start))

start = timer()
calc_pi(50000000, False)
end = timer()
print(timedelta(seconds=end-start))

start = timer()
calc_pi_quick(50000000)
end = timer()
print(timedelta(seconds=end-start))

Output

calc_pi with plotly display time: 0:00:08.318734 
calc_pi no display time: 0:00:00.625193 
calc_pi_quick time: 0:00:00.470770

We start to see there are some decent performance benefits. Lets check if these are linear or not. However to see some really decent improvements I think we can start to think about threading perhaps? Also if we still wanted to record the hit and miss positions, would numpy be quick enough that we can keep this data without a significant performance hit?

Numpy Optimisation Attempts

Spoiler Alert… my first attempt was absolutely terrible. However, in a way that’s great news. I now know what not to do with numpy on a non trivial problem, and now so do you.

def calc_pi_numpi_bad(num_attempts):
    
    import numpy as np
    import matplotlib
    import matplotlib.pyplot as plt
    from random import random
    
    inside = 0

    x_inside = np.array([])
    y_inside = np.array([])
    x_outside = np.array([])
    y_outside = np.array([])

    for _ in range(num_attempts):
        x = random()
        y = random()
        if x**2+y**2 <= 1:
            inside += 1
            x_inside = np.append(x_inside, x)
            y_inside = np.append(y_inside, y)
        else:
            x_outside = np.append(x_outside, x)
            y_outside = np.append(y_outside, y)

    pi = 4*inside/num_attempts

    return pi

How bad? Well lets see…

def measure_performance_verbose(num_attempts: int) -> None:

    from timeit import default_timer as timer
    from datetime import timedelta

    start = timer()
    calc_pi(num_attempts, False)
    end = timer()
    time_taken = end-start
    print('calc_pi time: {}'.format(timedelta(seconds=time_taken)))

    start = timer()
    calc_pi_quick(num_attempts)
    end = timer()
    quick_time_taken = end-start
    print('calc_pi_quick time: {}'.format(timedelta(seconds=quick_time_taken)))

    percentage_improvement = quick_time_taken / time_taken * 100
    
    print('Precentage improvement: {}'.format(percentage_improvement))

    start = timer()
    calc_pi_numpi(num_attempts)
    end = timer()
    numpi_time_taken = end-start
    print('calc_pi_numpi time: {}'.format(timedelta(seconds=numpi_time_taken)))
    
    start = timer()
    calc_pi_numpi_bad(num_attempts)
    end = timer()
    numpi_bad_time_taken = end-start
    print('calc_pi_numpi_bad time: {}'.format(timedelta(seconds=numpi_bad_time_taken)))
    
    
measure_performance_verbose(500_000)
calc_pi time: 0:00:00.636658
calc_pi_quick time: 0:00:00.462309
Precentage improvement: 72.6149660351135
calc_pi_numpi time: 0:00:00.616958
calc_pi_numpi_bad time: 0:12:52.784511

Whoa, what just happened there. It takes TWELVE seconds to run for 500,000 attempts. That is atrocious.

Lets look at the documentation for this function.

Ahh, that explains it. Every append copies the existing array into a new one. I did not know that, shame on me. I have always used np arrays as data converted from pandas or existing python lists, not created and appended to them on the fly.

However, we can fix it by pre-allocating the mem to the max num required and then mask the results if zero in plotly.

def calc_pi_numpi(num_attempts, display_plot=False, display_masked_plot=False):
    
    import numpy as np
    import matplotlib
    import matplotlib.pyplot as plt
    from random import random
    
    inside = 0

    x_inside = np.zeros(num_attempts)
    y_inside = np.zeros(num_attempts)
    x_outside = np.zeros(num_attempts)
    y_outside = np.zeros(num_attempts)

    for i in range(num_attempts):
        x = random()
        y = random()
        if x**2+y**2 <= 1:
            inside += 1
            x_inside[i] = x
            y_inside[i] = y
        else:
            x_outside[i] = x
            y_outside[i] = y

    if display_masked_plot:
        
        fig, ax = plt.subplots()
        ax.set_aspect('equal')
        ax.scatter(np.ma.masked_equal(x_inside,0), np.ma.masked_equal(y_inside,0), color='g')
        ax.scatter(np.ma.masked_equal(x_outside,0), np.ma.masked_equal(y_outside,0), color='r')
        
    if display_plot:
            
        x_inside2 = x_inside[x_inside !=0] 
        y_inside2 = y_inside[y_inside !=0]
        x_outside2 = x_outside[x_outside !=0]
        y_outside2 = y_outside[y_outside !=0]

        fig, ax = plt.subplots()
        ax.set_aspect('equal')
        ax.scatter(x_inside2, y_inside2, color='g')
        ax.scatter(x_outside2, y_outside2, color='r')

    pi = 4*inside/num_attempts

    return pi

There are two booleans so that we can test that both work.

calc_pi_numpi(256, True, True)
masked


numpy zero removal
start = timer()
calc_pi_numpi(500_000, True, False)
end = timer()
time_taken = end-start
print('calc_pi normal display time: {}'.format(timedelta(seconds=time_taken)))

start = timer()
calc_pi_numpi(500_000, False, True)
end = timer()
time_taken = end-start
print('calc_pi masked display time: {}'.format(timedelta(seconds=time_taken)))
calc_pi normal display time: 0:00:02.356013
calc_pi masked display time: 0:00:04.012019

Much much better.

start = timer()
calc_pi_numpi(500_000, False, False)
end = timer()
time_taken = end-start
print('calc_pi no display time: {}'.format(timedelta(seconds=time_taken)))

start = timer()
calc_pi(500_000)
end = timer()
time_taken = end-start
print('calc_pi no display time: {}'.format(timedelta(seconds=time_taken)))
calc_pi no display time: 0:00:00.622075 
calc_pi no display time: 0:00:00.655379

So… numpy doesn’t make things better, but used correctly, doesn’t make things any worse. So this is a dead end. Also we don’t want to plot for the numbers higher than a million as it looks to the human eye to be , a complete quarter circle. in plotly anyway so we dont need to store large amounts of array data anyway

While I am here I wanted to utilise the “functions are first class objects” paradigm in python. I have hinted at it by showing the initial fuction as _verbose but its much cleaner to call that same code within a functor ( of sorts ) loop as below.

def measure_performance(num_attempts: int) -> None:

    from timeit import default_timer as timer
    from datetime import timedelta
    
    funcs_to_call = [calc_pi, calc_pi_quick, calc_pi_numpi, calc_pi_numpi_bad]
    
    for func in funcs_to_call:
        start = timer()
        calc_pi(num_attempts, False)
        end = timer()
        time_taken = end-start
        print('{} time: {}'.format(func.__name__, timedelta(seconds=time_taken)))
    
measure_performance(50000)

In Phase III -> aka PiMultiCarlo I want to loo at what we can do locally on a decent multi core machine to get some decent speed improvements for only calculating the number only, not storing the guesses.

Follow-up

For Part III : CALCULATE PI USING MONTECARLO SIMULATION – PHASE III – aka “Multicarlo”

Calculate Pi using MonteCarlo Simulation – Phase I


Context

I was interviewing for a Python developer role in an Investment bank, and a nice enough but over zealous quant started asking me silly questions. You know the sort. However in this case I didnt really know the answer. So I have determined to provide a solution I am happy that I understand and also at the same time gets me into new areas of Python development.

Question:

How would calculate the value of Pi, from scratch / first principles?

Answer I gave:

“Uhhh, pardon?

Answer I should have given:

The equation for a circle is (x-h)2 + (y-k)2 = r2. With the centre as the origin and a radius of 1 we can cancel out to : x2 + y2 ≤ 1

We then look at a quarter of the circle to take away a lot of the maths needed, and randomly insert points into the resulting square. The random, co-ords are known to be either inside or outside of the desired area. We then can get the estimation of Pi as :

We know that (Points in the circle)/(Total points) = (πr2)/(2*r)2

Which we can solve for a radius of 1 as (Points in the circle)/(Total points) = (π)/(2)2

Then π = 4 * (Points in the circle)/(Total points)

Python Implementation

Err… ok. I feel now this was quite spurious in an interview situation, but lets look at the python and see what we can do.

def calc_pi(num_attempts, display_plot=False):

    import matplotlib
    import matplotlib.pyplot as plt
    from random import random
    
    inside = 0

    x_inside = []
    y_inside = []
    x_outside = []
    y_outside = []

    for _ in range(num_attempts):
        x = random()
        y = random()
        if x**2+y**2 <= 1:
            inside += 1
            x_inside.append(x)
            y_inside.append(y)
        else:
            x_outside.append(x)
            y_outside.append(y)

    pi = 4*inside/num_attempts
    
    if display_plot:
        fig, ax = plt.subplots()
        ax.set_aspect('equal')
        ax.scatter(x_inside, y_inside, color='g')
        ax.scatter(x_outside, y_outside, color='r')

    return pi
Attempting Monte Carlo Simulation with 50 attempts. 
Pi: 3.28 

Attempting Monte Carlo Simulation with 500 attempts. 
Pi: 2.992 

Attempting Monte Carlo Simulation with 5000 attempts. 
Pi: 3.176 

Attempting Monte Carlo Simulation with 50000 attempts. 
Pi: 3.14688 

Real PI: 3.141592653589793

Plotly Visualisation

Continuing with this “interview answer” level of solution, this might make for a cool Jupyter Notebook. Lets get a graphical representation.

Lo and behold, it does! Trying with ever increasing attempts shows the quarter circle becoming more and more clear.

Monte Carlo Simulation with 50 attempts. Pi: 3.28
Monte Carlo Simulation with 500 attempts. Pi: 2.992
Monte Carlo Simulation with 5000 attempts. Pi: 3.176
Monte Carlo Simulation with 50000 attempts. Pi: 3.14688

Run Multiple times to get a mean average?

I thought that with low numbers there would be quite a large variance but the mean over multiple times would still be decent.

def plot_attempt_pi_MC_n_times(num_times, mc_attempts):

    # ok, lets use this function to try to see what the drift between that and Pi is.
    import matplotlib.pyplot as plt
    import statistics

    pi_vals = []

    for i in range(num_times):
        pi = calc_pi(mc_attempts)
        pi_vals.append(pi)

    attempt = list(range(1, num_times + 1))
    actual_pi = [math.pi]*num_times

    plt.plot(attempt, pi_vals)
    plt.plot(attempt, actual_pi)
    # a really simple arithmetic way of getting pretty close
    plt.plot(attempt, [22/7]*num_times)
    avg = statistics.mean(pi_vals)
    
    plt.plot(attempt, [avg]*num_times)
    print('Avg: {}. Diff to actual Pi: {}'.format(avg, abs(math.pi - avg)))
    plt.show()
    return pi_vals

So we can multiple times, and plot each result, then take the mean compared to the same number of attempts in one single Monte Carlo evaluation

Code to try both versions of the same number of paths

mc_pi = calc_pi(5000)
print('calc_pi(5000): {}. Diff to actual Pi: {}'.format(mc_pi, abs(math.pi - mc_pi)))
w = plot_attempt_pi_MC_n_times(100, 50)

mc_pi = calc_pi(50_000)
print('calc_pi(50_000): {}. Diff to actual Pi: {}'.format(mc_pi, abs(math.pi - mc_pi)))
x = plot_attempt_pi_MC_n_times(100, 500)

mc_pi = calc_pi(500_000)
print('calc_pi(500_000): {}. Diff to actual Pi: {}'.format(mc_pi, abs(math.pi - mc_pi)))
y = plot_attempt_pi_MC_n_times(100, 5_000)

mc_pi = calc_pi(5_000_000)
print('calc_pi((5_000_000): {}. Diff to actual Pi: {}'.format(mc_pi, abs(math.pi - mc_pi)))
z = plot_attempt_pi_MC_n_times(100, 50_000)

5,000 Attempts

calc_pi(5000): 3.1376. Diff to actual Pi: 0.003992653589793171
Avg: 3.136. Diff to actual Pi: 0.005592653589792995

50,000 Attempts

calc_pi(50_000): 3.14408. Diff to actual Pi: 0.002487346410207092
Avg: 3.14256. Diff to actual Pi: 0.000967346410206904

500,000 Attempts

calc_pi(500_000): 3.144032. Diff to actual Pi: 0.002439346410207044
Avg: 3.14292. Diff to actual Pi: 0.001327346410207042

5,000,000 Attempts

calc_pi((5_000_000): 3.1420616.
Diff to actual Pi: 0.00046894641020678307
Avg: 3.1408424.
Diff to actual Pi: 0.0007502535897931928

Conclusion

We can say without a doubt that this method of calculating Pi is both expensive AND inncaurate compared to even 22/7. This approximation only gets beaten after millions of simulation steps.

It was however an interesting foray into a number of areas of maths and Python. While it is a bit mad I would like to keep going with this. How can we make this faster locally, and what it we turned to alternative solutions like the power of the cloud to get within .. say a reliable 10dp?

Follow-up

CALCULATE PI USING MONTECARLO SIMULATION – PHASE II

Google Music Takeout – Track Data


I always find working through a text book that it gets quite dry, and I struggle to maintain interest as the project or subject matter is too esoteric.

So, armed with a little python, StackOverflow and a search engine, I decided to see what I could do with my own data.

Photo by Clem Onojeghuo on Unsplash

What have I been listening to for the last three or four years?

Google allows users access to all their past data, and as a long time user of Google Music I wondered what the oversight on all this would be. Unlike Spotify, there doesn’t seem to be any scope for introspection into musical history or taste, which strikes me as odd. I guess Google is more about harvesting and selling the data, whereas Spotify realises that showing insight into their users activity and taste helps drive traffic and engagement levels.

Google Takeout Track dataset structure

unzipped takeout tracks, highlighting the weirder filenames

Seriously? A gazillion CSV files? Well… 14,646. That’s the best they can do? However, I then thought, hang on… combining multiple CSV files and cleaning up rubbish data in the dataset is the exact sort of basic data science 101 that every project has. Bring it on.

Track csv file combination

Python to combine the .csv files, lifted from my Jupyter Notebook

import os

# change as required, obviously
directory = '/takeout_data/Google Play Music/Tracks'

csv_files = []

for root,dirs,files in os.walk(directory):
    for file in files:
       if file.endswith(".csv"):
            csv_files.append(directory + "\\" + file)
            
print(len(csv_files))

Now lets iterate over the files found in the selected folder, add these to a list and then concatenate these into a pandas DataFrame. I’m painfully aware that the snippet below isn’t fully “pythonic” and wouldn’t win any stack overflow plaudits. However, I am backing myself to improve this at the end and I stated this would be a warts and all development blog. As such this is a “get this working to make sure I am going in the right direction and this is a good idea overall” mode. If its working I can move on and start to make enough progress to motivate myself to keep coding on the train and before bed each night. Also to be fair this code will be run on a once per archive result basis, so doesn’t need to be hugely optimised, considering the other project time dermands.

import pandas as pd

combined_csv = []
for i, csv_file in enumerate(csv_files):
    
    x = pd.read_csv(csv_file)
     
    if not pd.isna(x['Artist'].values) and x['Play Count'][0] > 0:
         if i % 100 == 0:
            print('100th Artist found: {}. Processing {}/{}'.format(x['Artist'].values, i, total))   
    
        combined_csv.append(x)

combined_df = pd.concat(combined_csv)

Which leads to the Jupyter Notebook output:

printing the output solved my “is this working?” worries

After saving this combined .csv file I thought I would have a quick peek into the data, to check what we were actually given.

Track combination sanity check

Ok, so now I know what I’ve listened to is all in one place. I now face some issues if I want to start answering the questions I had in mind.

  • Is this data reliable, how much cleaning and transformation will be needed?
  • When did these tracks get played?
  • Where can I find the metadata for this basic track info?

I created a new Jupyter notebook in order to evaluate the data. As a cursory check I thought the tthe followong would be a good test

  • See what the song length distribution was like
  • Top Artists
  • Total Number of Songs
  • DataFrame.head() displays

Jupyter Notebook : TakeoutTrackAnalysisBlog

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Further questions (for another blog post)

  • What are my most popular artists, albums and genres?
  • Do I have any dirty / unknown track data
  • What are my top 10 shortest and longest tracks

Blog at WordPress.com.

Up ↑