In last years, Machine Learning made important breakthrough. In this domain, the convolution is highly used to detect features on image. But convolutions are not used only in ML but also a lot in post processing for images. This step is often slow and requires important ressources. I recently discovered that there is ways to significantly increase the perfomance with Filter's decomposition from this video. In this notebook we will do a benchmark with Gaussian filters then apply this method on a filter well known of photographs called Bokeh Filter.

The code is based on this repository but I tried to simplify it first to understand it and make it understandable.

What is convolution

I already used several time the convolution in previous projects but for this one, there is a need to explain it briefly.

Given an image of $ H $ x $ W $ x $ L $ with $ L = 1 $ for a Black & White image or $ L = 3 $ for a color image and a filter of $ N $ x $ M $ x $ K $ with often $ K = 1 $, the convolution step consist of computing for each pixel of the image the sum of the value of pixel around this given pixel by the value of the filter. This is more clear on the 2 images below :

So as you can imagine, the complexity of this process is $ O(n^2) $ based on the size of the image and $ O(n^2) $ also based on the filter size. The objective of the filter decomposition is to reduce the complexity of this process for the filter from $ O(n^2) $ to $ O(n) $


If we consider a square filter ($ F $) of size $ N $, the process should multiply for each pixel $ N^2 $ elements. if we are able to decompose the filter as a product of vector ($ V1 $ and $ V2 $) :

$$ F_{NxN} = V1_{Nx1} \times V2_{1xN} $$

or even better if V1 = V2 (named $ V $)

$$ F_{NxN} = V_{Nx1} \times V_{Nx1}.T $$

In that case, we can apply the first convolution $ f $ with 1 vector (complexity of O(n) as our filter of size N requires N multiplication) then apply a second convolution $ g $ with the second filter on the result of the first convolution. This is possible because the convolution is an application having a commutative product so :

$$ filtered\_image = g(f(image)) = f(g(image)) $$

Now let's apply if on an easy decomposable filter which is Gaussian filter and then we will do it for a more complex one

Common function / imports

In [1]:
import time

import numpy as np
import scipy.signal
from scipy.stats import entropy
from PIL import Image

import matplotlib.pyplot as plt
In [2]:
def read_image(path, resize_factor=1.0):
    ''' read an image from disk to numpy float32 array '''
    img =
    h, w = img.size
    img = img.resize((int(h * resize_factor), int(w * resize_factor)), resample=Image.LANCZOS)
    arr = np.array(img)
    return arr / 255.0

# Reverse helper function to write an image from a numpy array
def write_image(img, path):
    ''' write a numpy float32 array to an image from disk '''
    img *= 255
    img = img.astype(np.uint8)
def print_image(img, title = None, figsize = (20,12)):
    ''' plot a numpy float32 array as an image '''
    plt.figure(figsize = figsize)
    if title is not None:

Gaussian Filters

Creation of filter

The gaussian filter is a filter often used to filter the noise on an image. This filter is quite simple to test is 1D or 2D due to his equation:

$$ G(x, \mu, \sigma) = \frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{(x-\mu)^2}{2\sigma^2}} $$

if we do a convolution product between 2 gaussian, the result is also a gaussian:

$$ G(x, \mu1, \sigma1) \times G(x, \mu2, \sigma2) = G(x, \mu1 + \mu2, \sqrt{\sigma1^2 + \sigma2^2)}$$

As a result, if our filter has a $ \mu = 0 $ and the same $ \sigma $

$$ G(x, 0, \sigma) \times G(x, 0, \sigma) = G(x, 0, \sigma\sqrt{2})$$

Now, let's create both filters

In [3]:
def gaussian_kernel_2d(sigma):
    '''Produces a 2D gaussian kernel of standard deviation sigma and size 2*sigma+1'''
    kernel_radius = np.ceil(sigma) * 3
    kernel_size = kernel_radius * 2 + 1
    ax = np.arange(-kernel_radius, kernel_radius + 1., dtype=np.float32)
    xx, yy = np.meshgrid(ax, ax)
    kernel = np.exp(-(xx**2 + yy**2) / (2. * sigma**2))
    return kernel / np.sum(kernel)

def gaussian_kernel_1d(sigma):
    '''Produces a 1D gaussian kernel of standard deviation sigma and size 2*sigma+1'''
    kernel_radius = np.ceil(sigma) * 3
    kernel_size = kernel_radius * 2 + 1
    ax = np.arange(-kernel_radius, kernel_radius + 1., dtype=np.float32).reshape(1, -1)
    kernel = np.exp(-(ax**2) / (2. * sigma**2))
    return kernel / np.sum(kernel)

To compare the result, we will benchmark the time spent with the 2D filter and the 1D filter. Both resulting images will be stored in variation $ output $ and $ output2 $ for comparison.

In [4]:
img = read_image("M35.jpg")
output = np.zeros(img.shape, dtype=np.float32)
output2 = np.zeros(img.shape, dtype=np.float32)
sigma = 3
radius = 50

kernel_2d = gaussian_kernel_2d(sigma)
kernel_1d = gaussian_kernel_1d(sigma)

Comparison results

First let's visualize the 2D filter (row1) and the 1D (with his product) on the row 2. The difference between both filter is printed in the last plot of the row 1 (=0 as expected)

In [5]:
cbar_max = kernel_2d.max()
f, arr = plt.subplots(2,3, figsize=(20,12))
arr[0, 0].imshow(kernel_2d, cmap='gray', interpolation='nearest', vmin=0, vmax=cbar_max)
arr[1, 0].imshow(kernel_1d, cmap='gray', interpolation='nearest', vmin=0, vmax=cbar_max**0.5)  # square root as we do a multiplaction afterward
arr[1, 1].imshow(kernel_1d.T, cmap='gray', interpolation='nearest', vmin=0, vmax=cbar_max**0.5)# square root as we do a multiplaction afterward
arr[1, 2].imshow(, kernel_1d), cmap='gray', interpolation='nearest', vmin=0, vmax=cbar_max)
arr[0, 2].imshow(kernel_2d -, kernel_1d), cmap='gray', interpolation='nearest', vmin=0, vmax=cbar_max)

Now, let's compute both convolution and time them. The result is shown below

In [6]:
t0 = time.perf_counter()

# NxN convolution
for i in range(3):
    output[:, :, i] = scipy.signal.convolve2d(img[:, :, i], kernel_2d, mode="same")

t1 = time.perf_counter()
print ("Elapsed: {0:.3f}s".format(t1-t0))

# Nx1 -> 1xN convolution
for i in range(3):
    output2[:, :, i] = scipy.signal.convolve2d(img[:, :, i], kernel_1d, mode="same")

for i in range(3):
    output2[:, :, i] = scipy.signal.convolve2d(output2[:, :, i], kernel_1d.T, mode="same")

t2 = time.perf_counter()
print ("Elapsed: {0:.3f}s".format(t2-t1))

Elapsed: 6.573s
Elapsed: 1.308s