Introduction

The seam carving is an algorithm introduced by Avidan and Shimar in 2007. It is used to resize an image by removing/adding seams that have low energy. In this exercice, we will explore the resut of this algorithm on a given image. When published, this algorithm was done to upscale / downscale image but here, we will use it only to downscale an image. It could be very usefull in Data Science to reduce the dimension of an image without losing important part of the image (for example to rescale the complete dataset to a given size.

How it works

On a given image, the energy is a derivative of the original image. Multiple filters can be used but most of the time, it uses the filter :

$$ F = \begin{bmatrix} 1 & 0 & -1 \\ 2 & 0 & -2 \\ 1 & 0 & -1 \\ \end{bmatrix} $$

to do :

$$ Energy = \mid{Img \circledast F}\mid + \mid{Img \circledast F.T}\mid $$

For the image the energy image looks like

After, the objective is to find the path (a serie of connected pixels) with the lowest sum of energy. The path must be :

  • vertical if you want to reduce the image horizontally
  • horizontal if you want to reduce the image vertically

This path can be found by using dynamic programming. The result is a path like shown below:

Removing this line with shorten the image and will remove the less information possible.

The effect of this animation is impressive and this is what we will implement now

Implementation

In [81]:
import sys
import numba
import numpy as np
import glob
from imageio import imread, imwrite, mimsave
from scipy.ndimage.filters import convolve
import matplotlib.pyplot as plt

The algorithm find below is a slightly changed version of this nice explanation

In [68]:
def calc_energy(img):
    filter_du = np.array([
        [1.0, 2.0, 1.0],
        [0.0, 0.0, 0.0],
        [-1.0, -2.0, -1.0],
    ])
    # This converts it from a 2D filter to a 3D filter, replicating the same
    # filter for each channel: R, G, B
    filter_du = np.stack([filter_du] * 3, axis=2)

    filter_dv = np.array([
        [1.0, 0.0, -1.0],
        [2.0, 0.0, -2.0],
        [1.0, 0.0, -1.0],
    ])
    # This converts it from a 2D filter to a 3D filter, replicating the same
    # filter for each channel: R, G, B
    filter_dv = np.stack([filter_dv] * 3, axis=2)

    img = img.astype('float32')
    convolved = np.absolute(convolve(img, filter_du)) + np.absolute(convolve(img, filter_dv))

    # We sum the energies in the red, green, and blue channels
    energy_map = convolved.sum(axis=2)
    
    return energy_map

@numba.jit
def minimum_seam(img):
    r, c, _ = img.shape
    energy_map = calc_energy(img)
    
    M = energy_map.copy()
    backtrack = np.zeros_like(M, dtype=np.int)

    for i in range(1, r):
        for j in range(0, c):
            # Handle the left edge of the image, to ensure we don't index -1
            if j == 0:
                idx = np.argmin(M[i - 1, j:j + 2])
                backtrack[i, j] = idx + j
                min_energy = M[i - 1, idx + j]
            else:
                idx = np.argmin(M[i - 1, j - 1:j + 2])
                backtrack[i, j] = idx + j - 1
                min_energy = M[i - 1, idx + j - 1]

            M[i, j] += min_energy
    
    return M, backtrack, energy_map

@numba.jit
def carve_column(img):
    r, c, _ = img.shape

    M, backtrack, energy_map = minimum_seam(img)

    # Create a (r, c) matrix filled with the value True
    # We'll be removing all pixels from the image which
    # have False later
    mask = np.ones((r, c), dtype=np.bool)

    # Find the position of the smallest element in the
    # last row of M
    j = np.argmin(M[-1])

    for i in reversed(range(r)):
        # Mark the pixels for deletion
        mask[i, j] = 0
        j = backtrack[i, j]

    seam = img.copy()
    seam[~mask] = [255, 0, 0]
        
    # Since the image has 3 channels, we convert our
    # mask to 3D    
    mask = np.stack([mask] * 3, axis=2)
    
    # Delete all the pixels marked False in the mask,
    # and resize it to the new image dimensions
    img = img[mask].reshape((r, c-1, 3))
    
    return img, seam, energy_map


def crop_c(img, scale_c):
    r, c, _ = img.shape
    if isinstance(scale_c, float):
        new_c = int(scale_c * c)
    else:
        new_c = scale_c

    for i in range(c - new_c):
        img, backtrack, energy_map = carve_column(img)

    return img, backtrack, energy_map

def crop_r(img, scale_r):
    img = np.rot90(img, 1, (0, 1))
    img, backtrack, energy_map = crop_c(img, scale_r)
    img = np.rot90(img, 3, (0, 1))
    backtrack = np.rot90(backtrack, 3, (0, 1))
    energy_map = np.rot90(energy_map, 3, (0, 1))
    return img, backtrack, energy_map

Now, let's render an image for every shift of 1 row. This will be used at the end to render gifs.

In [79]:
which_axis = "r"  

img = imread("test.jpg")
r_init, c_init, _ = img.shape

for i in range(1, 320):
    scale = r_init - i
    
    if which_axis == 'r':
        out, backtrack, energy_map = crop_r(img, scale)
    elif which_axis == 'c':
        out, backtrack, energy_map = crop_c(img, scale)

    bg = np.zeros((r_init, c_init), dtype=energy_map.dtype)
    bg2 = np.zeros((r_init, c_init, 3), dtype=backtrack.dtype)
    bg3 = np.zeros((r_init, c_init, 3), dtype=out.dtype)

    r, c, _ = out.shape
    bg[:r+1, :c] = energy_map
    bg2[:r+1, :c] = backtrack
    bg3[:r, :c] = out
    
    imwrite("img/energy/{:04d}.jpg".format(i), bg)
    imwrite("img/backtrack/{:04d}.jpg".format(i), bg2)
    imwrite("img/out/{:04d}.jpg".format(i), bg3)

# fig, axes = plt.subplots(2, 2, figsize=(20, 12))
# axes[0, 0].imshow(bg)
# axes[0, 1].imshow(bg2)
# axes[1, 0].imshow(bg3)
# axes[1, 1].imshow(out)
# plt.show()
#imwrite(out_filename, out)
C:\python36\envs\machine_learning\lib\site-packages\imageio\core\util.py:104: UserWarning: Conversion from float32 to uint8, range [0.0, 11652.0]
  'range [{2}, {3}]'.format(dtype_str, out_type.__name__, mi, ma))

Now let's render those gifs

In [82]:
images = []
for filename in glob.glob("img/energy/*.jpg"):
    images.append(imread(filename))
mimsave('energy_anim.gif', images)
In [83]:
images = []
for filename in glob.glob("img/backtrack/*.jpg"):
    images.append(imread(filename))
mimsave('backtrack_anim.gif', images)
In [85]:
images = []
for filename in glob.glob("img/out/*.jpg"):
    images.append(imread(filename))
mimsave('out_anim.gif', images, duration=0.05)

Conclusion

In this notebook, I wanted to play with the Seam Carving Algorithm and create a animation of the process. The result can be shown below :

This was a fun discovery and the usage in a futur work is more than probable due to the advantage versus a simple resize.