# Introduction¶

The seam carving is an algorithm introduced by Avidan and Shimar in 2007. It is used to resize an image by removing/adding seams that have low energy. In this exercice, we will explore the resut of this algorithm on a given image. When published, this algorithm was done to upscale / downscale image but here, we will use it only to downscale an image. It could be very usefull in Data Science to reduce the dimension of an image without losing important part of the image (for example to rescale the complete dataset to a given size.

# How it works¶

On a given image, the energy is a derivative of the original image. Multiple filters can be used but most of the time, it uses the filter :

$$F = \begin{bmatrix} 1 & 0 & -1 \\ 2 & 0 & -2 \\ 1 & 0 & -1 \\ \end{bmatrix}$$

to do :

$$Energy = \mid{Img \circledast F}\mid + \mid{Img \circledast F.T}\mid$$

For the image the energy image looks like

After, the objective is to find the path (a serie of connected pixels) with the lowest sum of energy. The path must be :

• vertical if you want to reduce the image horizontally
• horizontal if you want to reduce the image vertically

This path can be found by using dynamic programming. The result is a path like shown below:

Removing this line with shorten the image and will remove the less information possible.

The effect of this animation is impressive and this is what we will implement now

# Implementation¶

In [81]:
import sys
import numba
import numpy as np
import glob
from imageio import imread, imwrite, mimsave
from scipy.ndimage.filters import convolve
import matplotlib.pyplot as plt


The algorithm find below is a slightly changed version of this nice explanation

In [68]:
def calc_energy(img):
filter_du = np.array([
[1.0, 2.0, 1.0],
[0.0, 0.0, 0.0],
[-1.0, -2.0, -1.0],
])
# This converts it from a 2D filter to a 3D filter, replicating the same
# filter for each channel: R, G, B
filter_du = np.stack([filter_du] * 3, axis=2)

filter_dv = np.array([
[1.0, 0.0, -1.0],
[2.0, 0.0, -2.0],
[1.0, 0.0, -1.0],
])
# This converts it from a 2D filter to a 3D filter, replicating the same
# filter for each channel: R, G, B
filter_dv = np.stack([filter_dv] * 3, axis=2)

img = img.astype('float32')
convolved = np.absolute(convolve(img, filter_du)) + np.absolute(convolve(img, filter_dv))

# We sum the energies in the red, green, and blue channels
energy_map = convolved.sum(axis=2)

return energy_map

@numba.jit
def minimum_seam(img):
r, c, _ = img.shape
energy_map = calc_energy(img)

M = energy_map.copy()
backtrack = np.zeros_like(M, dtype=np.int)

for i in range(1, r):
for j in range(0, c):
# Handle the left edge of the image, to ensure we don't index -1
if j == 0:
idx = np.argmin(M[i - 1, j:j + 2])
backtrack[i, j] = idx + j
min_energy = M[i - 1, idx + j]
else:
idx = np.argmin(M[i - 1, j - 1:j + 2])
backtrack[i, j] = idx + j - 1
min_energy = M[i - 1, idx + j - 1]

M[i, j] += min_energy

return M, backtrack, energy_map

@numba.jit
def carve_column(img):
r, c, _ = img.shape

M, backtrack, energy_map = minimum_seam(img)

# Create a (r, c) matrix filled with the value True
# We'll be removing all pixels from the image which
# have False later

# Find the position of the smallest element in the
# last row of M
j = np.argmin(M[-1])

for i in reversed(range(r)):
# Mark the pixels for deletion
j = backtrack[i, j]

seam = img.copy()

# Since the image has 3 channels, we convert our

# Delete all the pixels marked False in the mask,
# and resize it to the new image dimensions

return img, seam, energy_map

def crop_c(img, scale_c):
r, c, _ = img.shape
if isinstance(scale_c, float):
new_c = int(scale_c * c)
else:
new_c = scale_c

for i in range(c - new_c):
img, backtrack, energy_map = carve_column(img)

return img, backtrack, energy_map

def crop_r(img, scale_r):
img = np.rot90(img, 1, (0, 1))
img, backtrack, energy_map = crop_c(img, scale_r)
img = np.rot90(img, 3, (0, 1))
backtrack = np.rot90(backtrack, 3, (0, 1))
energy_map = np.rot90(energy_map, 3, (0, 1))
return img, backtrack, energy_map


Now, let's render an image for every shift of 1 row. This will be used at the end to render gifs.

In [79]:
which_axis = "r"

r_init, c_init, _ = img.shape

for i in range(1, 320):
scale = r_init - i

if which_axis == 'r':
out, backtrack, energy_map = crop_r(img, scale)
elif which_axis == 'c':
out, backtrack, energy_map = crop_c(img, scale)

bg = np.zeros((r_init, c_init), dtype=energy_map.dtype)
bg2 = np.zeros((r_init, c_init, 3), dtype=backtrack.dtype)
bg3 = np.zeros((r_init, c_init, 3), dtype=out.dtype)

r, c, _ = out.shape
bg[:r+1, :c] = energy_map
bg2[:r+1, :c] = backtrack
bg3[:r, :c] = out

imwrite("img/energy/{:04d}.jpg".format(i), bg)
imwrite("img/backtrack/{:04d}.jpg".format(i), bg2)
imwrite("img/out/{:04d}.jpg".format(i), bg3)

# fig, axes = plt.subplots(2, 2, figsize=(20, 12))
# axes[0, 0].imshow(bg)
# axes[0, 1].imshow(bg2)
# axes[1, 0].imshow(bg3)
# axes[1, 1].imshow(out)
# plt.show()
#imwrite(out_filename, out)

C:\python36\envs\machine_learning\lib\site-packages\imageio\core\util.py:104: UserWarning: Conversion from float32 to uint8, range [0.0, 11652.0]
'range [{2}, {3}]'.format(dtype_str, out_type.__name__, mi, ma))


Now let's render those gifs

In [82]:
images = []
for filename in glob.glob("img/energy/*.jpg"):
mimsave('energy_anim.gif', images)

In [83]:
images = []
for filename in glob.glob("img/backtrack/*.jpg"):
mimsave('backtrack_anim.gif', images)

In [85]:
images = []
for filename in glob.glob("img/out/*.jpg"):