Seam Carving

💡

If you have any questions or feedback, pleasefill out this form

Seam Carving
Edge Detection
1. Sobel Edge Detection
Calculating the Unimportant Components in an Image
Results
Conclusion
Related Resources

This post is translated by ChatGPT and originally written in Mandarin, so there may be some inaccuracies or mistakes.

Seam Carving is an algorithm I learned during my time in Computational Thinking, which allows for the resizing of images based on their content. When rendering images on a webpage, if their size does not match the dimensions of the container, we typically resort to a couple of methods:

Scaling the image proportionally
Cropping part of the image content

The Seam Carving algorithm enables cropping while changing the aspect ratio of the image. Its principle is to identify and remove the least important parts of the image during each scaling step, effectively "sewing" the remaining parts together. Today, let's delve into this fascinating algorithm!

Let’s take a look at this image:

8f2adf0d-b228-4f4c-9108-8f39dbb39a89

If we change its aspect ratio to this:

We can see that the content of the image becomes distorted and unattractive.

The Seam Carving algorithm can roughly maintain the content of the image while allowing for non-proportional scaling. So, how does it achieve this? Let's explore the mechanics of this algorithm today.

The core principle of Seam Carving is to identify the least important parts of an image, remove them, and then "sew" the remaining parts together. The key question is: how do we determine which parts of the image are "unimportant"? How do we define what is unimportant?

Edge Detection

First, let’s examine this image:

rect-1

Now, let’s look at this one:

rect-2

The most critical area in this image is the boundary between the black and white regions. As long as this boundary is preserved, even if the aspect ratio is slightly altered, the difference will not be too noticeable. This is easily understood by the human eye, but how can we detect these points using computer vision?

Sobel Edge Detection

Sobel consists of two 3x3 matrices:

Kernel_x = \begin{bmatrix} 1 & 0 & -1 \\ 2 & 0 & -2 \\ 1 & 0 & -1 \end{bmatrix} \\ Kernel_y = \begin{bmatrix} 1 & 2 & 1 \\ 0 & 0 & 0 \\ -1 & -2 & -1 \end{bmatrix}\\ G = \sqrt{{Kernal_{x}}^2+{{Kernel_{y}}^2}}

If we treat the image as a matrix, multiplying the image's pixels with the matrices and calculating the gradient values is equivalent to performing convolution on the image. Sobel calculates the gradient of image brightness through discrete differentiation.

It might be hard to grasp from just the matrices, but representing it in code makes it much clearer:

// Code reference https://github.com/miguelmota/sobel/blob/master/sobel.js
function Sobel(imageData) {
  var width = imageData.width;
  var height = imageData.height;

  var kernelX = [
    [-1, 0, 1],
    [-2, 0, 2],
    [-1, 0, 1],
  ];

  var kernelY = [
    [-1, -2, -1],
    [0, 0, 0],
    [1, 2, 1],
  ];

  var sobelData = [];
  var grayscaleData = [];

  function bindPixelAt(data) {
    return function (x, y, i) {
      i = i || 0;
      return data[(width * y + x) * 4 + i];
    };
  }

  var data = imageData.data;
  var pixelAt = bindPixelAt(data);
  var x, y;

  for (y = 0; y < height; y++) {
    for (x = 0; x < width; x++) {
      var r = pixelAt(x, y, 0);
      var g = pixelAt(x, y, 1);
      var b = pixelAt(x, y, 2);

      var avg = (r + g + b) / 3;
      grayscaleData.push(avg, avg, avg, 255);
    }
  }

  pixelAt = bindPixelAt(grayscaleData);

  for (y = 0; y < height; y++) {
    for (x = 0; x < width; x++) {
      var pixelX =
        kernelX[0][0] * pixelAt(x - 1, y - 1) +
        kernelX[0][1] * pixelAt(x, y - 1) +
        kernelX[0][2] * pixelAt(x + 1, y - 1) +
        kernelX[1][0] * pixelAt(x - 1, y) +
        kernelX[1][1] * pixelAt(x, y) +
        kernelX[1][2] * pixelAt(x + 1, y) +
        kernelX[2][0] * pixelAt(x - 1, y + 1) +
        kernelX[2][1] * pixelAt(x, y + 1) +
        kernelX[2][2] * pixelAt(x + 1, y + 1);

      var pixelY =
        kernelY[0][0] * pixelAt(x - 1, y - 1) +
        kernelY[0][1] * pixelAt(x, y - 1) +
        kernelY[0][2] * pixelAt(x + 1, y - 1) +
        kernelY[1][0] * pixelAt(x - 1, y) +
        kernelY[1][1] * pixelAt(x, y) +
        kernelY[1][2] * pixelAt(x + 1, y) +
        kernelY[2][0] * pixelAt(x - 1, y + 1) +
        kernelY[2][1] * pixelAt(x, y + 1) +
        kernelY[2][2] * pixelAt(x + 1, y + 1);

      var magnitude = Math.sqrt(pixelX * pixelX + pixelY * pixelY);

      sobelData.push(magnitude, magnitude, magnitude, 255);
    }
  }

  var clampedArray = sobelData;

  return clampedArray;
}

From the code, we can see that Sobel edge detection first converts the image to grayscale and then calculates the gradient values of each pixel relative to its neighbors.

Next, we place the Sobel-processed image onto a canvas. Here is the original image:

After applying Sobel edge detection:

sobel

From the processed image, we can observe that at the boundary (where brightness changes significantly), the values approach white (close to 255). In contrast, areas with minimal brightness variation appear black (close to 0).

Calculating the Unimportant Components in an Image

Since edges are among the more significant parts of the image (the whiter the area, the more important), we can find the minimum values from adjacent pixels and remove them.

However, if we simply look from the top down (Greedy approach), it might not yield the global minimum. For example:

\begin{bmatrix} 0.9 & 0.8 & 0.7 \\ 0.1 & 0.1 & 0.05 \\ 0.1 & 1.5 & 1.5 \end{bmatrix}

If we start looking from 0.7, we get 0.7 + 0.05 + 1.5 = 2.25. But if we start from 0.8, we would get 0.8 + 0.1 + 0.1 = 1.0. The matrix data processed through Sobel is referred to as energy. We can perform preprocessing on this energy.

Starting from the bottom, since we always choose from the minimum values, we can ensure that the current minimum is indeed the global minimum, allowing us to use dynamic programming. First, we copy the original energy values of the bottom row, then calculate the minimum value from adjacent elements moving upwards, adding their own values:

\begin{bmatrix} 1.1 & 1.0 & 1.3 \\ 0.2 & 0.6 & 1.15 \\ 0.1 & 1.5 & 1.5 \end{bmatrix}

This way, we can begin our search from 1.0, sequentially moving to 0.2 and 0.1, ensuring that this path has the least total energy.

Next, we record the corresponding coordinates. From the above example, we can obtain:

[[1, 0], [0, 1], [0, 2]]

Finally, we simply remove the corresponding pixels. Here’s how it can be implemented in JavaScript:

function buildEnergyMap(data, width, height) {
  const energy = new Float32Array(width * height);
  const getIndex = (x, y) => (y * width + x) * 4;
  const getXY = (x, y) => y * width + x;
  // bottom
  // build bottom array
  for (let x = width - 1; x >= 0; x--) {
    const y = height - 1;
    energy[y * width + x] = data[getIndex(x, y)];
  }

  for (let y = height - 2; y >= 0; y--) {
    for (let x = 0; x < width; x++) {
      const left = Math.max(0, x - 1);
      const right = Math.min(x + 1, width - 1);
      const minEnergy = Math.min(
        energy[getXY(left, y + 1)],
        energy[getXY(x, y + 1)],
        energy[getXY(right, y + 1)]
      );

      energy[getXY(x, y)] += minEnergy;
      energy[getXY(x, y)] += data[getIndex(x, y)] / 255;
    }
  }

  return energy;
}

function seamCarving(canvas) {
  const ctx = canvas.getContext("2d");
  const imgData = ctx.getImageData(0, 0, canvas.width, canvas.height);
  const { width, height, data } = imgData;

  // Find the seam with minimum energy using dynamic programming
  const energyMap = buildEnergyMap(data, canvas.width, canvas.height);
  const seam = findSeam(energyMap, canvas.width, canvas.height);

  for (let y = 0; y < height; y++) {
    const seamIndex = seam[y][0];

    for (let x = seamIndex; x < width - 1; x++) {
      const offset = (y * width + x) * 4;
      const nextOffset = (y * width + (x + 1)) * 4;
      data[offset] = data[nextOffset];
      data[offset + 1] = data[nextOffset + 1];
      data[offset + 2] = data[nextOffset + 2];
      data[offset + 3] = data[nextOffset + 3];
      data[nextOffset] = 255;
      data[nextOffset + 1] = 255;
      data[nextOffset + 2] = 255;
      data[nextOffset + 3] = 255;
    }
  }
  ctx.clearRect(0, 0, canvas.width, canvas.height);
  ctx.putImageData(imgData, 0, 0);
}

function findSeam(energyMap, width, height, startPoint) {
  const getIndex = (x, y) => y * width + x;
  let min = Number.MAX_VALUE;
  let idx;
  for (let x = 0; x < width; x++) {
    if (min > energyMap[x]) {
      min = energyMap[x];
      idx = x;
    }
  }
  const seam = [];
  seam.push([idx, 0]);
  let x = idx;
  if (startPoint) {
    x = startPoint;
  }
  for (let y = 0; y < height - 1; y++) {
    let min = Number.MAX_VALUE;
    const left = Math.max(0, x - 1);
    const right = Math.min(x + 1, width - 1);
    const leftValue = energyMap[getIndex(left, y + 1)];
    const centerValue = energyMap[getIndex(x, y + 1)];
    const rightValue = energyMap[getIndex(right, y + 1)];
    const pairs = [
      [left, leftValue],
      [x, centerValue],
      [right, rightValue],
    ];

    let minX;
    for (let i = 0; i < pairs.length; i++) {
      min = Math.min(pairs[i][1], min);
    }
    const target = pairs.find((pair) => pair[1] === min);
    seam.push([target[0], y + 1]);
    x = target[0];
  }

  return seam;
}

Results

You can find the demo here.

While we claim that images can be resized without maintaining their proportions, the result can still look odd depending on the content of the image, especially when important information is removed. Even so, I find this algorithm to be quite fascinating.

Removing unimportant parts is a central concept that can be observed in many areas, such as audio sampling, where high frequencies that are inaudible to the human ear can be removed to save space; or in matrix simplification, where SVD (Singular Value Decomposition) uses important feature vectors to represent complex matrices. Although these concepts may seem unrelated, they share a common philosophy.

Conclusion

Directly manipulating each pixel of an image using such a looping method can lead to significant delays, especially with larger image sizes. At this point, optimizations can be considered, such as offloading image processing to the GPU using WebGPU or WebGL, or delegating computations to a Worker.

In practical applications, while the results are impressive, the outcome heavily depends on the content of the image. If the details in the image are crucial and exhibit significant brightness variations, even using Seam Carving may lead to noticeable distortions.

Many concepts in this article were derived from the video Computational Thinking – Seam Carving, presented by Grant Sanderson from 3Blue1Brown! It features extensive visual explanations that are easy to understand.

← 與程式有關的遊戲三選 (1) – A=B JPG 與離散餘弦轉換 →

If you found this article helpful, please consider buying me a coffee ☕ It'll make my ordinary day shine ✨

☕Buy me a coffee

Seam Carving – 讓圖片不成比例地縮小

Table of Contents

Seam Carving

Edge Detection

Sobel Edge Detection

Calculating the Unimportant Components in an Image

Results

Conclusion

Table of Contents

Seam Carving – 讓圖片不成比例地縮小

Table of Contents

Seam Carving

Edge Detection

Sobel Edge Detection

Calculating the Unimportant Components in an Image

Results

Conclusion

Related Resources

Table of Contents