Michael Wooley's Homepage

(Really) Tight Bounding Boxes

I’ve created a simple method to determine precise bounding boxes for characters and text elements within an SVG. Why would this be necessary? Can’t we just get the bounding box for an element by calling node.getBBox()? It depends on how much precision you need. For text characters and strings this method will assign the same bounding box for every character. So, for example, the bounding boxes for “a”, “g”, and “f” will all be the same. For a lot of applications this will do. However, if you need to know exactly where a character is located it would be preferable to have a bit more precision.

Here is a sample application that demonstrates the tight bounding box:

The purple area is the bounding box from calling node.getBBox() while the blue area is the tight bounding box.

The basic strategy will be to “copy” the SVG <text> element onto an HTML <canvas> element. We’ll then back out the bounding box by analyzing the canvas RGBA array.

I’ve implemented this procedure as a prototype of d3.selection. This means that we can find the bounding box by calling, e.g., d3.select('text').getTightBBox(). This call attaches an object property tBB to the to the node of the element (e.g. d3.select('text').node().tBB). This ensures that the method only has to be called once.

I’m going to start out by giving a detailed qualitative overview of what needs to happen and point out some problem areas. Then I’ll run through the actual code, which–due to the problem areas–isn’t very pretty at the moment.

Outline and Preview of Hiccups

Suppose that we have a white image with the character “1” (in black) in the middle. We want to draw a tight bounding box around the character. If we have a “usual” image format (e.g. JPEG, png) we could extract the pixels from the image as a 4D array of RGBA values. We could then loop through each of these pixels and search for the locations where the color changes. Once we know the location of all of the border pixels we can set the minimal x (horizontal) coordinate of the box by finding the least column in which a border pixel occurs. We can do something similar to find the maximal x, minimal y (vertical), and maximal y coordinates of the box.

What makes this strategy viable? First, the input image was super-simple: a black character on a white background. There was nothing else on the image that could have been confused for the character. Obvious example: there wasn’t a picture of a smiley face to the side. Maybe less-obvious: it wouldn’t work if the image was scanned in and there was some weird “scan marks” on the image. Second, we could get a pixel array that represented the image. While this might seem trivial, it will be fairly important in future developments.

“Pixelated” SVGs

In our application the bounding box will be made for text in an SVG image. The problem with SVGs is that they don’t have pixel representations. Now would be a good time to state that I’m not an expert on SVGs. Apologies if some of my terminology and/or statements are off. So are we dead in the water? Not exactly. One thing that we can try to do is “draw” the SVG element on an HTML canvas, which does have a pixel representation.

How do we convert an SVG element to a canvas? The method that I’m going to use here only works for SVG <text> elements. However, it works for a wider variety of cases than other methods that I’ve attempted. One general method that looked promising involved serializing the SVG element to a string then drawing it on a canvas. I found that this method wasn’t working well on non-standard fonts. It is also somewhat messy because it relies on callbacks. See this answer from Stack Overflow for more info. In short, we’re going to manually “re-make” the SVG element on the canvas by drawing text on the canvas that has the same styling and position properties as the text on the SVG.

Issues With Canvas

Two issues arise when we try duplicate the text using <canvas>.

The first is that–relative to the SVG–the canvas will look pretty grainy (because it is grainy). This will be problematic because, if we find the bounding box of a grainy image, the bounding box will be off by a tad when we try to apply the bounding box to the smooth SVG text. To get around this we’ll scale up the drawing of the text element on the canvas. With this enlarged image we’ll find a bounding box, which will be too large for the original image. However, we can simply scale the bounding box coordinates back down to the original image, which will mean that the coordinates will be floats rather than integers. The bounding box will then fit onto the SVG text much more tightly.

The second problem has to do with fonts. Suppose that you want to use a wacky font from Google Fonts. You can draw text in a canvas using one of these fonts. However, these fonts need to be loaded and that can take some time. If your application involves finding tight bounding boxes at the time the page is loading then the bounding box code may execute before the font is loaded. This is problematic because the machine will revert to the default font if the requested font isn’t found. Since the bounding box is fit on a font that is different from the visible SVG text, the bounding box will be off. Since the font of SVG elements are set via the style attribute or CSS, they will adjust their fonts once the specified font is loaded. When this is an issue it will be necessary to ensure that the method is called only after the font is loaded.

The Code

The main method is as follows:

d3.selection.prototype.getTightBBox = function () {

  var self = this;

  // Check to ensure that it is a text element
  if (self.node().tagName.toLowerCase() != 'text') {
    console.error('d3.selection.getTightBBox can only accommodate SVG <text> elements.');
    return;
  }

  // Scaling factor
  var k = 10, c = 4;
  // Get parent SVG (how to throw good error?)
  var svgText = self._groups[0][0].ownerSVGElement;
  // Check in on fonts to ensure all is okay.
  // Do this before setting the loose bounding box because may change
  // the font.
  var targetFont = k * parseFloat(self.style('font-size')) + 'px ' + self.style('font-family');
  if (!document.fonts.check(targetFont)) {
    targetFont = k * parseFloat(self.style('font-size')) + 'px sans';
    self.style('font-family', 'sans');
    console.warn('d3.selection.getTightBBox: Font family ' + targetFont + ' not found. Setting to "sans" font and proceeding.');
  }
  // Loose Bounding Box
  lBB = self.node().getBBox();
  // Make a canvas to search for element
  var canvas = d3.select('body')
    .append('canvas')
    .attr('height', k * svgText.clientHeight)
    .attr('width', k * svgText.clientWidth)
    .style('display', 'none')
    .node();
  // Get and set the context based on others
  var ctx = canvas.getContext('2d');
  ctx.font = targetFont;
  ctx.fillText(self.text(), k * parseFloat(self.attr('x')), k * parseFloat(self.attr('y')));
  
  // Break the image into pixels
  var imgData = ctx.getImageData(k * lBB.x, k * lBB.y, k * lBB.width, k * lBB.height);
  // Use the image data to get a tight bounding box
  var bb = tightBBox(imgData);
  // Undo scaling to get back to svg
  bb.x = (bb.x / k) + lBB.x;
  bb.y = (bb.y / k) + lBB.y;
  bb.width = bb.width / k;
  bb.height = bb.height / k;

  self.node().tBB = bb;
  canvas.remove();

};

It begins by checking to ensure that the passed element is of the correct type (it can only deal with <text> elements). The “scaling factor” k is hardcoded into the function. k = 10 means that the HTML canvas will be made to be ten times the size of the original SVG element. Recall that it is necessary to scale up the canvas to avoid imprecise bounding boxes resulting from grainy images. We then retrieve the SVG element that contains the text element.

The next step is ugly but necessary. We want to set the font styling of the canvas to be the same as the passed element. However, there may be cases where the desired font is not loaded. We check this with the if conditional. Presently, the method deals with this task by setting the SVG text and canvas fonts to be standard font (‘sans’). This is one way of avoiding incorrect bounding boxes.

In the next few lines we create and draw on a hidden canvas. If we viewed the canvas we would see large, black character(s) (in the correct font) on a white background. The position of the characters is the same as on the original SVG.

Once we’ve drawn the canvas we can begin to extract information from it. The line var imgData = ... returns a 1-D array of RGBA values in the specified area of the canvas. Here, we limit ourselves to the area of the canvas that contains the “loose” bounding box. The main benefit of doing this is that we know that the text is in that area of the canvas so we can restrict our search for minimal and maximal coordinates to this area. The next line actually retrieves the tight bounding box by calling the function tightBBox (discussed below) with the image data as an argument.

The process finishes by scaling and shifting the bounding box coordinates to fit the SVG.

Finding the Tight Bounding Box

Once we have the array of RGBA values in hand we can find the bounding boxes by searching for changes in hue. This task is carried out by two functions:

var arrayIndex1d3d = function (ii, w, h, c) {
  // Given ImageData.data index, get tuple location of pixel
  var out = [-99, -99, -99];

  // What column?
  out[1] = Math.floor(ii / (c * w));
  // What row?
  out[0] = Math.floor((ii - (out[1] * c * w)) / c);
  // What Channel?
  out[2] = ii % c;

  return out;
}

var tightBBox = function (data) {
  // Get a tight bounding box for the image data.
  var xyz;
  var xmin = data.width + 1, xmax = -1, ymin = data.height + 1, ymax = -1;

  for (var ii = 3; ii < data.data.length; ii += 4) {
    if (data.data[ii] > 0) {
      // Get coordinate in terms of (x, y, z)
      xyz = arrayIndex1d3d(ii, data.width, data.height, 4);
      // Update bounds
      if (xyz[0] < xmin) { xmin = xyz[0]; }
      if (xyz[0] > xmax) { xmax = xyz[0]; }
      if (xyz[1] < ymin) { ymin = xyz[1]; }
      if (xyz[1] > ymax) { ymax = xyz[1]; }
    }
  }

  return {x: xmin, y: ymin, width: xmax - xmin, height: ymax - ymin,};
}

The second function, tightBBox, is called from getTightBBox. In short, this function is going to loop through each “pixel” and determine if there is color in the pixel. If there is, then we know that we’re in a pixel that is part of the character(s). The next step is to see if this pixel is at a boundary of the character. We do this by comparing the “x” and “y” coordinates of the pixel to the current minimal and maximal elements and update them if needed.

Dealing With Image Data

The pixel data that we have in data.data is a 1D array. Each pixel is represented by four consecutive entries which specify the “R”, “G”, “B”, and “A” values, respectively. Here is a short example of the data structure for a monochrome image:

  R    G    B    A  | R    G    B    A  | R  , G  , B  , A  | ...
[ 0  , 0  , 0  , 0  , 0  , 0  , 0  , 255, 0  , 0  , 0  , 0  , ... ]

Notice that all pixels are technically “black” because the RGB entries are all 0. Only the “A” entry changes. An “A” value of “0” will show up as white; a value of “255” will show up as black. We save on computation time by only testing if these “A” characters are non-zero.

Once we know that a pixel is non-zero we want to figure out the “x” and “y” coordinates of the pixel. This task is done by the function arrayIndex1d3d.

Usage: Fonts

How do we make sure that any non-standard fonts are loaded before making the tight bounding box? The one method that will definitely work involves using the WebFont.

Here’s one example, which is taken from the application at the top of the page:

var fonts = ["Roboto", "Permanent Marker", "Condiment", "Reenie Beanie", "Monoton"];

WebFontConfig = {
  google: {
    families: fonts,
  },
  active: start,
};

(function (d) {
  var wf = d.createElement('script'),
    s = d.scripts[0];
  wf.src = 'https://ajax.googleapis.com/ajax/libs/webfont/1.6.26/webfont.js';
  wf.async = true;
  s.parentNode.insertBefore(wf, s);
})(document);

// All other code....

  function start() {
    // Once the code is loaded enable button
    drawChar('1', fc.node().value, fs.node().value, g, h, w);
    submit.attr('disabled', null);
  }
}

At the top we specified what fonts ought to be loaded. As part of this process we specified a function (start) that will be called once all of the fonts are active. In this case I disabled the “Make Example” button until the fonts were loaded.

Conclusion

Okay, that’s all I have. Why did I make this? Depending on how things develop, I may use this to make synthetic data for my table reading application. Time will tell…