Michael Wooley's Homepage

Drawing With D3.js Part 4: Data Extraction

In the fourth installment of this series I’m going to focus on extracting data from our drawing. What are we trying to do? Suppose that we have a picture of a table on our canvas. We’ll draw a bunch of bounding boxes/rectangles around different elements of the canvas (e.g. a “row”-type bounding box around a row). When we’re done annotating the table we’ll hit “submit” and the program will extract information about the bounding boxes and put them in a data structure that we can use later. This is much easier to understand with an example.

Here’s what we’re going to create:

See the full set of controls and code at the gist. Believe it or not, that actually was the first good example table that I came across!
Image Source: Jones, Hugh R. “The Perils and Protection of Infant Life.” Journal of the Royal Statistical Society 57, no. 1 (1894): 1-103. (via HathiTrust)

As you can see, we’ve made a good deal of progress towards creating an image annotation program. With the data that we’re extracting we could begin to train a neural network to extract the data from these old tables. I’m overdue for a post that gives a big picture idea about what this project is all about.

What do we need to add relative to our end-product from Part 3?

  1. Add functionality to handle image input.
  2. Add a “data extraction” method for pulling bounding box data from the canvas.
  3. Introduce some additional formatting to add buttons and controls.

I will warn you that the code has become somewhat… (what’s the word?) convoluted. Before any additional posts extend this code I will probably spend some time rewriting it so that it is more modular. Besides making the code easier to understand, it will also make it easier to add and extend new features. Due to this, I will in this post mostly focus on elements of the new methods and spend less time on how they (currently) fit into the current code structure.

The Base Image

Ultimately, we want to extract data about the bounding boxes of elements in a “base” image. For example, the base image in the above example is the table. To do this correctly we’ll need to take some care in handling this base image. In particular, we’ll need to make sure that the coordinates from the SVG that we draw on are correctly transformed to the coordinates of the underlying image. The main “trick” will be to store the original size information about the image as data that can be called on later.

Here’s the current code for loading an image given an argument arg, which is equal to the image’s url:

SVGCanvas.prototype.loadImage = function (arg) {
  // Load an image to the canvas.
  var self = this;

  //// Add zoom and pan group
  self.zoomG = self.svg
    .attr('class', 'zoom-group')
  //// Adding
  self.img = self.zoomG.append('image')
    .attr('href', arg)
    .attr('width', '98%')
    .attr('height', '98%')
    .attr('x', '1%')
    .attr('y', '1%')
    .call(function () {
      // Call this function to get size attributes for the 
      // displayed and actual image.
      var image = new Image();      // Create a new image 
      image.onload = function () {  // What should happen when the image is loaded
        imgBB = self.img.node().getBBox();
        var d = {};
        d.height = image.naturalHeight;
        d.width = image.naturalWidth;
        // Get x/y coordinates and scaling:
        if (d.height > d.width) {
          d.scale = (imgBB.height / d.height);
          d.x = (self.options.w - d.scale * d.width) / 2;
          d.y = imgBB.y;
        } else {
          d.scale = (imgBB.width / d.width);
          d.x = imgBB.x;
          d.y = (self.options.h - d.scale * d.height) / 2;
        // Reformat image attributes
          .attr('width', d.scale * d.width)
          .attr('height', d.scale * d.height)
          .attr('x', d.x)
          .attr('y', d.y);
        // Assign as data
        self.img.data = [d];
      image.src = arg;        // Load the image by specifying an image source

The basic idea is to load the image as an <image> within the <svg>. Since the image is contained within the zoom group, it will expand and pan with the rest of the canvas. In this version of the code the zoom group (i.e. self.zoomG) is created within this method. This is a first pass at making it possible to add and toggle between multiple images at once.

It is easy to specify that the image be resized to fit the canvas area (via, e.g., .attr('width', '98%')). While this approach creates an image that looks good, it gives us no indication of the actual size of the original image. We can do this via a .call(). If we just create an <img> element (i.e. not inside the <svg>) then this image will have properties related to its naturalHeight and naturalWidth. Once we know these elements, we back out the scale–the size of the original image relative to the image on screen–as well as the x and y coordinates of the image on the screen relative to the SVG element. In doing so we need to check whether the image is vertically- (i.e. height > width) or horizontally-oriented via if (d.height > d.width). This is important because imgBB.height and imgBB.width will appear to be equal. This seemingly-odd behavior comes from the fact that we set both height and width attributes initially to 98%.

We finish the image.onload function by setting the image attributes to numbers from percentages (a safety measure) and by assigning the d object to be part of the image’s data.

Data Extraction and Handling

This section is going to come in two parts. First, we’re going to talk about extracting the data and organizing it. Then, we’ll talk about the methods used to preview the data selections. We’ll call each of these new methods when the user hits the “Submit” button. The new buttons are discussed in the next section.

Data Extraction and Structure

We’re going to start out by creating a JSON-like data object that contains information about the image and the bounding boxes. The code is fairly straightforward:

SVGCanvas.prototype.dataCompile = function () {
  // A function for compiling all of the data on the canvas into a
  //  json data structure.
  // FUTURE: 
  //  More metadata
  //  Accomodate more than one image.
  var self = this;
  var out = [];

  // One file for each image (will have more in future).
  //// Initialize object
  var out_i = {
    meta: {},
    bb: {}
  //// Get image bounding box.
  var imgBB = self.img.node().getBBox();
  // Get Metadata - Add more later.
  //// File name w/ path
  out_i.meta.href = self.img.attr('href');
  //// File size
  out_i.meta.height = self.img.data[0].height;
  out_i.meta.width = self.img.data[0].width;
  //// Common Name/Identifier
  // Get bounding boxes
    .each(function (d, i) {
      // Retrieve rectangle bounding boxes _relative to image_.
      // Follows convention from VOC2008: 
      // http://host.robots.ox.ac.uk/pascal/VOC/voc2008/HTMLdoc/voc.HTML#SECTION00092000000000000000
      var d2 = {};
      d2.xmin = Math.max(d.x - imgBB.x, 0) / self.img.data[0].scale;
      d2.ymin = Math.max(d.y - imgBB.y, 0) / self.img.data[0].scale;
      d2.xmax = Math.min(d.x - imgBB.x + d.w, imgBB.width) / self.img.data[0].scale;
      d2.ymax = Math.min(d.y - imgBB.y + d.h, imgBB.height) / self.img.data[0].scale;
      d2.type = d.type;
      out_i.bb[d.id] = d2;
  // Push onto full dataset.

  return out;

We’re going to have an array (out) that contains an object for each image (there is only one at the moment). The basic structure of the data is:

  • out_i: A data object for an image.
    • .meta: Metadata about the image:
    • .href: Image Source
    • .height: Image natural height.
    • .width: Image natural width.
    • .bb: Bounding box data objects:
    • .xmin, .ymin, .xmax, .ymax: Bounding coordinates.
    • .type: Type of bounding box.

The type of the bounding box will be something like “table”, “row”, “column”, “number”, etc. It is controlled by SVGCanvas.state, of which more on below.

Notice how we needed to use the natural image dimensions, location, and the scaling parameters (stored in the img data) in order to ensure that our bounding boxed correspond to the original image and not just what we saw on the screen. The Math.max and Math.min calls account for cases where the rectangle was drawn within the canvas but off the base image.

Check: Displaying The Data

We can check whether the bounding box data is correct by creating a function that displays the bounding boxes. The SVGCanvas.previewSelections method does just that. Here is a snippet of the code:

SVGCanvas.prototype.previewSelections = function (d) {
  // Make a table to preview the selections made above.
  var self = this;

  // More....
  // - Load into a new window
  // - Create Table and headers

  // Cycle through output.
  for (var ii = 0; ii < d.length; ii++) {
    for (var bb in d[ii].bb) {

      // Get the data
      var d_i = d[ii].bb[bb];
      // Set the height and width
      var w = (d_i.xmax - d_i.xmin) * self.img.data[0].scale;
      var h = (d_i.ymax - d_i.ymin) * self.img.data[0].scale;
      // Make the row
      var row = tbody.append('tr');
      // Append the image
      var canvas = row.append('td')
        .attr('width', w)
        .attr('height', h);
      var ctx = canvas.node().getContext("2d");
      // ~~!!! ONLY WORKING ON CHROME !!!~~
                    d_i.xmin, d_i.ymin, 
                    d_i.xmax - d_i.xmin, d_i.ymax - d_i.ymin, 
                    0, 0, w, h);
      // Append other info
      // ...

The initial steps (which aren’t shown) create a new page and start an HTML table. I decided to create a new page to make it easy to compare the displayed bounding boxes with the selections on the canvas.

The for loop then cycles through each of the bounding boxes in the data that was just generated in SVGCanvas.dataCompile. The main thing to observe is the creation of the preview boxes. To create the cropped selection we draw a new HTML canvas object in one of the table cells. We begin by appending a <canvas> element that is the same height and width as the cell and getting its context. The important line is ctx.drawImage. The arguments for that method are:

ctx.drawImage(image, sx, sy, sWidth, sHeight, dx, dy, dWidth, dHeight);


  • image is the image to draw.
  • sx/sy are the upper-left/starting x/y coordinate of the cropped image.
  • sWidth/sHeight are the width/height of the cropped image.
  • dx/dy are the starting x/y coordinates in the destination canvas.
  • dWidth/dHeight are the width/height of the image in the destination canvas.

Basically, this call is going out to the original image and drawing the pixels specified in the sx, sy, sWidth, and sHeight parameters. That is, these parameters refer to coordinates in the original base image, not the SVG that we created previously. Therefore, this constitutes a valid test for whether our bounding box coordinates are correct with respect to the original image.

Note: Browser Compatibility

All of the above works well in Chrome. When I tested the code in Firefox I found that there was a problem related to the ctx.drawImage() call. In short, it appears that Firefox cannot take an SVG <image> element as the image argument whereas Chrome can. Otherwise the code should work well on other browsers.

Buttons and Visual Organization

For the first time we need to actually add buttons to control what is happening on the canvas. In order to do this I created a simple button utility for toggling between multiple choices. The mainly did this to get more experience in creating elements and to avoid having to reload Bootstrap JavaScript every time the page is loaded.

Button Toggling

The button toggling utility is a standalone file. The full code can be found at this gist. Its dependencies are Bootstrap CSS and D3.js.

The usage is designed to be simple. The user inputs an object containing:

  • Information about each choice in the form of an Array.
  • A CSS selector for the DOM element that the button should be appended to
  • A callback function to be called each time the button is pressed. The callback function takes one argument, which is the name of the button that was selected. (E.g. if “Table” is pressed, then the argument “Table” is passed to the function).

Here, I used this code to create the dropdown list. The options consist in each type of annotation that we can add to the canvas. The callback function switches the canvas state.

Here’s the basic idea:

// Initialize the set of possible states
self.stateData = [{
      name: 'Table',
      color: '#d32f2f',
      count: 0,
      class: 'rect',
      id: 'Table-0',
    // more....
      name: 'Word',
      color: '#0288d1',
      count: 0,
      class: 'rect',
      id: 'Word-0',
self.state = self.stateData[0];

function callbackStateToggle(arg) {
  // What to do when the state toggle buttons are pushed:
  // 1. Save the old state to self.stateData
  // 2. Make self.state the clicked state.
//// Set the Options
var stateToggleOpt = {
  type: 'dropdown', // Alt. is a button bank
  addTo: // Where to add to the dropdown,
  clickCall: callbackStateToggle,
//// Create the new object
self.stateTogglers = new ButtonToggle(self.stateData, stateToggleOpt);

By switching the canvas state, we switch the color of the rectangles that are drawn and, more substantially, we add classes to the rectangle groups that allow us to identify the annotation type when we go to compile the data.

The “Submit” Button

The “Submit” button is simply an HTML <button> with Bootstrap styling. We then attach an event listener to it that calls the data compile and preview methods discussed above.

The code is just:

function addDataSubmitButton(addTo) {
  // Make a button that will submit the necessary data to the system.

  // Add the button
    .attr('class', 'btn btn-dark submit-button')
    .style('font', 'caption')
    .on('click', onclickCallback);

  // Define the 'onclick' callback
  function onclickCallback() {
    var out = self.dataCompile(); // <= Data Compile Code Discussed Above
    console.log(out);             // <= See raw data in console.
    self.previewSelections(out);  // <= Preview in browser

The addTo parameter is a CSS selector stating where we want to add the button.


In this post I’ve demonstrated that we have a viable image annotation tool. In particular, we demonstrated that we can extract bounding boxes from each image and get them into a data structure that can be used to train a model.

The main gaps in the current code have to do with image handling. We need to add the ability to choose the input image and handle multiple images at once. This is where further modularization of the code will come in handy. The basic idea will be to create <g> elements for each image that will contain both the image and the annotation rectangles. We can then add controls for switching between each image. Ultimately, we’ll want to be able to download this data.