Daniel D. Beck

How I helped Mozilla turn documentation into data

My job is to make software understandable to humans. Typically, this means taking some information about some software and turning it into written documentation for people. But Mozilla recently invited me to work an interesting project to do just the opposite. They asked me to take a bunch of content written by people about web browser compatibility and turn it into structured data, as part of the browser-compat-data project. It was an unusual project for me, and I got a good reminder that sometimes data is documentation.

In addition to its work on the Firefox web browser, Mozilla hosts MDN Web Docs, a wiki that documents the web as an open platform. MDN is home to thousands of tables to help web developers answer a question they’ve got all the time: is a given standard supported by web browsers? The tables contain lots of information like:

Until recently, all of these tables were manually formatted, populated, and updated by people, usually volunteer contributors. To make one of these tables—to tell the story of a web standard’s adoption and availability—an author needs to synthesize and format a lot of different kinds of information: yes or no declarations, numbers, and loads of different notes. Here’s one of these manually authored tables (linear-gradient):

a screenshot of a table containing version numbers for different browsers, rows for sub-features of a CSS property, and some footnotes

It’s a tough task and even one table can be complex, never mind trying to achieve consistency across all these tables. Together with volunteers and MDN staff, I helped make sense of the data in these tables and turned many of them into JSON files with a well-defined structure, like this excerpt for the grid-row CSS property:

{
  "css": {
    "properties": {
      "grid-row": {
        "__compat": {
          "mdn_url": "https://developer.mozilla.org/docs/Web/CSS/grid-row",
          "support": {
            "webview_android": {
              "version_added": "57"
            },
            "chrome": [
              {
                "version_added": "57"
              },
              {
                "version_added": "29",
                "flags": [
                  {
                    "type": "preference",
                    "name": "Enable experimental Web Platform features"
                  }
                ]
              }
            ],
            "firefox": [
              {
                "version_added": "52"
              },
              {
                "version_added": "40",
                "flags": [
                  {
                    "type": "preference",
                    "name": "layout.css.grid.enabled",
                    "value_to_set": "true"
                  }
                ]
              }
            ],
            "ie": {
              "version_added": false
            }
          },
          "status": {
            "experimental": false,
            "standard_track": true,
            "deprecated": false
          }
        }
      }
    }
  }
}

The translation here is direct, but oft-repeated notes (like which preference to set to turn on a feature) got translated into objects with recognizable names and values rather than free-form sentences, while actual notes got attached directly to the versions they applied to. With this data, the team at MDN was able to generate much prettier and succinct tables based on the data, like this one for the grid-row CSS property:

a screenshot of a multi-colored table with version numbers

Prettier tables are nice and all, but I don’t think prettier tables alone justify doing all this work. The real reason to slice up these tables is more than mere appearances.

Good technical communication helps people do the tasks they need to do, in the time and place where it makes sense for them. Admittedly, the MDN Web Docs wiki pages are where web developers are spending a lot of their time. After all, MDN has 7.5 million unique visitors per month! But in practice, web developers probably don’t want to be spending their time researching on MDN; they’d rather see that story about a particular API’s compatibility where they really need it, like their text editor or in their browser’s developer tools. And if that information were left in a static table in a wiki, it just can’t be everywhere it’s needed.

But as structured data in a portable format, it can be. For example, check out this Firefox add-on by Eduardo Bouças that adds a CSS compatibility report for any web page, in the browser:

a screenshot of a CSS compat report for this page

There’s still a lot of data to migrate, but browser-compat-data has the potential to be a powerful resource for getting information that web developers need to the places where they need it. For me, working on the project was really interesting. I learned a lot about how CSS works (both theoretically and in practice), how it’s specified (I’m now a person who reads W3C specifications), and how people talk about CSS and browsers. And as someone who spends a lot of time writing documentation, it’s easy to get into the habit of thinking that the most important thing you can do is write to be understood by people. But this project was a good reminder that making words understandable to a machine isn’t a bad idea either.