Mapping the Metro in Washington, D.C.

June 10, 2017

How I Made This Map

I was inspired by this map online to create a similar map of the Washington, D.C. metro system. The map of the subway system plots stations with uniform spacing, often placing them in a straight line, improving map readability but distorting the size of surrounding regions. I wanted to compare the positions of stations on the D.C. map with their geographic locations.

There were basically three steps to this process: obtain location and name data for each metro station, retrieve coordinate data from the metro-issued maps, and combine these data sets and illustrate the difference between the two.

Obtaining Metro Station Data from the Metro API

The D.C. transit system has a surprisingly useful interface for requesting information about metro stations, bus stops, and real-time information about the locations and times of the trains and shuttles. I wrote a script that gathered the name and geographic location of each station. Metro even provided standard geographic coordinates! I also downloaded the rail information for each line (basically the order of the stops). I exported both information sets to JSON objects which I could easily import for the model.

Retrieving Coordinates of Stations from the Metro Map

Once I had the geographic location of each metro station, I wanted to compare this to the station’s location within the map that Metro provides. This proved slightly more difficult than I had expected because Metro doesn’t export any version of the map other than the image - no image formats that I could decompose by layer or by shape to obtain relative coordinates for each station. This meant that I had to actually analyze the pixels of the image to find what looked like “stations”, find a name for each station, and output that list of station names and coordinates.

The easiest way I knew of to solve a problem like this was using an image analysis technique known as a template match. This is a way to look for similar occurences of one image within another. I accomplished this pretty simply using the Python library OpenCV.

Here are the basic steps I used to obtain the coordinates of each station icon within the metro map:

Decide on a template image that matches the icon of each Metro station. Call its width \(w\) and its height \(h\).
Decide on some match threshold \(T\). After a bit of guessing, I decided on \(T = 0.7\).
Process the image using OpenCV to find the a score representing how well each \(w x h\) block of pixels matches the template. Choose each block that scores higher than the threshold.
Filter this list of rectangles so that the distance between each rectangle is greater than or equal to \(\sqrt{(w/2)^2 + (h/2)^2}\). This removes the possibility of some stations matching the template multiple times.
Enter each station name manually and export.

I had to tone down some of the brighter colors in the map so that the stations lying on those colors matched the template better. I also removed a couple icons that looked kind of like stations but weren’t, just to be sure. I could have made more use of OpenCV to detect station name labels within the image and map them to the closest station, but for the sake of time I just labeled each station manually. (There are 91 stations in the Metro system; this only took a couple of minutes.)

Washington, D.C. Metro Map with Template Matches Illustrated — Metro Map with template matches in red

This technique worked pretty well. I then wrote some Javascript to combine these two arrays into objects for each station in the two states (map coordinates vs. actual geography) and create the animation between states. I also produced this GIF with stations labeled and this GIF without labels. Check out my code on Github for more detail on the actual implementation of these techniques.