Two Vizzies that I Like and Why

I’m a very big fan of railway transportation system. I interned at a bullet train “Shinkansen” manufacturer, completed final project for railways company in my undergrad and joined a railfan community. I looked into two subway visualizations that I found very intriguing. First, it is a work by Mike Barry and Brian Card on MBTA data visualization, an interactive exploration of Boston’s subway system ( I was suggested to this viz by a friend who works as UI&UX designer in Asana, a startup founded by Facebook co-founder. Second, it is also visualization that explores subways data, MTA subways in NYC (

Visualizing MBTA Data (Boston)

The first visualization project describes how busy the subway system in Boston is. From train operation on a typical day which includes approximately 1150 trips from 5AM to 1AM the next morning, number of entrances and exits heatmap which contains 425,999 people in a typical weekday, to congestion and delay that affect people commuting, the publicly available subway data is visualized very well using D3.js.

Below is the figure of MBTA trips on Monday, February 3, 2014 made using parallel coordinates technique in D3 (

The figure is interactive, so it is quite difficult to explain it clearly. But here, Mike and Brian did a pretty good job in visualizing subway trips on the red, blue and orange lines, from Alewife station to Forest Hills station minute by minute. The data is displayed in multi-dimensional space, with 2-dimensional scattergrams as basic interface representation, where x-axis represents train schedule from 5AM to 1AM, and y-axis represents the station names. Notice that not every single station name is displayed on the screen to retain simplicity. The station name will appear when the pointer hovers around the thin grey horizontal lines in between the start and end stations. This is relevant to the tasks mentioned by Shneiderman, overview of the entire collection and details-on-demand in which item is selected to get details when needed. The red, blue and orange plots represent the schedule when the respective train lines arrive on particular station. On the right side of figure, detailed description on important train event is given, which is very helpful to understand the figure deeper.

Another interesting visualization is the entrances and exits per station figure as below, which uses the hierarchical bar ( and heatmap technique (


Similar to the previous one, this figure also visualizes the data in multi-dimension space with 2-dimension interface representation. The figure consists of hierarchical bar showing Harvard as the busiest station with 19,400 turnstile entries per day and heatmaps that show the busy hour during weekday and weekend. The small circles on the left indicate the train lines that stop by at the stations.

As stated on the summary, this project mainly collected the data through publicly available data from MBTA API ( The data collection process is not described, but I suspect they requested specific sensitive data to MBTA, such as per-minute entry and exit counts at each station (turnstiles used for payment).

What makes this visualization work well?

  • It cleverly integrated several items of data into comprehensive figures, while keeping the view as simple and details-on-demand.
  • It truly tells a story through data from beginning of the article to the end, helping people to understand the trains in Boston.
  • As mentioned by Lev Manovich, it takes away everything which is not essential in the figures and connects visual attributes with semantics. Not simply introduce pretty visualization design, but the choices of attributes are connected each other.
  • Very high data density. As mentioned by Tufte, this is one of principles of good visualization.
  • I enjoyed reading the article and interacting with the figures.

What makes this visualization work less well?

  • It doesn’t provide the zoom and filtering tasks. Users are not given control to the zoom focus of figures and filtering out things. Sometimes there are uninteresting items that users just want to filter. (Actually the last figure on the web – below “Your Commute” – demonstrates the filter task very well, but I didn’t include it here)


Visualizing MTA Data (NYC)

The second visualization describes about NYC subways which are ridden 1.6 billion times in a year. It is a similar story with the first one, but with different approach, from Manhattan with the biggest number of riders (more than 50%), student riders and to the most visited stations during weekday vs weekend. As I checked through the web script, the figures are not made by D3.js.

Below are the figures that show Manhattan as the borough with busiest train stations, although Brooklyn has the most number of stations.





The figures are very simple, but able to illustrate us clearly that riders are mostly found in Manhattan, regardless of how big the population numbers in Brooklyn is and how big the land area in Queens is. There is also no correlation between number of stations and annual ridership. Although there might be no specific meaning on the blue small circles, the circles help readers to catch the illustration faster, with a simple 1-dimensional data like this. The figures give us choice to choose (filtering) which data to be presented: “Number of stations”, “Annual ridership”, “Boroughs’ population” or “Boroughs’ land area”.

Another interesting visualization is the most visited stations figure as below:


The figure visualizes data in multidimensional space. The circle attributes illustrate number of riders on particular stations, the more number of riders is, the bigger the circle is. When the pointer hovers on each circle, it gives us the detail number of riders. The figure also gives readers options to look at which data between the Boroughs and weekday/weekend time.

Like the first project, this project also utilizes publicly available data from MTA website ( as stated on the about section. The data collection and cleaning process are also not described.

What makes this visualization work well?

  • As stated by Shneiderman, it gives choice to filter out uninteresting items and select a group to get details when needed (details-on-demand).
  • Provide great insight in a simple way of visualization.
  • It uses small multiples technique to visualize large quantities of data. According to Tufte, the figures have design-simple graphics.

What makes this visualization work less well?

  • Unnecessary details are included (excessive use of attribute), like the small blue circles to illustrate the data. I’m actually a bit confused by this, the circles have no meaning but helpful to understand the data.
  • I’m confused by the sorting numbers and small arrows on right side of the most visited stations figure.


Comparison between MBTA and MTA visualization

No Attribute MBTA MTA
1 Usability Require more learnability time to understand the graphics. Easier to understand due to simplicity.
2 User experience The graphic interactions are impressive, helped providing visualization in engaging but meaningful way. Easy to control, but some interactions are hard to understand.
3 Data type taxonomy 2-dimensional, multi-dimensional and temporal 1-dimensional and multi-dimensional
4 Tasks Overview, details-on-demand, relate Filter, details-one-demand


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s