Oh data! Oh my!

September 13 2012

Katalina, API Researcher and one of our Choreographers, discusses the team’s work in making today’s omni-prevalence of data useful and digestible:

We at Temboo have been keeping a gimlet eye out for open government data sets that might have wide ranging appeal and make for interesting apps. As a result of Obama’s Open Government Initiative new data sets are being released all the time.

We almost fell off our chairs thinking about all the strange and wonderful apps that could be built toward the public good when we came across the likes of the NASA data sets. How about a Lightning app that tracks where in the world lightning is striking in real time? Or an app that calculates your chances of being struck by a piece of falling space debris given your coordinates? The brutal truth is that NASA has a lot of data, and it will be some time yet before all those zettabytes of information are easily available to the public.

One of the issues with big data releases from the government, of course, is that it can be complex. The agencies often dump files upon the public, providing no viable interface or documentation, and we are expected to wade through the results. Case in point: the Department of Education data, built upon the Socrata system. We felt strongly that we wanted to add this data to our Library, but the information returned by the Socrata based methods was a no-man’s land of metadata fields and useless JSON junk. The data itself was hidden in a labyrinthine structure of unintelligible field names with equally bewildering numerical values contained therein.

What does it all mean? We rolled up our sleeves and dug through the voluminous documentation to see if we could wrap our heads around it all and come up with more meaningful kinds of outputs. We then painstakingly mapped the raw government data field names to ones that are concise — and above all,  descriptive — of what information they really contain.

Source to Target (DoE raw data to Choreo-categorized) mapping

To make the Department of Education data more intelligible to an average user, we performed some legerdemain using the Map function in Twyla, the software we use to build Choreos. We turned the hairy source XML into well-groomed target XML files by mapping each field one-by-one. For the sake of clarity and usability, we occasionally omitted data fields we felt were too obscure or cumbersome. With this combing done, building the Choreos that make up our Department of Education Bundle was a simple process.

The Department of Education bundle will be released with version 1.73 later this month.

Katalina, API Researcher and one of our Choreographers, discusses the team’s work in making today’s omni-prevalence of data useful and digestible: We at Temboo have been keeping a gimlet eye out for open government data sets that might have wide ranging appeal and make for interesting apps. As a result of Obama’s Open Government Initiative new […]