Taking A Look At Whats Next For The Environmental Protection Agency (EPA) Envirofacts Data Service API

EPA

I was asked by folks at the Environmental Protection Agency (EPA) to provide some feedback on the Envirofacts Data Service API, as they prepare to work on the next iteration. I took a quick glance at the landing page for their service, I saw a simple URL layout showing how to make API calls, and made an estimate that it would take me probably an hour or two (at the most) to profile the API.

As I dug into the process of profiling the Envirofacts Data Service API one evening in May, I realized I was wrong about the scope of the API, and became unsure how long it would actually take me. Then this work got lost in the shuffle of my summer, and is something I only recently picked up. I'm not happy if I can't provide an agency with some direction on where to go next, and after about 12 hours of work, I think I have some valuable feedback that they can run with.

The Envirofacts Data Service API program consists of a single landing page, with an overview of how to use the API, and a myriad of pages below, that explain the underlying data model put to use. The API is what I consider a very resource driven API design, meaning it reflects the database resource it came from, and not much emphasis on how the API driven resources will be used.

While the API does use the URL, it uses few of the other HTTP components that make some RESTful. I can see how the design would make sense to a database engineer, but will be a little confusing for API developers.

After looking beyond this portal I have since found other possible APIs, but honestly they are often even more incoherent than the Envirofacts Data Service API. I'm not trying to review the entire EPA API efforts, and will be specifically focusing on the resources available in the Envirofacts Data Service API for this round.

Environmental Protection Agency
  EPA Air Facility System (AFS) API  
  EPA Biennial Report API  
  EPA Comprehensive Environmental Response, Compensation, and Liability Information System API    
  EPA Facility Registry System API  
  EPA Greenhouse Gas API  
  EPA Integrated Grants Management System API  
  EPA Locational information API  
  EPA Permit Compliance System API  
  EPA Radiation Ambient Monitoring API  
  EPA Radiation Information Database API  
  EPA Resource Conservation and Recovery Act Information API  
  EPA Safe Drinking Water Information System API  
  EPA Toxics Release Inventory API  

After I discovered the 411 tables across these 13 groups, and learned the common URL pattern for querying, I decided to define each table as its own endpoint, rather than relying on each table to be included via a {table} path parameter, I opted to hard code it. Even though most of them are incoherent, some still articulate a little bit more about what they resource might do, and once you make a request, you get an even better idea. All of this can go a long way towards helping people understand what is going on.

It wouldn't take much to apply a coherent summary  to each endpoint that describes what is stored in the table for use. Once I had a list of all tables, I went ahead and made a call to each of the 411 endpoints in the 13 areas, and generated a Swagger API definition for each. Using Charles Proxy I was able to generate the underlying data model for each, which is necessary for generating SDKs, and can be used as a central truth throughout other aspects of API integration. The current API design also allows you pass in a field, and apply an operator against it when searching--I opted to leave this out of this iteration, until I had a clear diction of endpoints, and the underlying data model defined for each.  The API is perfectly usable without this.

Keeping Things Simple
My recommendation for any future API release out of the EPA team would be focused on just simplifying things. When you land on the home page, you get the idea there is an API present, but you do not grasp the depth of the resource. A simple list of the various API groups is important. A list that I hydrated from the acronyms, to better demonstrate what lies beneath. Calling things by their actual names just makes things more intuitive. You need to reach out of your government silos. I had to really work hard to make sense of the data model at play, I was sure there would be a meta API or download allowing me to quickly understand things, but I couldn't find it. By creating Swagger definitions for all API endpoints, complete with associated definitions for the data model, I can now easily build querying, filtering, and other mechanisms into my clients. 

Speaking In Plain English
While FRS_PROGRAM_FACILITY may had made sense to the database administrator when naming the original, it does not adequately describe the resource it is serving up. A big part of the next version for these APIs needs to focus on renaming towards more meaningful endpoints over the cryptic table names, and more descriptive fields for each of the underlying data definitions. After crafting the Swagger definitions for these APIs I am blown away by the amount of information in here, obfuscated by the cryptic database naming conventions.

Wrap In A Clean Portal
The current landing page for the Envirofacts API is fairly cluttered, and ultimately doesn't say much--it made me work to hard to get what I need. My goal was to distill down the 13 APIs I found buried in the Envirofacts API page, and expose exactly what you need to understand and get to work using any of the 13 APIs and the over 400 endpoints--nothing more.  I started with a simple Github Pages hosted template, with a single APIs.json home page, and interactive documentation for each of the APIs (which you can fork).

Environmental Protection Agency (apis.json)
The United States Environmental Protection Agency (EPA or sometimes USEPA) is an agency of the U.S. federal government which was created for the purpose of protecting human health and the environment by writing and enforcing regulations based on laws passed by Congress. The EPA was proposed by President Richard Nixon and began operation on December 2, 1970, after Nixon signed an executive order. The order establishing the EPA was ratified by committee hearings in the House and Senate. The agency is led by its Administrator, who is appointed by the president and approved by Congress. The current administrator is Gina McCarthy. The EPA is not a Cabinet department, but the administrator is normally given cabinet rank.
APIs