API Definitions For 28,835 Data Sets Across 207 City, County, State, and Federal Data Portals

My friend Taylor Barnett (@taylor_atx) over at Transposit asked if I had an OpenAPI for Socrata the other day, which triggered my memory about a whole bunch of old work I had done around Socrata that I had never got to a fully complete state and therefore never published. If you aren’t familiar with Socrata, they provide public data platform for city, county, state, federal, and other entities, making them them one of the goto place when understanding open data in the United States at scale. While you can go to the almost 200 entities that Socrata publishes data for, you can also easily discover and work with data from across all the providers via the Socrata discovery and metadata APIs. Making the platform a rich place to go for mining interesting public data sets, and to get a handle on on government of all shapes and sizes is thinking about open data and APIs.

While I had done a lot of work earlier around learning how Socrata publishes data, most of my work needed to be refreshed. I started by going to Socrata’s main developer portal where I navigated to their Discovery API to give me a list of the 198 domains in which they have data published. Then I got to work pulling all the published datasets and metadata for all of those domains using their Metadata API, which gave me all the details about each data set, including columns, tags, and other relevant information. Socrata is a multi-levell API, and their discovery and metadata APIs just gives me access to all the entities that are available, then you can use the individual APIs for each entity to actually pull specific datasets. Before I moved on, I wanted to make sure and create an OpenAPI and Postman collection for the Socrata APIs, before publishing OpenAPIs and Postman collections for each individual data provider on their platform.

After creating OpenAPI and Postman collections for both of Socrata discovery and meta data APIs, I wanted to do the same for every single data published within each individual domain. So I wrote a script that used the Socrata APIs to generate API definitions for every data set published across the almost 200 domains they had published. To make things as robust as I could, I created the following API definitions for each individual data set:

  • JSON Schema - An individual JSON Schema file generated from the data set metadata.
  • JSON Example - A snapshot (5 rows) of each actual datasets to show as an example.
  • OpenAPI - An individual OpenAPI for each individual dataset that is published within domain.
  • Postman Collection - An individual Postman collection, allowing each data set to be executed.

I wanted to be able to get at each individual data collection as an atomic unit of value--individuall quantifying the metadata and data value that each one possesses. To help realize this vision I published this list of 207 individual domains, where you can access any of their data sets published as part of a set of API definitions which can be used in a variety of ways.

City of Dallas, Texas View Data
Permitting Dashboard for Federal Infrastructure Projects View Data
City of Auburn, Washington View Data
State of Connecticut View Data
Cityof Dubuque, Iowa View Data
City of Naperville, Illinois View Data
Lehman College Community Connect View Data
City of Gainesville, Florida View Data
City of Nashville, Tennessee View Data
Federal Communications Comission (FCC) View Data
City of Kansas City, Missouri View Data
Johns Creek, Georgia Police Department View Data
City of Saint Paul, Minnesota View Data
Prince Georges County, Maryland View Data
Australian Capital Territory View Data
City of Menlo Park, California View Data
City of Chicago, Illinois View Data
State of Colorado View Data
Province of Nova Scotia View Data
State of Massachusets - Cannabis Control Commission View Data
County of Riverside, California View Data
City of Dallas, Texas View Data
County of San Diego, California View Data
City of Chattanooga, Tennessee View Data
State of Utah View Data
City of Cincinnati, Ohio View Data
State of New Jersey Health Data View Data
City of Oakland, California View Data
City of Greensboro, North Carolina View Data
County of Ramsey, Minnesota View Data
City of Richmond, California View Data
City of Mesa, AZ View Data
City of Rancho Cucamonga, California View Data
City of Grande Prairie, Alberta View Data
City of Gainesville, Florida View Data
Provice of Edmonton, Canada View Data
City of Providence, Rhode Island View Data
City of Providence, Rhode Island View Data
Energy Star View Data
City of Topeka, Kansas View Data
City of Norfolk, Virginia View Data
County of Santa Clara View Data
City of Tacoma, Washington View Data
State of Maryland View Data
State of Oregon View Data
City of Mount Pleasant, South Carolina View Data
County of Cook County, Illinois View Data
State of Missouri View Data
Global Island Partnership (GLISPA) View Data
City and County of San Francisco, California View Data
City of Fort Worth, Texas View Data
Montgomery County, Maryland View Data
City of Plano, Texas View Data
County of Sonoma, California View Data
Macoupin County, Illinois View Data
Montgomery Schools, Maryland View Data
Inter-American Development Bank View Data
California State Treasurer View Data
State of Washington View Data
Santa Monica, California Sustainable City Plan View Data
County of Marin, California View Data
Queen Anne's County, Maryland View Data
State of Michigan View Data
City of Hampton, Virginia View Data
Carson City, Nevada View Data
City of Henderson, Nevada View Data
City of Baton Rouge, Louisiana View Data
UDOT - Open Data Portal View Data
United States Agency for International Development View Data
City of Little Rock, Arkansas View Data
City of Memphis, Tennessee View Data
Data.Medicare.gov View Data
City of Oxnard, California View Data
Howard County, Maryland View Data
City of Greensboro, North Carolina View Data
City of Seattle, Washington View Data
OpenDataNetwork View Data
City of Mesquite, Texas View Data
City of Chattanooga, Tennessee View Data
Culver City, California View Data
City of Montgomery, Alabama View Data
City of Richmond, Virginia View Data
Montgomery County, Maryland View Data
County of San Mateo, California View Data
City of Virginia Beach, Virginia View Data
Fulton County, Georgia View Data
League of Oregon Cities View Data
National Library of Medicine View Data
Department of Transportation View Data
City of Tuscaloosa, Alabama View Data
City of Evanston, Illinois View Data
City of Corona, California View Data
Cook County, Illinois View Data
State of Delaware View Data
City of Reading, Pennsylvania View Data
City of Anchorage, Alaska View Data
City of Pittsburgh, Pennsylvania View Data
Province of Winnipeg, Manitoba View Data
Bay Area Metro View Data
Pierce County, Washington View Data
Strathcona County, Alberta View Data
City of Austin, Texas View Data
The Corporation for National and Community Service View Data
New Zealand Internet Data Portal View Data
Consumer Finance Bureau (CFB) View Data
City of College Station, Texas View Data
US Patent and Trade Office View Data
City of St. Petersburg's, Florida View Data
State of Pennsylvania View Data
State of Iowa View Data
City of Colorado Springs, Colorado View Data
City of Los Angeles, California Controller View Data
County of San Diego View Data
Metropolitan Airports Commission View Data
Data.Healthcare.gov View Data
Office of the Electoral Comptroller Puerto Rico View Data
City of Framingham, Massachusetts View Data
Nassau County, New York View Data
City of Detroit, Michigan View Data
King County, Washington View Data
State of New Jersey View Data
San Bernardino County, California Health Data View Data
State of Vermont View Data
City of New York View Data
Nashville Public Investment Plans View Data
Douglas County, Colorado View Data
City of Los Angeles, California View Data
State of Hawaii View Data
City of Buffalo, New York View Data
City of Novi, Michigan View Data
Cook County, Illinois View Data
State of Hawaii View Data
Open Payments Data - CMS View Data
Open Data Portal View Data
Data.Medicaid.gov View Data
City of Albany, New York View Data
Los Angeles County, California View Data
City of Janesville, Wisconsin View Data
The City of West Hollywood, California View Data
City of Roseville, California View Data
San Mateo County, California View Data
Employment Development Department for California View Data
Southern Nevada Health District View Data
City of Albany, New York View Data
City of Redmond, Washington View Data
City of Calgary, Alberta View Data
OCity of Davenport, Iowa View Data
City of Everett, Washington View Data
City of Miami, Florida View Data
City of Baltimore, Maryland View Data
City of Seattle, Washington View Data
Datos Abiertos Colombia View Data
State of Texas View Data
Prince George County, Maryland View Data
City of New Orleans, Lousiana View Data
Grand Rapids, Michigan View Data
City of Franklin, Tennessee View Data
City of Cambridge, Massachusetts View Data
Fulton County, Georgia View Data
City of Glendale, California View Data
City of Edmonton, Alberta View Data
Universal Service Administration View Data
City of Fort Collins, Colorada View Data
State of Michigan View Data
City of Topeka, Kansas View Data
Puerto Rico Government View Data
U.S. Department of Commerce View Data
City of Melbourne, Australia View Data
State of New York Health Data View Data
Centers for Disease Control and Prevention View Data
Water Point Data Exchange View Data
City of Somerville, Massachusetts View Data
City of Hartford, Connecticut View Data
City of Kansas City, Missouri View Data
State of Michigan View Data
US Department of Transportation View Data
NASA View Data
City of San Bernardino, California View Data
Franchise Tax Board of California View Data
City of Urbana, Illinois View Data
Sunshine Coast Council View Data
City of Berkeley, California View Data
Center for Disease Control and Prevention View Data
Government Financial Reports - California State Controller's Office View Data
Top 10 Live Well Indicators View Data
City of Honolulu, Hawaii View Data
County of Alameda, California View Data
Washington State Liquor and Cannabis Board View Data
State of New York View Data
City of Virginia Beach, Virginia View Data
Centers for Medicare and Medicaid Services View Data
datazONE View Data
City of Orlando, Florida View Data
Province of Prince Edward Island View Data
City of Mesa, Arizona View Data
City of Santa Monica, California View Data
St. Louis County, Missouri View Data
The Data Center View Data
World Bank View Data
Province of Ontario View Data
San Mateo Performance Dashboards View Data
USDA AMS - Data & Resources View Data
Datos Abiertos Colombia View Data
City of Edmonton, Alberta View Data
City of New York View Data
City of New York View Data
City of New York View Data
There is still a lot of work to be done on this project. The listing is alphabetic order by domain right now, but with a couple more hours of work I can have it sortable by name, type of project, and some other data points--I just ran out of time. Next, I’d like to create a postman collection for each domain, and organize by tags. However, not all the individual data sets are well tagged, and I would need some more time to automate the tagging of listings based upon their title, description, and other elements. It is a good start though, and it felt good to move forward an old project that was just sitting around. This is just one slice of the pie. There are other platforms to layer onto this work, adding in open source solutions like CKAN, DKAN, as well as the other commercial providers in the space. I see that Socrata is not part of Tyler Tech…I will have to further evaluate what this means for API access across this layer of the government data and API landscape. Regardless, there is still a lot of work in this area before it adequately represents the public data sector.

I think that doing what I did here, but for CKAN and DKAN is the next step in this journey. I’ll also think more about how I can run all this on a schedule to make sure I keep it all up to date. I’ll need some more monitors and other resources to ensure I can reliably keep everything operating smoothly and able to identify when one of the domains go radio silent. Each of the data sets have modified date information, allowing me to tune in to changes at the data set level, which I can also aggregate up to the domain level to identify when things are changing, or more importantly, when things have stopped changing. Beyond publishing of these API definitions, I want to begin mining this public data landscape for other relevant signals. Evaluating the tags people are using (or not), and getting a better handle on the types of data that is being shared across city, county, state, federal, and other public data repositories. First, I’ll get to work on aggregating CKAN cities, representing the open source side of the equation, then I’ll spend more time thinking about how I can mine relevant information from across all the public data domains I have indexed at that time.