Posted on 09-13-2013
In government there is a fear of exposing public data via APIs--rightfully so. This is not just a government concern, it exists in all industries within each an every business and organization. We all possess private data, and when opening up API driven resources, we need to make sure none of this is exposed in un-desired ways.
I find it hard to believe, that after almost 10 years of public APIs, there isn't a reasonable solution to masking, scrubbing and anonymizing data that is made available via APIs. I wrote about research into finding a solution at UC Berkeley a while back, but to date I have not seen any real solutions to this problem yet.
I was talking with another Presidential Innovation Fellow (PIF) the other day about possible solutions for making sure Personally Identifiable Information (PII) doesn't get exposed via government APIs. Afterwards, I got to thinking about possible API options, and I don't think it would be that difficult to get started with a basic solution.
My thoughts are, that you could provide a simple API proxy, that would terminate requests from any Swagger defined APIs and easily iterate through each value and apply a series of regular expressions against it to look for common PII or other data that shouldn't be exposed. The proxy could automatically replace with template values like John or Jane Doe for names, 1234 Street for addresses, etc.
API providers could set a list of areas they are concerned about exposing with the API proxy configuration, and it would enforce all filtering required. The proxy could also look for other common patterns, and make recommendations of other areas that could be masked, scrubbed or anonymized that the API provider didn't consider.
Technically it sounds like a pretty simple solution, that could get smarter and faster over time at identifying sensitive information, to better serve API providers. This type of proxy could be default in healthcare, education and in other sensitive environments and be default in development environments, or in production environments that are accessible to non-trusted consumers.
Of course this is something I'd love to explore, but I just don't have the time to build it. This is something that wouldn't be too hard to build and evolve, and could have potentially huge impacts across many important industries, and go far to protect all of our sensitive data from potential privacy violations.
As with all of my ideas, I just want to share it publicly, in hopes someone will build it.
comments powered by Disqus
Winning in the API Economy
|Download as PDF|
Latest Blog Posts
- Thank You For Your API Evangelist Blog(s)
- Video From The Hypermedia Panel At API-Craft In Detroit Last Month
- Please Open Source Your API Before Shutting It Down
- Explaining My Work Around APIs In Higher Education To Institutions
- You Can Have An API Just By Choosing Products And Services That Have APIs
- Using Excel As An API Datasource And An API Client For The Masses
- Brewing Up Something Awesome With The Jive Software API
- Relationship Between APIs And Containers
- Real-time and Visualizations Will Be Key in Financial API Deployments
- Notification Focused Startups Within Leading API Ecosystems
- APIs That Do One Thing And Do It Well Like ZipLocate
- Which API Do I Need?
- The Expanding API Conference Landscape
- Ocotoparts Open Source Google Spreadsheet
- Andrew Nacin Of WordPress @APIStrat Chicago
- Push Button API Deployment With The Heroku Button
- WordPress Style API Modules For Government
- The Heroku HTTP API Design Guide
- What I Have Been Calling API Trends, Are Slowly Being Baked Into API Operations
- FDA Finding Their API Mojo With A New Drug Label API
- Adding PokitDok To Healthcare Research And The API Stack (Well They Did)
- Why I Am Continuing To Integrate Zapier In My Business Workflow
- Who Is Going To Build The Uber API Platform For The Sharing Economy?
- The API Focused Dev Shop
- Route SMS Messages To Google Spreadsheets Via Twilio API With TwilioSheet