Secrets and Personally Identifiable Information (PII) Across Our API Definitions
27 Jan 2020
As API providers and consumers we tend to have access to a significant amount of credentials, keys, tokens, as well as personally identifiable data (PII). We use this sensitive information throughout the API integration and delivery life cycles. We depend on credentials, keys, and tokens to authorize each of our API requests, and we potentially capture PII as part of the request and response for each the individual API requests we execute regularly. Most developers, teams, and organizations I’ve spoken with do not have a strategy for addressing how secrets and PII are applied across the internal and external API landscape. API management over the last decade has helped us as API providers better manage how we define and manage authentication for the APIs we are providing, but there hasn’t been a solution emerge that helps us manage the tokens we use across many internal and external APIs.
With this reality, there are a lot of developers who are self-managing how they authenticate with APIs, and work with PII that gets returned from APIs. I am working on several talks with enterprise organizations about this challenge, and to prepare I want to work through my thoughts on the problem, as well as some possible solutions. I wanted to map out how we integrate with the APIs we are developing and consuming, and think about what the common building blocks of how we can better define, educate, execute, audit, and govern the secrets and PII that is applied throughout the API life cycle across all of the APIs we depend on. Allowing me to have a more informed conversation about how we can get better at managing the more sensitive parts of our operations.
What Are The Types of Sensitive Information?
First I wanted to understand the types of common information being applied by API developers, helping me establish and evolve a list of the types of data we are looking for when securing the API development life cycle. This list will grow as I flesh out this work more, but here are the types of sensitive information I’m looking to identify and manage across API operations.
- API Keys - Static keys and secrets generated by API providers.
- API Tokens - Dynamic OAuth, JWT, and other tokens being issued.
- Username / Passwords - Account usernames and passwords.
- Personally Identifiable Information (PII) - Names, age, addresses, phones, SSN, and other PII.
I am sure there are other pieces of data we should be looking for, but this provides a nice list to get us going. I just want to begin making a meaningful impact on the most critical aspects of this conversation, and we can expand into other areas in the future. It is a problem we need to begin investing in now, otherwise as the number of APIs we depend on increases, the bigger this problem will become.
Where Is Sensitive Information Is Used?
There are going to be many ways in which these sensitive items get used across API operations. I don’t think I”ll be able to identify all of them because each organization is going to operate differently, but it helps to have a list to consider when solving this problem. Here are some of the common places where sensitive information gets stored, specifically as part of the API development process.
- Definitions - Swagger, OpenAPI, Postman collections and environments all have the potential to be storing secrets and PII.
- Code - Developers are baking secrets and PII into the code they are producing and pushing out on a regular basis.
- Applications - The desktop and server applications we are using throughout the API life cycle will possess secrets and PII.
- Services - The cloud services that we are putting to work throughout the API life cycle will possess secrets and PII.
We are applying secrets, and in some cases applying PII across all of these areas. Rarely with any way of defining, guiding, auditing, and governing how they are applied across teams. Peppering a variety of secrets and PII across many different locations, with no way of understanding the scope and properly securing, scrubbing, and getting things in order—something folks should be concerned with across all levels of the enterprise.
Where Is Sensitive Information Stored?
Building upon where sensitive information stored, I wanted to think about where it actually resides for use across operations. It will be used in different ways by different individuals, teams, and organizations. However, I think there are some common places we can be looking for how sensitive information gets stored as it is being applied throughout the API life cycle. These are just a handful of the locations I am thinking about currently, providing me with targets to think about as I look to get a handle on sensitive information across API operations.
- Local - Each individual user is storing definitions, code, and has applications installed that all contribute to the storing and applying of secrets and PII.
- Cloud - Each individual user is potentially storing, syncing, and ackupingv up data to the cloud, extending the reach of the secrets and PII in use.
- Postman - It is common for developers to be using Postman, leaning on the application to store secrets in collections and environments.
- Repository - Definitions and code will be often managed using Git, making repositories a good place to look for secrets and PII in use across an organization.
- Workspaces - Taking advantage of the workspaces offered by different applications used to engage with the API life cycle, establishing common spaces where API definitions and environments are used.
I am sure there are more locations I can target, but this gives me a place to start. This represents about 75% of the API definition, code, and other artifacts where we are going to find secrets and PII. I want to examine each of these locations and see what our options are when it comes to improving upon our behavior when it comes to managing secrets and PII across the API life cycle.
How Do We Improve Our Situation?
One of the first things we can do to help improve our situation is to have a conversation about it. Discuss the type of information we are concerned about, how it is being applied, and where we can find it stored. I want to help stimulate the conversation by thinking about some tangible things we can be focusing on to help improve how secrets and PII are used across teams. I will be fleshing out the details of each of these areas in future posts, but these are the areas I would like to focus on when it comes to how we manage sensitive information in use across our APIs.
- Definitions - Using OpenAPI and Postman collections consistently across all API operations helps isolate how secrets and PII is applied and stored.
- Environments - Using machine readable environments help isolate and standardize how secrets and PII is applied and stored across API operations.
- Pipelines - Having a standardized approach to how secrets are used as part of the pipeline operation, consistently using definitions and environments.
- Documentation - Establish constant practices for how documentation is published using tools where definitions and environments can be applied.
- Automation - Using APIs and the CLI to automate the API life cycle, using definitions and environments to drive all of the automation, and orchestration.
- Guidance - Investing in automated and self-service guidance for individuals and teams when it comes to how secrets and PII should be produced, applied, stored, and refreshed as part of
- Education - Make sure there is investment in education of individuals and teams when it comes to get everyone up to speed on how the management of secrets and PII works across teams.
- Storytelling - Tell stories regularly about the types of secrets and PII that are of concern, and how they are applied and stored across API operations, helping keep the conversation going across teams.
This is how we are going to improve things. By standardizing the usage of OpenAPI and Postman collections across the API life cycle, and then leveraging machine readable environments for manual and automated execution of API integrations. Allowing developers to more consistently define, store, apply, and refresh secrets and PII manually when working with APIs, while also automating via monitors, pipelines, as well as publishing documentation and other ways of making our APIs more usable by other developers, systems, and applications. Then we should be able to automate the auditing, governance, and monitor the health of how secrets and PII are used across the API delivery life cycle. Helping shine a light on the problems associated with having secrets and PII laying all over the place, and get people doing things in a more organized fashion.
I feel like how we manage our secrets is one of the biggest challenges we face in the coming years. The number of APIs are only growing, but there hasn’t been enough discussion and solutions for helping us manage how we use secrets across the API life cycle. I am also pretty convinced that the executable dimension of using Postman collections and environments provides us with a blueprint, and tooling for how we can improve how we manage secrets and PII across the API life cycle. We just need to get more strategic about how we use Postman as not just an API client, but also for making documentation available, and applying API collections as part of CI/CD pipelines that are driving the deployment of our API infrastructure. I am going to flesh out some secrets governance strategies using Postman for some of the conversation I’m having here in January and February, and then revisit this topic to see what other refinements I can make to how I approach this subject on my blog, and in my work.