Automated Mapping Of The API Universe With Charles Proxy, Dropbox, OpenAPI Spec, And Some Custom APIs

I have been working hard for about a year now trying to craft machine readable API definitions for the leading APIs out there. I've written before about my use of Charles Proxy to generate OpenAPI Spec files, something I'm evolving over the last couple days, making it more automated, and hopefully making my mapping of the API universe much more efficient.

Hand crafting even the base API definition for any API is time consuming, which is something that swells quickly to being hours when you consider the finish work that required, so I was desperately looking how I could automate this aspect of my operations more. I have two modes when looking at an API, review mode where I'm documenting the API and its surrounding operations, with the second being about actually using the API. While I will still be reviewing APIs, my goal is to immediately begin actually using an API, where I feel most of the value is at, while also kicking off the documentation process in the same motion.

Logging All Of My Traffic Using Charles Proxy On Machine
Using Charles Proxy, I route all of my network traffic on my Macbook Pro through a single proxy which I am in control of, allowing me to log every Internet location my computer visits throughout the day. It is something I cannot leave running 100% of the time, as it breaks certificates, sets of security warnings from a number of destinations, but is something I can run about 75% of my world through--establishing a pretty interesting map of the resources I consume, and produce on each day.

Auto Saving Charles Proxy Session Files Every 30 Minutes
While running running Charles Proxy, I have it setup to auto save a session XML every 30 minutes, giving me bite size snapshots of transaction throughout my day. I turn Charles Proxy on or off, depending on what I am doing. I selected to save as a session XML file because after looking at each format, I felt it had the information I needed, while also easily imported into my database back end.

Leverage Dropbox Sync And API To Process Session Files
The session XML files generated by Charles Proxy get saved into my local Dropbox folder on my Macbook Pro. Dropbox does the rest, it syncs all of my session XML files to the cloud, securely stored in a single application folder. This allows me to easily generate profiles of websites and APIs, and something that passively occurs in the background while I work on specific research. The only time Dropbox will connect and sync my files, is when I have Charles Proxy off, otherwise it can't establish a secure connection.

Custom API To Process Session Files Available In Dropbox
With network traffic logged, and stored in the cloud using Dropbox, I can then access them via the Dropbox API. To handle this work, I setup an API that will check the specified Dropbox app folder, associated with its Dropbox API application access, and import any new files that it finds. Once a file has been processed, I delete it from Dropbox, dumping any personally identifiable information that may have been present--however, I am not doing banking, or other vital things with Charles Proxy on.

Custom API To Organize Transactions By Host & Media Type
I now have all my logged transactions stored in a database, and I can begin to organize them by host, and media type--something I'm sure I will evolve with time. To facilitate this process I have created a custom API that allows me to see each unique domain or sub-domain that I visit during my logging with Charles Proxy. I am mostly interested in API traffic, so I'm looking for JSON, XML, and other API related media types. I do not process any image, and many other common media types, but do log traffic to HTML sites, routing into a separate bucket which I describe below.

Custom API To Generate OpenAPI Spec For Each Transaction
In addition to storing the primary details for each transaction I log, for each transaction with a application/json response, I auto-generate an OpenAPI Spec file, mapping out the surface area of the API endpoint. The goal is to provide a basic, machine readable definition of the transaction, so that I can group by host, and other primary details I'm tracking on. This is the portion of the process that generates the map I need for the API universe.

Custom API To Generate JSON Schema For Each Transaction
In addition to generating an OpenAPI Spec for each transaction that I track on with a application/json response, I generate a JSON Schema for the JSON returned. This allows me to map out what data is being returned, without it containing any of the actual data itself. I will do the same for any request body as well, providing a JSON Schema definition for what data is being sent as well as received within any transaction that occurs during my Charles Proxy monitoring.

Automation Of Process Using The EasyCRON Layer Of Platform
I now have four separate APIs that help me automate the logging of my network traffic, storing, processing of all transactions I record, then automatically generate an OpenAPI Spec, and JSON Schema for each API call. This provides me with a more efficient way to kick off the API documentation process, automatically generating machine readable API definitions and data schema, from the exhaust of my daily work, which includes numerous API calls, for a wide variety different reasons.

Helping Me Map Out The World Of Web APIs As The API Evangelist
The primary goal of this work is to help me map out the world of APIs, as part of my work as the API Evangelist. Using this process, all I have to do is turn on Charles Proxy, fire up my Postman, visit an API I want to map out, and start using the API. Usually within an hour, I will then have an Open API Spec for each transaction, as well as aggregated by host, along with a supporting JSON Schema for the underlying request or response data model--everything I need to map out more APIs, more efficient scaling what I do.

Helping Me Understand The Media Types In Use Out There Today
One thing I noticed right away, was the variety of media types I was coming across. At first I locked things down to application/json, but then I realized I wanted XML, and others. So I reversed my approach and let through all media-types, and started building a blacklist for which ones I did not want to let through. Leaving this part of the process open, and requiring manual evaluation of media types is really pushing forward my awarnesss of alternative media types, and is something that was an unexpected aspect to this owrk.

Helping Me Understand The Companies I Use Daily In My Business
It is really interesting to see the list of hosts that I have generated as part of this work. Some of these companies I depend on for applications that I depend on like Tweetdeck, Github, and Dropbox, while others are companies I'm looking to learn more about as part of API Evangelist research, and storytelling. I'm guessing this understanding of the companies that I'm using daily in my work will continue to evolve significantly as I continue looking at the world through this lens.

Helping Me Understand The Data I Exchange Daily With Companies
The host of each transaction gives me a look at the companies I transact with daily, but the JSON Schema derived from request and responses that are JSON, also giving me an interesting look at the information I'm exchanging in my daily operations, either directly with platforms I depend on, or casually with websites I visit, and the web applications I'm testing out. I have a lot of work ahead of me to actually catalog, organized and derive meaning from the schema I am generating, but at least I have them in buckets for further evaluation in the near future.

Routing Websites That I Visit Into A Separate Bucket For Tracking On
At first I was going to just ditch all GET requests that returned HTML, but instead I decided to log these transactions, keeping the host, path, and parameters in a separate media type bucket. While I won't be evaluating these domains like I do the APIs that return JSON, XML, etc, I will be keeping an eye on them. I'm feeding these URLs into my core monitoring system, and for some companies I will pull their blog RSS, Twitter handles, and Github accounts, in addition to looking for other artifacts like OpenAPI Specs, API Blueprints, Postman Collections, APIs.json, and other machine readable goodies.

Targeting Of Specific Web, Mobile, Device, And API Driven Platforms
Now that I have this new, more automated API mapping system setup, it will encourage me to target specific web mobile, devices, and API platforms. I will be routing my iPhone, and iPad through the proxy, allowing me to map out mobile applications. If I can just get to work using an API in my Postman client, or use the website or mobile app, and auto-generate a map of the APIs in use in OpenAPI Spec, and data models using JSON Schema, you are going to find me mapping a number of new platform targets in 2016.

Ok, So What Now? What Am I Going To Do With This Mapping Info Next?
You know, I'm not sure what is next. I learned a lot from this 25 hour sprint, to better automate this process. I think I will just sit back and let it run for a week or two, and do what I regularly do. Visit the websites and developer areas of platforms that I'm keeping an eye on. I will keep using APIs to run my own operations, as well as play with as many APIs as I possibly can fit into my days. Periodically I will check it to see how my new API mapping system is working, and see if I can't answer some pressing questions I have:

How much do I create vs. consume? ie. POST, PUT & PATCH over GET?
How often do I use my own resources vs the API resources of others?
Do I have any plan B or C for all resources I am using?
Do I agree with the terms of service for these platforms I am using?
Do I pay any of the services that are a regular part of my daily operations?
Am I in control of my account and data for these platforms & companies?

For the moment, I am just looking at establish a map of the digital surface area I touch on each day, and further scale my ability to map out unknown areas of the API wildnerness. I am curious to see how many OpenAPISpecs and JSON Schemas I can generate in a week or month now. I have no idea how I'm going to store or organize all of these maps of the API sector, but it is something I'm sure I can find a solution for using my APIs.json format.

This is the type of work I really enjoy. It involves scaling what I do, better understanding what already exists out there, something that will fuel my storytelling, and is something that pushes me to code, and craft custom APIs, while also employing other open tooling, formats, and services along the way--this is API Evangelist.