Using Github As An API Index And Data Store
13 Feb 2017
I am spending a lot of time studying how companies are using Github as part of their software and API development life cycle, and how the social coding platform is used. More companies like Netflix are using as part of their continuous integration workflow, something that API service providers like APIMATIC are looking to take advantage of with a new wave of services and tooling. This usage of Github goes well beyond just managing code, and are making the platform more of an engine in any continuous integration and API life cycle workflow.
I run all my API research project sites on Github. I do this because it is secure and static, as well as introduces a very potent way to not just manage a single website, but over 200 individual open data and API projects. Each one of my API research areas leverages a Github Jeykll core, providing a machine readable index of the companies, news, tools, and other building blocks I'm aggregating throughout my research.
Recently, this approach has moved beyond the core areas of my API research and is something I'm applying to my API discovery work, profiling the resources available with popular API platforms like Amazon Web Services, and across my government work like with my GSA index. Each of these projects managed using Github, providing a machine readable index of the disparate APiI, in a single APIs.json index which includes OpenAPI Specs for each of the APIs included. When complete, these indexes can provide a runtime discovery engine of APIs used as part of integrations, providing an index of single APIs, as well as potentially across many distributed APiI brought together into a single meaningful collection.
I've started pushing this approach even further with my Knight Foundation funded Adopta.Agency work, and making the Github repository not just a machine-readable index of many APIs, I'm also using the _data folder as a JSON or YAML data store, which can then also be indexed as part of the APIs.json and OpenAPI Spec for each project. I've been playing with different ways of storing and working with JSON and YAML in Jekyll on Github for a while now, but now I'm trying to develop projects that are a seamless open data store, as well as an API index, providing the best of both worlds.
This is not a model for delivering high performance and availability APIs. This is a model for publishing and sharing open data so that it is highly available, workable, and hosted on Github for FREE. Most of the data I work with is publicly available. It is part of what I believe in, and how I work on a regular basis. Making it available in a Github repo allows it to be forked, or even consumed directly while offloading bandwidth and storage costs to Github. The GET layer for all my open data project is all static, and dead simple to work with. Next, I'm working on a truly RESTfully augmented layer providing the POST, PUT, and DELETE, as well as more advanced search solutions.
I am using the Github API for this augmented layer. I am just playing with different ways to proxy it and deliver the best search results possible. The POST, PUT, PATCH, and DELETE layer for each Github repository data store in the _data folder is pretty straightforward. My goal is to offload as much of the payload to Github as possible, but then augment what it can't do when it comes to more advanced usage. I'm looking for each API index and data store can act as a forkable engine for a variety of stops along the API life cycle, as well as throughout the delivery of the web, mobile, and device-based applications we are building on top of them.