The Caltech University API Landscape

I regularly take a look at what different universities are up to when it comes to their APIs. I spent two days talking with different universities at the University API summit in Utah a couple weeks back, and I wanted to continue working my way through the list of schools I am speaking with, profiling their approach to doing APIs, while also providing some constructive feedback on what schools might consider doing next when it comes to optimizing API delivery and consumption across campus.

Next up on m list is Caltech, who we have been having conversations with at Postman, and I wanted to conduct an assessment of what the current state of APIs are across the school. The university reflects what I see at most universities, meaning there are plenty of APIs in operation, but not any visible organized effort when it comes to bringing together all the existing APIs into a single developer portal, or centralizing API knowledge and best practices when it comes to the interesting API things going on across a campus, and with other partners and stakeholders.

APIs in the Caltech Library

When it comes to APIs at the University level the first place to start is always at the library, and things are no different at Caltech. While there is no official landing page specifically for APIs the Caltech library has GitHub page dedicated to a variety of programmatic solutions, but you can find many signals of API activity behind the scene, like this API announcement that the write API is available You can also find an interesting case study on how the library is using APIs provided by an interesting API provider called Clarivate, which I will be looking to understand further. As with every other university, there is a huge opportunity for Caltech to be more organized and public about the API resources offered as part of the library--even if it isn't widely available to the public, making everything API easy to find by faculty and students online helps a lot.

Rich Datasets as APIs

Another common pattern you see with API availability across higher eduction is that there are often rich datasets made available from across research that is occurring, and Caltech is no different. You can find several hundred rich datasets out of Caltech, provided in a simple, browsable catalog provided by Tind, which like Clarivate, provides another service provider I'd like to dive into and learn about more. Most of the datasets Caltech publishes are available in a spreadsheet or CSV format, but also provide a rich place when it comes to simple API development. It doesn't take much for Tind, the platform behind Caltech datasets to make APIs available for all the datasets, or even make it a class project for students to work with Caltech data stewards to make each dataset available as a simple easy to use web API. Something that would reduce friction hen it came to putting the data to work, make the data more usable, while also potentially providing a learning opportunity for Caltech students when it comes to publishing APIs.

Cataloging Existing Caltech APIs

Beyond the library, and the low hanging fruit involving Caltech datasets, there are a number of really interesting APIs available at Caltech--the problem is they just aren't available as part of any organized API effort. When you spend time looking around Caltech's online presence for APIs there are two APIs that stand out.

  • NASA Exolanet Archive - The NASA Exoplanet Archive is an online astronomical exoplanet and stellar catalog and data service that collates and cross-correlates astronomical data and information on exoplanets and their host stars, and provides tools to work with these data. The archive is dedicated to collecting and serving important public data sets involved in the search for and characterization of extrasolar planets and their host stars. These data include stellar parameters (such as positions, magnitudes, and temperatures), exoplanet parameters (such as masses and orbital parameters) and discovery/characterization data (such as published radial velocity curves, photometric light curves, images, and spectra).
    • Website - The website for the Exoplanet archive.
    • Docs - Documentation I published using Postman.
    • Collection - A Postman collection for the API.
  • MIST - MiST 3.0 provides signal transduction profiles of more than 125,000 bacterial and archaeal genomes. This release of the database is a result of a substantial scaling to accommodate constantly growing microbial genomic data.
    • Website - The website for the MiST database.
    • Developer - The developer area for the MiST database.
    • Docs - Documentation I published using Postman.
    • Collection - A Postman collection for the API.

It is very clear that there is some super valuable research coming out of Caltech, and the work it does with it's partners like NASA. With a little help these APIs could be made more discoverable and usable by the community. I spent some time crafting Postman collections for both of these APIs, and published documentation from them to show a little of what is possible. I'll be making more time to better organize them, and polish the documentation available for each, but ideally it is something that can be tackled on campus by administrators, faculty, or by students.

The NASA Exoplanet Archive and MIST are two shining examples of valuable APIs out of Caltech, but with just a little more searching you can find many others that should also be available as Postman collections with consistent documentation. While by no means an exhaustive search, here are few others that I came across as I was making my way through what is available from Caltech.

  • Caltech Electron Tomography Database - A public repository featuring 11293 electron tomography datasets of intact bacterial and archaeal cells, representing 85 species.
  • NASA/IPAC Infrared Science Archive - IRSA offers program-friendly interfaces to all of its holdings. Through an Application Program Interface (API), users can access IRSA data directly (within a script or on the command line) without the need to go through interactive web-based user interfaces. These APIs allow users to write software that can talk to IRSA's software to carry out queries and download data, with no user intervention.
  • Finder Chart - Finder Chart is a visualization tool that allows cross-comparison of images from various surveys of different wavelengths and different epochs.
  • AstroPix - Images from telescopes around the world and in space are now at your fingertips. AstroPix is a new way to explore and share the universe.

After looking through the NASA/PIAC Infrared Science Archive I begin finding a bunch more single use APIs, and I'm sure there are many more buried across the rich research coming out of the institution. All of this should be available via a single Caltech owned landing page--something they could accomplish using GitHub, similar to what the Caltech library has already done for its other technical resources. Caltech has a rich set of APIs, you just wouldn't know it at first glance, but with a little investment the work going on across campus could be made more discoverable and usable across different web, mobile, device, and network applications.

Further Investment in APIs at CalTech

Caltech is doing APIs. That is clear. The only thing that is missing is a more deliberate and organized effort to publish and provide APIs across campus. This is something that every higher educational institution I am talking to struggles with, and it is something that can be corrected with just a little bit of investment by staff and students. I recommend beginning with just documenting what is already in motion, and investing in a handful of the common building blocks you need to be successful when it comes to making APIs available.

  • Portal - Publishing a single landing page and portal for accessing all APIs across campus. It is common to place this at developer.[university-domain].edu, providing a common known location where anyone can go to discover and learn about APIs at Caltech. Helping centralize APIs from across departments and external stakeholders, as well as knowledge, information, and communication around how to provide and consume APIs.
  • Administrative APIs - Publish a section or page dedicated to campus IT and other administrative APIs that faculty can put to use. These APIs already are in existence and used by staff, they just aren't easy to find. They should all be defined using common machine readable formats like OpenAPI and Postman collections, and then published as documentation using Postman, or other open source API documentation format. Making APIs more accessible, even if you need approval and credentials to actually put to work as part of any integration or application.
  • Research APIs - Next, gather up all of the research related APIs like what I have listed above, and beyond, and do the same as I just suggested for administrative APIs. Define them using OpenAPI and Postman collections, and then publish API documentation for all of them, making easily available via the centralized API portal. Making the distributed research that is occurring across campus more easily accessible by researchers and the general public.
  • Student APIs - Once administrative and research APIs are well documented I recommend begin defining the APIs that might benefit students. Common APIs from other universities include access to course catalog, building locations, events calendar, and other resources that are relevant to students. Check out some of the other API programs available as part of my wider university API research for examples. Ideally, all web and mobile applications that students engage with should also be available via simple web APIs, allowing students to hack the systems they depend on as part of class work, or just out of a desire to improve upon the overall student experience on their own time.
  • 3rd Party APIs - Lastly, I recommend dedicating a section or page to the 3rd part APIs that are already in use, or would be available across campus. Some of the APIs I've seen showcased are Twitter, Dropbox, Google Drive, JSTOR, World Bank data, WordPress, and other common services in use today. Highlighting these 3rd party APIs alongside existing campus APIs helps faculty and students understand what is possible when it comes to developing applications and streamlining integrations.
  • Resources - Provide a list of other resources that faculty and students can take advantage of when it comes to putting APIs to work. Ideally all APIs are well documented, and have Postman collections available for them, but it can also be helpful to provide code snippets, sandboxes, videos, tutorials, and other resources that will help people understand what is possible, and how they can put APIs to work.
  • Showcase - Make sure and showcase how APIs are being put to use on campus and off. Try to highlight existing uses of APIs and the benefits they deliver via web, mobile, and device applications. Having the APIs is not enough. It helps stimulate the imagination to show what developers and non-developers are doing with API resources.
  • Events - Provide a calendar of events regarding API and other relevant meetings, conferences, or external events that API providers and consumers will find interesting. If there are no events, start organizing them, and make sure there is always active discussion around how APIs are put to use on campus.
  • Support - Publish the names and contact information for API advocates, champions, owners, and practitioners. Make forums, chats, and other communication channels easily accessible so that API providers and consumers can ask questions and get the help they need. The spread of information, and supporting existing API efforts is critical to growing and expanding the reach of APIs across campus.

Start there. There will be more work down the road. But, publishing a single portal with documentation and definitions of administrative, research, student, and 3rd party APIs, as well as a handful of resources, showcase, events, and support for the API conversation will provide the base necessary for the API conversation to take root at Caltech. Now I am sure some folks are asking why? Why should there be more investment in APIs at Caltech? Let's take a crack at answer that in context of the four types of APIs that commonly exist across campus.

  • Administrative - Caltech already operates on digital infrastructure that possess APIs, and by further making these existing APIs more known, and leveraging them as part of campus operations, you have an opportunity to reduce friction and workload for staff, making their lives easier.
  • Research - As demonstrated by the existing APIS coming out of Caltech, there is valuable data and research coming out of the institution. In a digital age you want this information easily available for use in new web, mobile, and device applications--APIs are how you do that.
  • Students - APIs are how Internet connected web, mobile, and device applications work. Students are already using APIs when they browse the course catalog, sign up for classes, and pay tuition. By exposing them to what is going on behind the scenes of the applications they already use, and let them tinker and optimize their own experience, you are better preparing them for the digital world they'll be working in within just a couple years.
  • 3rd Party - Software as a service (SaaS) solutions are ubiquitous across our personal and professional lives. Services like Dropbox, Google Apps, WordPress, and others have APIs that allow us to more seamlessly integration these solutions into our lives, and enable us to take advantage of low code or no code solutions to orchestrate with these services. Introducing an opportunity to further reduce friction, and allow for the optimization of how faculty does their job, and also helping students be more successful in their studies.

APIs are everywhere once you begin looking for them. At Caltech, and across the web we depend on each day. It isn't a question of whether or not Caltech should do APIs--they already are. It is a matter of whether the campus wants to be more organized in how they are doing APIs. Hopefully this research provides a look at some of the next steps Caltech could consider when it comes to doing APIs in a more organized fashion. This look at Caltech APIs isn't just about using Postman, it is about how Caltech can step up it's API game using Postman. I'm guessing Postman is already in use across campus IT and development staff. I'm also guessing there are students who are using Postman. The problem is that it is most likely just being used as an API client, allowing users to make calls to existing APIs, and debug the response. Once users realize they can also use it to mock, document, test, and make APIs more accessible and collaborative by publishing a documentation for common APIs, I predict the conversation will shift, and pick up momentum--it always does.

I am looking forward to further documenting the APIs coming out of Caltech, and talking with folks from different groups across campus. Like the other schools I am talking to they are up to some really interesting things--they just need to get better at documenting, showcasing, and telling stories about what is happening. Something Postman can help facilitate. Next I am going to publish the two collections I have made to GitHub, and begin documenting some of the other APIs using Postman. Once I have a better view of the Caltech API landscape I will spend more time thinking about how we can further jumpstart the API conversation with some administrative, student, and 3rd party API work. Ideally the school would setup a GitHub repository and landing page for API efforts, and then folks like me, or others on campus can submit pull requests to add APIs and other resources tot he central Caltech API portal. With some centralization of existing API activity, and stimulation of the API conversation using GitHub, you never know what you can se into motion when it comes the API journey at Caltech University.