GraphQL is designed to handle numerous data use cases and and is particularly we suited to adjust to potentially changing data structures and content. But there are situations where the challenge is quite the opposite: we want to be able to expose harmonized and stable schemas and querying mechanisms across disparate data sources or even metadata standards. While I’m still learning about GraphQL, I believe the specification also has the potential to meet such requirements.
Metadata Standards need APIs
One particular use case I have in mind is around metadata standards that often do not come equipped with an API. My focus being on making high quality data available as a service in machine actionable formats and support the FAIR data initiative, I’m referring to specifications such as the Data Catalog Vocabulary (DCAT), the Data Documentation Initiative (DDI), schema.org, and the likes. As their models can be seen as hierarchical structure of elements and attributes, they seem to be a natural fit for GraphQL.
As an initial experiment, I decided to start with the Data Catalog Vocabulary (DCAT) as the standard is fairly light compared to others (only a handful of elements), and also comes with a JSON serialization. I found a nice collection of data catalogs on the U.S. Project Open Data Dashboard which I downloaded for development and testing. Now these are actually based on the DCAT-US flavour of DCAT, which includes a few US specific elements and instructions, but this makes no difference for this proof of concept.
This all came together quite nicely. After following the Yoga tutorial, I was able to adjust the schema and code to mirror the DCAT-US specification, and access my test catalogs. As a matter of fact, I added the ability to query across multiple U.S. statistical agencies by wrapping the catalog into a higher level repository entity, which is a feature not supported by DCAT.
The prototype GraphQL schema definition for DCAT-US looks like this:
My current resolvers’ implementation does not use a database, I basically just load the JSON catalogs in memory and parse. This naturally limits the querying capabilities and scalability, but these are technicalities that can be addressed and were not requirements for this initial investigation.
Here is an example of a basic GraphQL query on DCAT-US repositories and results:
Moving forward, one technical challenge in translating metadata specifications into GraphQL relates to the inability to namespace or use special characters in field names. For example, we find metadata elements such as
@id which cannot be transcribed as is as GraphQL scalars (the @ character cannot be used). Likewise some standards mix elements from multiple namespaces so you may find properties like
skos:Concept where the abbreviation before the
: is a nickname for the namespace. I could stay away from these with DCAT, but something to think about for down the road. I’m confident through that these can be addressed through naming conventions or other mechanisms.
While I plan to keep exploring and enhancing this initial DCAT implementation, I’m also very interested next to replicate this with DDI-Codebook (DDI-C), the light version of DDI. One challenge here is the absence of a JSON serialization for DDI-C (it is natively in XML), so some extra translation work will be needed.
This is overall very exciting. promising, and encouraging. To be successful, metadata standards need to be more than just a model and documentation: they must come with tools and APIs. The latter is often absent and GraphQL has the potential to offer a generic solution.