I can’t articulate the importance of enterprise organizations getting their schema house in order as a starting point for API governance. There is an exponential amount of instability and friction introduced across the enterprise because teams are properly using JSON Schema. To augment my services for helping enterprise organizations manage their JSON Schema via GitHub, I am providing customers with a base set of Spectral rules to begin defining the maturity of the schema you are publishing to your Git repository.
Identifier
Every schema added to the central GitHub repository must have a unique identifier applied to each of the schema, giving each object a source of truth no matter where it might find itself across the enterprise.
json-schema-2020-12-id-error:
description: Schema MUST have a unique identifier for each object.
message: Schema MUST Have a $id.
severity: error
given: $
then:
field: "$id"
function: truthy
json-schema-2020-12-id-info:
description: Schema MUST have a unique identifier for each object.
message: Schema Has an $id.
severity: info
given: $
then:
field: "$id"
function: falsy
json-schema-2020-12-id-source-url-error:
description: The id for a schema MUST have a valid URL pointing to the central register.
message: The $id for schema MUST reference the central registry.
severity: error
given: $
then:
field: "$id"
function: pattern
functionOptions:
match: \b(example.com)\b
json-schema-2020-12-id-source-url-info:
description: The id for a schema MUST have a valid URL pointing to the central register.
message: The $id for schema references the central registry.
severity: info
given: $
then:
field: "$id"
function: pattern
functionOptions:
notMatch: \b(example.com)\b
I will add more rules and regexes to get more explicit with the naming conventions applied as part of the unique identifier, but this is a damn good place to start when it comes to governing the source of truth for your schema.
Schema
Next up, every schema should have a $schema property present, and be using the latest draft of the JSON Schema specification—so very important. This work augments my existing policy and rule work across OpenAPI and other artifacts in use.
json-schema-2020-12-schema-error:
description: Schema MUST have a $schema property.
message: Schema MUST Have a $schema.
severity: error
given: $
then:
field: "$schema"
function: truthy
json-schema-2020-12-schema-info:
description: Schema MUST have a $schema property.
message: Schema Has a $schema.
severity: info
given: $
then:
field: "$schema"
function: falsy
json-schema-2020-12-schema-draft-error:
description: The $schema for a schema MUST use the latest draft
message: The $schema for schema MUST use the latest draft.
severity: error
given: $
then:
field: "$schema"
function: pattern
functionOptions:
match: 'https://json-schema.org/draft/2020-12/schema'
json-schema-2020-12-schema-draft-info:
description: The $schema for a schema MUST use the latest draft
message: The $schema for schema uses the latest draft.
severity: info
given: $
then:
field: "$schema"
function: pattern
functionOptions:
notMatch: 'https://json-schema.org/draft/2020-12/schema'
This set of rules will have one of the biggest impacts across the enterprise in helping stabilize things, but is one that will take a shit ton of work to implement because you are going to have to update a lot of libraries and tooling across the enterprise.
Title
Now I want to make sure every object has a name, but also that it is applied as the title of the object. This is something we’ll have to align with the unique identifier naming patterns, but having the title property present and meeting minimum governance is critical.
json-schema-2020-12-title-error:
description: Schema MUST have a title for the entire object, describing an object in plain language.
message: Schema MUST Have a Title.
severity: error
given: $
then:
field: title
function: truthy
json-schema-2020-12-title-info:
description: Schema MUST have a title for the entire object, describing an object in plain language.
message: Schemas Has a Title.
severity: info
given: $
then:
field: title
function: falsy
json-schema-2020-12-title-pascal-case-error:
description: Schema names should always be PascalCase, and be used in title for a schema to help ensure readability and consistency.
message: Schema Title MUST Be PascalCase.
severity: error
given: $
then:
- field: title
function: pattern
functionOptions:
match: ^[A-Z](([a-z]+[A-Z]?)*)$
- field: title
function: pattern
functionOptions:
match: ^[A-Z](([a-z0-9]+[A-Z]?)*)$
json-schema-2020-12-title-pascal-case-info:
description: Schema names should always be PascalCase, and be used in title for a schema to help ensure readability and consistency.
message: Schema Title Are PascalCase.
severity: info
given: $
then:
- field: title
function: pattern
functionOptions:
notMatch: ^[A-Z](([a-z]+[A-Z]?)*)$
- field: title
function: pattern
functionOptions:
notMatch: ^[A-Z](([a-z0-9]+[A-Z]?)*)$
json-schema-2020-12-title-length-error:
description: Schema names and resulting title should be kept to less than 25 characters.
message: Schema Names MUST Be Less Than 25 Characters
severity: error
given: $
then:
field: title
function: length
functionOptions:
max: 25
json-schema-2020-12-title-length-error:
description: Schema names and resulting title should be kept to greater than 3 characters.
message: Schema Names MUST Be Greater Than 3 Characters
severity: error
given: $
then:
field: title
function: length
functionOptions:
min: 3
You will want to adjust the minx and max values to match the needs of your organizations and it is a fun one to bike shed with teams producing schema, but these rules provide a great place to begin when doing this important work.
Description
Now let’s apply similar logic to the description for each object, which is absent on most any schema I struggle across, so just making sure there is a description, along with a minimum bar for length and other things is a good place to start.
json-schema-2020-12-description-error:
description: Schema MUST have a description for the entire object, explaining in plain language what the object is for.
message: Schema MUST Have a Description.
severity: error
given: $
then:
field: description
function: truthy
json-schema-2020-12-description-info:
description: Schema MUST have a description for the entire object, explaining in plain language what the object is for.
message: Schemas Has a Description.
severity: info
given: $
then:
field: description
function: falsy
json-schema-2020-12-description-length-error:
description: The description for a schema should not be too long, helping keep it as readable and consumable as possible by users.
message: Schema Description MUST be Less Than 250 Characters
severity: error
given: $
then:
field: description
function: length
functionOptions:
max: 250
Once I get the base in place I will also move towards applying some sort of vocabulary that will trigger rules when certain forbidden words are used, and identify things that are better expressed as JSON Schema properties, allowing us to get more precise with governance.
Types
Next, let’s establish a baseline for object types by requiring there is a type property at the top level, making sure we are always explicit with any JSON Schema we’ve published, ensuring the fundamentals are always present.
json-schema-2020-12-type-error:
description: All schema must have a type property.
message: Schema MUST Have Type Property
severity: error
given: $
then:
field: type
function: truthy
json-schema-2020-12-type-info:
description: All schema must have a type property.
message: Schema Have Type Property
severity: info
given: $
then:
field: type
function: falsy
I don’t want to be opinionated at the type level yet, and am looking to get feedback for what else should be present here when it comes to requiring an object or array, focusing on just the fundamentals right now.
Property Names
Moving on to the properties of each object, we always want to be making sure our casing of object properties are consistent across the enterprise, making sure objects use consistent naming and have at least one property present.
json-schema-2020-12-properties-names-camel-case-error:
description: All schema properties should be camel case for consistency.
message: Schema Property Names MUST Be camelCase.
severity: error
given: $.properties
then:
- field: "@key"
function: pattern
functionOptions:
notMatch: ^[A-Z][a-z0-9]*[A-Z0-9][a-z0-9]+[A-Za-z0-9]*$
json-schema-2020-12-properties-names-camel-case-info:
description: All schema properties should be camel case for consistency.
message: Schema Property Names Are camelCase.
severity: info
given: $.properties
then:
- field: "@key"
function: pattern
functionOptions:
match: ^[A-Z][a-z0-9]*[A-Z0-9][a-z0-9]+[A-Za-z0-9]*$
json-schema-2020-12-properties-names-max-error:
description: It makes sense to keep schema property names a consistent length.
message: Schema Properties Name Length MUST Be Less Than 25
severity: error
given: $.properties
then:
field: "@key"
function: length
functionOptions:
max: 25
json-schema-2020-12-properties-names-min-error:
description: It makes sense to keep schema property names a consistent length.
message: Schema Properties Name Length MUST Be More Than 3
severity: error
given: $.properties
then:
field: "@key"
function: length
functionOptions:
min: 3
There is a lot to think about when it comes to the property object depending on how you use more advanced JSON Schema elements, but this is meant to be a starting place for anyone looking to govern their schema and covers the examples available on the JSON Schema site.
Property Descriptions
All properties should have descriptions, keeping them not too short and not too long, and similar object descriptions, you will want to get more precise with a vocabulary and other ways of fine tuning how teams are describing properties.
json-schema-2020-12-properties-descriptions-error:
description: Schema property descriptions should be complete and useful to users.
message: Schema Properties MUST Have Description
severity: error
given: $.properties.*
then:
field: description
function: truthy
json-schema-2020-12-properties-descriptions-info:
description: Schema property descriptions should be complete and useful to users.
message: Schema Properties Have Description
severity: info
given: $.properties.*
then:
field: description
function: falsy
json-schema-2020-12-properties-descriptions-length-error:
description: Schema property descriptions should not be too long, helping ensure they aren't too verbose and complex.
message: Schema Properties Description MUST Be Less Than 250 Characters
severity: error
given: $.properties.*
then:
field: description
function: length
functionOptions:
max: 250
json-schema-2020-12-properties-descriptions-length-error:
description: Schema property descriptions should not be too short, helping ensure they are actually helpful.
message: Schema Properties Description MUST Be More Than 10 Characters.
severity: error
given: $.properties.*
then:
field: description
function: length
functionOptions:
min: 10
Ideally this work is not being done by developers and is being handled by a central set of architects and content writers, because developers are going to always write the shortest and least helpful descriptions for properties.
Types
Similar to object type, we want to establish the baseline for property types, but I am also refraining from going deep down this rabbit hole yet, and will get more advanced here with other types, regexes, and approaches to standardizing vocabulary at this layer.
json-schema-2020-12-type-error:
description: All schema properties must have a type property.
message: Schema MUST Have Type Property
severity: error
given: $.properties.*
then:
field: type
function: truthy
json-schema-2020-12-type-info:
description: All schema properties must have a type property.
message: Schema Have Type Property
severity: info
given: $.properties.*
then:
field: type
function: falsy
It will take some research harvesting common JSON Schema in use out there to understand how people are using property types — I know we can go a long way based upon the spec, but the real value of governance comes in aligning it to the enterprise.
Properties Required
I threw required in here as well, but I am skeptical that this is a fundamentals based upon how more advanced JSON schema elements are used, but what the hell — I am open to push back on whether this should be part of the basics, as well as properties, based upon $def and other ways of using.
json-schema-2020-12-required-error:
description: All schema should have a required property.
message: Schema MUST Have a Required Property for Objects.
severity: error
given: $
then:
field: required
function: truthy
json-schema-2020-12-required-info:
description: All schema should have a required property.
message: Schema Have a Required Property for Objects.
severity: info
given: $
then:
field: required
function: falsy
json-schema-2020-12-required-error:
description: All schema should have at least one required property.
message: MUST Be At Least One Tag
given: $
severity: error
then:
field: required
function: length
functionOptions:
min: 1
What is required can get pretty freaky, so I am skeptical of the governance around this, but will harvest the JSON Schema from common API providers to see how they are expressing things, then come up with some basics based upon what I see.
Just the Fundamentals
It was hard for me to stop there, but beginning with the fundamentals while keeping simple is so important. I was talking to Ben from JSON Schema this morning and told him I’d share my base set, and I want to do some more marketing and storytelling around my base JSON Schema management services via GitHub repositories, so this starter Spectral ruleset satisfies these needs. While there are plenty of folks governing JSON Schema with Spectral via OpenAPI, there aren’t many folks who are governing the maturity of their JSON Schema by themselves using Spectral rules.
I will bundle these rules up with the CI/CD pipelines and other automation I am applying to the schema I am managing via GItHub repositories. All schema published to a central schema registration should pass all these rules. Another thing to note with these rules is that I strive to always have positive and negative for each pattern I am governing, which is not something others are doing, but I feel is critical to the behavioral aspect of API governance across teams. If you want to get at the rules I’ve published to a GitHub Gist, but will also include via the API Commons rules gallery when I have more time.