Breaking Changes Bad! API Versioning Good!
As anyone who has built or regularly uses an API realizes sooner or later, breaking changes are very bad and can be a very serious blemish on an otherwise useful API. A breaking change is a change to the behavior of an API that can break a user’s integration and result in a lot of frustration and loss of trust between the API provider and user. Breaking changes require that users be notified in advance (with accompanying mea culpas) rather than a change that just shows up, such as a delightful new feature. The way to avoid that frustration is to version an API with assurances from the API owner that there will be no surprising changes introduced within any single version.
So how hard can it be to version an API? The truth is it’s not, but what is hard is maintaining some sanity by not needlessly devolving into a dizzying number of versions and subversions applied across dozens of API endpoints with unclear compatibilities.
We introduced v1 of the API three years ago and did not realize that it would be going strong to this day. So how have we continued to provide the best email delivery API for over two years but still maintain the same API version? While there are many different opinions on how to version REST APIs, I hope that the story of our humble yet powerful v1 might guide you on your way to API versioning enlightenment.
REST Is Best
The SparkPost API originates from when we were Message Systems, before our adventures in the cloud. At the time we were busy making final preparations for the beta launch of Momentum 4. This was a major upgrade to version 3.x, our market leading on-premise MTA. Momentum 4 included an entirely new UI, real-time analytics, and most importantly a new web API for message injection and generation, managing templates, and getting email metrics. Our vision was of an API first architecture – where even the UI would interact with API endpoints.
One of the earliest and best decisions we made was to adopt a RESTful style. Since the late 2000s representational state transfer (REST) based web APIs are the de-facto standard of cloud APIs. Using HTTP and JSON makes it easy for developers, regardless of which programming language they use – PHP, Ruby, and Java – to integrate with our API without knowing or caring about our underlying technology.
Choosing to use the RESTful architecture was easy. Choosing a versioning convention was not so easy. Initially we punted on the question of versioning by not versioning the beta at all. However, within a couple months the beta was in the hands of a few customers and we began building out our cloud service. Time to version. We evaluated two versioning conventions. The first was to put the versioning directly in the URI and the second was to use an Accept header. The first option is more explicit and less complicated, which is easier for developers. Since we love developers, it was the logical choice.
With a versioning convention selected we had more questions. When would we bump the version? What is a breaking change? Would we reversion the whole API or just certain endpoints? At SparkPost, we have multiple teams working on different parts of our API. Within those teams, people work on different endpoints at different times. Therefore, it’s very important that our API is consistent in the use of conventions. This was bigger than versioning.
We established a governance group including engineers representing each team, a member of the Product Management team, and our CTO. This group is responsible for establishing, documenting, and enforcing our API conventions across all teams. An API governance Slack channel also comes in handy for lively debates on the topic.
The governance group identified a number of ways changes can be introduced to the API that are beneficial to the user and do not constitute a breaking change. These include:
A new resource or API endpoint
A new optional parameter
A change to a non-public API endpoint
A new optional key in the JSON POST body
A new key returned in the JSON response body
Conversely, a breaking change included anything that could break a user’s integration such as:
A new required parameter
A new required key in POST bodies
Removal of an existing endpoint
Removal of an existing endpoint request method
A materially different internal behavior of an API call – such as a change to the default behavior.
The Big 1.0
As we documented and discussed these conventions, we also came to the conclusion that it was in everyone’s (including ours!) best interest to avoid making breaking changes to the API since managing multiple versions adds quite a bit of overhead. We decided that there were a few things we should fix with our API before committing to “v1”.
Sending a simple email required way too much effort. To “keep the simple things simple” we updated the POST body to ensure that both simple and complex use cases are accommodated. The new format was more future-proof as well. Secondly we addressed a problem with the Metrics endpoint. This endpoint used a “group_by” parameter that would change the format of the GET response body such that the first key would be the value of the group by parameter. That did not seem very RESTful so we broke each group by into a separate endpoint. Finally we audited each endpoint and made minor changes here and there to ensure they conformed with the standards.
It is important to have accurate and usable API documentation to avoid breaking changes, of the deliberate or unintentional kind. We decided to use a simple API documentation approach leveraging a Markdown language called API Blueprint and manage our docs in Github. Our community contributes and improves upon these open source docs. We also maintain a nonpublic set of docs in Github for internal APIs and endpoints.
Initially, we published our docs to Apiary, a great tool for prototyping and publishing API docs. However, embedding Apiary into our website doesn’t work on mobile devices so we now use Jekyll to generate static docs instead. Our latest SparkPost API docs now load quickly and work well on mobile devices which is important for developers who are not always sitting at their computer.
Separating Deployment from Release
We learned early on the valuable trick of separating a deployment from a release. This way it’s possible to frequently deploy changes when they are ready through continuous delivery and deployment but we don’t always publicly announce or document them at the same time. It’s not uncommon for us to deploy a new API endpoint or an enhancement to an existing API endpoint and use it from within the UI or with internal tools before we publicly document it and support it. That way we can make some tweaks to it for usability or conformance to standards without worrying about making a dreaded breaking change. Once we are happy with the change we add it to our public documentation.
It is only fair to admit that there have been times where we have not lived up to our “no breaking changes” ideals and these are worth learning from. On one occasion we decided it would be better for users if a certain property defaulted to true instead of false. After we deployed the change we received several complaints from users since the behavior had changed unexpectedly. We reverted the change and added an account level setting – a much more user friendly approach for sure.
Occasionally we are tempted to introduce breaking changes as the result of bug fixes. However, we decided to leave these idiosyncrasies alone rather than risk breaking customer’s integrations for the sake of consistency.
There are rare cases where we made the serious decision to make a breaking change – such as deprecating an API resource or method – in the interest of the greater user community and only after confirming that there is little to no impact to users. For example, we deliberately made the choice to alter the response behavior of the Suppression API but only after carefully weighing the benefits and impacts to the community and carefully communicating the change to our users. However, we would never introduce a change that has a remote possibility of directly impacting the sending of a user’s production email.