Sunday 8 March 2015

Restful services: to XSD or not to XSD? That is the question


A questions I've been considering lately is “should we be using XSD to describe and validate our Restful payloads?”. A lot of REST APIs are described in a bit of a loose way (e.g. example payloads not strict definitions). Obviously XSD can define any XML structure, but in an RPC world, the XSD is tied to the operation not a stateful entity. Does this make a difference? Is there a reason XSD is any less applicable to REST services, than to a service defined in WSDL?
 
There’s surprisingly little I can find in the blogosphere about the pros and cons of this – most of the blogs around XSD and REST are more about how to implement using different frameworks / technologies (so obviously it can be done). One blog explicitly said that XSD was a “bad idea for REST”, but not why – so that’s hardly a strong argument. I got to thinking from first principles and after a bit of thought, and some conversations with colleagues at the coffee machine, we agreed that XSD validation is entirely possible, and probably desirable. That said (like a lot of REST use) there are some rules to follow – and this post gives my humble opinion as to what they are.

Foundation Concepts

As a starting point, it’s worth outlining some things I think of are true of all services (feel free to disagree in the comments section).

F1 - Service contracts are good: 
It’s vital for service consumers to know what a service does and how to interact with it. This applies to all component software where Encapsulation and Information Hiding are foundational concepts.

F2 - Machine readable contracts are better: 
It’s better to have a contract that code can read, rather than one in word/a wiki/or on paper. It’s also great if your contract and validation rules are from the same source, that way there’s no chance of disagreement between what the service claims to do, and what the validation allows it to do. This is one reason I’m such a big fan of XSD and so many of the other open standards.

F3 - Open standards are a good thing
REST is built upon the HTTP standards for transport, XML should be based on the open XSD standard for validation. Of course the are other standards (JSON to name but one) – but if you’re using XML, then XSD seems a sensible way to both document and validate you messages – everyone understands it and there is a mass of tooling (both free and commercial) to help.

F4 - Service contracts are not unrestful
Knowing how to represent your state is key to being able to transfer your state. Of course to be RESTful (regardless of how you describe it) the contract needs to be entity focused and not RPC focussed e.g. an XSD “Customer” object, not an XSD “AddCustomer” object.

The Rules

OK, hopefully we’re in agreement so far. In this case how do we do XSD REST validation? It’s actually quite simple but as we discussed various different scenarios we found some basic rules needed to be followed or else we got into a bit of a mess. These rules make a REST XSD quite different to an RPC XSD:

Rule 1: Entities must be described with one schema, regardless of ACTION
Perhaps quite intuitive – if something is mandatory/optional when you create (POST) an object, it follows that it is mandatory when you GET it later, or PUT it back. It also follows that the format of an object remains fixed throughout the life of that object - If not, how is it the same object?

This obviously doesn't mean the state of the object won’t change (that would be silly). Values can change and optional fields can be filled in which were previously missing. What it does mean is:
  • No fundamental changes to structure (e.g. the root element won’t change name)
  • No change to element cardinality (e.g. MinOccurs=1 changing to 0, or MaxOccurs=1 changing to unbounded)
  • No change to data types (e.g. integers change their min/max values or become strings)
The key here is that data validation must be about structure, and the core of what makes an object valid - not about a particular use case. For example in a given organisation a “Customer” might always be invalid without a Surname – this is essentially the same sort of rules we apply in database create table statements.

Rule 2: No partial updates - Entities are created/updated/retrieved as a whole
In order for XSD validation to occur cleanly we can't allow partial updates by POST (or shock horror by PUT). If we trying to write an XSD which allows partial POST updates, it becomes so lax it can’t actually achieve validation. Either everything becomes optional, or there are lots of “choices”. In either case invalid XMLs can slip through the gaps and the XSD will report them as valid.

We did explore the idea of having different XSDs (or an XSD with a choice in it) for different scenarios (e.g. one which describes everything needed for a full update, and another which is used for partial updates. This is possible, but very quickly becomes unstuck because:
  • REST has no procedures - At a conceptual level, attempting to impose “scenarios” on REST is fundamentally troubling, and has the potential to get everyone out of the RESTful mindset.
  • More concretely, there isn't a way to tie different scenarios to different XSDs – the client wouldn't know which XSD to use for a given scenario other than by convention (so you break Foundation Concept F2). Even if you could somehow specify that we use object_get.xsd for the GET action, and object_put.xsd for the PUT action, Rule 1 explains why you shouldn't want to.
OK, but what about the PATCH operation? The pros can cons of PATCH are beyond the scope of this blog, but in terms of XSD, PATCH doesn't really help:
  • The patch object (as described very well in William Durand's blog) isn't the same format as our object – so can't use our XSDs.
  • These patch operations have similar issues of being so loose that things slip though the gaps and it doesn't solve the F2 or Rule 1 concerns in the previous bullets.
OK, so if we follow the rules laid down above is everything easy? Well, almost. There are a few awkward scenarios we identified so it’s worth highlighting them.

Scenarios & Examples

Scenario 1: Elements null on initial POST, then mandatory afterwards.

Example 1: The primary key is missing in POST but needed in PUT

This is OK and breaks no rules, in fact this is normal procedure for the REST spec. The key is in the URL and not in the payload so we’re OK.

Example 2: Some fields are “generated” by the system we POST to, and from then on they are mandatory e.g. creating a “user” object sets their home folder which we want to be mandatory.

This obviously breaks Rule 1 – we’re asking for the validation rules of POST to be different to those of PUT. What we do about it depends on the scenario in question: 
  • Is this actually mandatory from a data validity perspective – only from a business process perspective? Namely is the attribute mandatory at this point in an object’s lifecycle but it might not be later? If this is business process logic, then XSD is the wrong tool for the job - the check should be in application logic.
  • Is this a separate entity? If so then should it be created by its own POST and then passed by reference to this entity’s POST? In DB terms should do we need to create a foreign key object before we create this object?
  • If neither then we're into the horrible territory of either having a lax schema, or making it mandatory but the POST method knowing to ignore the value sent in (and explaining to the client that they have to send nonsense). None of these is ideal obviously.

Scenario 2: The consumer has the rights to GET the whole object, but only has the authority to update a subset of the fields

Example: A consumer can GET an Order, but only has rights to update the “comments” section.

This doesn't break a rule. This is talking about if they’re authorised to change a field, not if the change is valid (authorisation is not validation).

The consumer can PUT the object as normal – the XSD will validate that the submitted data is VALID. If the data is valid then the API needs to see if they've tried to do something they shouldn't (i.e. change a field they’re not allowed to) that’s not an XSD problem:
  • If an unauthorised change has been made, then return a code 403 – forbidden
  • If this is an authorised change then PUT is OK and return a standard response in the 2xx range.
This might be inefficient (comparing the whole object to ensure no unauthorised changes have been made, but it’s not unrestful and it keeps to the rules. To improve efficiency we might be better to spin up an OrderComments API with a PUT to allow this action to only change comments.

Scenario 3: Consumer doesn't have access to view every part of an object (some are obscured/removed based on their profile), but wishes to PUT.

Example 1: Depending on profile some consumers see a customer’s phone number, others see 07**********12

This doesn't break a rule, and is just another form of Scenario 2 – the consumer doesn't have rights to change this field, so they just pass back the masked phone number.

Example 2: Depending on profile some consumers see an Order’s payment information, for other consumers it is omitted entirely (e.g. what happens with Mashery Response Filters).

In order for the message to validate against the XSD, the fields which were removed cannot be mandatory – otherwise the GET would fail its own XSD (which would obviously be bad).

As long as a value of the PUT contains what was in the GET then the consumer isn't breaking Rule 1 and from their point of view they're passing back the whole object, so they're not breaking Rule 2 either. As with Scenario 2, the responsibility is with the API to manage. This could also be inefficient and is possibly a bad idea, but the issue is with API design and not with the validation.

Conclusion

As far as I can tell XSD is perfectly doable with Rest so long as certain rules are followed. Within these constraints, and with good API design, we can both document and validate our payloads using XSD. There are some use cases where we push these rules, but I've not yet found a scenario under which the rules break. 

If anyone would like to feed back any comments I'd be glad to hear them.