Saturday, February 8, 2014

XML Validation in Client-Side JavaScript

XML and JavaScript
This week Alan, a participant in an advanced JavaScript course, posed a question to me: "how can I validate the XML returned from an AJAX call against a DTD or Schema?" I have to admit that I hadn't really considered it before. In retrospect I've always been in the camp that figures the server is part of the greater application, the server is the source for the data, and we can safely rely on it. But what if we can't? Or we shouldn't? The good news is that at least one JavaScript library exists that can validate a document against a schema; the bad news is that it cannot validate a document against a DTD.


The first problem we immediately encounter is that the XMLHttpRequest object parses the AJAX result before we get our hands on it. You may be more comfortable using a library to manage AJAX like jQuery, Prototype, etc., but all of them rely on XMLHttpRequest behind the scenes and are affected by this. By the way, all of the libraries use the internal XML parser of whichever browser your client may have. So, if we can't do it during the request we'll have to bite the bullet and validate the XML against a schema after it has already been parsed. It doesn't hurt anything, it's just inefficient.

You probably don't want to bother with validation if the XML didn't parse right in the first place. In XML terms the document was not well-formed. If you are using XMLHttpRequest directly you'll have to jump through some hoops to find out if that happened. Some browsers like Chrome will leave you with a null responseXML property, but others like Firefox will actually put an XML tree in responseXML that contains nodes with error messages. You'll just have to work around that. The best solution is to forgo the XMLHttpRequest for something that abstracts it for you on all the browsers, like jQuery. If jQuery fails to parse the XML correctly the error callback will be used.

At the time of this post the only browser that can validate an XML document natively is Microsoft's Internet Explorer version 5 and later.  But don't give up hope, we have a library that can help us. To perform the validation we need the original XML text and the schema to validate it against. In the XMLHttpRequest object it is found in the responseText property. In jQuery it will be in the jqXHR.responseText property, and somewhere just as appropriate for any other libraries.

So where can we get the schema to do the validation? Well, the location of the schema in the document itself is only useful to us if the .xsd file is on the same server the JavaScript came from (it's the same-origin problem). So we could embed the schema text directly into the HTML document where it will be used. Or we could make an AJAX request and load the schema document from the server. In that case both the JavaScript program and the schema are managed as documents on the server, so make sure that they are kept in line with each other. We want the schema to validate against the structure that the program expects to use!

You may have not seen this technique, so let's embed a schema in the document using a script tag. Because the type attribute is set to "text/xml" the script is ignored by the browser, but because we gave it an id we can find the contents later. The XML declaration at the top of the schema must be on the first line, so we could put that tag immediately after the script tag. Putting it on the following line as in line two of the example creates the empty line in front of it when the text is retrieved. This more readable alternative requires us to use JavaScript to "trim" the contents to get rid of the blank line at the top (in the second example on line eighteen):

<script id="participants-schema" type="text/xml">
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
@tab;<xs:complexType name="participant">
@tab;@tab;<xs:sequence>
@tab;@tab;@tab;<xs:element name="firstName" type="xs:string" />
@tab;@tab;@tab;<xs:element name="lastName" type="xs:string" />
@tab;@tab;</xs:sequence>
@tab;</xs:complexType>
@tab;<xs:element name="participants">
@tab;@tab;<xs:complexType>
@tab;@tab;@tab;<xs:sequence>
@tab;@tab;@tab;@tab;<xs:element name="participant" type="participant" minOccurs="0" maxOccurs="unbounded" />
@tab;@tab;@tab;</xs:sequence>
@tab;@tab;</xs:complexType>
@tab;</xs:element>
</xs:schema>

The library that can handle the job for us is in a public GitHub project at https://github.com/kripken/xml.js. The key file you need is xmllint.js. Watch out, you want the one in the Git repository that I linked to. There is another link on the demo page at https://github.com/syssgx/xml.js that references a script with the same name, but it is an adaption that works differently. This is a community licensed library, so you can use it anywhere as long as you give credit to the author in your program.

So given the document, the schema, and the xmllint.js library we can validate the document. All you have to do is pass the library function validateXML the document text and the schema text. The function will return a string containing the word "validates" if it is successful, and a detailed error message if it isn't. This example uses jQuery to retrieve the XML document in an AJAX call and then validate it using our schema:

var schema = $('#participants-schema').text().trim(); $.get('participants.xml', function(data, textStatus, jqXHR) { @tab;var valid = validateXML(jqXHR.responseText, schema); @tab;if (valid.match(/validates/i).length > 0) { @tab;@tab;alert('The document is well-formed and valid.'); @tab;} else { @tab;@tab;alert('The document is well-formed but not valid.'); @tab;} }).fail(function(jqXHR, textStatus, errorThrown) { @tab;alert(errorThrown); });

If we want to read both the schema and the data from the server then we need to chain two AJAX calls together, and the second AJAX call takes place in the done filter of the the first AJAX call. We need to do that to ensure that we have both documents before attempting validation:

$.get('participants.xsd', function(data, textStatus, jqXHR) { @tab;var schema = $('#participants-schema').text().trim(); @tab;$.get('participants.xml', function(data, textStatus, jqXHR) { @tab;@tab;var valid = validateXML(jqXHR.responseText, schema); @tab;@tab;if (valid.match(/validates/i).length > 0) { @tab;@tab;@tab;alert('The document is well-formed and valid.'); @tab;@tab;} else { @tab;@tab;@tab;alert('The document is well-formed but not valid.'); @tab;@tab;} @tab;}).fail(function(jqXHR, textStatus, errorThrown) { @tab;@tab;alert(errorThrown); @tab;}); }).fail(function(jqXHR, textStatus, errorThrown) { @tab;alert(errorThrown); });


So that's everything you need to validate an XML document! I wrapped up all of these examples in an Eclipse dynamic web project named JsXmlValidation. XMLHttpHandler only works if the request is made to a web server, that's why it's a dynamic project. You can easily copy the static files into a project in another IDE such as Intellij or Visual Studio. And I'll update this post if and when I find any other libraries that will solve this problem :)

1 comment:

  1. I really loved reading your blog. It was very well authored and easy to understand. Unlike other blogs I have read which are really not that good.Thanks alot!
    defer parsing of js wordpress

    ReplyDelete