Combining Schematron with other
XML Schema languages
By Eddie Robertsson
June 10, 2002
Updated to ISO Schematron, Rick Jelliffe, 2010
Abstract
This article shows how Schematron can be combined with other XML Schema
languages to create powerful validation possibilities for business applications.
Table of contents
Introduction
Introduction to Schematron
Schematron hierarchy
Assertions
Rules
Patterns
Schematron processing
Embedded Schematron Rules in W3C XML Schema
Dependant attributes
Interleaving of elements
Co-occurrence constraints
Dependancy between XML documents
Embedded Schematron Rules in RELAX-NG
Co-occurrence constraints
Dependancy between XML documents
Processing
Summary
Acknowledgements
Introduction
After the W3C ratified W3C XML Schema as a full recommendation on May
2nd 2001 it has become clear that this is the most used XML Schema language in the
development community. Many believed that W3C XML Schema would solve most problems encountered with
validation of XML documents but this is not the case and in fact was never the goal of W3C XML Schema. In the purpose section
of the specification it is clearly stated that:
"However, the language defined by this specification does not
attempt to provide all the facilities that might be needed by any application.
Some applications may require constraint capabilities not expressible in
this language, and so may need to perform their own additional validations."
When W3C XML Schema is not powerful enough there are other options for
developers. One of the options is to find a different XML Schema language that can express all the needed constraints
and RELAX-NG has become increasingly popular due to its simplicity and expressive
power. In many areas RELAX-NG is more powerful than W3C XML Schema but there are
still areas where both of these languages fall short. One such area is the ability
to express constraints between components in an XML document which are known as co-occurence
constraints.
The best XML Schema language for expressing co-occurence constraints is ISO Schematron.
Schematron is a rule-based schema language and although you can define structure using Schematron
it can often be a bit cumbersome. However, since defining structure in both
W3C XML Schema and RELAX-NG is easy, the perfect solution would be to combine the schema languages.
This way we can use each language for what it is best at, define structure with
W3C XML Schema or RELAX-NG and define co-occurence constraints with Schematron. (Note: This paper was written before the ISO standard for Schematron
was created, and so referred to an earlier version now obsolete, see Schematron.)
This article will provide an explanation and several examples of how
Schematron rules can easily be embedded within W3C XML Schemas and RELAX-NG to perform
validation tasks not possible in W3C XML Schema or RELAX-NG alone.
The following four areas, which W3C XML Schema does not fully address, will be covered in the section
Embedded Schematron Rules in W3C XML Schema:
Dependant attributes
The W3C XML Schema allows attributes to be declared on elements and
the occurrence of the attributes can be controlled to be either optional
or required. However, in some cases this is not enough and what is really
needed is to define that the attributes have some form of dependency between
them, for example that one of two attributes must appear but not both.
Interleaving of elements
The introduction of the all group was a feature that many were waiting
for. The idea is that it allows the child elements in the
group to appear in any order. Unfortunately the all group is not as useful
as many had hoped because of some restrictions put on the declaration to
simplify validation.
Co-occurrence constraints
A co-occurrence constraint is a constraint between components in an
XML instance document. W3C XML Schema has limited support for this through
the identity constraint functionality, which can specify that one element
or attribute's value should refer to another element or attribute. In many
cases this is not enough and it would be useful to express constraints like,
for example, that if an element State has the value of NSW then
the element
Country must be Australia. Another example would be that if attribute
currentTime="3am"
on element Calendar then attribute currentState
on element Person should
be 'Sleeping' unless element Calendar's attribute currentDay="Friday"
in
which case attribute currentState should be 'At party'.
Dependancy between XML documents
Most XML Schema languages lack functions for applying constraints between
XML instance documents. In many cases this is useful and it could for example
be that one document contains a database with specific items and then other
documents refer to these items. In this case it would be very useful to validate
the each item referenced actually exist in the database document.
The first two of the above examples (Dependant attributes and Interleaving of
elements) are handled by RELAX-NG without having to rely on embedded Schematron rules.
However, when it comes to defining advanced co-occurence constraints and dependancies
between XML documents RELAX-NG also falls short and some examples of how this can
be achieved will be shown in the section
Embedded Schematron Rules in RELAX-NG
Introduction to Schematron
The Schematron schema language differs from most other
XML schema languages in that it is a rule-based language that uses path-expressions
instead of grammars. This means that instead of creating a grammar for an
XML document a Schematron schema will make assertions applied to a specific
context within the document. If the assertion fails, a diagnostic message
that is supplied by the author of the schema can be displayed.
One advantages of taking this rule-based approach is that in many
cases the Schematron rules can easily be created by modifying the wanted
constraint written in plain English. For example, a simple content model
can in plain English be written like this: "The Person element should
in the XML instance document have an attribute Title and contain
the elements Name and Sex
in that order. If the value of the Title attribute is 'Mr'
then the value of the Sex element must be 'Male'".
In this sentence the context in which the assertions should be applied
are clearly stated as the Person element while we have four different
assertions:
- The context element (Person) should have an attribute Title
- The context element should contain two child elements, Name
and Sex
- The child element Name should appear before the child element
Sex
- If attribute Title has the value 'Mr' then the element
Sex must have
the value 'Male'
In order to implement the path-expressions used in the rules in Schematron,
the W3C XPath language (XPath) is used with various extensions provided by XSLT
(Extensible Stylesheet Language Transformations). Since the path-expressions
are built on top of XPath and XSLT it is also trivial to implement Schematron
using XSLT, which is shown in the section Schematron processing
below.
It has already been mentioned that Schematron makes various assertions
based on a specific context in a document. Both the assertions and the context
make up two of the four layers in Schematron's fixed four-layer hierarchy
that consists of phases (top-level), patterns, rules (defines the context)
and assertions.
Schematron hierarchy
In this introduction only three of these layers (patterns, rules and
assertions) will be covered since these are most important for using embedded
Schematron rules in W3C XML Schemas and RELAX-NG. For a full description of the Schematron
schema language see the Schematron specification International Standard.
In short the three layers covered in this section are constructed so
that each assertion is grouped into rules and each rule defines a context.
Each rule is then grouped into patterns, which are given a name that is displayed
together with the error message (there is really more to patterns than just
a grouping mechanism but for this introduction this is sufficient).
The example in the introduction specified a very simple content model (see below)
that will be used to explain the three layers in the hierarchy.
<Person Title="Mr">
<Name>Eddie</Name>
<Sex>Male</Sex>
</Person>
|
Assertions
The bottom layer in the hierarchy is the assertions, which are used
to specify the constraints that should be checked within a specific context
of the XML instance document. In a Schematron schema the typical element used to define assertions is, assert.
The assert element has a test
attribute, which is a modified XPath expression
1 . In the above example there was four assertions made on the
document in order to specify the content model, namely:
- The context element (Person) should have an attribute Title
- The context element should contain two child elements, Name
and Sex
- The child element Name should appear before the child element
Sex
- If attribute Title has the value 'Mr' then the element
Sex must have the value 'Male'
Written using Schematron assertions this would be:
<assert test="@Title">The element Person must have a Title attribute.</assert>
<assert test="count(*) = 2 and count(Name) = 1 and count(Sex)= 1">The element
Person should have the child elements Name and Sex.</assert>
<assert test="*[1] = Name">The element Name must appear before element Sex.</assert>
<assert test="(@Title = 'Mr' and Sex = 'Male') or @Title != 'Mr'">If the Title
is "Mr" then the sex of the person must be "Male".</assert>
|
For people familiar with XPath these assertions are easy to understand
but even for people with limited experience using XPath this is rather straightforward.
The first assertion simply tests for the occurrence of an attribute Title.
The second assertion tests that the total number of children is equal to
two and that there is one Name element and one Sex
element. The third
assertion tests that the first child element is Name and the last
assertion
test that if the Title is 'Mr' then the sex of the
person must be 'Male'.
If the condition in the test attribute is not fulfilled
the content
of the assertion element will be displayed to the user. So, for example,
if the third condition was broken (*[1] = Name) then the following
message
would be displayed:
The element Name must appear before element Sex.
|
Each of the above assertions has a condition that is evaluated but the assertion does
not define where in the
XML instance document this condition should be checked. For example, the first assertion test
for the occurrence of the
attribute Title but it is not specified on which element in
the XML instance document this assertion should be applied.
The next layer in the hierarchy, the rules, specifies this location (the context of the assertion).
Rules
The rules in Schematron are declared by using the rule element
and the rule element has a context attribute. The value of the context
attribute is the same modified XPath expression as for the test
attribute on the assertions. Like the name suggest, the
context attribute is used to specify the context in the XML
instance document where the assertions should be applied.
In the above example the context was specified to be the Person element
and a Schematron rule with the Person element as
context would simply be:
<rule context="Person"></rule>
|
Since the rules are used to group together all the assertions that share the same context
the rules are designed so
that the assertions are declared as children of the rule element.
For the above example this means that the complete
Schematron rule would be:
<rule context="Person">
<assert test="@Title">The element Person must have
a Title attribute.</assert>
<assert test="count(*) = 2 and count(Name) = 1 and count(Sex) =
1">The element Person should have the child elements Name and Sex.</assert>
<assert test="*[1] = Name">The element Name must appear
before element Age.</assert>
<assert test="(@Title = 'Mr' and Sex = 'Male') or @Title != 'Mr'">If
the Title is "Mr" then the sex of the person must be "Male".</assert>
</rule>
|
This means that all the assertions in the rule will be tested on every Person
element in the XML instance document. If
the context should not be all the Person elements it is easy to
change the XPath to define a more restricted context.
The value Database/Person would for example set the context to
be all the Person elements that have the element Database
as its parent.
Patterns
The third layer in the hierarchy is the pattern, declared using the pattern
element, which is used to group together
different rules. The pattern element also has a name
attribute that will be displayed in the output when the pattern
is checked. For the above assertions you could for example have two patterns, one for checking
the structure and one
for checking the co-occurrence constraint. Since patterns group together different rules Schematron
is designed so
that groups are declared as children of the pattern
element. This
means that the above example, using the two patterns,
would look like this:
<pattern name="Check structure">
<rule context="Person">
<assert test="@Title">The element Person
must have a Title attribute.</assert>
<assert test="count(*) = 2 and count(Name) =
1 and count(Sex) = 1">The element Person should have the child elements Name
and Sex.</assert>
<assert test="*[1] = Name">The element
Name must appear before element Age.</assert>
</rule>
</pattern>
<pattern name="Check co-occurrence constraints">
<rule context="Person">
<assert test="(@Title = 'Mr' and Sex = 'Male')
or @Title != 'Mr'">If the Title is "Mr" then the sex of the person must be
"Male".</assert>
</rule>
</pattern>
|
The name of the pattern will always be displayed in the output regardless of whether the
assertions fail or succeed
and if the assertion fails the output will also contain the content of the assertion element.
However, there is also
additional information displayed together with the assertion text to help the user locate the
source of the failed
assertion. For example, if the co-occurrence constraint above was violated by having Title='Mr'
and Sex='Female' then
the following diagnostic would be generated by Schematron:
From pattern "Check structure":
From pattern "Check co-occurence constraints":
Assertion fails: "If the Title is "Mr" then the sex of
the person must be "Male"." at
/Person[1]
<Person Title="Mr">...</>
|
So, the pattern names are always displayed while the assertion text is only displayed when
the assertion fails. The additional information starts with an XPath that shows
the location of
the context element in the instance document (in this case the first Person
element) and then on a new line the
start tag of the context element is displayed.
The assertion to test the co-occurrence constraint is not trivial and in fact this rule
could be written in a simpler
way by using an XPath predicate when selecting the context. Instead of having the context
set to all Person
elements the co-occurrence constraint can be simplified by only specifying the context to be
all the Person elements
that have the attribute Title='Mr'. If the rule was specified
using this technique the co-occurrence constraint
could be described like this:
<rule context="Person[@Title='Mr']">
<assert test="Sex = 'Male'">If the Title is "Mr"
then the sex of the person must be "Male".</assert>
</rule>
|
So, by moving some of the logic from the assertion to the specification of the context the complexity
of the rule has been decreased.
This is a technique that often is very useful when writing Schematron schemas.
This concludes this introduction about patterns and now all that is left to do is to wrap
the patterns in the
Schematron schema in a schema element and specify that all the Schematron elements used should
be defined in the
Schematron namespace, http://www.ascc.net/xml/schematron http://purl.oclc.org/dsdl/schematron. This
means that the complete Schematron schema for the
example would be:
<?xml version="1.0" encoding="UTF-8"?>
<sch:schema xmlns:sch="http://purl.oclc.org/dsdl/schematron">
<sch:pattern name="Check structure">
<sch:rule context="Person">
<sch:assert test="@Title">The
element Person must have a Title attribute</sch:assert>
<sch:assert test="count(*)
= 2 and count(Name) = 1 and count(Sex) = 1">The element Person should have
the child elements Name and Sex.</sch:assert>
<sch:assert test="*[1] = Name">The
element Name must appear before element Sex.</sch:assert>
</sch:rule>
</sch:pattern>
<sch:pattern name="Check co-occurrence constraints">
<sch:rule context="Person">
<sch:assert test="(@Title
= 'Mr' and Sex = 'Male') or @Title != 'Mr'">If the Title is "Mr" then the
sex of the person must be "Male".</sch:assert>
</sch:rule>
</sch:pattern>
</sch:schema>
|
Schematron can also be used to validate XML instance documents that use namespaces.
Each namespace used in the XML instance document should be declared in the Schematron schema.
The element used to declare namespaces are the ns element which should
appear as a child of the schema element. The ns
element has two attributes, uri and prefix,
which are used to define the namespace uri and the namespace prefix. So, if the XML instance document in the
example had been defined in the namespace www.topologi.com/example then the
Schematron schema would look like this:
<?xml version="1.0" encoding="UTF-8"?>
<sch:schema xmlns:sch="http://purl.oclc.org/dsdl/schematron">
<sch:ns uri="www.topologi.com/example" prefix="ex"/>
<sch:pattern name="Check structure">
<sch:rule context="ex:Person">
<sch:assert test="@Title">The
element Person must have a Title attribute</sch:assert>
<sch:assert test="count(ex:*)
= 2 and count(ex:Name) = 1 and count(ex:Sex) = 1">The element Person should have
the child elements Name and Sex.</sch:assert>
<sch:assert test="ex:*[1] = ex:Name">The
element Name must appear before element Sex.</sch:assert>
</sch:rule>
</sch:pattern>
<sch:pattern name="Check co-occurrence constraints">
<sch:rule context="ex:Person">
<sch:assert test="(@Title
= 'Mr' and ex:Sex = 'Male') or @Title != 'Mr'">If the Title is "Mr" then the
sex of the person must be "Male".</sch:assert>
</sch:rule>
</sch:pattern>
</sch:schema>
|
Note that all XPath expressions that test element values now include the namespace prefix ex.
Schematron processing
One of the major advantages with Schematron is that you do not need a specially written
Schematron processor in order
to validate the XML instance documents. Since Schematron is built using XPath and XSLT functions
all you need is an
XSLT processor. The Schematron processing then works in two steps (see Figure 1):
- The Schematron schema is first turned
into a validating XSLT stylesheet by transforming it with an XSLT stylesheet initially
provided by Academica Sinica Computing Centre. These stylesheets (schematron-basic.xsl,
schematron-message.xsl and schematron-report.xsl) can be found at the
Schematron website Schematron.com website and the different stylesheets generate different output.
For example, the
schematron-basic.xsl is used to generate simple text output like in the example above.
- This validating stylesheet is then used on the XML instance document and the result
will be a report that is based on the rules and assertions in the original Schematron schema.
This means that it is very easy to set up a Schematron processor because the only
thing needed is an XSLT
processor together with one of the Schematron stylesheets. Here is an example of how to validate
the example used
above where the XML instance document is called Person.xml and the Schematron schema is called
Person.sch. The
example use Saxon as an XSLT processor:
C:\>saxon -o validate_person.xsl Person.sch schematron-basic.xsl
C:\>saxon Person.xml validate_person.xsl
From pattern "Check structure":
From pattern "Check co-occurrence constraints":
Assertion fails: "If the Title is "Mr" then the sex of
the person must be "Male"." at
/Person[1]
<Person Title="Mr">...</>
|
Embedded Schematron Rules
in W3C XML Schema
One really good thing about W3C XML Schema is that it is very easy to extend and
one way to do so is to use the annotation functions. The annotation element can have two child elements, namely documentation
and appinfo. The documentation element is mainly intended to provide humans with information about the schema
while the appinfo element is intended for applications. The appinfo element is defined so that it can have any well-formed
XML content from any namespace. Since a Schematron rule use XML syntax this is the perfect place to embed
rules from Schematron.
Almost all elements defined by the W3C XML Schema specification can have the annotation
child element and the most logic place to put the Schematron rules are on the element declaration where the Schematron
rule applies. This means that the W3C XML Schema element declaration and the Schematron rule that apply to the element
are declared in the same place. However, since the Schematron rule add more code to the already verbose W3C XML
Schema, you can just as easy include all the Schematron rules in, for example, the annotation element for the schema
element itself. This may improve readability of the schema by concentrating the Schematron rules at the beginning of
the W3C XML Schema.
Here is a very simple W3C XML Schema that only define one element:
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="Root" type="xs:string">
</xs:element>
</xs:schema>
|
Now, if a Schematron rule should have the Root element as its context this rule could be
added as an embedded
Schematron rule within the appinfo element of the declaration like this:
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="Root" type="xs:string">
<xs:annotation>
<xs:appinfo>
<sch:pattern
name="Test constraints on the Root element" xmlns:sch="http://purl.oclc.org/dsdl/schematron">
<sch:rule
context="Root">
<sch:assert
test="test-condition">Error message when the assertion condition is broken...</sch:assert>
</sch:rule>
</sch:pattern>
</xs:appinfo>
</xs:annotation>
</xs:element>
</xs:schema>
|
As can be seen from the example all embedded Schematron rules must be added on the pattern
level and all Schematron
elements must be declared in the Schematron namespace, http://www.ascc.net/xml/schematron http://purl.oclc.org/dsdl/schematron.
The rules are embedded on
a pattern level because this way the pattern name will be included in the output which helps
identify which rule was
broken if there is a validation problem in the XML instance document.
Now that we know how to write Schematron schemas and we have seen an example of an embedded
Schematron rule in a
W3C XML Schema we can have a look at how to solve the different problems stated in the introduction.
Dependant attributes
To illustrate we will use an example where we have a socket element
with two attributes hostName and hostAddress.
The requirement is that these two attributes are mutually exclusive so that if one is present
the other cannot be
present and vice versa. It is also required that at least of the attributes must appear.
W3C XML Schema will be used to declare the socket element and
also that the socket element can have two attributes,
hostName and hostAddress. The
closest we can get to the above constraint in W3C XML Schema is to declare both
attributes as optional since neither hostName nor hostAddress
is required. This schema could look like the following:
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="socket">
<xs:complexType>
<xs:attribute name="hostName"
type="xs:string" use="optional"/>
<xs:attribute name="hostAddress"
type="xs:string" use="optional"/>
</xs:complexType>
</xs:element>
</xs:schema>
|
A Schematron rule can now be embedded on the socket element
to add the extra constraints. In this case the constraint is divided into two assertions, which allows for a separate error
message for each assertion:
- Both hostName and hostAddress
cannot be present at the same time
- At least one of hostName and hostAddress
must be present
The W3C XML Schema with an embedded Schematron rule for this example would look like this:
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="socket">
<xs:annotation>
<xs:appinfo>
<sch:pattern
name="Mutually exclusive attributes on the socket element" xmlns:sch="http://purl.oclc.org/dsdl/schematron">
<sch:rule
context="socket">
<sch:report
test="@hostName and @hostAddress">On a socket element only one of the attributes
hostName and hostAddress are allowed, not both.</sch:report>
<sch:assert
test="@hostName | @hostAddress">One of the attributes hostName or hostAddress
must be present on the socket element</sch:assert>
</sch:rule>
</sch:pattern>
</xs:appinfo>
</xs:annotation>
<xs:complexType>
<xs:attribute name="hostName"
type="xs:string" use="optional"/>
<xs:attribute name="hostAddress"
type="xs:string" use="optional"/>
</xs:complexType>
</xs:element>
</xs:schema>
|
This schema would validate that the following two instance documents are valid
<?xml version="1.0" encoding="UTF-8"?>
<socket hostAddress="192.168.200.76"/>
<?xml version="1.0" encoding="UTF-8"?>
<socket hostName="pc100"/>
|
while the following two instance documents are invalid:
<?xml version="1.0" encoding="UTF-8"?>
<socket hostAddress="192.168.200.76" hostName="pc100"/>
<?xml version="1.0" encoding="UTF-8"?>
<socket/>
|
Interleaving of elements
The constraints put on the all group that each element declared in its content must have
its maxOccurs attribute
fixed to 1 simplifies the processing but limits the usefulness. For example, the following
content model is not
allowed:
<xs:element name="Root">
<xs:complexType>
<xs:all>
<xs:element name="child1"
type="xs:string" minOccurs="5" maxOccurs="5"/>
<xs:element name="child2"
type="xs:string" minOccurs="2" maxOccurs="2"/>
<xs:element name="child3"
type="xs:string" minOccurs="0"/>
<xs:element name="child4"
type="xs:string" minOccurs="3" maxOccurs="7"/>
</xs:all>
</xs:complexType>
</xs:element>
|
By changing the all group to a choice group and by
making the choice group itself optional and repeatable
a content model where
the different child elements can appear in any order are created. If, for example all the child1 elements should be
grouped together in the instance document, the minOccurs constraint
can be kept as it is. If the child elements do
not have to be grouped together the minOccurs constraint can
be set to 1 to allow for a full mixture of the elements:
<xs:element name="Root">
<xs:complexType>
<xs:choice minOccurs="0" maxOccurs="unbounded">
<xs:element name="child1"
type="xs:string" minOccurs="1" maxOccurs="5"/>
<xs:element name="child2"
type="xs:string" minOccurs="1" maxOccurs="2"/>
<xs:element name="child3"
type="xs:string" minOccurs="0"/>
<xs:element name="child4"
type="xs:string" minOccurs="1" maxOccurs="7"/>
</xs:choice>
</xs:complexType>
</xs:element>
|
Unfortunately this also removes the occurrence constraints on the children. Sometimes this
is not a very important
requirement and if that is the case the above will probably be sufficient. If, however, the occurrence
constraints on the child elements are important
it is trivial to add a Schematron rule to check this. The following schema will illustrate:
<xs:element name="Root">
<xs:annotation>
<xs:appinfo>
<sch:pattern name="Extended_all"
xmlns:sch="http://purl.oclc.org/dsdl/schematron">
<sch:rule context="Root">
<sch:assert
test="count(child1) = 5">You must have exactly 5 child1 elements.</sch:assert>
<sch:assert
test="count(child2) = 2">You must have exactly 2 child2 elements.</sch:assert>
<sch:assert
test="count(child3) <= 1">You can only have one child3 element.</sch:assert>
<sch:assert
test="count(child4) >= 3 and count(child4) <= 7">You must have at
least 3 child4 elements but you can’t have more than 7.</sch:assert>
</sch:rule>
</sch:pattern>
</xs:appinfo>
</xs:annotation>
<xs:complexType>
<xs:choice minOccurs="0" maxOccurs="unbounded">
<xs:element name="child1"
type="xs:string" minOccurs="1" maxOccurs="5"/>
<xs:element name="child2"
type="xs:string" minOccurs="1" maxOccurs="2"/>
<xs:element name="child3"
type="xs:string" minOccurs="0"/>
<xs:element name="child4"
type="xs:string" minOccurs="1" maxOccurs="7"/>
</xs:choice>
</xs:complexType>
</xs:element>
|
This schema would validate a true mixture of all the child elements. If the child elements
should be grouped together
the only change would be to preserve the minOccurs constraint
on each child element (5 for child1, 2 for child2
and 3
for child4). However, since child4's
occurrence is a range a new Schematron rule is needed to assert that all child4
elements are grouped together. The new schema would look like this:
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="Root">
<xs:annotation>
<xs:appinfo>
<sch:pattern
name="Extended_all" xmlns:sch="http://purl.oclc.org/dsdl/schematron">
<sch:rule
context="Root">
<sch:assert
test="count(child1) = 5">You must have exactly 5 child1 elements.</sch:assert>
<sch:assert
test="count(child2) = 2">You must have exactly 2 child2 elements.</sch:assert>
<sch:assert
test="count(child3) <= 1">You can only have one child3 element.</sch:assert>
<sch:assert
test="count(child4) >= 3 and count(child4) <= 7">You must have at
least 3 child3 elements but you can’t have more than 7.</sch:assert>
</sch:rule>
<sch:rule
context="Root/*">
<sch:assert
test="not(preceding-sibling::*[1][name() != name(current())][preceding-sibling::*[name()
= name(current())]])">All <sch:name/> elements must be grouped with the other
<sch:name/> elements.</sch:assert>
</sch:rule>
</sch:pattern>
</xs:appinfo>
</xs:annotation>
<xs:complexType>
<xs:choice minOccurs="0"
maxOccurs="unbounded">
<xs:element
name="child1" type="xs:string" minOccurs="5" maxOccurs="5"/>
<xs:element
name="child2" type="xs:string" minOccurs="2" maxOccurs="2"/>
<xs:element
name="child3" type="xs:string" minOccurs="0"/>
<xs:element
name="child4" type="xs:string" minOccurs="3" maxOccurs="7"/>
</xs:choice>
</xs:complexType>
</xs:element>
</xs:schema>
|
The new rule has each of the child elements of Root as its
context so this rule will apply to all the children of
Root. The assertion in the rule uses the preceding sibling axis
to assert that all the child elements must be
grouped together. In this case it would have been enough to apply this rule to child4
(since it's the only element with an occurrence range) but it is just as easy to apply the same rule for all the children.
Co-occurrence constraints
The number of examples for co-occurrence constraints is more or less unlimited and one example
was used in the Introduction to Schematron section above. In that example the co-occurrence constraint was
that if the Title
attribute on element Person had the value 'Mr' then the value
of the Sex sub-element must be 'Male'.
Instead of defining everything using a Schematron schema, this example will show how to do
the structure in
W3C XML Schema and the co-occurrence constraint with a Schematron rule. The W3C XML Schema
for this simple
example is straightforward:
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="Person">
<xs:complexType>
<xs:sequence>
<xs:element
name="Name" type="xs:string"/>
<xs:element
name="Sex">
<xs:simpleType>
<xs:restriction
base="xs:string">
<xs:enumeration
value="Male"/>
<xs:enumeration
value="Female"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
</xs:sequence>
<xs:attribute name="Title"
type="xs:string" use="required"/>
</xs:complexType>
</xs:element>
</xs:schema>
|
This schema defines the structure of the XML instance document and the only thing the Schematron
rule needs to
define is the co-occurrence constraint. The complete schema with an embedded Schematron rule
would look like this:
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="Person">
<xs:annotation>
<xs:appinfo>
<sch:pattern
name="Co-occurrence constraint on attribute Title" xmlns:sch="http://purl.oclc.org/dsdl/schematron">
<sch:rule
context="Person[@Title='Mr']">
<sch:assert
test="Sex = 'Male'">If the Title is "Mr" then the sex of the person
must be "Male".</sch:assert>
</sch:rule>
</sch:pattern>
</xs:appinfo>
</xs:annotation>
<xs:complexType>
<xs:sequence>
<xs:element
name="Name" type="xs:string"/>
<xs:element
name="Sex">
<xs:simpleType>
<xs:restriction
base="xs:string">
<xs:enumeration
value="Male"/>
<xs:enumeration
value="Female"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
</xs:sequence>
<xs:attribute name="Title"
type="xs:string" use="required"/>
</xs:complexType>
</xs:element>
</xs:schema>
|
Dependancy between XML documents
By using the document() function in XSLT it is also possible to apply constraints between
XML instance documents
and not just within a single document. To illustrate this we use two simple XML instance documents
where one document
contain a single Person element with a name
sub-element and one document that contain a single Car element
with an
Owner attribute. The W3C XML Schemas for these documents would
be:
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="Person">
<xs:complexType>
<xs:sequence>
<xs:element
name="Name" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
|
and
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="Car">
<xs:complexType>
<xs:attribute name="Owner"
type="xs:string" use="required"/>
</xs:complexType>
</xs:element>
</xs:schema>
|
The instance documents would be:
<?xml version="1.0" encoding="UTF-8"?>
<Person>
<Name>Eddie</Name>
</Person>
|
and
<?xml version="1.0" encoding="UTF-8"?>
<Car Owner="Eddie"/>
|
Now we want to make sure that the value of the Owner attribute
in Car.xml must match the value of Person/Name in
Person.xml. This can be done by inserting a Schematron rule in the W3C XML Schema that defines
the Car document:
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="Car">
<xs:annotation>
<xs:appinfo>
<sch:pattern
name="Car owner must link to a person" xmlns:sch="http://purl.oclc.org/dsdl/schematron">
<sch:rule
context="Car">
<sch:assert
test="document('Person.xml')/Person/Name = @Owner">The owner of the
car must match the name of the person in Person.xml.</sch:assert>
</sch:rule>
</sch:pattern>
</xs:appinfo>
</xs:annotation>
<xs:complexType>
<xs:attribute name="Owner"
type="xs:string" use="required"/>
</xs:complexType>
</xs:element>
</xs:schema>
|
The document() function will bring in the elements from the Person.xml file and the assertion
will make sure that
the value of the Owner attribute match the value of the Person/Name element.
Embedded Schematron Rules
in RELAX-NG
Unlike for W3C XML Schemas the embedded Schematron rules in a RELAX-NG schema does
not have to be declared within a specific element. Since a RELAX-NG validator will ignore
all elements not in the RELAX-NG namespace (http://relaxng.org/ns/structure/1.0
), the Schematron rules can be declared between any RELAX-NG element.
Here is a very simple RELAX-NG schema:
<?xml version="1.0" encoding="UTF-8"?>
<element name="Root" xmlns="http://relaxng.org/ns/structure/1.0">
<text/>
</element>
|
Now, if a Schematron rule should have the Root element as its context this rule could be
added as an embedded Schematron rule like this:
<?xml version="1.0" encoding="UTF-8"?>
<element name="Root" xmlns="http://relaxng.org/ns/structure/1.0">
<sch:pattern name="Test constraints on the Root element" xmlns:sch="http://purl.oclc.org/dsdl/schematron">
<sch:rule context="Root">
<sch:assert test="test-condition">Error message when the assertion condition is
broken...</sch:assert>
</sch:rule>
</sch:pattern>
<text/>
</element>
|
The Schematron rules embedded in a RELAX-NG schema are inserted on the pattern
level and need to be declared in the Schematron namespace (http://purl.oclc.org/dsdl/schematron
) just like for W3C XML
Schemas.
Co-occurrence constraints
Although RELAX-NG have better support for co-occurence constraints than W3C
XML Schema there are still many types of co-occurence constraints that cannot be
expressed by RELAX-NG. One such example is identity constraints that has been left out of the current
version of RELAX-NG.
As an example we are going to use a schema that defines a sports tournament.
The tournament have a name, a number of teams which have a unique id and a number of matches
that define which teams will meet in each match. Typically such a schema would validate that every
team in a match must also be one of the teams registered in the
tournament. Although some basic identity constraints can be done usings DTD's
ID and
IDREF, more complex identity constraints will have to be checked with embedded
Schematron rules.
A RELAX-NG for the above described tournament could look like this:
<?xml version="1.0" encoding="UTF-8"?>
<grammar xmlns="http://relaxng.org/ns/structure/1.0">
<start>
<ref name="Tournament"/>
</start>
<define name="Tournament">
<element name="Tournament">
<element name="Name"><text/></element>
<element name="Teams">
<!-- We must have at
least two teams -->
<ref name="Team"/>
<oneOrMore>
<ref name="Team"/>
</oneOrMore>
</element>
<element name="Matches">
<oneOrMore>
<element name="Match">
<element name="Team"><text/></element>
<element name="Team"><text/></element>
<attribute name="id"/>
</element>
</oneOrMore>
</element>
</element>
</define>
<define name="Team">
<element name="Team">
<attribute name="id"/>
<optional>
<attribute name="Name"/>
</optional>
</element>
</define>
</grammar>
|
An XML instance document that would be valid against this schema is:
<?xml version="1.0" encoding="UTF-8"?>
<Tournament>
<Name>FIFA World Cup</Name>
<Teams>
<Team Name="Sweden" id="t1"/>
<Team Name="Argentina" id="t2"/>
<Team Name="Nigeria" id="t3"/>
<Team Name="England" id="t4"/>
</Teams>
<Matches>
<Match id="m1">
<Team>t1</Team>
<Team>t4</Team>
</Match>
<Match id="m2">
<Team>t2</Team>
<Team>t3</Team>
</Match>
</Matches>
</Tournament>
|
Unfortunately the RELAX-NG schema will also validate the XML instance document even if
the id for one of the teams playing in a match doesn't match the id of a team that
has been registered in the tournament (appears as a child of the Teams element).
It is very easy to add a Schematron rule to check this extra constraint and it could
for example be done by adding an embedded rule to the definition of the pattern that
match the Match element:
<element name="Matches">
<oneOrMore>
<element name="Match">
<sch:pattern name="Check that each team is registered in
the tournament" xmlns:sch="http://purl.oclc.org/dsdl/schematron">
<sch:rule context="Matches/Match/Team">
<sch:assert test="text() = ../../../Teams/Team/@id"
>Each Team in a Match must be a registered Team in the tournament.</sch:assert>
</sch:rule>
</sch:pattern>
<element name="Team"><text/></element>
<element name="Team"><text/></element>
<attribute name="id"/>
</element>
</oneOrMore>
</element>
|
With this new definition for the pattern following XML instance document would be invalid
<?xml version="1.0" encoding="UTF-8"?>
<Tournament>
<Name>FIFA World Cup</Name>
<Teams>
<Team Name="Sweden" id="t1"/>
<Team Name="Argentina" id="t2"/>
</Teams>
<Matches>
<Match id="m1">
<Team>t1</Team>
<Team>t4</Team>
</Match>
</Matches>
</Tournament>
|
since a team with an id="t4" is not registered in the tournament.
Dependancy between XML documents
Neither RELAX-NG nor W3C XML Schema was designed to handle dependancies between
XML instance documents but sometimes this is a necessary requirement. For example, if the teams in the previous example were
put in a separate XML instance document
we would still need to validate that each team in a match is registered as a child
of the Teams element.
With this new design the RELAX-NG schema for the tournament would be:
<?xml version="1.0" encoding="UTF-8"?>
<grammar xmlns="http://relaxng.org/ns/structure/1.0">
<start>
<ref name="Tournament"/>
</start>
<define name="Tournament">
<element name="Tournament">
<element name="Name"><text/></element>
<element name="Matches">
<oneOrMore>
<element name="Match">
<element name="Team"><text/></element>
<element name="Team"><text/></element>
<attribute name="id"/>
</element>
</oneOrMore>
</element>
</element>
</define>
</grammar>
|
with the corresponding instance:
<?xml version="1.0" encoding="UTF-8"?>
<Tournament>
<Name>FIFA World Cup</Name>
<Matches>
<Match id="m1">
<Team>t1</Team>
<Team>t4</Team>
</Match>
<Match id="m2">
<Team>t2</Team>
<Team>t3</Team>
</Match>
</Matches>
</Tournament>
|
The schema that defines the teams would be:
<?xml version="1.0" encoding="UTF-8"?>
<grammar xmlns="http://relaxng.org/ns/structure/1.0">
<start>
<ref name="Teams"/>
</start>
<define name="Teams">
<element name="Teams">
<!-- We must have at least two teams -->
<ref name="Team"/>
<oneOrMore>
<ref name="Team"/>
</oneOrMore>
</element>
</define>
<define name="Team">
<element name="Team">
<attribute name="id"/>
<optional>
<attribute name="Name"/>
</optional>
</element>
</define>
</grammar>
|
with the instance:
<?xml version="1.0" encoding="UTF-8"?>
<Teams>
<Team Name="Sweden" id="t1"/>
<Team Name="Argentina" id="t2"/>
<Team Name="Nigeria" id="t3"/>
<Team Name="England" id="t4"/>
</Teams>
|
Now, when validation is performed of the XML instance document with the tournament
information we still want to make sure that each team in a match is declared in the XML
instance document that
contains the teams. Like in the previous example the embedded Schematron rule can
be defined on the pattern for the Match element. The
only difference will be that this time the document() function will be used to access
the instance where the teams are defined:
<element name="Matches">
<oneOrMore>
<element name="Match">
<sch:pattern name="Check that each team is registered in
the tournament" xmlns:sch="http://purl.oclc.org/dsdl/schematron">
<sch:rule context="Matches/Match/Team">
<sch:assert test="text() = document('Teams.xml')/Teams/Team/@id"
>Each Team in a Match must be a registered Team in the tournament.</sch:assert>
</sch:rule>
</sch:pattern>
<element name="Team"><text/></element>
<element name="Team"><text/></element>
<attribute name="id"/>
</element>
</oneOrMore>
</element>
|
Processing
Neither
a W3C XML Schema nor a RELAX-NG processor will recognize and perform the
validation constraints expressed by the embedded Schematron rules. In fact,
the embedded Schematron rules will be completely ignored by both processors
since for W3C XML Schema they are declared within the appinfo element and
for RELAX-NG they are declared in the Schematron namespace2. This means that in order to use the Schematron rules for validation they need
to be extracted from the host schema and concatenated into a Schematron schema. Since all
three schema languages use XML syntax a perfect tool for this is XSLT.
The XSD2Schtrn.xsl ExtractSchFromXSD.xsl stylesheet
will extract embedded Schematron rules from a W3C XML Schema document and merge them
into a complete Schematron schema. It will also extract Schematron rules that have
been declared in W3C XML Schema modules that are imported, included or redefined
in the base schema. (There is a version for Schematron schemas that use XSLT2 called
ExtractSchFromXSD-2.xsl)
Similarily, the RNG2Schtrn.xsl ExtractSchFromRNG.xsl stylesheet
will extract embedded Schematron rules from a RELAX-NG schema document. It will also extract Schematron rules that has
been declared in RELAX-NG modules that are included in or referenced from the base schema. (There is a version for Schematron schemas that use XSLT2 called
ExtractSchFromRNG-2.xsl)
The result from the scripts is a complete Schematron schema that can be validated
using the two-step XSLT process described in the Introduction to Schematron
section above. This means that validation results are available from both
Schematron validation and W3C XML Schema or RELAX-NG validation and if needed
the results can be merged into one report. The whole process is described
in the following picture:
As
can be seen in the picture, there are two distinctive paths in the processing
which means that if timing is important the two paths could be implemented
as separate processes and be executed in parallel.
A batch file that would (using XSV and Saxon) validate an
XML instance document against both W3C XML Schema and its embedded Schematron rules can look like this:
echo Running XSV validation on Person_bad.xml...
xsv Person_bad.xml
echo Creating Schematron schema from appinfo in Person.xsd...
saxon -o Person.sch Person.xsd XSD2Schtron.xsl
echo Running Basic Schematron validation on file Person_bad.xml...
saxon -o validate.xsl Person.sch schematron-basic.xsl
saxon Person_bad.xml validate.xsl
|
So, first is the XML instance document is validated against the W3C XML Schema using XSV and then it is validated with
the embedded Schematron rules using Saxon. An output example could look like this:
Running XSV validation on Person.xml...
<?xml version='1.0'?> <xsv docElt='{None}Person' instanceAssessed='true'
instanceErrors='0' rootType='[Anonymous]' schemaErrors='0' schemaLocs='None
-> Person.xsd' target='file:/E:/Work/XMLSchema/XML-DEV/Schtrn+W3C/Person.xml'
validation='strict' version='XSV 1.203.2.16/1.106.2.8 of 2001/
10/28 17:39:15' xmlns='http://www.w3.org/2000/05/xsv'>
<schemaDocAttempt URI='file://C:/Person.xsd' outcome='success' source='schemaLoc'/>
</xsv>
Done.
Creating Schematron schema from appinfo in Person.xsd...
Running Basic Schematron validation on file Person.xml...
From pattern "Check structure":
From pattern "Check co-occurrence constraints":
Assertion fails: "If the Title is "Mr" then the sex of the person must be "Male"." at
/Person[1]
<Person Title="Mr">...</>
|
Similarily a batch file that would (using the Win32 executable of Jing and Saxon) validate
an XML instance document against a RELAX-NG schema and its embedded Schematron rules can look like this:
echo Running Jing validation on Tournament_bad.xml...
jing Tournament.rng Tournament_bad.xml
echo Creating Schematron schema from Tournament.rng...
saxon -o Tournament.sch Tournament.rng RNG2Schtron.xsl
echo Running Basic Schematron validation on file Tournament_bad.xml...
saxon -o validate.xsl Tournament.sch schematron-basic.xsl
saxon Tournament_bad.xml validate.xsl
|
An output example could look like this:
Running Jing validation on Tournament_bad.xml...
Error at URL "file:/D:/Work/XMLSchema/XML-DEV/Schtrn+W3C/Article/Emb_Schtrn/Tournament_bad.xml", line number 7: unknown
element "BugusTeam"
Creating Schematron schema from Tournament.rng...
Running Basic Schematron validation on file Tournament_bad.xml...
From pattern "Check that each team is registered in the tournament":
Assertion fails: "Each Team in a Match must be a registered Team in the tournament." at
/Tournament[1]/Matches[1]/Match[1]/Team[2]
<Team>...</>
Done.
|
The Topologi Schematron Validator is a graphical validator that can validate and
XML instance document using both W3C XML Schemas and RELAX-NG schemas with embedded Schematron rules.
Summary
Schematron is a very good complement to both W3C XML Schema and RELAX-NG and there seems
little that cannot be validated by the combination.
This article has shown how to extract the embedded Schematron rules and validate
the resulting Schematron schema using a three-step XSLT process. The examples shown can be downloaded
in a zip-file that also contains Saxon,
XSV and Jing so you can try them out yourself (only Windows is supported and Jing
needs Microsoft Java VM).
It is up to each project and use-case to evaluate if this is suitable technique
to achieve more powerful validation and some of the advantages and disadvantages
to take into account are:
+ By combining the power of W3C XML Schema and Schematron the limit for
what can be done in terms of validation is raised to a new level.
+ Many of the constraints that previously had to be checked in the application
can now be moved out of the application and into the schema.
+ Since Schematron lets you provide your own error messages (the content
of the assertion elements) you can assure that each message is as explanatory
as it needs to be.
- In time critical applications the overhead of processing the embedded Schematron rules may be too long.
- Since the extraction of Schematron rules from a RELAX-NG schema is performed with
XSLT, embedded Schematron rules are only supported in RELAX-NG schema that use the
full XML syntax.
For
W3C XML Schema it should also be noted that, at this stage, Schematron rules
can only applied on specific elements in the XML instance document. It is
not yet possible to apply a Schematron rule to a type definition in W3C XML
Schema which would make this technique even more powerful. Depending on how
much of the PSVI3 that will be
available in the next version of XPath this is something that may be possible in the future.
If you do not mind adding two more XSLT processes to the processing chain this is in fact possible to do with the
help of Francis Norton's typeTagger.
The basic idea is that it annotates the XML instance document with extra
attributes containing, among other things, the element type information from
the W3C XML Schema.
Instead of using the RNG2Schtrn.xsl stylesheet there exists an alternative way
to validate embedded Schematron rules in a RELAX-NG schema. One version of Sun's
MSV have an
add-on that will validate XML documents against RELAX-NG schemas annotated with Schematron rules.
The ability to combine embedded Schematron rules is not unique to W3C XML Schema and
RELAX-NG and in fact it should be possible in all XML Schema languages the uses XML syntax and have an extensibility mechanism.
The only thing needed is to modify the XSLT extractor stylesheet to accomodate the
extension mechanism in the XML Schema language.
Acknowledgements
I would like to thank Rick Jelliffe for taking the time to review this paper.
Back to text
Back to text
Back to text
Copyright © 2002, Eddie Robertsson
This is a draft paper that can be used privately but do not repost publicly.