ISO Document Schema Definition Language (DSDL)
Rick Jelliffe, Topologi, 2002-04-06
ISO DSDL, ISO/IEC 19757, is a proposed ISO standard to bring choice and power to users of XML and SGML in validation and schema-based post-processing.
The steady innovation of the last few years in XML has resulted in a group of technologies from several different sources which each provide powerful and useful contributions. But without a common framework, the pieces of the puzzle cannot be combined into a satisfactory whole.
CONTRIBUTING TECHNOLOGIES
ISO DSDL is made from several parts:
- A framework for supporting different schema modules, perhaps
influenced by RELAX Namespaces (details only in Japanese, as yet), XPipe,
XML
Pipeline Definition Language, and my Connect.
I suppose that Namespaces
in XML would be referenced as part of the framework. Note that
already some major software is being designed along lines which will
help DSDL implementation: notably the Apache Xerces
Xerces Native Interface which is a framework for communicating
a "streaming" document information set and constructing generic
parser configurations. It is quite likely that there will be some discussions
at W3C on this question: for their family of technologies the framework
question arises because of issues such as "should validation occur
before or after XInclude inclusion?" which, unanswered, leave potential
users in the air.
- Namespace-aware processing with DTD syntax, for a discussion
of which see these XML-DEV thread
s
and CL-XML's
approach;
- RELAX NG
(called Grammar-oriented schema languages);
- Schematron
(called Path-based integrity constraints);
- W3C
XML Schemas Datatypes (called Primitive data type semantics);
- W3C
XML Schemas Structures (called Object-oriented schema languages)
- Information item manipulation, which would presumably provide as modules many of the features of SGML and SGML Extended Facilities unbundled; the details have not emerged yet, but candidate features to reconstruct could include attribute defaulting and SGML's LINK (perhaps using NIST's ATTS), architectural forms (perhaps based on Dave Megginson's XAF, NIST's APEX and John Cowan's AFNG proposal), entity inclusions (perhaps based on W3C XBase, XLink and XInclude), a SHORT REFERENCE-like facility (perhaps something like Regular Fragmentations) and entity and other declarations (perhaps something like Topologi's Named Information Item Declaration Language).
Presumably, the technologies that are already completed (RELAX NG, Schematron, XML Schemas) could be ISO Standardized very quickly. The other parts would involve more community interaction and discussion, and so take longer.
There are some potentially important use-cases which may have to wait. XML Schemas Key and Uniqueness constraints are limited; better constraints can be expressed in my Schematron but not with the same declarative usefulness; progress on this would probably be deferred until a W3C or OASIS technology comes to the fore. Similarly, Schematron's phases mechanism is very useful and powerful in reconstructing DTD's conditional marked sections; but it is probably ahead of its time in public perception, so I would not expect it to make standards-makers' 80/20 point (it will still be available in Schematron however.)
Merely factoring out functionality from DTDs is not "layering". Layering happens when all the functions have real specifications and the order of their application can be specified or defaulted. Layering certainly does not happen by either factoring out existing functionality to be taken care of by potential specifications (= vapor-layers) or by treating processing order as irrelevant.
WHAT IS WRONG WITH XML SCHEMAS?
Well, sometimes...nothing! XML Schemas started with the aim of being a universal schema language. However, in the absence of a modular architecture, there are simply too many domains-of-use of XML for any single language to be universal.
While the developers of various schema languages obviously do so because of some perceived deficiency in the available schema languages (Schematron and the precursors of RELAX NG RELAX and TREX were notably created in response to XML Schemas drafts, for example), ISO DSDL is about supporting plurality and allowing competition, not to promote one technology above another.
Different stakeholders, in particular the W3C, will naturally be promoting the particular technologies they have created; however, these technologies are designed with particular use-cases in mind (notably the ubiquity of the WWW, an emphasis on XML for messages and data, and XHTML as their supported language for prose) which are not universal. So it is appropriate for ISO to provide a framework to support a plurality of schema modules, allowing rigorous description of the particular processes a document goes through to be validated, and allowing smaller and more reliable schema systems.
An interesting sidenote is that because modern schema languages are specified in XML instance syntax, it is often a matter of simple transformations for one schema language to implement or simulate another. Thus Sun's Multi-Schema Validator (MSV) uses a RELAX-like abstract grammar internally, and converts DTDs and XML Schemas into that internal form. Already there is a trend for tools to support multiple schemas, and this may continue. (Anyway, if you use each language conservatively, you won't lock yourself in to a particular schema technology; you could change schema languages while keeping the same document structures.)
TOPOLOGI & DSDL
Of the technologies mentioned above, the Topologi Collaborative Markup Editor (in beta at time of writing) supports
- XAR
- XML DTDs
- Schematron
- RELAX NG
- XML Schemas
- Named Information Item Declaration Language
Topologi's free Schematron Validator is a Windows application that supports
- XML DTDs
- Schematron
- XML Schemas
- Schematron embedded in XML Schemas
Topologi's approach is that users benefit from products which do not lock them in to particular technologies: SGML versus XML, Mac versus Linux versus Windows, database versus documents, file systems versus repositories, DTDs versus RELAX versus XML Schema versus Schematron.
If one piece of the Jigsaw puzzle does not meet expectations, you should not have to throw away the whole puzzle! And, similarly, if one interface between components is not optimal, it is best if the existing components can switch (e.g. from XML Schemas to RELAX) without requiring new components, training, and so on.
When looking at XML products, it is useful to consider "Am I really getting the risk-minimization of loose coupling, or does it force me to completely buy-in to schema languages and technologies which have not proven themselves yet?" By suporting plurality, our customers are not forced to adopt technology they do not need, and have the agility to change when needed.
RELATED MATERIAL
For an excellent discussion on why XML processing should be more modular, see Simon St. Laurent's Toward a Layered Model for XML and XPDL. For a concrete proposal on combining this approach with UML to make more readable and tight standards, see my DISARM (Document Information Set Articulated Reference Model). For a list of the 16 thin layers of XML see my Goldilocks and SML. For an off-the-cuff discussion on whether software layers should correspond to implementation layers, see this XML-DEV thread.
For a good introduction to document architectures, see Josh Lubell's Architectures in an XML World.
For one approach to supporting multiple schemas in a distributable package, see my proposal for an XAR (XML Application Archive) format DZIP.
For an example of an approach which, if it gets a user base and multiple implementations, I hope would also be considered for inclusion in DSDL, see Examplotron.
Another schema language paradigm, based on ordering, is that of my Hook.
There is another major strand of schema languages available, which already has ISO standardization: ASN.1. Users who have requirements for ultra-compact data transmission should consider ASN.1, at least for that part of their data lifecycle.