Document Complexity Metric

Rick Jelliffe's Document Complexity Metric was first presented in a paper at XML 2004, at the University of Cambridge, U.K. A version of that paper is being prepared for publication at the XML.COM website in early 2005.

Briefly, the Document Complexity Metric is a measure of the structural complexity of a document, a set of documents or a DTD. Unlike most complexity metrics, which are aimed at understanding the compressibility of a document for transmission, the Document Complexity Metric is designed for use in project estimation and software development.

The Document Complexity Metric has been tested against over 300 large technical documents from different locations and organizations, and provides a better measure of complexity than, for example, a raw count of unique elements.

It can be used in a variety of situations:

  • During pre-sales discussions, the Document Complexity Metric score of the prospective job can be compared with the scores of previous jobs, to locate the probable ballpark for costing.
  • As a project progresses and more examples become available, the Document Complexity Metric score can be used as an objective basis for billing clients. As the client adds elements or contexts that must be dealt with, the score will increase.
  • A DTD with a very large Document Complexity Metric score is undoubtedly an overly broad DTD which will not reflect the more limited set of structures that will actually be found in documents. Project managers can use the score to alert them that the DTD is probably unreliable as a definite guide to structure and work. Coding for structures and contexts that will not ever be found can easily double the amount of time (and cost) for a stylesheet development.