Cecilia Magnusson Sjöberg
March 19, 1997


DTD development for the legal domain

The so called Corpus Legis Project at Stockholm University was commenced in response to certain well established needs for a tool with which one could improve legal document management. The project's main DTD - Legis.dtd - focuses on documents reflecting the system of lawmaking, e.g. government bills and laws. It enables markup at several levels. Elements used for references both within a document and externally are included. With regard to the logical design in terms of connectors and occurrence indicators Legis.dtd may be characterised as flexible. More precisely, it allows for alternative markup with the purpose of extracting present and future information in an optimal way.

In the process of developing DTDs for the legal domain it has proved meaningful to distinguish between the following three markup levels (a) layout, (b) structure and (c) contents. Layout markup is not of primary interest in a legally oriented DTD, but cannot be completely disregarded in the development of a legal SGML-system. For example, the use of italics in a government bill is the way of indicating changes in a law.

The legal implications of structural markup are mostly document-type-dependent. For example, in a given law the document structure has been defined, in principle, beforehand. This means that although the markup does not necessarily add any new information it represents important information. Structural markup that reflects the legal convention will help to improve information retrieval.

The choice of a method for implementing contents markup is highly application dependent. Several different approaches have been tried in the Corpus Legis project. Two major alternatives are either to create a set of specific legal elements mirroring each particular legal aspect or a general legal element covering all these legal components which are then further classified as attribute values.

It shall be noted that trying to create a complete list of common components for the legal domain in a markup system is not a particularly rewarding task. The task should be application-oriented instead, being dependent on the constant development of legal systems. Some text elements have appeared, however, as being of particular relevance. These may be categorised as: headings, paragraphs, articles, legal concepts (general or topic specific), references and other (e.g. quotations and personal data identifiers, such as personal names). In Legis.dtd these are handled by using structural elements, legal elements, the nameloc function (HyTime) or attribute values.

Concluding remarks

With regard to the introduction of SGML into the legal domain the following characteristics have come to appear as central. First, the fact that SGML is an official international standard for document markup. Secondly, the fact that it has a high level of expressiveness. Thirdly, the inherent idea of media independence is valuable. Last but not least, the SGML standard's text orientation may be in many ways more suitable for the legal material than, for example, logically based approaches used in the area of AI-based expert systems.

Is SGML the solution to the problem of managing the rapid growth of legal information with increasing transborder data flows? Without sophisticated means for automatic markup and updating of, for example, hypertext links the majority of SGML implementations in the legal domain will remain trivial, especially as regards markup levels. In a broader perspective uncomplicated but efficient large-scale implementations may be what is actually needed most.

SGML will in this context serve as an incentive for keeping documents in order, as well as being a tool for accomplishing more uniform document type structures. Finally, seemingly insignificant markup of, for example, headings has proved to form a basis for deeper understanding of legal documents.


