CORPUS LEGIS

Brief Project Presentation

The Corpus Legis project was commenced in response to established needs for improved legal information retrieval methods and enhanced support for legal investigations of different subject matters. Other significant factors include the rapid growth of legal information, its internationalisation and the general need for European harmonisation as a result of European Community Law.

Questions of legal document management have been investigated in this project by means of the international document representation standard SGML – Standard Generalized Markup Language. The main part of the analysed text corpus focuses on documents reflecting the system for lawmaking, e.g. government bills and laws.

The Corpus Legis project has been carried out in co-operation between the Swedish Law & Informatics Research Institute and the Department of Computational Linguistics at the University of Stockholm. It has been supported financially by two national research councils: the Swedish Council for Planning and Co-ordination of Research (FRN) and the Swedish Council for Research in the Humanities and Social Sciences (HSFR). The project began in July 1994 and run through 1998.

The project was governed by the following three hypotheses:
(a) SGML is a tool for expressing more than just a document’s structure.
(b) It is possible to design general document type definitions (DTD’s) for legal documents.
(c) SGML is a means for improving methods for legal information retrieval (IR).

Technical Tools
Near & Far (Microstar Software) was used to create document type definitions within a graphical environment. The Rules Builder software (by SoftQuad) converted (compiled) the resulting DTD file to a format that can be read by the SGML-editor Author/Editor (also by SoftQuad).

PanoramaPro (SoftQuad) is utilised in browsing through the completed SGML-documents. The PRISE (Prototype Indexing and Search Engines) system (developed by NIST ) is used to explore different information retrieval features. Dataware II  is an Electronic Publishing Management System from Dataware Technologies (see further Sunstone Systems AB) which is taken advantage of for the purpose of an overall evaluation of the SGML-application.


Updated November 2000.