Report on File Formats for Hand-written Text Recognition (HTR) Material

Today, the National Archives of Finland officially presents its file format study for hand-written text recognition carried out within the EU-funded project “co:op – community as opportunity” !

The primary purpose of this study is to review and analyze the available file formats for the storage of automatically recognized text or manually input text (transcription). The automatic recognition can be either OCR-based (i.e. recognition of printed text) or HTR-based (i.e. recognition of hand-written text).

The existing file formats are described from the point of view of their structure and special characteristics and links to schema files or more detailed descriptions of the formats are given. Also, an attempt is made to list some of the projects, organizations and pieces of software using the formats. Finally, a summary and comparison of the reviewed file formats is provided.

Another purpose of this format study is to analyze the applicability of the file formats in the environment of the National Archives of Finland. This requires state-of-the-art analysis identifying current systems related to e.g. long-term preservation of documents, metadata handling and information search as well as describing the foreseen changes in the environment in the near future. In addition to that, requirements concerning the types of usage potentially enabled by the existence of OCR:ed / HTR:ed document text are listed. Finally, the potential implications of fulfilling the listed requirements on processes, other systems and processing are analyzed.

