I presented our ongoing work at the “högre seminariet i ekonomisk historia på Stockholms universitet ” (4 June 2020).
The title was Fast and easy transcription of handwritten documents
Abstract
Printed books can be converted into searchable machine encoded text using Optical Character Recognition (OCR). However, handwritten text is much harder to convert due to the large variation in handwriting style between persons, since every person will always inevitably write the same words with a small variation in size etc. Handwritten Text Recognition (HTR) has therefore emerged as an active research field to solve the problem of automatic word recognition and text conversion.
Transcription is a tedious time consuming task and several applications exist that facilitates the process, but usually require rather large training data that first needs to be transcribed and annotated by hand. We are therefore developing a framework for fast semi-automatic collection of words, which even allows for a group of users to transcribe a text in arbitrary word order. This will help in finding linked words much faster for subsequent learning and it also makes it possible to search in not yet transcribed document collections. Examples from ongoing research projects will be presented.