Libraries and cultural organisations have a rich amount of digitised historical handwritten material. A vast majority of this material has not yet been transcribed. The task of making these historical collections available for public access is challenging, especially in performing a simple text search across the collection. Machine learning based methods for handwritten text … Continue reading Making Large Collections of Handwritten Material Easily Accessible and Searchable
In Search of the Scribe Letter Spotting as a Tool for Identifying Scribes in Large Handwritten Text Corpora
In this article, letter spotting was used on a large set of handwritten documents in order to identify those that contain similar script. The main scribe of the Codex Holmiensis D 3 manuscript has yet not been identified in other documents. In the letter spotting process, a set of ‘g’:s, ‘h’:s and ‘k’:s have been selected … Continue reading In Search of the Scribe Letter Spotting as a Tool for Identifying Scribes in Large Handwritten Text Corpora
OptimalRANSAC
Due to its random nature, standard RANSAC is not always able to find the optimal set (all inliers) even for moderately contaminated sets and it is known to perform badly when the number of inliers is less than 50%. But this algorithm is capable of finding the optimal set (hence it is "almost" deterministic) even … Continue reading OptimalRANSAC
Clustering as an Alternative to RANSAC
These two papers propose clustering as a deterministic alternative to RANSAC. The corresponding pairs of points can be rewritten as points in 2D space. The inliers will be found close to each other while the outliers will be found further away from the cluster, depending on how well the points correspond. The first paper describes … Continue reading Clustering as an Alternative to RANSAC
Putative Match Analysis (PUMA)
This paper propose a deterministic alternative to RANSAC that is rotation and scale invariant, but to a smaller extent large perspective differences. We used this in our first word spotter to remove word outliers. The idea is to compare each point in one image with all the ones in the other image. The relative distance … Continue reading Putative Match Analysis (PUMA)
