binSince documents often are somewhat degraded it is important to be able to efficiently remove the disturbing background from the text. The next step would be to binarise the segmented text, but in our wordspotter we prefer to work on the background removed text. We have published two papers dealing with these problems.

  • Automatic Document Image Binarization using Bayesian Optimization. Vats, E., Hast, A, Singh, P. In Proceedings of the 4th International Workshop on Historical Document Imaging and Processing (HIP), ACM Press, pp. 89–94, 2017.
    @inproceedings{Vats:2017:ADI:3151509.3151520,  author = {Vats, Ekta and Hast, Anders and Singh, Prashant},  title = {Automatic Document Image Binarization Using Bayesian Optimization},  booktitle = {Proceedings of the 4th International Workshop on Historical Document Imaging and Processing},  series = {HIP2017},  year = {2017},  isbn = {978-1-4503-5390-8},  location = {Kyoto, Japan},  pages = {89--94},  numpages = {6},  url = {http://doi.acm.org/10.1145/3151509.3151520},  doi = {10.1145/3151509.3151520},  acmid = {3151520},  publisher = {ACM},  address = {New York, NY, USA}}

     

The following images shows the original image to the left, a threshold binarisation in the middle and the back ground removed using the proposed algorithm to the right.

I strongly suggest that HTR should be done on background removed images, rather than binirised images. But since many do not agre

Background Removal and Binarisation

e we also added binarisation upon it, using Bayesian optimisation.

The following paper further improves binarisation.

  • Learning Surrogate Models of Document Image Quality Metrics for Automated Document Image Processing. Singh, P., Vats, E., Hast, A. Proceedings of the 13th IAPR International Workshop on Document Analysis Systems (DAS), 2018.
    @INPROCEEDINGS{8395173, author={P. Singh and E. Vats and A. Hast}, booktitle={2018 13th IAPR International Workshop on Document Analysis Systems (DAS)}, title={Learning Surrogate Models of Document Image Quality Metrics for Automated Document Image Processing}, year={2018}, pages={67-72}, doi={10.1109/DAS.2018.14}, month={April}}

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s