3 Pre-processing

There are two main ML components to HTR/OCR transcription workflows (Figure 1.1), a segmentation component and a text transcription component. To understand the software (frameworks) for HTR/OCR solutions a brief introduction in ML and computer vision pre-processing methods is required. This allows you to understand potential pitfalls better.

3.1 Computer vision

Although computer vision methods, broadly, include ML methods the classical approaches differ significantly from ML methods. Classic computer vision methods are applied on pixel (region) or image based transformation. These methods are often used in the pre-processing of images before a machine learning algorithm is applied Figure 1.1. In particular, the removal of noise, boosting of text contrast and creation of evenly lighted documents are common pre-processing steps.

These algorithms also serve an important role in the creation of additional (synthetic) data from a single reference dataset, through data augmentation Figure 5.1, in order to increase a machine learning model robustness.

3.2 Key pre-processing concepts

Classical examples are the removal of uneven lighting across an image using (contrast limited) adaptive histogram equalization (CLAHE), the detection and removal of structuring elements such as linear features using a Hough transform, or the adaptive thresholding of an image from colour to black-and-white only.

Figure 3.1: Example of various thresholding methods as implemented in the OpenCV computer vision library (https://opencv.org)

Other common methods are the use of Non-Local Means de-noising to remove stochastic noise. Other Fast Fourier Transform (FFT) based methods can be applied to remove periodic noise by manipulating the frequency domain of the image.