An Efficient Phase-Based Binarization Method for Degraded Historical Documents

— Document image binarization is the first essential step in digitalizing images and is considered an essential technique in both document image analysis applications and optical character recognition operations, the binarization process is used to obtain a binary image from the original image, binary image is the proper presentation for image segmentation, recognition, and restoration as underlined by several studies which assure that the next step of document image analysis applications depends on the binarization result. However, old and historical document images mainly suffering from several types of degradations, such as bleeding through the blur, uneven illumination and other types of degradations which makes the binarization process a difficult task. Therefore, extracting of foreground from a degraded background relies on the degradation, furthermore it also depends on the type of used paper and document age. Developed binarization methods are necessary to decrease the impact of the degradation in document background. To resolve this difficulty, this paper proposes an effective, enhanced binarization technique for degraded and historical document images. The proposed method is based on enhancing an existing binarization method by modifying parameters and adding a post-processing stage, thus improving the resulting binary images. This proposed technique is also robust, as there is no need for parameter tuning. After using document image binarization Contest (DIBCO) datasets to evaluate this proposed technique, our findings show that the proposed method efficiency is promising, producing better results than those obtained by some of the winners in the DIBCO.


I. INTRODUCTION
Document image binarization (DIB) is considered a critical stage in segmenting texts from highly degraded document images, also binarization coming as the first step in most of document image analysis and recognition [1]. The purpose of this technique is to segment anticipated objects, such as texts, from the background and remove noises that exist in images. Document image binarization is of paramount importance because the performance of other steps in analysis and vision applications depends on its results and efficiencies [1], such as optical character recognition, image enhancement, text detection and writer identification [2].
As illustrated in Fig.1, document images typically suffer from various degradations over time, and it is not uncommon for severely degraded documents to depict abnormal properties concerning stroke brightness, stroke width, stroke connection, and background of the document [3]. Common types of degradation include contrast variation, uneven Illumination, blurring, faded ink or faint characters, bleeding of ink, smears, and thin or weak texts [4]. These degradations make the document image binarization a daunting task. Nevertheless, various methods for DIBs have been developed to address the challenges in binarization. The methods of image binarization are typically categorized into local and global thresholding methods [5]. The global thresholding method uses a single thresholding amount, and it ensures that the foreground and the background images are well-segmented.
On the other hand, the local thresholding method identifies more than one thresholding value, as the image is divided into windows with a fixed pixel width and height. This local method also allows a thresholding value for each of these windows rather than permitting one global value for the whole image [6]. Global binarization methods, however, are unsuitable for intensively degraded images. Therefore, using local thresholding techniques is preferable in segmenting text values from background images because they are more adaptable and accurate than those of global thresholding methods. Furthermore, the Niblack method [7], which is one of the older local binarization methods, can produce reasonable results by segmenting the text in the image from the background correctly. This method, nonetheless, produces extensive noise around the text, and the tuning of the parameter needs to be made manually. Hence, apart from the fact that the results of binarization depend on the window size used, the parameters need to change for certain kinds of degradations [5].
Another thresholding method is Sauvola's binarization method, a technique that has solved the noise problem around text [8]. However, this method is not foolproof. It is sensitive when there is a contrast variation between the background and the foreground images. In addition to the fact that its results depend on window size, Sauvola's binarization method is also similar to most local methods which need to tune the parameters and the window's size manually, depending on the images. Although no binarization methods are yet to be effective for different types of document degradations, the binarization of heavily degraded document images remains under research [9].
Some researchers have proposed binarization methods that are dependent on many stages [10]. They use a preprocessing stage before binarization; these methods utilize some filters to eliminate noises in the images before the binarization step. At a later stage, a newly proposed binarization method is applied to specific existing binarization methods, to extract the foreground from the noisy background. Furthermore, the quality of the resulting binary image is improved by employing some postprocessing methods [11].
These approaches yield more improved results than those using simple thresholding methods [5], [12]. This was confirmed by the 2014 and 2016 DIBCO winners, whose methods consisted of numerous stages. In addition to these research efforts, by introducing challenging benchmarking datasets for evaluating the recent advancement in DIB, contests such as the Competition of Handwritten Document Image Binarization (H-DIBCO) and the Competition of Document Image Binarization (DIBCO) have been held since 2009 to address this ongoing problem [13]. However, competition results thus far indicate that more research efforts are needed to improve binarized image quality [6].

II. MATERIALS AND METHODS
Howe's binarization method comprises three salient stages that define its distinct purposes and functions [14].
First, regarding the Markov Random Field Model, Howe's binarization method explicates target binarization as a process of putting labeling on pixels to minimize their energy function. Second, the method formulates the data fidelity term of the energy, by using Laplacian image intensity to provide a clear distinct background from the ink [15]. It also provides a climacteric invariance brought on by the key differences in both the contrast and the overall intensity. Third, the method takes in edge discontinuities into the global energy's smoothness term function, thus aligning ink boundaries to the edges through bias, as well as allowing for a strong smoothness incentive in other parts of the image. Although Howe's binarization method consists of six important parameters, he identifies two of them whose impact has the most significant effect on the binarization result's high threshold (thi). These parameters include the algorithm of canny edge detection and the value c used for correcting labeling discontinuities. Owing to their significant results, Howe provides an automatic algorithm that tunes the two parameters [15]. He also argues that tuning the value of c minimizes the energy function for a sequence of a varied c value and compares two sequential images based on the measure of their instability. The measure above represents a normalized change between two sequential images. The final result is chosen by selecting the image whose instability value is the lowest. Another crucially tuned parameter, according to [14], is the thi. Howe contends that picking between two high threshold values, τ1 and τ2, is enough to provide sufficient energy to speed up the process of tuning the parameter. However, it is worth noting that tuning thi requires adjusting c, as described above for τ1 and τ2 and their average value τ0.
The process results in the following binarized images: B0, B1, and B2. The previously described variability measure is used on B0 and B1, and B0 and B2 to compare their high threshold with its mean. The value of thi is chosen from the highest threshold, whose instability value is the lowest. The above automatic algorithm tuning procedure produces excellent results for the binarization, however, Howe's binarization method, in some cases, fails to detect the edges of the text if the image is degraded with too much ink bleeding [16].

A. Proposed Binarization Method
The proposed binarization method consists of two main stages. Fig. 2 presents the proposed method framework.

A. Stage 1: Modified Howe Binarization
The proposed binarization method uses modified parameters of Howe's binarization method [13], which tuned the c value and calculated the binarization using two thi values τ1, τ2 and their mean value τ0 = (τ1 + τ2)/2. The proposed method modified thi value from the original Howe method. thi, from the original Howe method, had values [0.25 0.5]. From experiments with the proposed postprocessing method, our technique produced the best results with the values of thi equal to [0.20 0.6] moreover we experimentally found that Howe method with sigma equals 0.62 instead of 0.6 in original Howe method gives better results, where these parameters tuned using DIBCO 2016 dataset [11].

B. Stage2: Post-Processing Step
Howe's binarization method fails in some cases to detect the edges of the text if the image degrades with too much bleeding [16]. The purpose of the post-processing stage is to identify the needed pixels around the text stroke edges, to include it with the binarized image, to reduce any loss of pixels around the text edges, and to refine the pixel position around the edges, based on their grayscale image from the original image which must be binarized This step follows the methods proposed by B. Su, et al. [17]. The modification in the proposed binarization method is to involve the binary image calculated in the previous step, which modified Howe's binarized image for E, rather than the image with the high contrast used in the original method. The grayscale image must be included for post-processing. Thus, the input grayscale image is used for I in (1). The post-processing generates hollow characters; hence, the OR operation is applied between the resultant binary image from the postprocessing step and the first binary image from Howe's modified method in Step (A).
Where Estd in equation two represents the image's intensity standard deviation, Emean in equation three denotes the mean, which involves the modification of Howe's binary image and the grayscale image within the region window. Also, we must note that the size of the window used is 3 x 3. The symbol I denote the input greyscale image and (x, y) represents the location of the pixel for the image under study. E denotes the binary result image from Step (A) above. The number of ones in the image is within the local neighboring window represented by Ne. Therefore, if Ne is higher than Nmin and I (i, j) is less than Emean +E (std)/2, R (i, j) is set to 1; or, R (i, j) is set to 0. Based on the experiment, the size of the dimension of the window is set to 3x3, as was done in the reference paper, and the minimum amount of foreground image pixels Nmin is set to 4 within the region window [16].

B. Setup of the Experiment
The proposed method was evaluated using all versions of the DIBCO dataset from 2009 to 2017, with all 103 images. The proposed method was also assessed using the PHIBC 2012 [13], which includes images written in Persian. The images from the above datasets include different kinds of degradations, such as smearing, bleeding through contrast variation, uneven illumination, faint ink, and thin pen strokes. Those degradations make the binarization process challenging. The datasets include handwritten and printed documents.
Furthermore, the proposed method compares the first three winners in each version of the DIBCO dataset. In addition, the results are compared with some standard binarization techniques, such as Otsu's [24], Savoula's [8], and Howe's [14] binarization methods. The measures used for evaluation were revamped from DIBCO's report [5] The phenomenon is represented by [5], whereby it applies to all the PPs (Pseudo Precision) and RPs (Pseudo-Recall). RPs together with PPs utilize distance weights concerning the issue characters of the Ground Truth (GT) contour. Regarding PRs, GT foreground weights are normalized based on the width of the local stroke. The weights are well-defined between the region [0,1]. The weights for PPs are fixed in a region that allows it to expand to the GT background, the width of the closest GT stroke part.
The term PSNR is used to indicate the distance or closeness of one image to another. It should be noted that the value of PSNR is directly proportional to the similarity of the image. In other words, when the value of PSNR is high, the similarity of the images increases and vice versa. The distance reciprocal metric (DRD) measures the possible visual distortions that are visible in binary documented images. It uses the (9) to measure the distortion.
The NUBN represents the total number of the 8x8 nonuniform blocks present in the GT image. The DRD k represents the deformity of the flipped k th pixel. Its weighted sum is equal to the ground truth image of 5x5 block, but it differs from the flipped k th pixel at (x, y) as shown in (10).

III. RESULTS AND DISCUSSION
The experimental results are presented in Table 1. The proposed method is compared with some well-known binarization methods in the 2009-2016 competition datasets [5] and [18][19][20][21][22][23]. Table 2 illustrates the evaluation results on the most recent DIBCO 2017 datasets. A comparison is drawn between the top three ranked methods from the competition and some well-known methods. Table 3 presents the evaluation results on the PHIBC 2012 datasets [13] which include 15 images written in Persian and Arabic. These images are riddled with several types of degradation, including smear, bleed-through, uneven illumination, and other forms of degradation. Table 1 also summarizes the binarization results from the DIBCO 2009 to H-DIBCO 2016 images for the first three winners from each dataset. Also, Otsu's [24] and Sauvola's binarization methods [8] are compared with the proposed method. As depicted in Table I, the top scores are labeled with (*) so that they can be noticed easily. It was observed that of the 6 datasets, the proposed method attains higher scores in terms of F-Measure, Pseudo-FM, DRD, and PSNR compared with other approaches. However, there are also some cases where the compared methods overtake the proposed method by a small rate. However, regarding all the results from the datasets used in the experiments, the proposed method was found to be the most consistent and stable technique, with very high F-Measure, high Pseudo-FM and a slightly high PSNR, and low DRD.  Table II depicts the FM, P-FM, PSNR, and DRD values of the first three winners in DIBCO 2017 dataset, Otsu's, Sauvola's, Howe's methods, and the proposed method. In comparison to other methods, the proposed method produced the highest results in terms of FM and PSNR. However, the DRD value of the proposed method is relatively low compared to Sauvola's and Otsu's method. By examining the two tables, we are able to see that the results of the proposed method for FM is higher than 87.4 % for all images, except image number 13, where the FM result was 58.92 % for that image, which shown in Figure 3, is very low compared with all 19 document images in the dataset. Since the image contains several types of degradations at the same time, in addition to blurring, the proposed method fails to get high results like other images. Hence, this image result affects the average FM value of machine-printed document results which is 90.02 %, whereas it is 92.46 % for handwritten document images.     [5], [11], and [19]- [23]. Figure 4 presents the average F-Measure to the left and PSNR to the right for Otsu and Sauvola's methods compared with the proposed method for all images from the 2009-2017 DIBCO datasets.  Furthermore, as illustrated in Table VI, which shows the result of PHIBC 2012 dataset [13], which evaluate the performance of the binarization methods when applied on Iranian historical degraded documents which are written in Arabic letters, we observed that the proposed result had the best result in terms of F-Measure and P-FM. Additionally, Figure 5 highlights the document binarization results for sample test images from the DIBCO 2017 dataset and compares the results obtained from the algorithm of the competition winner with some of the well-known methods Figure 5 shows a printed degraded document image from the DIBCO 2017 dataset. In Otsu's binarization result, the image is bleeding and the text is difficult-to-read.
Similarly, Sauvola's method fails because the input image is very low in contrast. Both Niblack and Nick methods fail to show a good binary result image. As for the Howe method, many pixels are lost around text stroke edges. The resultant image from the winner of the DIBCO 2017 dataset also has many faded texts as shown in Figure 5. Nevertheless, the proposed method preserves most of the text strokes, and it is the closest image to the ground truth image.

IV. CONCLUSION
This paper proposes an improved document images binarization method which is effective for common types of image degradation, including uneven illumination, contrast variation, ink-bleed, smears, faded ink or faint characters, blurring, and thin or weak text. The proposed technique is modest and robust. Besides the fact that it does not require any parameter tuning, this proposed method consists of two steps. The first step is by using an existing binarization method and by modifying some parameters. The second step is by adding an efficient post-processing method that refines the pixels around the edges and enhances the performance of the binarization method. Based on the experimental results, these two steps have provided a high accuracy when applied to highly degraded historical documents. The results from experiments also indicate that the proposed methods outperform several well-known established binarization methods in terms of the F-Measure, P-FM, PSNR, and DRD.
ACKNOWLEDGMENT University Grants PP-FTSM-2019 funded this research