Next Article in Journal
Design of Secure Protocol for Cloud-Assisted Electronic Health Record System Using Blockchain
Next Article in Special Issue
An Algorithm Based on Text Position Correction and Encoder-Decoder Network for Text Recognition in the Scene Image of Visual Sensors
Previous Article in Journal
Spectral Analysis of Electricity Demand Using Hilbert–Huang Transform
Previous Article in Special Issue
A New Filtering System for Using a Consumer Depth Camera at Close Range
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Robust Combined Binarization Method of Non-Uniformly Illuminated Document Images for Alphanumerical Character Recognition

Faculty of Electrical Engineering, West Pomeranian University of Technology in Szczecin, 70-313 Szczecin, Poland
*
Author to whom correspondence should be addressed.
Sensors 2020, 20(10), 2914; https://doi.org/10.3390/s20102914
Submission received: 30 March 2020 / Revised: 13 May 2020 / Accepted: 19 May 2020 / Published: 21 May 2020
(This article belongs to the Special Issue Document-Image Related Visual Sensors and Machine Learning Techniques)

Abstract

:
Image binarization is one of the key operations decreasing the amount of information used in further analysis of image data, significantly influencing the final results. Although in some applications, where well illuminated images may be easily captured, ensuring a high contrast, even a simple global thresholding may be sufficient, there are some more challenging solutions, e.g., based on the analysis of natural images or assuming the presence of some quality degradations, such as in historical document images. Considering the variety of image binarization methods, as well as their different applications and types of images, one cannot expect a single universal thresholding method that would be the best solution for all images. Nevertheless, since one of the most common operations preceded by the binarization is the Optical Character Recognition (OCR), which may also be applied for non-uniformly illuminated images captured by camera sensors mounted in mobile phones, the development of even better binarization methods in view of the maximization of the OCR accuracy is still expected. Therefore, in this paper, the idea of the use of robust combined measures is presented, making it possible to bring together the advantages of various methods, including some recently proposed approaches based on entropy filtering and a multi-layered stack of regions. The experimental results, obtained for a dataset of 176 non-uniformly illuminated document images, referred to as the WEZUT OCR Dataset, confirm the validity and usefulness of the proposed approach, leading to a significant increase of the recognition accuracy.

1. Introduction

The increasing interest in machine and computer vision methods, recently observed in many areas of industry, is partially caused by the growing availability of relatively inexpensive high quality cameras and the rapid growth of the computational power of affordable devices for everyday use, such as mobile phones, tablets, or notebooks. Their popularity makes it possible to apply some image processing algorithms in many new areas related to automation, robotics, intelligent transportation systems, non-destructive testing and diagnostics, biomedical image analysis, and even agriculture. Some methods, previously applied, e.g., for visual navigation in mobile robotics, may be successfully adopted for new areas, such as automotive solutions, e.g., Advanced Driver-Assistance Systems (ADAS). Nevertheless, such extensions of previously developed methods are not always straightforward, since the analysis of natural images may be much more challenging in comparison to those acquired in fully controlled lighting conditions.
One of the dynamically growing areas of the applications of video technologies based on the use of camera sensors is related to the utilization of Optical Character Recognition (OCR) systems. Some of them include: document image analysis, recognition of the QR codes from natural images [1,2], as well as automatic scanning and digitization of books [3], where additional infrared cameras may also be applied, e.g., supporting the straightening process for the scanned pages. Considering the wide application possibilities of binary image analysis for shape recognition, also in embedded systems with limited computational power and a relatively small amount of memory, a natural direction seems to be their utilization in mobile devices. Since modern smartphones are usually equipped with multi-core processors, some parallel image processing methods may be of great interest as well.
As images acquired by vision sensors in cameras are usually full color photographs, which may be easily converted into grayscale images (if they are not acquired by monochrome sensors directly), the next relevant pre-processing step is their conversion into binary images, significantly decreasing the amount of data used in further shape analysis and character recognition. Nevertheless, for the images captured in uncontrolled lighting conditions, the presence of shadows, local light reflections, illumination gradients, and other background distortions may lead to an irreversible loss of information during the image thresholding, causing many errors in character recognition. Hence, an appropriate binarization of such non-uniformly illuminated images is still a challenging task, similar to degraded historical document images containing many specific distortions.
To face this challenge, many various algorithms have been proposed during recent years, i.e., presented at the Document Image Binarization Competitions (DIBCO) organized during the two most relevant conferences in this field: the International Conference on Document Analysis and Recognition (ICDAR) [4] and the International Conference on Frontiers in Handwriting Recognition (ICFHR) [5]. All competitions have been held with the use of dedicated DIBCO datasets (available at: https://vc.ee.duth.gr/dibco2019/) containing degraded handwritten and machine-printed historical document images together with their binary “ground-truth” (GT) equivalents used for verification of the obtained binarization results.
Since there is no single binarization method that would be perfect for all applications for document images, some initial attempts at the combination of widely known approaches have been made [6], although verified for a relatively small number of test images from earlier DIBCO datasets. Another interesting recent idea is the development of some methods, which should be balanced between the processing time and obtained accuracy, presented during the ICDAR 2019 Time-Quality Document Binarization Competition [7]. Some approaches presented during this competition were also based on the combination of multiple methods, e.g., based on supervised machine learning, including texture features, with the use of the XGBoost classifier and additional morphological post-processing, as well as, e.g., a combination of the Niblack [8] and Wolf [9] methods. Nonetheless, such approaches typically do not focus on document images and OCR applications, considering image binarization as a more general task.
Some attempts at the combination of various methods, also using quite sophisticated approaches, have also been made for the images captured by portable cameras [10,11,12]. Some of the algorithms have been implemented in PhotoDoc [13], a software toolbox designed to process document images acquired with portable digital cameras integrated with the Tesseract OCR engine. A more comprehensive overview of the analysis methods of text documents acquired by cameras may be found in the survey paper [14].
Nevertheless, in view of potential parallelization of processing, an appropriate combination of some recently proposed binarization methods, also with some previously known algorithms, may lead to relatively fast and accurate results in terms of the OCR accuracy.
Although the most common approaches to the assessment of image binarization are based on the comparison of individual pixels [15,16], it should be noted that not all improperly classified pixels have the same influence on the final recognition results. Obviously, incorrectly classified background pixels located in the neighborhood of characters may be more troublesome than single isolated points in the background. Regardless of the presence of some pixel-based measures, such as, e.g., the pseudo-F-measure or Distance Reciprocal Distortion (DRD) [17], considering the distance of individual pixels from character strokes, their direct application would require not only the presence of the GT images, but also their precise matching with acquired photos. Hence, considering the final results of the character recognition, the assessment of thresholding methods considered in the paper is conducted by the calculation of the number of correctly and incorrectly recognized alphanumerical characters instead of single pixels.
One of the main goals of the conducted experiments is the verification of possible combinations of the recently proposed methods [18,19,20] with some other algorithms, without a priori training, therefore excluding some recently proposed deep learning approaches due to their memory and hardware requirements. To minimize the direct impact of camera parameters and properties on the characteristics of the obtained image and further processing steps, a Digital Single Lens Reflex (DSLR) camera Nikon N70 is used to acquire the images. The main contributions of the paper are the proposed idea of the combination of some recently proposed image binarization methods, particularly utilizing image entropy filtering and multi-layered stack of regions, based on pixel voting, with additional tuning of some parameters of the selected algorithms, as well as verification for the developed image dataset, containing 176 non-uniformly illuminated document images.
The rest of the paper contains an overview of the most popular image thresholding algorithms, including recently proposed ideas of image pre-processing with entropy filtering [18], background modeling with image resampling [19], and the use of a multi-layered stack of image regions [20], as well as the discussion of the proposed approach, followed by the presentation and analysis of the experimental results and final conclusions.

2. Overview of Image Binarization Algorithms

Image binarization has a relatively long history due to a constant need to decrease the amount of image data, caused earlier by the limitations of displays, the availability of memory, as well as processing speed. The simplest methods of global binarization of grayscale images are based on the choice of a single threshold for all pixels of the image. Instead of the simplest choice of 50% of the dynamic range, the Balanced Histogram Thresholding (BHT) method may be applied [21], where the threshold should be chosen in the lowest part of the histogram’s valley. However, this fast and simple method, initially developed for biomedical images, should be applied only for images with bi-modal histograms due to some problems with big tails in the histogram, being useless for unevenly illuminated document images. Kittler and Illingworth proposed an algorithm [22] minimizing the Bayes misclassification error expressed as the solution of the quadratic equation, assuming the normal distribution of the brightness levels for objects and background, further improved by Cho et al. [23] using the model distributions with corrected variance values.
Another global method, regarded as the most popular one for images with bi-model histograms, was proposed by Nobuyuki Otsu [24]. Its idea utilizes the maximization of inter-class variance equivalent to the minimization of the sum of two intra-class variances calculated for two groups of pixels, representing the foreground and background, respectively. A similar approach, although replacing the variance with the histogram’s entropy, was proposed by Kapur et al. [25]. Since both methods work properly only for uniformly illuminated images, their modifications utilizing the division of images into regions and combining the obtained local and global thresholds were also considered a few years ago [26].
A more formal analysis of the similarities and differences between some global thresholding methods for bi-modal histogram images, including the iterative selection method proposed by Ridler and Calvard [27], may be found in the paper [28]. Nevertheless, these methods do not perform well for natural images, where the bi-modality of the histogram cannot be ensured. A similar problem may be found applying some other methods developed for binarization of images with unimodal histograms [29,30], which are not typical for document images as well.
An obvious solution of these problems is the use of adaptive binarization methods, where the threshold values are determined locally for each pixel, depending on the local parameters, such as average brightness or local variance. In some cases, semi-adaptive versions of global thresholding may be applied as the region based approaches, where different thresholds may be set for various image fragments. One of exemplary extensions of the classical Otsu’s method, referred to as AdOtsu, was proposed by Moghaddam and Cheriet [31], who postulated the use of the additional detection of line heights and stroke widths, as well as the multi-scale background estimation and removal.
The region based thresholding using Otsu’s method with Support Vector Machines (SVM) was proposed by Chou et al. [32], whereas another application of SVMs with local features was recently analyzed by Xiong et al. [33]. Some relatively fast region based approaches were proposed recently as well [34,35], leading finally to the idea of the multi-layered stack of regions [20].
Apart from the above-mentioned method proposed by Kapur et al. [25], some entropy based binarization methods may be distinguished as well. Some of them, although less popular than histogram based algorithms, utilize the histogram’s entropy [36,37], whereas some other approaches are based on the Tsallis entropy [38] or Shannon entropy with the classification of pixels into text, near-text, and non-text regions [39]. Some earlier algorithms, e.g., developed by Fan et al. [40], were based on the maximization of the 2D temporal entropy or minimization of the two-dimensional entropy [41]. Some more sophisticated ideas employ genetic methods [42] and cross-entropy for color image thresholding, as presented in a recent paper [43]. Another recent idea is the application of image entropy filtering for pre-processing of unevenly illuminated document images [18], which may be applied in conjunction with some other thresholding methods, leading to significant improvement, particularly for some simple methods, such as, e.g., Meanthresh, which is based just on the calculation of the mean intensity of the local neighborhood and setting it as the local threshold value.
Another simple local thresholding method using the midgray value, defined as the average of the minimum and the maximum intensity within the local window, was proposed by Bernsen [44]. Although this method may be considered as relatively old, its modification for blurred and unevenly lit QR codes has been proposed recently [45], based on its combination with the global Otsu’s method. A popular adaptive binarization method, available in the MATLAB environment as the adaptthresh function, was proposed by Bradley and Roth [46], who applied the integral image for the calculation of the local mean intensity of the neighborhood, as well as the local median and Gaussian weighted mean in its modified versions. A description of some other applications of integral images for adaptive thresholding may be found in the paper [47].
One of the most widely known extensions of the above simple methods, such as Meanthresh or Bernsen’s thresholding, was proposed by Niblack [8], who used the mean local intensity lowered by the local standard deviation multiplied by the constant parameter k   =   0.2 as the local threshold. The default size of the local sliding window was 3   ×   3 pixels, and therefore, the method was very sensitive to local distortions. A simple, but efficient modification of this algorithm, known as the NICK method, was proposed by Khurshid et al. [48] for brighter images with the additional correction by the average local intensity and the changed parameter k   =   0.1 . One of the most popular extensions of this approach was proposed by Sauvola and Pietikäinen [49], where the additional use of the dynamic range of the standard deviation was applied. The additional modifications of this approach were proposed by Wolf and Jolion [9], who used the normalization of contrast and average intensity, as well as by Feng and Tan [50], using the second larger local window for the computation of the local dynamic range of the standard deviation. The latter approach was relatively slow because of the application of additional median filtration with bilinear interpolation. A multi-scale extension of Sauvola’s method was proposed by Lazzara and Géraud [51], whereas the additional pre-processing with the use of the Wiener filter and background estimation was used by Gatos et al. [52], together with noise removal and additional post-processing operations.
Another algorithm, known as the Singh method [53], utilizes integral images for local mean and local mean deviation calculations to increase the speed of computations. One of the most recent methods based on Sauvola’s algorithm, referred to as ISauvola, was proposed in the paper [54], where the local image contrast was applied to adjust the method’s parameters automatically. Another modification of Sauvola’s method applied to QR codes with an adaptive window size based on lighting conditions was recently presented by He et al. [55], who used an adaptive window size partially inspired by Bernsen’s approach. Another recently proposed algorithm, inspired by Sauvola’s method, named WANafter the first name of one of its authors [56], focuses on low contrast document images, where the local mean values are replaced by so-called “maximum mean”, being in fact the average of the mean and maximum intensity values. Nevertheless, this approach was verified only for the H-DIBCO 2016 dataset, containing 14 handwritten images; hence, it might be less suitable for machine-printed document images and OCR applications.
Some other methods inspired by Niblack’s algorithm were also proposed by Kulyukin et al. [57] and by Samorodova and Samorodov [58]. The application of dynamic windows for Niblack’s and Sauvola’s methods was presented by Bataineh et al. [59], whereas Mysore et al. [60] developed a method useful for binarization of color document images based on the multi-scale mean-shift algorithm. A more detailed overview of adaptive binarization methods based on Niblack’s approach, as well as some others, may be found in some recent survey papers [61,62,63,64,65,66].
Some researchers developed many less popular binarization methods, which were usually relatively slow, and their universality was limited due to some assumptions related to necessary additional operations. For example, an algorithm described by Su et al. [67] utilized a combination of Canny edge filtering and an adaptive image contrast map, whereas Bag and Bhowmick [68] presented a multi-scale adaptive–interpolative method, dedicated for documents with faint characters. Another method based on Canny edge detection was presented by Howe [69], who combined it with the Laplacian operator and graph cut method, leading to an energy minimization approach. An interesting method based on background suppression, although appropriate mainly for uniformly lit document images, was developed by Lu et al. [70], whereas Erol et al. [71] used a generalized approach to background estimation and text localization based on morphological operations for documents acquired by camera sensors from mobile phones. The mathematical morphology was also used in the method presented by Okamoto et al. [72].
An algorithm utilizing median filtering for background estimation was recently proposed by Khitas et al. [73], whereas Otsu’s thresholding preceded by the use of curvelet transform was described by Wen et al. [74]. Alternatively, Mitianoudis and Papamarkos [75] presented the idea of using local features with Gaussian mixtures. The use of the non-local means method before the adaptive thresholding was examined by Chen and Wang [76], and the method known as Fast Algorithm for document Image Restoration (FAIR) utilizing rough text localization and likelihood estimation was presented by Lelore and Bouchara [77], who used the obtained super-resolution likelihood image as the input for a simple thresholding. The gradient based method for binarization of medical and document images proposed by Yazid and Arof [78] utilized edge detection with the Prewitt filter for the separation of weak and strong boundary points. However, the presented results were obtained using only the document images from the H-DIBCO 2012 dataset.
Some other recent ideas are the use of variational models [79], fast background estimation based on image resampling [19], as well as the application of independent thresholding of the RGB channels of historical document images [80] with the use of Otsu’s method. Nevertheless, the latter method requires the additional training of the decision making block with the use of synthetic images. Due to recent advances of deep learning, some attempts were also made [81,82]; although, such approaches needed relatively large training image datasets, and therefore, their application may be troublesome, especially for mobile devices working in uncontrolled lighting conditions. Another issue is related to their high memory requirements, as well as the necessity of using some modern GPUs, which may be troublesome, e.g., in embedded systems, as well as in some industrial applications.
Recently, some applications of the fuzzy approach to image thresholding were also investigated by Bogatzis and Papadopoulos [83,84], as well as the use of Structural Symmetric Pixels (SSP) proposed by Jia et al. [85,86] (the original implementation of the method available at: https://github.com/FuxiJia/DocumentBinarizationSSP). The idea of this method is based on the assumption that the local threshold should be estimated using only the pixels around strokes whose gradient magnitudes are relatively big and directions are opposite, instead of the whole region.

3. Proposed Method

Apart from the approaches presented during the recent ICDAR [87], some initial attempts at the use of multiple binarization methods were made by Chaki et al. [6], as well as Yoon et al. [88], although the presented results were obtained for a limited number of test images taken from earlier DIBCO datasets or captured images of vehicles’ license plates. The idea of the combination of various image binarization based on pixel voting presented in this paper was verified using the 176 non-uniformly illuminated document images containing various kinds of illumination gradients, as well as five common font families, also with additional style modifications (bold, italics, and both of them) and utilized the combination of recently proposed methods with some adaptive binarization algorithms proposed earlier, based on different assumptions. The verification of the obtained results was done with the use of three various OCR engines, calculating the F-measure and OCR accuracy for characters, as well as the Levenshtein distance between two strings, which was defined as the number of character operations needed to convert one string into another. All the images were the photographs of the printed documents containing the well-known Lorem ipsum text acquired in various lighting conditions.
Assuming the parallel execution of three, five, or seven various image binarization algorithms, some differences in the resulting images may be observed, particularly in background areas. Nevertheless, the most significant fragments of document images were located near the characters subjected to further text recognition. The main idea of the proposed method of the voting of pixels being the result of the applications of individual algorithms for the same image was in fact equivalent to the choice of the median value of the obtained binary results (ones and zeros) for the same pixel using three, five, or seven applied methods. Obviously, one might not expect satisfactory results for the use of three similar methods, such as, e.g., Niblack’s, Sauvola’s, and Wolf’s algorithms, but for the approaches based on various assumptions, some of the results may differ significantly, being complementary to each other.
The preliminary choice of binarization methods for combination was made analyzing the performance of individual measures for Bickley Diary, Nabuco (dataset available at: https://dib.cin.ufpe.br/), and individual DIBCO datasets, using the typically used measures based on the comparison of pixels (accuracy, F-measure, DRD, MPM, etc.) reported in some earlier papers. Since these datasets, typically used for general-purpose document image binarization evaluation, do not contain ground-truth text data, the OCR accuracy results calculated for our dataset were additionally used for this purpose. Having found the most appropriate combination of three methods, the two additional methods were added in the second stage only to the best combinations of three methods, and finally, the next two methods were added only to the best such obtained combinations of five methods. The choice of the most appropriate candidate algorithms for the combination was made essentially among the algorithms, which individually led to relatively high OCR accuracy.
Considering this, as well as the complexity of many candidate methods, the combination of two recently proposed algorithms, namely image entropy filtering followed by Otsu’s global thresholding described in the paper [18] and the multi-layered stack of regions using 16 layers [20], with NICK adaptive thresholding [48], was proposed. Each of these methods may be considered as relatively fast, in particular assuming potential parallel processing, and based on different operations, as shown in earlier papers.
The application of the stack of regions [20] was based on the calculation of the thresholds for image fragments, where the image was divided into blocks partially overlapping each other; hence, each pixel belonged to different regions shifted from each other according to the specified layer, and the final threshold was selected as the average of the threshold values obtained for all regions to which the pixel belonged for different layers. The local thresholds for each region were calculated in a simplified form as T = a · m e a n ( X ) b , where m e a n ( X ) is the local average, and the values of the optimized parameters were a   =   0.95 and b = 7 , as presented in the paper [20].
The application of the image entropy filtering based method [18] was conducted in a few main steps. The initial operation was the calculation of the local entropy, which could be made using MATLAB’s entropyfilt function, assuming a 17 × 17 pixel neighborhood (obtained after the optimization experiments), followed by its negation for better readability. The obtained entropy map was normalized and initially thresholded using Otsu’s method to remove the background information partially. Such an obtained image with segmented text regions was considered as the mask for the background subjected to morphological dilation used to fill the gaps containing the individual characters. The minimum appropriate size of the structuring element was dependent on the font size, and for the images in the test dataset, a 20   ×   20 pixel size was sufficient. Such achieved background estimation was subtracted from the original image, and the negative of the result was subjected to contrast increase and final binarization. Since the above steps caused the equalization of image illumination and the increase of its contrast, various thresholding algorithms may be applied in the last step. Nevertheless, the best results of the further OCR in combination with the other methods were obtained for Otsu’s global thresholding applied as the last step of this algorithm.
The algorithm described in the paper [19], used in some of the tested variants, was based on the assumption that a significant decrease of the image size, e.g., using MATLAB’s imresize function, caused the loss of text information, preventing mainly the background information, similar to (usually much slower) low-pass filtering. Hence, the combination of downsampling and upsampling using the same kernel may be applied for a fast background estimation. In this paper, the best results were obtained using the scale factor equal to 8 and bilinear interpolation. Such an obtained image was subtracted from the original, and further steps were similar to those used in the previous method: increase of contrast (using the coefficient 0.4), negation, and the final global thresholding using Otsu’s method as well. Although both methods were based on similar fundamentals, the results of background estimation using the entropy filtering and image resampling differed significantly; hence, both methods could be considered as complementary to each other.
The last of the methods applied in the proposed approach, known as NICK [48], named after the first letter of its authors’ names, was one of the modifications of Niblack’s thresholding, where the local threshold is determined as:
T = m + k · s = m + k · B   ,
where m is the local average value, k   =   0.2 is a fixed parameter, s stands for the local standard deviation, and hence, B is the local variance.
The modifications behind the NICK method lead to the formula:
T = m + k · B + m 2   ,
with the postulated values of the parameter k   =   0.1 for the OCR applications. As stated in the paper [48], the application of this value of k left the characters “crispy and unbroken” for the price of the presence of some noisy pixels. The window size originally proposed in the paper [48] was 19 × 19 pixels; however, the suitable parameters depended on the image size, as well as the font size and may be adjusted for specific documents. Nevertheless, after experimental verification, the optimal choice for the testing dataset used in this paper was a 15 × 15 pixel window with the “original” Niblack’s parameter k   =   0.2 .
Since most of the OCR engines utilized their predefined thresholding methods, which were integrated into the pre-processing procedures, the input images should be binarized prior the use of the OCR software to prevent the impact of their “built-in” thresholding. The well-known commercial ABBYY FineReader uses the adaptive Bradley’s method, whereas the freeware Tesseract engine developed by Google after releasing its source code by HP company [89] employs the global Otsu binarization. In this case, forced prior thresholding replaces the internal default methods of the OCR software.

4. Discussion of the Results

The experimental verification of the proposed combined image binarization method for the OCR purposes should be conducted using a database of unevenly illuminated document images, for which the ground truth text data are known. Unfortunately, currently available image databases, such as the DIBCO [4], Bickley Diary [90], or Nabuco datasets [87], used for the performance analysis of image binarization methods contain usually a handwritten text (in some cases, also machine-printed) subjected to some distortions such as ink fading, the presence of some stains, or some other local distortions.
Hence, a dedicated dataset containing 176 document images photographed by a Nikon N70 DSLR camera with a 70 mm focal length with the well-known Lorem ipsum text consisting of 563 words was developed with five font shapes, also with style modifications, and various types of non-uniform illuminations. Since the most popular font shapes were used, namely Arial, Times New Roman, Calibri, Verdana, and Courier, the obtained document images may be considered as representative for typical OCR applications. Three sample images from the dataset are shown in Figure 1. The whole dataset, referred to as the WEZUT OCR Dataset, has been made publicly available and may be accessed free of charge at http://okarma.zut.edu.pl/index.php?id=dataset&L=1.
For all images, several image binarization methods were applied, as well as their combinations based on the proposed pixel voting for 3, 5, and 7 methods. Such obtained images were treated as input data for three OCR engines: Tesseract (Version 4 with leptonica-1.76.0), MATLAB’s R2018a built-in OCR procedure (also originating from Tesseract), and GNU Ocrad (Version 0.27) based on a feature extraction method (software release available at: https://www.gnu.org/software/ocrad/). Since the availability of some other cloud solutions, usually paid, e.g., provided by Google or Amazon, may be limited in practical applications, we focused on two representative freeware OCR engines and MATLAB’s ocr function, which do not utilize any additional text operations related, e.g., to dictionary or semantic analysis.
Each result of the final text recognition was compared with ground truth data (the original Lorem ipsum text) using three measures: Levenshtein distance, interpreted as the minimum number of text changes (insertions, deletions, or substitutions of individual characters) needed to change a text string into another, as well as the F-measure and accuracy, typically used in classification tasks. The F-measure is defined as the harmonic mean of precision (true positives to all/true and false/positives ratio) and recall (ratio of true positives to the sum of true positives and false negatives), whereas accuracy may be calculated as the ratio of the sum of true positives and true negatives to all samples.
To verify the possibilities of the application of various combinations of different methods, the results of the proposed pixel voting approach were obtained using various methods. Nevertheless, only the best results are presented in the paper and compared with the use of individual thresholding methods. Most of the individual methods were implemented in MATLAB, although some of them partially utilized available codes provided in MATLAB Central File Exchange (Jan Motl) and GitHub (Doxa project by Brandon M. Petty). It is worth noting that the initial idea was the combination of three recently proposed approaches described in the papers [18,19,20]; hence, the first voting (Method No. #37 in Table 1 was used for these three algorithms (similar to the OR and AND operations shown as Methods #35 and #36 in Table 1). Nevertheless, during further experiments, better results were obtained replacing the resampling based method [19] with the NICK algorithm [48]. To illustrate the importance of an appropriate choice of individual methods for the voting procedure, some of the worse results (Methods #39–#41) are presented in Table 1, Table 2 and Table 3 as well. Further experiments with additional application of some other recent methods led to even better results.
A comparison of the results obtained for the whole dataset using Tesseract OCR is presented in Table 1, together with the rank positions for each of the methods. The overall rank was calculated using the rank positions achieved by each method according to three measures. Method #21 was the modification of Method #20 [18] with the use of the Monte Carlo method to speed up the calculations due to the decrease in the number of analyzed pixels. Nevertheless, applying the integral images in the methods referred to as #14–#20, it was possible to achieve even faster calculations. The results obtained for MATLAB’s built-in OCR and GNU Ocrad are presented in Table 2 and Table 3, respectively. A comparison of the processing time, relative to Otsu’s method, is shown in Table 4. The reference time obtained for Otsu’s method using a computer with Core i7-4810MQ processor (four cores/eight threads), 16GB of RAM, and an SSD disk was 1.77 ms.
Analyzing the results provided in Table 1, Table 2 and Table 3, it may be clearly observed that the best results were achieved using the Tesseract OCR, and the results obtained for the two remaining OCR programs should be considered as supplementary. Particularly poor results could be observed for the GNU Ocrad software. Among the various combinations based on voting, most of them achieved much better results than individual binarization methods regardless of the applied OCR engine, proving the advantages of the proposed approach. Nevertheless, considering the best results, it is worth noting that the use of only three methods (referred to as #58 in Table 1) provided the best F-measure and accuracy and the second results in terms of Levenshtein distance being better even in comparison with the voting approach with the use of five or seven individual algorithms. The Levenshtein distance achieved by this proposed method was only slightly worse than the result of pixel voting using seven algorithms (referred to as #61). Considering the worse OCR engines, some other combinations led to better results, especially for GNU Ocrad, where the application of seven methods referred to as #61 was not listed even in the top 10 methods. Therefore, the final aggregated rank positions for all three OCR engines, together with the relative computation time normalized according to Otsu’s thresholding, are presented in Table 4.
Although not all the results of the tested combinations of various methods are reported in Table 1, Table 2, Table 3 and Table 4, it is worth noting that the most successful combinations, leading to the best aggregated rank positions presented in Table 4, contained one of the variants of the multi-layered stack of regions (#20) or the resampling method (#19), as well as an entropy based method (#27). Therefore, the possibilities of the application of these recent approaches in combination with some other algorithms were confirmed. Considering additionally the processing time, a reasonable choice might also be the combination of Methods #22 and #27 with the recent ISauvola algorithm (#34), listed as #53, providing very good results for each of the tested OCR engines in view of Levenshtein distance.
Exemplary results of the binarization of sample documents from the dataset used in experiments are presented in Figure 2, Figure 3 and Figure 4, where significant differences between some methods may be easily noticed, as well as the relatively high quality of binary images obtained using the proposed approach.

5. Concluding Remarks

Binarization of non-uniformly illuminated images acquired by camera sensors, especially mounted in mobile devices, in unknown lighting conditions is still a challenging task. Considering the potential applications of the real-time analysis of binary images captured by vision sensors, not only directly related to OCR applications, but also, e.g., to mobile robotics or recognition of the QR codes from natural images, the proposed approach may be an interesting idea providing a reasonable accuracy for various types of illuminations.
The presented experimental results may be extended during future research also by the analysis of the potential applicability of the proposed methods and their combinations for automatic text recognition systems for even more challenging images, e.g., with metallic plates with embossed serial numbers. Another direction for further research may be the investigation of the potential applications of some fuzzy methods [83,84], which may be useful, e.g., for a combination of an even number of algorithms, as well as the use of different weights for each combined method.

Author Contributions

H.M. worked under the supervision of K.O. H.M. prepared the data and sample document images. H.M. and K.O. designed the concept and methodology and proposed the algorithm. H.M. implemented the method, performed the calculations, and prepared the data visualization. K.O. validated the results and wrote the final version of the paper. All authors read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Acknowledgments

The authors would like to thank the anonymous reviewers for their helpful comments supporting us in improving the current version of the paper and to all researchers who made the codes of their algorithms and the datasets used for their preliminary verification publicly available.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
ADASAdvanced Driver-Assistance System
BHTBalanced Histogram Thresholding
DIBCODocument Image Binarization Competition
DSLRDigital Single Lens Reflex
DRDDistance Reciprocal Distortion
FAIRFast Algorithm for document Image Restoration
GPUGraphics Processing Unit
GTGround Truth
H-DIBCOHandwritten Document Image Binarization Competition
ICDARInternational Conference on Document Analysis and Recognition
ICFHRInternational Conference on Frontiers in Handwriting Recognition
OCROptical Character Recognition
QRQuick Response
SSDSolid-State Drive
SSPStructural Symmetric Pixels
SVMSupport Vector Machines

References

  1. Okarma, K.; Lech, P. Fast statistical image binarization of color images for the recognition of the QR codes. Elektron. Ir Elektrotech. 2015, 21, 58–61. [Google Scholar] [CrossRef]
  2. Chen, R.; Yu, Y.; Xu, X.; Wang, L.; Zhao, H.; Tan, H.Z. Adaptive Binarization of QR Code Images for Fast Automatic Sorting in Warehouse Systems. Sensors 2019, 19, 5466. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  3. Guizzo, E. Superfast Scanner Lets You Digitize Book by Flipping Pages. Available online: https://spectrum.ieee.org/automaton/robotics/robotics-software/book-flipping-scanning (accessed on 20 May 2020).
  4. Pratikakis, I.; Zagoris, K.; Karagiannis, X.; Tsochatzidis, L.; Mondal, T.; Marthot-Santaniello, I. ICDAR 2019 Competition on Document Image Binarization (DIBCO 2019). In Proceedings of the 15th IAPR International Conference on Document Analysis and Recognition (ICDAR), Sydney, Australia, 20–25 September 2019; pp. 1547–1556. [Google Scholar] [CrossRef]
  5. Pratikakis, I.; Zagori, K.; Kaddas, P.; Gatos, B. ICFHR 2018 Competition on Handwritten Document Image Binarization (H-DIBCO 2018). In Proceedings of the 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), Niagala Falls, NY, USA, 5–8 August 2018; pp. 489–493. [Google Scholar] [CrossRef]
  6. Chaki, N.; Shaikh, S.H.; Saeed, K. Exploring Image Binarization Techniques. In Studies in Computational Intelligence; Springer: New Delhi, India, 2014; Volume 560. [Google Scholar] [CrossRef]
  7. Lins, R.D.; Kavallieratou, E.; Smith, E.B.; Bernardino, R.B.; de Jesus, D.M. ICDAR 2019 Time-Quality Binarization Competition. In Proceedings of the 15th IAPR International Conference on Document Analysis and Recognition (ICDAR), Sydney, Australia, 20–25 September 2019; pp. 1539–1546. [Google Scholar] [CrossRef]
  8. Niblack, W. An Introduction to Digital Image Processing; Prentice Hall: Englewood Cliffs, NJ, USA, 1986. [Google Scholar]
  9. Wolf, C.; Jolion, J.M. Extraction and recognition of artificial text in multimedia documents. Form. Pattern Anal. Appl. 2004, 6, 309–326. [Google Scholar] [CrossRef] [Green Version]
  10. Lins, R.; e Silva, G.P.; Gomes e Silva, A.R. Assessing and Improving the Quality of Document Images Acquired with Portable Digital Cameras. In Proceedings of the 9th International Conference on Document Analysis and Recognition (ICDAR), Parana, Brazil, 23–26 September 2007; Volume 2, pp. 569–573. [Google Scholar] [CrossRef]
  11. Alqudah, M.K.; Bin Nasrudin, M.F.; Bataineh, B.; Alqudah, M.; Alkhatatneh, A. Investigation of binarization techniques for unevenly illuminated document images acquired via handheld cameras. In Proceedings of the International Conference on Computer, Communications, and Control Technology (I4CT), Kuching, Malaysia, 21–23 April 2015; pp. 524–529. [Google Scholar] [CrossRef]
  12. Lins, R.D.; Bernardino, R.B.; de Jesus, D.M.; Oliveira, J.M. Binarizing Document Images Acquired with Portable Cameras. In Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Kyoto, Japan, 9–15 November 2017; pp. 45–50. [Google Scholar] [CrossRef]
  13. Pereira, G.; Lins, R.D. PhotoDoc: A Toolbox for Processing Document Images Acquired Using Portable Digital Cameras. In Proceedings of the 2nd International Workshop on Camera-Based Document Analysis and Recognition (CBDAR), Curitiba, Brazil, 22 September 2007; pp. 107–115. [Google Scholar]
  14. Liang, J.; Doermann, D.; Li, H. Camera-based analysis of text and documents: A survey. Int. J. Doc. Anal. Recognit. 2005, 7, 84–104. [Google Scholar] [CrossRef]
  15. Ntirogiannis, K.; Gatos, B.; Pratikakis, I. Performance evaluation methodology for historical document image binarization. IEEE Trans. Image Process. 2013, 22, 595–609. [Google Scholar] [CrossRef]
  16. Sokolova, M.; Lapalme, G. A systematic analysis of performance measures for classification tasks. Inf. Process. Manag. 2009, 45, 427–437. [Google Scholar] [CrossRef]
  17. Lu, H.; Kot, A.; Shi, Y. Distance-reciprocal distortion measure for binary document images. IEEE Signal Process. Lett. 2004, 11, 228–231. [Google Scholar] [CrossRef] [Green Version]
  18. Michalak, H.; Okarma, K. Improvement of Image Binarization Methods Using Image Preprocessing with Local Entropy Filtering for Alphanumerical Character Recognition Purposes. Entropy 2019, 11, 286. [Google Scholar] [CrossRef] [Green Version]
  19. Michalak, H.; Okarma, K. Fast Binarization of Unevenly Illuminated Document Images Based on Background Estimation for Optical Character Recognition Purposes. J. Univ. Comput. Sci. 2019, 25, 627–646. [Google Scholar]
  20. Michalak, H.; Okarma, K. Adaptive Image Binarization Based on Multi-layered Stack of Regions. In Computer Analysis of Images and Patterns; Vento, M., Percannella, G., Eds.; Springer International Publishing: Cham, Switzerland, 2019; Volume 11679, pp. 281–293. [Google Scholar] [CrossRef]
  21. dos Anjos, A.; Shahbazkia, H.R. Bi-Level Image Thresholding-A Fast Method. In Proceedings of the 1st International Conference on Biomedical Electronics and Devices (BIOSIGNALS), Funchal, Madeira, Portugal, 28–31 January 2008; pp. 70–76. [Google Scholar]
  22. Kittler, J.; Illingworth, J. Minimum error thresholding. Pattern Recognit. 1986, 19, 41–47. [Google Scholar] [CrossRef]
  23. Cho, S.; Haralick, R.; Yi, S. Improvement of Kittler and Illingworth’s minimum error thresholding. Pattern Recognit. 1989, 22, 609–617. [Google Scholar] [CrossRef]
  24. Otsu, N. A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 1979, 9, 62–66. [Google Scholar] [CrossRef] [Green Version]
  25. Kapur, J.; Sahoo, P.; Wong, A. A new method for gray-level picture thresholding using the entropy of the histogram. Comput. Vis. Gr. Image Process. 1985, 29, 273–285. [Google Scholar] [CrossRef]
  26. Lech, P.; Okarma, K.; Wojnar, D. Binarization of document images using the modified local-global Otsu and Kapur algorithms. Przegląd Elektrotech. 2015, 91, 71–74. [Google Scholar] [CrossRef] [Green Version]
  27. Ridler, T.; Calvard, S. Picture Thresholding Using an Iterative Selection Method. IEEE Trans. Syst. Man Cybern. 1978, 8, 630–632. [Google Scholar] [CrossRef]
  28. Xue, J.H.; Zhang, Y.J. Ridler and Calvard’s, Kittler and Illingworth’s and Otsu’s methods for image thresholding. Pattern Recognit. Lett. 2012, 33, 793–797. [Google Scholar] [CrossRef]
  29. Rosin, P.L. Unimodal thresholding. Pattern Recognit. 2001, 34, 2083–2096. [Google Scholar] [CrossRef]
  30. Coudray, N.; Buessler, J.L.; Urban, J.P. Robust threshold estimation for images with unimodal histograms. Pattern Recognit. Lett. 2010, 31, 1010–1019. [Google Scholar] [CrossRef] [Green Version]
  31. Moghaddam, R.F.; Cheriet, M. AdOtsu: An adaptive and parameterless generalization of Otsu’s method for document image binarization. Pattern Recognit. 2012, 45, 2419–2431. [Google Scholar] [CrossRef]
  32. Chou, C.H.; Lin, W.H.; Chang, F. A binarization method with learning-built rules for document images produced by cameras. Pattern Recognit. 2010, 43, 1518–1530. [Google Scholar] [CrossRef] [Green Version]
  33. Xiong, W.; Xu, J.; Xiong, Z.; Wang, J.; Liu, M. Degraded historical document image binarization using local features and support vector machine (SVM). Optik 2018, 164, 218–223. [Google Scholar] [CrossRef]
  34. Michalak, H.; Okarma, K. Region based adaptive binarization for optical character recognition purposes. In Proceedings of the International Interdisciplinary PhD Workshop (IIPhDW), Świnoujście, Poland, 9–12 May 2018; pp. 361–366. [Google Scholar] [CrossRef]
  35. Michalak, H.; Okarma, K. Fast adaptive image binarization using the region based approach. In Artificial Intelligence and Algorithms in Intelligent Systems; Silhavy, R., Ed.; Springer: New York, NY, USA, 2019; Volume 764, pp. 79–90. [Google Scholar] [CrossRef]
  36. Pun, T. A new method for grey-level picture thresholding using the entropy of the histogram. Signal Process. 1980, 2, 223–237. [Google Scholar] [CrossRef]
  37. Pun, T. Entropic thresholding, a new approach. Comput. Gr. Image Process. 1981, 16, 210–239. [Google Scholar] [CrossRef] [Green Version]
  38. Tian, X.; Hou, X. A Tsallis-entropy image thresholding method based on two-dimensional histogram obique segmentation. In Proceedings of the 2009 WASE International Conference on Information Engineering, Taiyuan, Chanxi, China, 10–11 July 2009; Volume 1, pp. 164–168. [Google Scholar] [CrossRef]
  39. Le, T.H.N.; Bui, T.D.; Suen, C.Y. Ternary entropy-based binarization of degraded document images using morphological operators. In Proceedings of the 11th IAPR International Conference on Document Analysis and Recognition (ICDAR), Beijing, China, 18–21 September 2011; pp. 114–118. [Google Scholar] [CrossRef]
  40. Fan, J.; Wang, R.; Zhang, L.; Xing, D.; Gan, F. Image sequence segmentation based on 2D temporal entropic thresholding. Pattern Recognit. Lett. 1996, 17, 1101–1107. [Google Scholar] [CrossRef]
  41. Abutaleb, A.S. Automatic thresholding of gray-level pictures using two-dimensional entropy. Comput. Vis. Gr. Image Process. 1989, 47, 22–32. [Google Scholar] [CrossRef]
  42. Tang, K.; Yuan, X.; Sun, T.; Yang, J.; Gao, S. An improved scheme for minimum cross entropy threshold selection based on genetic algorithm. Knowl.-Based Syst. 2011, 24, 1131–1138. [Google Scholar] [CrossRef]
  43. Li, J.; Tang, W.; Wang, J.; Zhang, X. A multilevel color image thresholding scheme based on minimum cross entropy and alternating direction method of multipliers. Optik 2019, 183, 30–37. [Google Scholar] [CrossRef]
  44. Bernsen, J. Dynamic thresholding of grey-level images. In Proceedings of the 8th International Conference on Pattern Recognition (ICPR), Paris, France, 27–31 October 1986; pp. 1251–1255. [Google Scholar]
  45. Yang, L.; Feng, Q. The Improvement of Bernsen Binarization Algorithm for QR Code Image. In Proceedings of the 5th International Conference on Cloud Computing and Intelligence Systems (CCIS), Nanjing, China, 23–25 November 2018; pp. 931–934. [Google Scholar] [CrossRef]
  46. Bradley, D.; Roth, G. Adaptive thresholding using the integral image. J. Gr. Tools 2007, 12, 13–21. [Google Scholar] [CrossRef]
  47. Shafait, F.; Keysers, D.; Breuel, T.M. Efficient implementation of local adaptive thresholding techniques using integral images. In Proceedings of the Document Recognition and Retrieval XV, San Jose, CA, USA, 27–31 January 2008; Volume 6815. [Google Scholar] [CrossRef]
  48. Khurshid, K.; Siddiqi, I.; Faure, C.; Vincent, N. Comparison of Niblack inspired binarization methods for ancient documents. In Document Recognition and Retrieval XVI; SPIE: Bellingham, WA, USA, 2009; Volume 7247, pp. 7247–7249. [Google Scholar] [CrossRef]
  49. Sauvola, J.; Pietikäinen, M. Adaptive document image binarization. Pattern Recognit. 2000, 33, 225–236. [Google Scholar] [CrossRef] [Green Version]
  50. Feng, M.L.; Tan, Y.P. Adaptive binarization method for document image analysis. In Proceedings of the 2004 IEEE International Conference on Multimedia and Expo (ICME), Taipei, Taiwan, 27–30 June 2004; Volume 1, pp. 339–342. [Google Scholar] [CrossRef]
  51. Lazzara, G.; Géraud, T. Efficient multiscale Sauvola’s binarization. Int. J. Doc. Anal. Recognit. 2014, 17, 105–123. [Google Scholar] [CrossRef] [Green Version]
  52. Gatos, B.; Pratikakis, I.; Perantonis, S. Adaptive degraded document image binarization. Pattern Recognit. 2006, 39, 317–327. [Google Scholar] [CrossRef]
  53. Singh, T.R.; Roy, S.; Singh, O.I.; Sinam, T.; Singh, K.M. A New Local Adaptive Thresholding Technique in Binarization. IJCSI Int. J. Comput. Sci. Issues 2011, 8, 271–277. [Google Scholar]
  54. Hadjadj, Z.; Meziane, A.; Cherfa, Y.; Cheriet, M.; Setitra, I. ISauvola: Improved Sauvola’s Algorithm for Document Image Binarization. In Image Analysis and Recognition; Campilho, A., Karray, F., Eds.; Springer International Publishing: Cham, Switzerland, 2016; Volume 9730, pp. 737–745. [Google Scholar] [CrossRef]
  55. He, Y.; Yang, Y. An Improved Sauvola Approach on QR Code Image Binarization. In Proceedings of the 11th International Conference on Advanced Infocomm Technology (ICAIT), Jinan, China, 18–20 October 2019; pp. 6–10. [Google Scholar] [CrossRef]
  56. Azani Mustafa, W.; Kader, M.M.M.A. Binarization of Document Image Using Optimum Threshold Modification. J. Phys. Conf. Ser. 2018, 1019, 012022. [Google Scholar] [CrossRef] [Green Version]
  57. Kulyukin, V.; Kutiyanawala, A.; Zaman, T. Eyes-free barcode detection on smartphones with Niblack’s binarization and Support Vector Machines. In Proceedings of the 16th International Conference on Image Processing, Computer Vision, and Pattern Recognition (IPCV’2012), Las Vegas, NV, USA, 16–19 July 2012; Volume 1, pp. 284–290. [Google Scholar]
  58. Samorodova, O.A.; Samorodov, A.V. Fast implementation of the Niblack binarization algorithm for microscope image segmentation. Pattern Recognit. Image Anal. 2016, 26, 548–551. [Google Scholar] [CrossRef]
  59. Bataineh, B.; Abdullah, S.N.H.S.; Omar, K. An adaptive local binarization method for document images based on a novel thresholding method and dynamic windows. Pattern Recognit. Lett. 2011, 32, 1805–1813. [Google Scholar] [CrossRef]
  60. Mysore, S.; Gupta, M.K.; Belhe, S. Complex and degraded color document image binarization. In Proceedings of the 3rd International Conference on Signal Processing and Integrated Networks (SPIN), Noida, India, 11–12 February 2016; pp. 157–162. [Google Scholar] [CrossRef]
  61. Leedham, G.; Yan, C.; Takru, K.; Tan, J.H.N.; Mian, L. Comparison of some thresholding algorithms for text/background segmentation in difficult document images. In Proceedings of the 7th International Conference on Document Analysis and Recognition (ICDAR), Edinburgh, UK, 6 August 2003; pp. 859–864. [Google Scholar] [CrossRef]
  62. Sezgin, M.; Sankur, B. Survey over image thresholding techniques and quantitative performance evaluation. J. Electron. Imaging 2004, 13, 146–165. [Google Scholar] [CrossRef]
  63. Shrivastava, A.; Srivastava, D.K. A review on pixel-based binarization of gray images. In ICICT 2015; Springer: Singapore, 2016; Volume 439, pp. 357–364. [Google Scholar] [CrossRef]
  64. Saxena, L.P. Niblack’s binarization method and its modifications to real-time applications: A review. Artif. Intell. Rev. 2017, 1–33. [Google Scholar] [CrossRef]
  65. Mustafa, W.A.; Kader, M.M.M.A. Binarization of document images: A comprehensive review. J. Phys. Conf. Series 2018, 1019, 012023. [Google Scholar] [CrossRef] [Green Version]
  66. Sulaiman, A.; Omar, K.; Nasrudin, M.F. Degraded historical document binarization: A review on issues, challenges, techniques, and future directions. J. Imaging 2019, 5, 48. [Google Scholar] [CrossRef] [Green Version]
  67. Su, B.; Lu, S.; Tan, C.L. Robust document image binarization technique for degraded document images. IEEE Trans. Image Process. 2013, 22, 1408–1417. [Google Scholar] [CrossRef]
  68. Bag, S.; Bhowmick, P. Adaptive-interpolative binarization with stroke preservation for restoration of faint characters in degraded documents. J. Vis. Commun. Image Represent. 2015, 31, 266–281. [Google Scholar] [CrossRef]
  69. Howe, N.R. A Laplacian energy for document binarization. In Proceedings of the 11th IAPR International Conference on Document Analysis and Recognition (ICDAR), Beijing, China, 18–21 September 2011; pp. 6–10. [Google Scholar] [CrossRef] [Green Version]
  70. Lu, S.; Su, B.; Tan, C.L. Document image binarization using background estimation and stroke edges. Int. J. Doc. Anal. Recognit. 2010, 13, 303–314. [Google Scholar] [CrossRef]
  71. Erol, B.; Antúnez, E.R.; Hull, J.J. HOTPAPER: Multimedia interaction with paper using mobile phones. In Proceedings of the 16th International Conference on Multimedia 2008, Vancouver, BC, Canada, 26–31 October 2008; pp. 399–408. [Google Scholar] [CrossRef]
  72. Okamoto, A.; Yoshida, H.; Tanaka, N. A binarization method for degraded document images with morphological operations. In Proceedings of the 13th IAPR International Conference on Machine Vision Applications (MVA), Kyoto, Japan, 20–23 May 2013; pp. 294–297. [Google Scholar]
  73. Khitas, M.; Ziet, L.; Bouguezel, S. Improved degraded document image binarization using median filter for background estimation. Elektron. Ir Elektrotech. 2018, 24, 82–87. [Google Scholar] [CrossRef] [Green Version]
  74. Wen, J.; Li, S.; Sun, J. A new binarization method for non-uniform illuminated document images. Pattern Recognit. 2013, 46, 1670–1690. [Google Scholar] [CrossRef]
  75. Mitianoudis, N.; Papamarkos, N. Document image binarization using local features and Gaussian mixture modeling. Image Vis. Comput. 2015, 38, 33–51. [Google Scholar] [CrossRef]
  76. Chen, Y.; Wang, L. Broken and degraded document images binarization. Neurocomputing 2017, 237, 272–280. [Google Scholar] [CrossRef]
  77. Lelore, T.; Bouchara, F. Super-resolved binarization of text based on the FAIR algorithm. In Proceedings of the 11th IAPR International Conference on Document Analysis and Recognition (ICDAR), Beijing, China, 18–21 September 2011; pp. 839–843. [Google Scholar] [CrossRef]
  78. Yazid, H.; Arof, H. Gradient based adaptive thresholding. J. Vis. Commun. Image Represent. 2013, 24, 926–936. [Google Scholar] [CrossRef]
  79. Feng, S. A novel variational model for noise robust document image binarization. Neurocomputing 2019, 325, 288–302. [Google Scholar] [CrossRef]
  80. Almeida, M.; Lins, R.D.; Bernardino, R.; Jesus, D.; Lima, B. A New Binarization Algorithm for Historical Documents. J. Imaging 2018, 4, 27. [Google Scholar] [CrossRef] [Green Version]
  81. Tensmeyer, C.; Martinez, T. Document image binarization with fully convolutional neural networks. In Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Kyoto, Japan, 9–15 November 2017; pp. 99–104. [Google Scholar] [CrossRef] [Green Version]
  82. Vo, Q.N.; Kim, S.H.; Yang, H.J.; Lee, G. Binarization of degraded document images based on hierarchical deep supervised network. Pattern Recognit. 2018, 74, 568–586. [Google Scholar] [CrossRef]
  83. Bogiatzis, A.; Papadopoulos, B. Producing fuzzy inclusion and entropy measures and their application on global image thresholding. Evol. Syst. 2018, 9, 331–353. [Google Scholar] [CrossRef]
  84. Bogiatzis, A.; Papadopoulos, B. Global Image Thresholding Adaptive Neuro-Fuzzy Inference System Trained with Fuzzy Inclusion and Entropy Measures. Symmetry 2019, 11, 286. [Google Scholar] [CrossRef] [Green Version]
  85. Jia, F.; Shi, C.; He, K.; Wang, C.; Xiao, B. Document Image Binarization Using Structural Symmetry of Strokes. In Proceedings of the 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), Shenzhen, China, 23–26 October 2016; pp. 411–416. [Google Scholar] [CrossRef]
  86. Jia, F.; Shi, C.; He, K.; Wang, C.; Xiao, B. Degraded document image binarization using structural symmetry of strokes. Pattern Recognit. 2018, 74, 225–240. [Google Scholar] [CrossRef]
  87. Lins, R.D.; Bernardino, R.B.; de Jesus, D.M. A Quality and Time Assessment of Binarization Algorithms. In Proceedings of the 15th IAPR International Conference on Document Analysis and Recognition (ICDAR), Sydney, Australia, 20–25 September 2019; pp. 1444–1450. [Google Scholar] [CrossRef]
  88. Yoon, Y.; Ban, K.D.; Yoon, H.; Lee, J.; Kim, J. Best combination of binarization methods for license plate character segmentation. ETRI J. 2013, 35, 491–500. [Google Scholar] [CrossRef]
  89. Smith, R. An Overview of the Tesseract OCR Engine. In Proceedings of the 9th International Conference on Document Analysis and Recognition (ICDAR), Parana, Brazil, 23–26 September 2007; Volume 2, pp. 629–633. [Google Scholar] [CrossRef] [Green Version]
  90. Deng, F.; Wu, Z.; Lu, Z.; Brown, M.S. Binarizationshop: A user assisted software suite for converting old documents to black-and-white. In Proceedings of the Annual Joint Conference on Digital Libraries, Gold Coast, Queensland, Australia, 21–25 June 2010; pp. 255–258. [Google Scholar] [CrossRef]
  91. Wellner, P.D. Adaptive Thresholding for the DigitalDesk; Technical Report EPC 1993-110; Rank Xerox Ltd.: Cambridge, UK, July 1993. [Google Scholar]
Figure 1. Three sample unevenly illuminated images from the dataset used in experiments. (a) with strongly illuminated bottom part; (b) with regular shadows; (c) with strongly illuminated right side.
Figure 1. Three sample unevenly illuminated images from the dataset used in experiments. (a) with strongly illuminated bottom part; (b) with regular shadows; (c) with strongly illuminated right side.
Sensors 20 02914 g001
Figure 2. Binarization results obtained for a sample unevenly illuminated image from the dataset used in the experiments shown in Figure 1a for various methods: (a) Otsu, (b) Niblack, (c) Sauvola, (d) Bradley (mean), (e) Bernsen, (f) Meanthresh, (g) NICK, (h) stack of regions (16 layers), and (i) proposed (#51).
Figure 2. Binarization results obtained for a sample unevenly illuminated image from the dataset used in the experiments shown in Figure 1a for various methods: (a) Otsu, (b) Niblack, (c) Sauvola, (d) Bradley (mean), (e) Bernsen, (f) Meanthresh, (g) NICK, (h) stack of regions (16 layers), and (i) proposed (#51).
Sensors 20 02914 g002aSensors 20 02914 g002b
Figure 3. Binarization results obtained for a sample unevenly illuminated image from the dataset used in the experiments shown in Figure 1b for various methods: (a) Otsu, (b) Niblack, (c) Sauvola, (d) Bradley (mean), (e) Bernsen, (f) Meanthresh, (g) NICK, (h) stack of regions (16 layers), and (i) proposed (#51).
Figure 3. Binarization results obtained for a sample unevenly illuminated image from the dataset used in the experiments shown in Figure 1b for various methods: (a) Otsu, (b) Niblack, (c) Sauvola, (d) Bradley (mean), (e) Bernsen, (f) Meanthresh, (g) NICK, (h) stack of regions (16 layers), and (i) proposed (#51).
Sensors 20 02914 g003aSensors 20 02914 g003b
Figure 4. Binarization results obtained for a sample unevenly illuminated image from the dataset used in experiments shown in Figure 1c for various methods: (a) Otsu, (b) Niblack, (c) Sauvola, (d) Bradley (mean), (e) Bernsen, (f) Meanthresh, (g) NICK, (h) stack of regions (16 layers), and (i) proposed (#51).
Figure 4. Binarization results obtained for a sample unevenly illuminated image from the dataset used in experiments shown in Figure 1c for various methods: (a) Otsu, (b) Niblack, (c) Sauvola, (d) Bradley (mean), (e) Bernsen, (f) Meanthresh, (g) NICK, (h) stack of regions (16 layers), and (i) proposed (#51).
Sensors 20 02914 g004
Table 1. Comparison of the average F-measure, Levenshtein distance, and Optical Character Recognition (OCR) accuracy values obtained for various binarization methods using the Tesseract OCR engine for 176 document images (three best results shown in bold format).
Table 1. Comparison of the average F-measure, Levenshtein distance, and Optical Character Recognition (OCR) accuracy values obtained for various binarization methods using the Tesseract OCR engine for 176 document images (three best results shown in bold format).
#Binarization MethodOCR MeasureOverall Rank
F-MeasureRankLevenshtein DistanceRankAccuracyRank
1Otsu [24]0.6808601469.88600.51796060
2Chou [32]0.803257944.68580.65755757
3Kittler [22]0.6173611889.86610.39116161
4Niblack [8]0.883848243.39470.79064848
5Sauvola [49]0.94282796.79350.89552728
6Wolf [9]0.934233142.43410.88003036
7Bradley (mean) [46]0.901943245.98480.82174345
8Bradley (Gaussian) [46]0.849051557.98540.73195251
9Feng [50]0.743859950.16590.59085959
10Bernsen [44]0.767358724.68570.61045858
11Meanthresh0.820355464.19520.68855554
12NICK [48]0.95512443.20250.91442525
13Wellner [91]0.913440275.10500.84504042
14Region (1 layer) [20]0.885846174.98420.79564544
15Region (2 layers) [20]0.923638105.19360.85883838
16Region (4 layers) [20]0.93443192.14310.87743230
17Region (6 layers) [20]0.93593093.24320.87983129
18Region (8 layers) [20]0.93413488.88290.87693533
19Region (12 layers) [20]0.93433293.33330.87713434
20Region (16 layers) [20]0.93393590.65300.87673635
21Region (16 layers + MC) [20]0.907942117.16390.83154241
22Resampling [19]0.95572237.13240.91562323
23Entropy + Otsu [18]0.841853618.51560.72915554
24Entropy + Niblack [18]0.808656491.88530.67585656
25Entropy + Bradley(Mean) [18]0.91154194.08340.84054139
26Entropy + Bradley(Gauss) [18]0.890844188.71430.80574443
27Entropy + Meanthresh [18]0.94041646.93140.88991715
28SSP [85,86]0.940228111.99270.89152927
29Gatos [52]0.6808491469.88490.51794950
30Su [67]0.93323662.21280.97723332
31Singh [53]0.894525245.57230.80462424
32Bataineh [59]0.3905522578.68510.18605451
33WAN [56]0.95045045.39440.90805049
34ISauvola [54]0.94592680.53260.89552626
35OR (#20,#22,#23)0.929437110.91370.86983737
36AND (#20,#22,#23)0.840854615.75550.73375153
37Voting (#20,#22,#23)0.95761830.44170.91921819
38Voting (#5,#12,#22)0.95851631.35220.92071620
39Voting (#4,#7,#11)0.886345236.19450.79504646
40Voting (#4,#11,#22)0.884447238.95460.79154747
41Voting (#7,#20,#23)0.920639141.19400.85443940
42Voting (#7,#12,#20,#22,#23)0.95682029.05120.91772018
43Voting (#12,#20,#23)0.9617826.8280.926387
44Voting (#12,#22,#27)0.95861530.88190.92081517
45Voting (#12,#18,#20,#22,#27)0.95761931.04210.91881921
46Voting (#5,#6,#12,#18,#20,#22,#27)0.9617727.1190.926476
47Voting (#16,#22,#23)0.95562330.93200.91562222
48Voting (#12,#16,#23)0.96051027.95100.92431011
49Voting (#7,#12,#16,#22,#23)0.95801729.52130.92001716
50Voting (#20, #23, #34)0.9630326.3970.928933
51Voting (#20, #27, #31)0.96021223.3150.928934
52Voting (#20, #23, #28)0.9623625.5260.9238127
53Voting (#22, #27, #34)0.95971322.4330.927765
54Voting (#22, #23, #31)0.95602128.60110.92291415
55Voting (#22, #27, #28)0.9630423.1140.91682110
56Voting (#28, #31, #34)0.95971430.82180.92321314
57Voting (#20, #31, #34)0.96031130.44160.92431113
58Voting (#12, #28, #34)0.9660120.4420.934611
59Voting (#22, #31, #34)0.9611930.02150.9258912
60Voting (#4, #7, #28, #31, #34)0.9626529.88140.928557
61Voting (#12, #20, #22, #23, #28, #31, #34)0.9653218.5110.933322
Table 2. Comparison of the average F-measure, Levenshtein distance, and OCR accuracy values obtained for various binarization methods using MATLAB’s built-in OCR engine for 176 document images (three best results shown in bold format).
Table 2. Comparison of the average F-measure, Levenshtein distance, and OCR accuracy values obtained for various binarization methods using MATLAB’s built-in OCR engine for 176 document images (three best results shown in bold format).
#Binarization MethodOCR MeasureOverall Rank
F-MeasureRankLevenshtein DistanceRankAccuracyRank
1Otsu [24]0.6306601618.53600.43686060
2Chou [32]0.7351541097.47570.54955556
3Kittler [22]0.5799612027.23610.32346161
4Niblack [8]0.739552455.53450.57875150
5Sauvola [49]0.867227267.34340.76552829
6Wolf [9]0.851230312.68390.74333031
7Bradley (mean) [46]0.800841549.89490.65544143
8Bradley (Gaussian) [46]0.755447856.21550.58195051
9Feng [50]0.6607581041.55560.46835757
10Bernsen [44]0.6640571194.11580.45335958
11Meanthresh0.703956663.48510.52125655
12NICK [48]0.858929208.87260.75932928
13Wellner [91]0.826832470.24460.69853238
14Region (1 layer) [20]0.713655455.26440.55155452
15Region (2 layers) [20]0.785243286.34370.64994241
16Region (4 layers) [20]0.805938249.19330.67903837
17Region (6 layers) [20]0.809537249.01320.68413735
18Region (8 layers) [20]0.815135239.34290.69193531
19Region (12 layers) [20]0.814136241.31300.69083633
20Region (16 layers) [20]0.816034242.50310.69323330
21Region (16 layers + MC) [20]0.781944305.28380.64294342
22Resampling [19]0.865528159.55180.76772726
23Entropy + Otsu [18]0.773446786.19540.61614649
24Entropy + Niblack [18]0.6372591211.35590.46005859
25Entropy + Bradley(Mean) [18]0.821233363.03410.69293436
26Entropy + Bradley(Gauss) [18]0.786942525.62480.63984444
27Entropy + Meanthresh [18]0.879021149.66150.78792221
28SSP [85,86]0.876626235.88280.78022627
29Gatos [52]0.754448477.35470.59364746
30Su [67]0.805339283.09350.67633939
31Singh [53]0.877923185.99240.78222525
32Bataineh [59]0.877953185.99500.78225354
33WAN [56]0.746150742.53520.57575253
34ISauvola [54]0.883514216.48270.78902023
35OR (#20,#22,#23)0.804940285.87360.67594040
36AND (#20,#22,#23)0.778745765.26530.62694547
37Voting (#20,#22,#23)0.879918136.1390.78991814
38Voting (#5,#12,#22)0.876725150.70160.78542424
39Voting (#4,#7,#11)0.746749437.98420.58874845
40Voting (#4,#11,#22)0.744251442.27430.58404947
41Voting (#7,#20,#23)0.831031359.44400.70673133
42Voting (#7,#12,#20,#22,#23)0.884713134.7880.7977118
43Voting (#12,#20,#23)0.881017138.27110.79241512
44Voting (#12,#22,#27)0.879220145.94130.78882120
45Voting (#12,#18,#20,#22,#27)0.878822130.8870.78921918
46Voting (#5,#6,#12,#18,#20,#22,#27)0.89008124.8040.806475
47Voting (#16,#22,#23)0.879819138.04100.79021716
48Voting (#12,#16,#23)0.877824139.63120.78682322
49Voting (#7,#12,#16,#22,#23)0.883515129.9460.79531310
50Voting (#20, #23, #34)0.89666164.73190.811267
51Voting (#20, #27, #31)0.89932118.3030.818521
52Voting (#20, #23, #28)0.888210148.10140.800299
53Voting (#22, #27, #34)0.89665116.2920.814154
54Voting (#22, #23, #31)0.882516173.11200.79051619
55Voting (#22, #27, #28)0.89833114.4810.817831
56Voting (#28, #31, #34)0.88949189.82250.79911013
57Voting (#20, #31, #34)0.887711182.86220.79711214
58Voting (#12, #28, #34)0.89824153.56170.816346
59Voting (#22, #31, #34)0.885212181.83210.79321417
60Voting (#4, #7, #28, #31, #34)0.89167185.45230.8025811
61Voting (#12, #20, #22, #23, #28, #31, #34)0.90141129.6250.820911
Table 3. Comparison of the average F-measure, Levenshtein distance, and OCR accuracy values obtained for various binarization methods using GNU Ocrad for 176 document images (three best results shown in bold format).
Table 3. Comparison of the average F-measure, Levenshtein distance, and OCR accuracy values obtained for various binarization methods using GNU Ocrad for 176 document images (three best results shown in bold format).
#Binarization MethodOCR MeasureOverall Rank
F-MeasureRankLevenshtein DistanceRankAccuracyRank
1Otsu [24]0.5622602414.45590.22316059
2Chou [32]0.6013561884.73540.33165656
3Kittler [22]0.5641592487.22600.20196160
4Niblack [8]0.663943953.84360.45314442
5Sauvola [49]0.7001351136.26440.49383639
6Wolf [9]0.7068321009.59400.50833436
7Bradley (mean) [46]0.6074531786.23520.36335453
8Bradley (Gaussian) [46]0.5745572151.78580.29095858
9Feng [50]0.6050541943.19550.38945254
10Bernsen [44]0.5020612969.76610.22635961
11Meanthresh0.6576461075.15420.43664744
12NICK [48]0.722622872.76280.53622121
13Wellner [91]0.6796381214.53460.46384043
14Region (1 layer) [20]0.6183511057.44410.40385047
15Region (2 layers) [20]0.674939800.10240.47803834
16Region (4 layers) [20]0.699536735.78220.50983330
17Region (6 layers) [20]0.706931727.24210.51982825
18Region (8 layers) [20]0.707930721.76190.52102724
19Region (12 layers) [20]0.706533724.13200.51952928
20Region (16 layers) [20]0.709429720.84180.52372421
21Region (16 layers + MC) [20]0.666141824.98260.46413936
22Resampling [19]0.733118690.13160.55561717
23Entropy + Otsu [18]0.6422491420.09490.42474950
24Entropy + Niblack [18]0.6615451962.02560.45864147
25Entropy + Bradley(Mean) [18]0.6247501608.03500.39175151
26Entropy + Bradley(Gauss) [18]0.6142521699.44510.37405352
27Entropy + Meanthresh [18]0.75588573.6390.588988
28SSP [85,86]0.717125884.47310.52252527
29Gatos [52]0.6559471178.52450.43734646
30Su [67]0.702034814.40250.51223230
31Singh [53]0.718024644.81350.52192629
32Bataineh [59]0.6041551869.63530.36095555
33WAN [56]0.5695582103.19570.31095757
34ISauvola [54]0.7109281089.10430.50683536
35OR (#20,#22,#23)0.687937883.67300.48973735
36AND (#20,#22,#23)0.6493481390.91480.43424849
37Voting (#20,#22,#23)0.75657558.0560.591066
38Voting (#5,#12,#22)0.742214675.09140.56831414
39Voting (#4,#7,#11)0.666540937.72330.45774239
40Voting (#4,#11,#22)0.664842965.40370.45404341
41Voting (#7,#20,#23)0.6636441262.76470.44924545
42Voting (#7,#12,#20,#22,#23)0.76154552.8450.598044
43Voting (#12,#20,#23)0.75519588.52110.5883910
44Voting (#12,#22,#27)0.741915673.88130.56791515
45Voting (#12,#18,#20,#22,#27)0.752011584.39100.58461111
46Voting (#5,#6,#12,#18,#20,#22,#27)0.76085552.1940.596855
47Voting (#16,#22,#23)0.752910564.8980.5853109
48Voting (#12,#16,#23)0.749413595.03120.57991212
49Voting (#7,#12,#16,#22,#23)0.75676559.5070.590677
50Voting (#20, #23, #34)0.731619875.44290.54121919
51Voting (#20, #27, #31)0.76732530.6810.605022
52Voting (#20, #23, #28)0.739416715.07170.55871616
53Voting (#22, #27, #34)0.76791531.4920.606111
54Voting (#22, #23, #31)0.727320841.39270.53862019
55Voting (#22, #27, #28)0.76613537.6230.603733
56Voting (#28, #31, #34)0.715927978.32390.51743133
57Voting (#20, #31, #34)0.723521935.28320.52872223
58Voting (#12, #28, #34)0.735117784.12230.55041818
59Voting (#22, #31, #34)0.721823937.92340.52682325
60Voting (#4, #7, #28, #31, #34)0.717026973.15380.51893032
61Voting (#12, #20, #22, #23, #28, #31, #34)0.751212676.03150.57611313
Table 4. Comparison of the overall rank scores for 3 OCR engines and average computational time relative to Otsu’s method obtained for 176 document images.
Table 4. Comparison of the overall rank scores for 3 OCR engines and average computational time relative to Otsu’s method obtained for 176 document images.
#Binarization MethodFinal Aggregated RankComputation Time (Relative)
1Otsu [24]601.00
2Chou [32]575.74
3Kittler [22]6123.30
4Niblack [8]4675.11
5Sauvola [49]3373.73
6Wolf [9]3676.36
7Bradley (mean) [46]4719.62
8Bradley (Gaussian) [46]54241.61
9Feng [50]58215.20
10Bernsen [44]59197.14
11Meanthresh5139.93
12NICK [48]2570.81
13Wellner [91]41187.90
14Region (1 layer) [20]4929.84
15Region (2 layers) [20]3850.23
16Region (4 layers) [20]3492.39
17Region (6 layers) [20]31145.49
18Region (8 layers) [20]30211.87
19Region (12 layers) [20]32325.05
20Region (16 layers) [20]29441.84
21Region (16 layers + MC) [20]401232.01
22Resampling [19]2412.48
23Entropy + Otsu [18]51664.87
24Entropy + Niblack [18]56755.11
25Entropy + Bradley(Mean) [18]42706.57
26Entropy + Bradley(Gauss) [18]45932.92
27Entropy + Meanthresh [18]21736.67
28SSP [85,86]274542.24
29Gatos [52]482413.68
30Su [67]356016.56
31Singh [53]2659.78
32Bataineh [59]5444.58
33WAN [56]53400.98
34ISauvola [54]27113.69
35OR (#20,#22,#23)371138.64
36AND (#20,#22,#23)501134.25
37Voting (#20,#22,#23)121136.87
38Voting (#5,#12,#22)22159.17
39Voting (#4,#7,#11)43137.30
40Voting (#4,#11,#22)44130.64
41Voting (#7,#20,#23)391143.63
42Voting (#7,#12,#20,#22,#23)91224.28
43Voting (#12,#20,#23)71191.77
44Voting (#12,#22,#27)18817.40
45Voting (#12,#18,#20,#22,#27)151455.67
46Voting (#5,#6,#12,#18,#20,#22,#27)41600.90
47Voting (#16,#22,#23)14793.58
48Voting (#12,#16,#23)13858.17
49Voting (#7,#12,#16,#22,#23)11892.77
50Voting (#20, #23, #34)71249.60
51Voting (#20, #27, #31)11247.58
52Voting (#20, #23, #28)105662.15
53Voting (#22, #27, #34)2887.61
54Voting (#22, #23, #31)19801.04
55Voting (#22, #27, #28)35286.12
56Voting (#28, #31, #34)234584.69
57Voting (#20, #31, #34)15745.37
58Voting (#12, #28, #34)64572.60
59Voting (#22, #31, #34)20190.31
60Voting (#4, #7, #28, #31, #34)154656.10
61Voting (#12, #20, #22, #23, #28, #31, #34)45880.53

Share and Cite

MDPI and ACS Style

Michalak, H.; Okarma, K. Robust Combined Binarization Method of Non-Uniformly Illuminated Document Images for Alphanumerical Character Recognition. Sensors 2020, 20, 2914. https://doi.org/10.3390/s20102914

AMA Style

Michalak H, Okarma K. Robust Combined Binarization Method of Non-Uniformly Illuminated Document Images for Alphanumerical Character Recognition. Sensors. 2020; 20(10):2914. https://doi.org/10.3390/s20102914

Chicago/Turabian Style

Michalak, Hubert, and Krzysztof Okarma. 2020. "Robust Combined Binarization Method of Non-Uniformly Illuminated Document Images for Alphanumerical Character Recognition" Sensors 20, no. 10: 2914. https://doi.org/10.3390/s20102914

APA Style

Michalak, H., & Okarma, K. (2020). Robust Combined Binarization Method of Non-Uniformly Illuminated Document Images for Alphanumerical Character Recognition. Sensors, 20(10), 2914. https://doi.org/10.3390/s20102914

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop