Document Image Processing

A special issue of Journal of Imaging (ISSN 2313-433X).

Deadline for manuscript submissions: closed (15 December 2017) | Viewed by 103467

Printed Edition Available!
A printed edition of this Special Issue is available here.

Special Issue Editors


E-Mail Website
Guest Editor
Department of Information and Communication Systems Engineering, University of the Aegean, Samos, Greece
Interests: document image; document image processing; historical document images; document analysis; deep learning; machine learning

E-Mail Website
Guest Editor
Telecom ParisTech/TSI, Paris, France
Interests: handwriting recognition with Markovian methods (HMMs, Bayesian Networks) and Recurrent Neural Networks (BLSTMs); document analysis of historical documents; information extraction in degraded documents; Web documents; automatic detection of cognitive disorders from handwriting signal

Special Issue Information

Dear Colleagues,

Document Image Processing allows systems like OCR, writer identification, writer recognition, check processing, historical document processing, etc., to extract useful information from document images. In order to succeed, many preprocessing tasks can be required: Document skew detection and correction, slant removal, binarization and segmentation procedures, as well as other normalization tasks.

The intent of this Special Issue is to collect the experiences of leading scientists of the field, but also to be an assessment tool for people who are new to the world of document image processing.

This Special Issue intends to cover the following topics, but is not limited to them:

  • Document Image Analysis
  • Document Understanding
  • Document Analysis Systems
  • Document Processing
  • Camera-based Document Processing
  • Document Databases and Digital Libraries
  • Mining Document Image Collections
  • Document Forensics
  • Historical Documents
  • Segmentation and Restoration
  • Performance Evaluation
  • Camera and Scene Text Understanding
  • Machine Learning for Document Analysis
  • Human-Document Interaction
  • Novel Applications

Indeed, any work concerning the use of document image processing, as well the development of new application procedures, may fall within the scope of this Special Issue. Of course, papers must present novel results, or the advancement of previously published data, and the matter should be dealt with scientific rigor.

Dr. Ergina Kavallieratou
Dr. Laurence Likforman
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Journal of Imaging is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • document image;
  • document image processing;
  • historical document images;
  • document analysis;
  • deep learning;
  • machine learning;

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (13 papers)

Order results
Result details
Select all
Export citation of selected articles as:
2 pages, 150 KiB  
Editorial
Document Image Processing
by Laurence Likforman-Sulem and Ergina Kavallieratou
J. Imaging 2018, 4(7), 84; https://doi.org/10.3390/jimaging4070084 - 22 Jun 2018
Cited by 2 | Viewed by 4047
(This article belongs to the Special Issue Document Image Processing)
14 pages, 7116 KiB  
Article
Non-Local Sparse Image Inpainting for Document Bleed-Through Removal
by Muhammad Hanif, Anna Tonazzini, Pasquale Savino and Emanuele Salerno
J. Imaging 2018, 4(5), 68; https://doi.org/10.3390/jimaging4050068 - 9 May 2018
Cited by 17 | Viewed by 5475
Abstract
Bleed-through is a frequent, pervasive degradation in ancient manuscripts, which is caused by ink seeped from the opposite side of the sheet. Bleed-through, appearing as an extra interfering text, hinders document readability and makes it difficult to decipher the information contents. Digital image [...] Read more.
Bleed-through is a frequent, pervasive degradation in ancient manuscripts, which is caused by ink seeped from the opposite side of the sheet. Bleed-through, appearing as an extra interfering text, hinders document readability and makes it difficult to decipher the information contents. Digital image restoration techniques have been successfully employed to remove or significantly reduce this distortion. This paper proposes a two-step restoration method for documents affected by bleed-through, exploiting information from the recto and verso images. First, the bleed-through pixels are identified, based on a non-stationary, linear model of the two texts overlapped in the recto-verso pair. In the second step, a dictionary learning-based sparse image inpainting technique, with non-local patch grouping, is used to reconstruct the bleed-through-contaminated image information. An overcomplete sparse dictionary is learned from the bleed-through-free image patches, which is then used to estimate a befitting fill-in for the identified bleed-through pixels. The non-local patch similarity is employed in the sparse reconstruction of each patch, to enforce the local similarity. Thanks to the intrinsic image sparsity and non-local patch similarity, the natural texture of the background is well reproduced in the bleed-through areas, and even a possible overestimation of the bleed through pixels is effectively corrected, so that the original appearance of the document is preserved. We evaluate the performance of the proposed method on the images of a popular database of ancient documents, and the results validate the performance of the proposed method compared to the state of the art. Full article
(This article belongs to the Special Issue Document Image Processing)
Show Figures

Figure 1

12 pages, 2874 KiB  
Article
A New Binarization Algorithm for Historical Documents
by Marcos Almeida, Rafael Dueire Lins, Rodrigo Bernardino, Darlisson Jesus and Bruno Lima
J. Imaging 2018, 4(2), 27; https://doi.org/10.3390/jimaging4020027 - 23 Jan 2018
Cited by 15 | Viewed by 8113
Abstract
Monochromatic documents claim for much less computer bandwidth for network transmission and storage space than their color or even grayscale equivalent. The binarization of historical documents is far more complex than recent ones as paper aging, color, texture, translucidity, stains, back-to-front interference, kind [...] Read more.
Monochromatic documents claim for much less computer bandwidth for network transmission and storage space than their color or even grayscale equivalent. The binarization of historical documents is far more complex than recent ones as paper aging, color, texture, translucidity, stains, back-to-front interference, kind and color of ink used in handwriting, printing process, digitalization process, etc. are some of the factors that affect binarization. This article presents a new binarization algorithm for historical documents. The new global filter proposed is performed in four steps: filtering the image using a bilateral filter, splitting image into the RGB components, decision-making for each RGB channel based on an adaptive binarization method inspired by Otsu’s method with a choice of the threshold level, and classification of the binarized images to decide which of the RGB components best preserved the document information in the foreground. The quantitative and qualitative assessment made with 23 binarization algorithms in three sets of “real world” documents showed very good results. Full article
(This article belongs to the Special Issue Document Image Processing)
Show Figures

Figure 1

16 pages, 6387 KiB  
Article
Slant Removal Technique for Historical Document Images
by Ergina Kavallieratou, Laurence Likforman-Sulem and Nikos Vasilopoulos
J. Imaging 2018, 4(6), 80; https://doi.org/10.3390/jimaging4060080 - 12 Jun 2018
Cited by 3 | Viewed by 9297
Abstract
Slanted text has been demonstrated to be a salient feature of handwriting. Its estimation is a necessary preprocessing task in many document image processing systems in order to improve the required training. This paper describes and evaluates a new technique for removing the [...] Read more.
Slanted text has been demonstrated to be a salient feature of handwriting. Its estimation is a necessary preprocessing task in many document image processing systems in order to improve the required training. This paper describes and evaluates a new technique for removing the slant from historical document pages that avoids the segmentation procedure into text lines and words. The proposed technique first relies on slant angle detection from an accurate selection of fragments. Then, a slant removal technique is applied. However, the presented slant removal technique may be combined with any other slant detection algorithm. Experimental results are provided for four document image databases: two historical document databases, the TrigraphSlant database (the only database dedicated to slant removal), and a printed database in order to check the precision of the proposed technique. Full article
(This article belongs to the Special Issue Document Image Processing)
Show Figures

Figure 1

15 pages, 15101 KiB  
Article
Text/Non-Text Separation from Handwritten Document Images Using LBP Based Features: An Empirical Study
by Sourav Ghosh, Dibyadwati Lahiri, Showmik Bhowmik, Ergina Kavallieratou and Ram Sarkar
J. Imaging 2018, 4(4), 57; https://doi.org/10.3390/jimaging4040057 - 12 Apr 2018
Cited by 19 | Viewed by 7363
Abstract
Isolating non-text components from the text components present in handwritten document images is an important but less explored research area. Addressing this issue, in this paper, we have presented an empirical study on the applicability of various Local Binary Pattern (LBP) based texture [...] Read more.
Isolating non-text components from the text components present in handwritten document images is an important but less explored research area. Addressing this issue, in this paper, we have presented an empirical study on the applicability of various Local Binary Pattern (LBP) based texture features for this problem. This paper also proposes a minor modification in one of the variants of the LBP operator to achieve better performance in the text/non-text classification problem. The feature descriptors are then evaluated on a database, made up of images from 104 handwritten laboratory copies and class notes of various engineering and science branches, using five well-known classifiers. Classification results reflect the effectiveness of LBP-based feature descriptors in text/non-text separation. Full article
(This article belongs to the Special Issue Document Image Processing)
Show Figures

Figure 1

1550 KiB  
Article
A Holistic Technique for an Arabic OCR System
by Farhan M. A. Nashwan, Mohsen A. A. Rashwan, Hassanin M. Al-Barhamtoshy, Sherif M. Abdou and Abdullah M. Moussa
J. Imaging 2018, 4(1), 6; https://doi.org/10.3390/jimaging4010006 - 27 Dec 2017
Cited by 22 | Viewed by 7485
Abstract
Analytical based approaches in Optical Character Recognition (OCR) systems can endure a significant amount of segmentation errors, especially when dealing with cursive languages such as the Arabic language with frequent overlapping between characters. Holistic based approaches that consider whole words as single units [...] Read more.
Analytical based approaches in Optical Character Recognition (OCR) systems can endure a significant amount of segmentation errors, especially when dealing with cursive languages such as the Arabic language with frequent overlapping between characters. Holistic based approaches that consider whole words as single units were introduced as an effective approach to avoid such segmentation errors. Still the main challenge for these approaches is their computation complexity, especially when dealing with large vocabulary applications. In this paper, we introduce a computationally efficient, holistic Arabic OCR system. A lexicon reduction approach based on clustering similar shaped words is used to reduce recognition time. Using global word level Discrete Cosine Transform (DCT) based features in combination with local block based features, our proposed approach managed to generalize for new font sizes that were not included in the training data. Evaluation results for the approach using different test sets from modern and historical Arabic books are promising compared with state of art Arabic OCR systems. Full article
(This article belongs to the Special Issue Document Image Processing)
Show Figures

Figure 1

16 pages, 414 KiB  
Article
Efficient Query Specific DTW Distance for Document Retrieval with Unlimited Vocabulary
by Gattigorla Nagendar, Viresh Ranjan, Gaurav Harit and C. V. Jawahar
J. Imaging 2018, 4(2), 37; https://doi.org/10.3390/jimaging4020037 - 8 Feb 2018
Cited by 1 | Viewed by 4725
Abstract
In this paper, we improve the performance of the recently proposed Direct Query Classifier (dqc). The (dqc) is a classifier based retrieval method and in general, such methods have been shown to be superior to the OCR-based solutions for [...] Read more.
In this paper, we improve the performance of the recently proposed Direct Query Classifier (dqc). The (dqc) is a classifier based retrieval method and in general, such methods have been shown to be superior to the OCR-based solutions for performing retrieval in many practical document image datasets. In (dqc), the classifiers are trained for a set of frequent queries and seamlessly extended for the rare and arbitrary queries. This extends the classifier based retrieval paradigm to an unlimited number of classes (words) present in a language. The (dqc) requires indexing cut-portions (n-grams) of the word image and dtw distance has been used for indexing. However, dtw is computationally slow and therefore limits the performance of the (dqc). We introduce query specific dtw distance, which enables effective computation of global principal alignments for novel queries. Since the proposed query specific dtw distance is a linear approximation of the dtw distance, it enhances the performance of the (dqc). Unlike previous approaches, the proposed query specific dtw distance uses both the class mean vectors and the query information for computing the global principal alignments for the query. Since the proposed method computes the global principal alignments using n-grams, it works well for both frequent and rare queries. We also use query expansion (qe) to further improve the performance of our query specific dtw. This also allows us to seamlessly adapt our solution to new fonts, styles and collections. We have demonstrated the utility of the proposed technique over 3 different datasets. The proposed query specific dtw performs well compared to the previous dtw approximations. Full article
(This article belongs to the Special Issue Document Image Processing)
Show Figures

Figure 1

14 pages, 1171 KiB  
Article
Handwritten Devanagari Character Recognition Using Layer-Wise Training of Deep Convolutional Neural Networks and Adaptive Gradient Methods
by Mahesh Jangid and Sumit Srivastava
J. Imaging 2018, 4(2), 41; https://doi.org/10.3390/jimaging4020041 - 13 Feb 2018
Cited by 75 | Viewed by 9835
Abstract
Handwritten character recognition is currently getting the attention of researchers because of possible applications in assisting technology for blind and visually impaired users, human–robot interaction, automatic data entry for business documents, etc. In this work, we propose a technique to recognize handwritten Devanagari [...] Read more.
Handwritten character recognition is currently getting the attention of researchers because of possible applications in assisting technology for blind and visually impaired users, human–robot interaction, automatic data entry for business documents, etc. In this work, we propose a technique to recognize handwritten Devanagari characters using deep convolutional neural networks (DCNN) which are one of the recent techniques adopted from the deep learning community. We experimented the ISIDCHAR database provided by (Information Sharing Index) ISI, Kolkata and V2DMDCHAR database with six different architectures of DCNN to evaluate the performance and also investigate the use of six recently developed adaptive gradient methods. A layer-wise technique of DCNN has been employed that helped to achieve the highest recognition accuracy and also get a faster convergence rate. The results of layer-wise-trained DCNN are favorable in comparison with those achieved by a shallow technique of handcrafted features and standard DCNN. Full article
(This article belongs to the Special Issue Document Image Processing)
Show Figures

Figure 1

27 pages, 7742 KiB  
Article
Benchmarking of Document Image Analysis Tasks for Palm Leaf Manuscripts from Southeast Asia
by Made Windu Antara Kesiman, Dona Valy, Jean-Christophe Burie, Erick Paulus, Mira Suryani, Setiawan Hadi, Michel Verleysen, Sophea Chhun and Jean-Marc Ogier
J. Imaging 2018, 4(2), 43; https://doi.org/10.3390/jimaging4020043 - 22 Feb 2018
Cited by 34 | Viewed by 9167
Abstract
This paper presents a comprehensive test of the principal tasks in document image analysis (DIA), starting with binarization, text line segmentation, and isolated character/glyph recognition, and continuing on to word recognition and transliteration for a new and challenging collection of palm leaf manuscripts [...] Read more.
This paper presents a comprehensive test of the principal tasks in document image analysis (DIA), starting with binarization, text line segmentation, and isolated character/glyph recognition, and continuing on to word recognition and transliteration for a new and challenging collection of palm leaf manuscripts from Southeast Asia. This research presents and is performed on a complete dataset collection of Southeast Asian palm leaf manuscripts. It contains three different scripts: Khmer script from Cambodia, and Balinese script and Sundanese script from Indonesia. The binarization task is evaluated on many methods up to the latest in some binarization competitions. The seam carving method is evaluated for the text line segmentation task, compared to a recently new text line segmentation method for palm leaf manuscripts. For the isolated character/glyph recognition task, the evaluation is reported from the handcrafted feature extraction method, the neural network with unsupervised learning feature, and the Convolutional Neural Network (CNN) based method. Finally, the Recurrent Neural Network-Long Short-Term Memory (RNN-LSTM) based method is used to analyze the word recognition and transliteration task for the palm leaf manuscripts. The results from all experiments provide the latest findings and a quantitative benchmark for palm leaf manuscripts analysis for researchers in the DIA community. Full article
(This article belongs to the Special Issue Document Image Processing)
Show Figures

Figure 1

22 pages, 1223 KiB  
Article
Transcription of Spanish Historical Handwritten Documents with Deep Neural Networks
by Emilio Granell, Edgard Chammas, Laurence Likforman-Sulem, Carlos-D. Martínez-Hinarejos, Chafic Mokbel and Bogdan-Ionuţ Cîrstea
J. Imaging 2018, 4(1), 15; https://doi.org/10.3390/jimaging4010015 - 11 Jan 2018
Cited by 29 | Viewed by 9956
Abstract
The digitization of historical handwritten document images is important for the preservation of cultural heritage. Moreover, the transcription of text images obtained from digitization is necessary to provide efficient information access to the content of these documents. Handwritten Text Recognition (HTR) has become [...] Read more.
The digitization of historical handwritten document images is important for the preservation of cultural heritage. Moreover, the transcription of text images obtained from digitization is necessary to provide efficient information access to the content of these documents. Handwritten Text Recognition (HTR) has become an important research topic in the areas of image and computational language processing that allows us to obtain transcriptions from text images. State-of-the-art HTR systems are, however, far from perfect. One difficulty is that they have to cope with image noise and handwriting variability. Another difficulty is the presence of a large amount of Out-Of-Vocabulary (OOV) words in ancient historical texts. A solution to this problem is to use external lexical resources, but such resources might be scarce or unavailable given the nature and the age of such documents. This work proposes a solution to avoid this limitation. It consists of associating a powerful optical recognition system that will cope with image noise and variability, with a language model based on sub-lexical units that will model OOV words. Such a language modeling approach reduces the size of the lexicon while increasing the lexicon coverage. Experiments are first conducted on the publicly available Rodrigo dataset, which contains the digitization of an ancient Spanish manuscript, with a recognizer based on Hidden Markov Models (HMMs). They show that sub-lexical units outperform word units in terms of Word Error Rate (WER), Character Error Rate (CER) and OOV word accuracy rate. This approach is then applied to deep net classifiers, namely Bi-directional Long-Short Term Memory (BLSTMs) and Convolutional Recurrent Neural Nets (CRNNs). Results show that CRNNs outperform HMMs and BLSTMs, reaching the lowest WER and CER for this image dataset and significantly improving OOV recognition. Full article
(This article belongs to the Special Issue Document Image Processing)
Show Figures

Figure 1

21 pages, 2203 KiB  
Article
A Study of Different Classifier Combination Approaches for Handwritten Indic Script Recognition
by Anirban Mukhopadhyay, Pawan Kumar Singh, Ram Sarkar and Mita Nasipuri
J. Imaging 2018, 4(2), 39; https://doi.org/10.3390/jimaging4020039 - 13 Feb 2018
Cited by 13 | Viewed by 6447
Abstract
Script identification is an essential step in document image processing especially when the environment is multi-script/multilingual. Till date researchers have developed several methods for the said problem. For this kind of complex pattern recognition problem, it is always difficult to decide which classifier [...] Read more.
Script identification is an essential step in document image processing especially when the environment is multi-script/multilingual. Till date researchers have developed several methods for the said problem. For this kind of complex pattern recognition problem, it is always difficult to decide which classifier would be the best choice. Moreover, it is also true that different classifiers offer complementary information about the patterns to be classified. Therefore, combining classifiers, in an intelligent way, can be beneficial compared to using any single classifier. Keeping these facts in mind, in this paper, information provided by one shape based and two texture based features are combined using classifier combination techniques for script recognition (word-level) purpose from the handwritten document images. CMATERdb8.4.1 contains 7200 handwritten word samples belonging to 12 Indic scripts (600 per script) and the database is made freely available at https://code.google.com/p/cmaterdb/. The word samples from the mentioned database are classified based on the confidence scores provided by Multi-Layer Perceptron (MLP) classifier. Major classifier combination techniques including majority voting, Borda count, sum rule, product rule, max rule, Dempster-Shafer (DS) rule of combination and secondary classifiers are evaluated for this pattern recognition problem. Maximum accuracy of 98.45% is achieved with an improvement of 7% over the best performing individual classifier being reported on the validation set. Full article
(This article belongs to the Special Issue Document Image Processing)
Show Figures

Figure 1

25492 KiB  
Article
DocCreator: A New Software for Creating Synthetic Ground-Truthed Document Images
by Nicholas Journet, Muriel Visani, Boris Mansencal, Kieu Van-Cuong and Antoine Billy
J. Imaging 2017, 3(4), 62; https://doi.org/10.3390/jimaging3040062 - 11 Dec 2017
Cited by 43 | Viewed by 10031
Abstract
Most digital libraries that provide user-friendly interfaces, enabling quick and intuitive access to their resources, are based on Document Image Analysis and Recognition (DIAR) methods. Such DIAR methods need ground-truthed document images to be evaluated/compared and, in some cases, trained. Especially with the [...] Read more.
Most digital libraries that provide user-friendly interfaces, enabling quick and intuitive access to their resources, are based on Document Image Analysis and Recognition (DIAR) methods. Such DIAR methods need ground-truthed document images to be evaluated/compared and, in some cases, trained. Especially with the advent of deep learning-based approaches, the required size of annotated document datasets seems to be ever-growing. Manually annotating real documents has many drawbacks, which often leads to small reliably annotated datasets. In order to circumvent those drawbacks and enable the generation of massive ground-truthed data with high variability, we present DocCreator, a multi-platform and open-source software able to create many synthetic image documents with controlled ground truth. DocCreator has been used in various experiments, showing the interest of using such synthetic images to enrich the training stage of DIAR tools. Full article
(This article belongs to the Special Issue Document Image Processing)
Show Figures

Figure 1

19 pages, 5686 KiB  
Article
Open Datasets and Tools for Arabic Text Detection and Recognition in News Video Frames
by Oussama Zayene, Sameh Masmoudi Touj, Jean Hennebert, Rolf Ingold and Najoua Essoukri Ben Amara
J. Imaging 2018, 4(2), 32; https://doi.org/10.3390/jimaging4020032 - 31 Jan 2018
Cited by 10 | Viewed by 9343
Abstract
Recognizing texts in video is more complex than in other environments such as scanned documents. Video texts appear in various colors, unknown fonts and sizes, often affected by compression artifacts and low quality. In contrast to Latin texts, there are no publicly available [...] Read more.
Recognizing texts in video is more complex than in other environments such as scanned documents. Video texts appear in various colors, unknown fonts and sizes, often affected by compression artifacts and low quality. In contrast to Latin texts, there are no publicly available datasets which cover all aspects of the Arabic Video OCR domain. This paper describes a new well-defined and annotated Arabic-Text-in-Video dataset called AcTiV 2.0. The dataset is dedicated especially to building and evaluating Arabic video text detection and recognition systems. AcTiV 2.0 contains 189 video clips serving as a raw material for creating 4063 key frames for the detection task and 10,415 cropped text images for the recognition task. AcTiV 2.0 is also distributed with its annotation and evaluation tools that are made open-source for standardization and validation purposes. This paper also reports on the evaluation of several systems tested under the proposed detection and recognition protocols. Full article
(This article belongs to the Special Issue Document Image Processing)
Show Figures

Figure 1

Back to TopTop