sensors-logo

Journal Browser

Journal Browser

Visual Sensors and Machine Learning Techniques for Handwritten Text and Document Recognition

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Sensing and Imaging".

Deadline for manuscript submissions: 15 January 2025 | Viewed by 450

Special Issue Editor


E-Mail Website
Guest Editor
Department of Engineering "Enzo Ferrari'', Università degli Studi di Perugia, 06123 Perugia, Italy
Interests: embodied AI; handwritten text recognition; image captioning

Special Issue Information

Dear Colleagues,

For this Special Issue, we invite submissions of high-quality, original research on the theory, application, and development of automated systems for the recognition and analysis of handwritten text and documents. This Special Issue seeks to explore the intersection of computer vision, pattern recognition, and machine learning to bridge the gap between theoretical advancements and practical applications to improve the accuracy, robustness, and scalability of handwritten text recognition (HTR) across diverse scenarios.

Topics of interest include, but are not limited to, the following:

  • Novel visual sensor technologies for capturing handwritten text and documents, including advancements in lighting, camera characteristics, and multi-modal sensing.
  • Advanced techniques for cost-effective training, such as data augmentation and synthesis.
  • Machine learning algorithms for handwritten text recognition (HTR) and document image analysis (DIA), encompassing deep learning architectures, feature extraction techniques, and language models for character recognition, text segmentation, layout analysis, and writer identification.
  • Pre-processing and post-processing methods for improving recognition accuracy, including noise reduction, segmentation, and error correction.
  • Applications of HTR and DIA in various domains, such as historical document processing, legal document analysis, and form recognition.
  • Evaluation methodologies for benchmarking HTR and DIA systems, including the development of new datasets and performance metrics.

Dr. Silvia Cascianelli
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • handwritten text recognition
  • document image analysis
  • document digitalization
  • structured document recognition

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (1 paper)

Order results
Result details
Select all
Export citation of selected articles as:

Research

14 pages, 1168 KiB  
Article
CLIP-Llama: A New Approach for Scene Text Recognition with a Pre-Trained Vision-Language Model and a Pre-Trained Language Model
by Xiaoqing Zhao, Miaomiao Xu, Wushour Silamu and Yanbing Li
Sensors 2024, 24(22), 7371; https://doi.org/10.3390/s24227371 - 19 Nov 2024
Viewed by 274
Abstract
This study focuses on Scene Text Recognition (STR), which plays a crucial role in various applications of artificial intelligence such as image retrieval, office automation, and intelligent transportation systems. Currently, pre-trained vision-language models have become the foundation for various downstream tasks. CLIP exhibits [...] Read more.
This study focuses on Scene Text Recognition (STR), which plays a crucial role in various applications of artificial intelligence such as image retrieval, office automation, and intelligent transportation systems. Currently, pre-trained vision-language models have become the foundation for various downstream tasks. CLIP exhibits robustness in recognizing both regular (horizontal) and irregular (rotated, curved, blurred, or occluded) text in natural images. As research in scene text recognition requires substantial linguistic knowledge, we introduce the pre-trained vision-language model CLIP and the pre-trained language model Llama. Our approach builds upon CLIP’s image and text encoders, featuring two encoder–decoder branches: one visual branch and one cross-modal branch. The visual branch provides initial predictions based on image features, while the cross-modal branch refines these predictions by addressing the differences between image features and textual semantics. We incorporate the large language model Llama2-7B in the cross-modal branch to assist in correcting erroneous predictions generated by the decoder. To fully leverage the potential of both branches, we employ a dual prediction and refinement decoding scheme during inference, resulting in improved accuracy. Experimental results demonstrate that CLIP-Llama achieves state-of-the-art performance on 11 STR benchmark tests, showcasing its robust capabilities. We firmly believe that CLIP-Llama lays a solid and straightforward foundation for future research in scene text recognition based on vision-language models. Full article
Show Figures

Figure 1

Back to TopTop