Journal of Imaging

14 pages, 2761 KiB

Open AccessArticle

Validation of Novel Image Processing Method for Objective Quantification of Intra-Articular Bleeding During Arthroscopic Procedures

by Olgar Birsel, Umut Zengin, Ilker Eren, Ali Ersen, Beren Semiz and Mehmet Demirhan

J. Imaging 2025, 11(2), 40; https://doi.org/10.3390/jimaging11020040 - 31 Jan 2025

Abstract

Visual clarity is crucial for shoulder arthroscopy, directly influencing surgical precision and outcomes. Despite advances in imaging technology, intraoperative bleeding remains a significant obstacle to optimal visibility, with subjective evaluation methods lacking consistency and standardization. This study proposes a novel image processing system [...] Read more.

Visual clarity is crucial for shoulder arthroscopy, directly influencing surgical precision and outcomes. Despite advances in imaging technology, intraoperative bleeding remains a significant obstacle to optimal visibility, with subjective evaluation methods lacking consistency and standardization. This study proposes a novel image processing system to objectively quantify bleeding and assess surgical effectiveness. The system uses color recognition algorithms to calculate a bleeding score based on pixel ratios by incorporating multiple color spaces to enhance accuracy and minimize errors. Moreover, 200 three-second video clips from prior arthroscopic rotator cuff repairs were evaluated by three senior surgeons trained on the system’s color metrics and scoring process. Assessments were repeated two weeks later to test intraobserver reliability. The system’s scores were compared to the average score given by the surgeons. The average surgeon-assigned score was 5.10 (range: 1–9.66), while the system scored videos from 1 to 9.46, with an average of 5.08. The mean absolute error between system and surgeon scores was 0.56, with a standard deviation of 0.50, achieving agreement ranging from [0.96,0.98] with 96.7% confidence (ICC = 0.967). This system provides a standardized method to evaluate intraoperative bleeding, enabling the precise detection of blood variations and supporting advanced technologies like autonomous arthropumps to enhance arthroscopy and surgical outcomes. Full article

(This article belongs to the Section Medical Imaging)

► Show Figures

Figure 1

20 pages, 2884 KiB

Open AccessArticle

Dimensional Accuracy Assessment of Medical Anatomical Models Produced by Hospital-Based Fused Deposition Modeling 3D Printer

by Kevin Wendo, Catherine Behets, Olivier Barbier, Benoit Herman, Thomas Schubert, Benoit Raucent and Raphael Olszewski

J. Imaging 2025, 11(2), 39; https://doi.org/10.3390/jimaging11020039 - 30 Jan 2025

Abstract

As 3D printing technology expands rapidly in medical disciplines, the accuracy evaluation of 3D-printed medical models is required. However, no established guidelines to assess the dimensional error of anatomical models exist. This study aims to evaluate the dimensional accuracy of medical models 3D-printed [...] Read more.

As 3D printing technology expands rapidly in medical disciplines, the accuracy evaluation of 3D-printed medical models is required. However, no established guidelines to assess the dimensional error of anatomical models exist. This study aims to evaluate the dimensional accuracy of medical models 3D-printed using a hospital-based Fused Deposition Modeling (FDM) 3D printer. Two dissected cadaveric right hands were marked with Titanium Kirshner wires to identify landmarks on the heads and bases of all metacarpals and proximal and middle phalanges. Both hands were scanned using a Cone Beam Computed Tomography scanner. Image post-processing and segmentation were performed on 3D Slicer software. Hand models were 3D-printed using a professional hospital-based FDM 3D printer. Manual measurements of all landmarks marked on both pairs of cadaveric and 3D-printed hands were taken by two independent observers using a digital caliper. The Mean Absolute Difference (MAD) and Mean Dimensional Error (MDE) were calculated. Our results showed an acceptable level of dimensional accuracy. The overall study’s MAD was 0.32 mm (±0.34), and its MDE was 1.03% (±0.83). These values fall within the recommended range of errors. A high level of dimensional accuracy of the 3D-printed anatomical models was achieved, suggesting their reliability and suitability for medical applications. Full article

(This article belongs to the Section Medical Imaging)

► Show Figures

Figure 1

19 pages, 1172 KiB

Open AccessReview

Machine Learning-Based Approaches for Breast Density Estimation from Mammograms: A Comprehensive Review

by Khaldoon Alhusari and Salam Dhou

J. Imaging 2025, 11(2), 38; https://doi.org/10.3390/jimaging11020038 - 26 Jan 2025

Abstract

Breast cancer, as of 2022, is the most prevalent type of cancer in women. Breast density—a measure of the non-fatty tissue in the breast—is a strong risk factor for breast cancer that can be estimated from mammograms. The importance of studying breast density [...] Read more.

Breast cancer, as of 2022, is the most prevalent type of cancer in women. Breast density—a measure of the non-fatty tissue in the breast—is a strong risk factor for breast cancer that can be estimated from mammograms. The importance of studying breast density is twofold. First, high breast density can be a factor in lowering mammogram sensitivity, as dense tissue can mask tumors. Second, higher breast density is associated with an increased risk of breast cancer, making accurate assessments vital. This paper presents a comprehensive review of the mammographic density estimation literature, with an emphasis on machine-learning-based approaches. The approaches reviewed can be classified as visual, software-, machine learning-, and segmentation-based. Machine learning methods can be further broken down into two categories: traditional machine learning and deep learning approaches. The most commonly utilized models are support vector machines (SVMs) and convolutional neural networks (CNNs), with classification accuracies ranging from 76.70% to 98.75%. Major limitations of the current works include subjectivity and cost-inefficiency. Future work can focus on addressing these limitations, potentially through the use of unsupervised segmentation and state-of-the-art deep learning models such as transformers. By addressing the current limitations, future research can pave the way for more reliable breast density estimation methods, ultimately improving early detection and diagnosis. Full article

(This article belongs to the Section Medical Imaging)

► Show Figures

Figure 1

17 pages, 5511 KiB

Open AccessArticle

Semantic-Guided Transformer Network for Crop Classification in Hyperspectral Images

by Weiqiang Pi, Tao Zhang, Rongyang Wang, Guowei Ma, Yong Wang and Jianmin Du

J. Imaging 2025, 11(2), 37; https://doi.org/10.3390/jimaging11020037 - 26 Jan 2025

Abstract

The hyperspectral remote sensing images of agricultural crops contain rich spectral information, which can provide important details about crop growth status, diseases, and pests. However, existing crop classification methods face several key limitations when processing hyperspectral remote sensing images, primarily in the following [...] Read more.

The hyperspectral remote sensing images of agricultural crops contain rich spectral information, which can provide important details about crop growth status, diseases, and pests. However, existing crop classification methods face several key limitations when processing hyperspectral remote sensing images, primarily in the following aspects. First, the complex background in the images. Various elements in the background may have similar spectral characteristics to the crops, and this spectral similarity makes the classification model susceptible to background interference, thus reducing classification accuracy. Second, the differences in crop scales increase the difficulty of feature extraction. In different image regions, the scale of crops can vary significantly, and traditional classification methods often struggle to effectively capture this information. Additionally, due to the limitations of spectral information, especially under multi-scale variation backgrounds, the extraction of crop information becomes even more challenging, leading to instability in the classification results. To address these issues, a semantic-guided transformer network (SGTN) is proposed, which aims to effectively overcome the limitations of these deep learning methods and improve crop classification accuracy and robustness. First, a multi-scale spatial–spectral information extraction (MSIE) module is designed that effectively handle the variations of crops at different scales in the image, thereby extracting richer and more accurate features, and reducing the impact of scale changes. Second, a semantic-guided attention (SGA) module is proposed, which enhances the model’s sensitivity to crop semantic information, further reducing background interference and improving the accuracy of crop area recognition. By combining the MSIE and SGA modules, the SGTN can focus on the semantic features of crops at multiple scales, thus generating more accurate classification results. Finally, a two-stage feature extraction structure is employed to further optimize the extraction of crop semantic features and enhance classification accuracy. The results show that on the Indian Pines, Pavia University, and Salinas benchmark datasets, the overall accuracies of the proposed model are 98.24%, 98.34%, and 97.89%, respectively. Compared with other methods, the model achieves better classification accuracy and generalization performance. In the future, the SGTN is expected to be applied to more agricultural remote sensing tasks, such as crop disease detection and yield prediction, providing more reliable technical support for precision agriculture and agricultural monitoring. Full article

(This article belongs to the Section Color, Multi-spectral, and Hyperspectral Imaging)

► Show Figures

Figure 1

18 pages, 18455 KiB

Open AccessArticle

iForal: Automated Handwritten Text Transcription for Historical Medieval Manuscripts

by Alexandre Matos, Pedro Almeida, Paulo L. Correia and Osvaldo Pacheco

J. Imaging 2025, 11(2), 36; https://doi.org/10.3390/jimaging11020036 - 25 Jan 2025

Abstract

The transcription of historical manuscripts aims at making our cultural heritage more accessible to experts and also to the larger public, but it is a challenging and time-intensive task. This paper contributes an automated solution for text layout recognition, segmentation, and recognition to [...] Read more.

The transcription of historical manuscripts aims at making our cultural heritage more accessible to experts and also to the larger public, but it is a challenging and time-intensive task. This paper contributes an automated solution for text layout recognition, segmentation, and recognition to speed up the transcription process of historical manuscripts. The focus is on transcribing Portuguese municipal documents from the Middle Ages in the context of the iForal project, including the contribution of an annotated dataset containing Portuguese medieval documents, notably a corpus of 67 Portuguese royal charter data. The proposed system can accurately identify document layouts, isolate the text, segment, and transcribe it. Results for the layout recognition model achieved 0.98 [email protected] and 0.98 precision, while the text segmentation model achieved 0.91 [email protected], detecting 95% of the lines. The text recognition model achieved 8.1% character error rate (CER) and 25.5% word error rate (WER) on the test set. These results can then be validated by palaeographers with less effort, contributing to achieving high-quality transcriptions faster. Moreover, the automatic models developed can be utilized as a basis for the creation of models that perform well for other historical handwriting styles, notably using transfer learning techniques. The contributed dataset has been made available on the HTR United catalogue, which includes training datasets to be used for automatic transcription or segmentation models. The models developed can be used, for instance, on the eSriptorium platform, which is used by a vast community of experts. Full article

(This article belongs to the Section Document Analysis and Processing)

28 pages, 49034 KiB

Open AccessArticle

Revealing Gender Bias from Prompt to Image in Stable Diffusion

by Yankun Wu, Yuta Nakashima and Noa Garcia

J. Imaging 2025, 11(2), 35; https://doi.org/10.3390/jimaging11020035 - 24 Jan 2025

Abstract

Social biases in generative models have gained increasing attention. This paper proposes an automatic evaluation protocol for text-to-image generation, examining how gender bias originates and perpetuates in the generation process of Stable Diffusion. Using triplet prompts that vary by gender indicators, we trace [...] Read more.

Social biases in generative models have gained increasing attention. This paper proposes an automatic evaluation protocol for text-to-image generation, examining how gender bias originates and perpetuates in the generation process of Stable Diffusion. Using triplet prompts that vary by gender indicators, we trace presentations at several stages of the generation process and explore dependencies between prompts and images. Our findings reveal the bias persists throughout all internal stages of the generating process and manifests in the entire images. For instance, differences in object presence, such as different instruments and outfit preferences, are observed across genders and extend to overall image layouts. Moreover, our experiments demonstrate that neutral prompts tend to produce images more closely aligned with those from masculine prompts than with their female counterparts. We also investigate prompt-image dependencies to further understand how bias is embedded in the generated content. Finally, we offer recommendations for developers and users to mitigate this effect in text-to-image generation. Full article

(This article belongs to the Section AI in Imaging)

30 pages, 3389 KiB

Open AccessArticle

GCNet: A Deep Learning Framework for Enhanced Grape Cluster Segmentation and Yield Estimation Incorporating Occluded Grape Detection with a Correction Factor for Indoor Experimentation

by Rubi Quiñones, Syeda Mariah Banu and Eren Gultepe

J. Imaging 2025, 11(2), 34; https://doi.org/10.3390/jimaging11020034 - 24 Jan 2025

Abstract

Object segmentation algorithms have heavily relied on deep learning techniques to estimate the count of grapes which is a strong indicator for the yield success of grapes. The issue with using object segmentation algorithms for grape analytics is that they are limited to [...] Read more.

Object segmentation algorithms have heavily relied on deep learning techniques to estimate the count of grapes which is a strong indicator for the yield success of grapes. The issue with using object segmentation algorithms for grape analytics is that they are limited to counting only the visible grapes, thus omitting hidden grapes, which affect the true estimate of grape yield. Many grapes are occluded because of either the compactness of the grape bunch cluster or due to canopy interference. This introduces the need for models to be able to estimate the unseen berries to give a more accurate estimate of the grape yield by improving grape cluster segmentation. We propose the Grape Counting Network (GCNet), a novel framework for grape cluster segmentation, integrating deep learning techniques with correction factors to address challenges in indoor yield estimation. GCNet incorporates occlusion adjustments, enhancing segmentation accuracy even under conditions of foliage and cluster compactness, and setting new standards in agricultural indoor imaging analysis. This approach improves yield estimation accuracy, achieving a R² of 0.96 and reducing mean absolute error (MAE) by 10% compared to previous methods. We also propose a new dataset called GrapeSet which contains visible imagery of grape clusters imaged indoors, along with their ground truth mask, total grape count, and weight in grams. The proposed framework aims to encourage future research in determining which features of grapes can be leveraged to estimate the correct grape yield count, equip grape harvesters with the knowledge of early yield estimation, and produce accurate results in object segmentation algorithms for grape analytics. Full article

(This article belongs to the Special Issue Deep Learning in Image Analysis: Progress and Challenges)

► Show Figures

Figure 1

18 pages, 1716 KiB

Open AccessArticle

Investigating the Potential of Latent Space for the Classification of Paint Defects

by Doaa Almhaithawi, Alessandro Bellini, Georgios C. Chasparis and Tania Cerquitelli

J. Imaging 2025, 11(2), 33; https://doi.org/10.3390/jimaging11020033 - 24 Jan 2025

Abstract

Defect detection methods have greatly assisted human operators in various fields, from textiles to surfaces and mechanical components, by facilitating decision-making processes and reducing visual fatigue. This area of research is widely recognized as a cross-industry concern, particularly in the manufacturing sector. Nevertheless, [...] Read more.

Defect detection methods have greatly assisted human operators in various fields, from textiles to surfaces and mechanical components, by facilitating decision-making processes and reducing visual fatigue. This area of research is widely recognized as a cross-industry concern, particularly in the manufacturing sector. Nevertheless, each specific application brings unique challenges that require tailored solutions. This paper presents a novel framework for leveraging latent space representations in defect detection tasks, focusing on improving explainability while maintaining accuracy. This work delves into how latent spaces can be utilized by integrating unsupervised and supervised analyses. We propose a hybrid methodology that not only identifies known defects but also provides a mechanism for detecting anomalies and dynamically adapting to new defect types. This dual approach supports human operators, reducing manual workload and enhancing interpretability. Full article

(This article belongs to the Section AI in Imaging)

► Show Figures

Figure 1

23 pages, 5040 KiB

Open AccessArticle

Optimizing Deep Learning Models for Climate-Related Natural Disaster Detection from UAV Images and Remote Sensing Data

by Kim VanExel, Samendra Sherchan and Siyan Liu

J. Imaging 2025, 11(2), 32; https://doi.org/10.3390/jimaging11020032 - 24 Jan 2025

Abstract

This research study utilized artificial intelligence (AI) to detect natural disasters from aerial images. Flooding and desertification were two natural disasters taken into consideration. The Climate Change Dataset was created by compiling various open-access data sources. This dataset contains 6334 aerial images from [...] Read more.

This research study utilized artificial intelligence (AI) to detect natural disasters from aerial images. Flooding and desertification were two natural disasters taken into consideration. The Climate Change Dataset was created by compiling various open-access data sources. This dataset contains 6334 aerial images from UAV (unmanned aerial vehicles) images and satellite images. The Climate Change Dataset was then used to train Deep Learning (DL) models to identify natural disasters. Four different Machine Learning (ML) models were used: convolutional neural network (CNN), DenseNet201, VGG16, and ResNet50. These ML models were trained on our Climate Change Dataset so that their performance could be compared. DenseNet201 was chosen for optimization. All four ML models performed well. DenseNet201 and ResNet50 achieved the highest testing accuracies of 99.37% and 99.21%, respectively. This research project demonstrates the potential of AI to address environmental challenges, such as climate change-related natural disasters. This study’s approach is novel by creating a new dataset, optimizing an ML model, cross-validating, and presenting desertification as one of our natural disasters for DL detection. Three categories were used (Flooded, Desert, Neither). Our study relates to AI for Climate Change and Environmental Sustainability. Drone emergency response would be a practical application for our research project. Full article

(This article belongs to the Section AI in Imaging)

► Show Figures

Figure 1

19 pages, 7485 KiB

Open AccessArticle

Design of an Optimal Convolutional Neural Network Architecture for MRI Brain Tumor Classification by Exploiting Particle Swarm Optimization

by Sofia El Amoury, Youssef Smili and Youssef Fakhri

J. Imaging 2025, 11(2), 31; https://doi.org/10.3390/jimaging11020031 - 24 Jan 2025

Abstract

The classification of brain tumors using MRI scans is critical for accurate diagnosis and effective treatment planning, though it poses significant challenges due to the complex and varied characteristics of tumors, including irregular shapes, diverse sizes, and subtle textural differences. Traditional convolutional neural [...] Read more.

The classification of brain tumors using MRI scans is critical for accurate diagnosis and effective treatment planning, though it poses significant challenges due to the complex and varied characteristics of tumors, including irregular shapes, diverse sizes, and subtle textural differences. Traditional convolutional neural network (CNN) models, whether handcrafted or pretrained, frequently fall short in capturing these intricate details comprehensively. To address this complexity, an automated approach employing Particle Swarm Optimization (PSO) has been applied to create a CNN architecture specifically adapted for MRI-based brain tumor classification. PSO systematically searches for an optimal configuration of architectural parameters—such as the types and numbers of layers, filter quantities and sizes, and neuron numbers in fully connected layers—with the objective of enhancing classification accuracy. This performance-driven method avoids the inefficiencies of manual design and iterative trial and error. Experimental results indicate that the PSO-optimized CNN achieves a classification accuracy of 99.19%, demonstrating significant potential for improving diagnostic precision in complex medical imaging applications and underscoring the value of automated architecture search in advancing critical healthcare technology. Full article

(This article belongs to the Special Issue Learning and Optimization for Medical Imaging)

► Show Figures

Figure 1

17 pages, 6472 KiB

Open AccessArticle

A Method for Estimating Fluorescence Emission Spectra from the Image Data of Plant Grain and Leaves Without a Spectrometer

by Shoji Tominaga, Shogo Nishi, Ryo Ohtera and Hideaki Sakai

J. Imaging 2025, 11(2), 30; https://doi.org/10.3390/jimaging11020030 - 21 Jan 2025

Abstract

This study proposes a method for estimating the spectral images of fluorescence spectral distributions emitted from plant grains and leaves without using a spectrometer. We construct two types of multiband imaging systems with six channels, using ordinary off-the-shelf cameras and a UV light. [...] Read more.

This study proposes a method for estimating the spectral images of fluorescence spectral distributions emitted from plant grains and leaves without using a spectrometer. We construct two types of multiband imaging systems with six channels, using ordinary off-the-shelf cameras and a UV light. A mobile phone camera is used to detect the fluorescence emission in the blue wavelength region of rice grains. For plant leaves, a small monochrome camera is used with additional optical filters to detect chlorophyll fluorescence in the red-to-far-red wavelength region. A ridge regression approach is used to obtain a reliable estimate of the spectral distribution of the fluorescence emission at each pixel point from the acquired image data. The spectral distributions can be estimated by optimally selecting the ridge parameter without statistically analyzing the fluorescence spectra. An algorithm for optimal parameter selection is developed using a cross-validation technique. In experiments using real rice grains and green leaves, the estimated fluorescence emission spectral distributions by the proposed method are compared to the direct measurements obtained with a spectroradiometer and the estimates obtained using the minimum norm estimation method. The estimated images of fluorescence emissions are presented for rice grains and green leaves. The reliability of the proposed estimation method is demonstrated. Full article

(This article belongs to the Special Issue Color in Image Processing and Computer Vision)

► Show Figures

Figure 1

13 pages, 3880 KiB

Open AccessArticle

Remote Sensing Target Tracking Method Based on Super-Resolution Reconstruction and Hybrid Networks

by Hongqing Wan, Sha Xu, Yali Yang and Yongfang Li

J. Imaging 2025, 11(2), 29; https://doi.org/10.3390/jimaging11020029 - 21 Jan 2025

Abstract

Remote sensing images have the characteristics of high complexity, being easily distorted, and having large-scale variations. Moreover, the motion of remote sensing targets usually has nonlinear features, and existing target tracking methods based on remote sensing data cannot accurately track remote sensing targets. [...] Read more.

Remote sensing images have the characteristics of high complexity, being easily distorted, and having large-scale variations. Moreover, the motion of remote sensing targets usually has nonlinear features, and existing target tracking methods based on remote sensing data cannot accurately track remote sensing targets. And obtaining high-resolution images by optimizing algorithms will save a lot of costs. Aiming at the problem of large tracking errors in remote sensing target tracking by current tracking algorithms, this paper proposes a target tracking method combined with a super-resolution hybrid network. Firstly, this method utilizes the super-resolution reconstruction network to improve the resolution of remote sensing images. Then, the hybrid neural network is used to estimate the target motion after target detection. Finally, identity matching is completed through the Hungarian algorithm. The experimental results show that the tracking accuracy of this method is 67.8%, and the recognition identification F-measure (IDF1) value is 0.636. Its performance indicators are better than those of traditional target tracking algorithms, and it can meet the requirements for accurate tracking of remote sensing targets. Full article

► Show Figures

Figure 1

17 pages, 5156 KiB

Open AccessArticle

Plant Detection in RGB Images from Unmanned Aerial Vehicles Using Segmentation by Deep Learning and an Impact of Model Accuracy on Downstream Analysis

by Mikhail V. Kozhekin, Mikhail A. Genaev, Evgenii G. Komyshev, Zakhar A. Zavyalov and Dmitry A. Afonnikov

J. Imaging 2025, 11(1), 28; https://doi.org/10.3390/jimaging11010028 - 20 Jan 2025

Abstract

Crop field monitoring using unmanned aerial vehicles (UAVs) is one of the most important technologies for plant growth control in modern precision agriculture. One of the important and widely used tasks in field monitoring is plant stand counting. The accurate identification of plants [...] Read more.

Crop field monitoring using unmanned aerial vehicles (UAVs) is one of the most important technologies for plant growth control in modern precision agriculture. One of the important and widely used tasks in field monitoring is plant stand counting. The accurate identification of plants in field images provides estimates of plant number per unit area, detects missing seedlings, and predicts crop yield. Current methods are based on the detection of plants in images obtained from UAVs by means of computer vision algorithms and deep learning neural networks. These approaches depend on image spatial resolution and the quality of plant markup. The performance of automatic plant detection may affect the efficiency of downstream analysis of a field cropping pattern. In the present work, a method is presented for detecting the plants of five species in images acquired via a UAV on the basis of image segmentation by deep learning algorithms (convolutional neural networks). Twelve orthomosaics were collected and marked at several sites in Russia to train and test the neural network algorithms. Additionally, 17 existing datasets of various spatial resolutions and markup quality levels from the Roboflow service were used to extend training image sets. Finally, we compared several texture features between manually evaluated and neural-network-estimated plant masks. It was demonstrated that adding images to the training sample (even those of lower resolution and markup quality) improves plant stand counting significantly. The work indicates how the accuracy of plant detection in field images may affect their cropping pattern evaluation by means of texture characteristics. For some of the characteristics (GLCM mean, GLRM long run, GLRM run ratio) the estimates between images marked manually and automatically are close. For others, the differences are large and may lead to erroneous conclusions about the properties of field cropping patterns. Nonetheless, overall, plant detection algorithms with a higher accuracy show better agreement with the estimates of texture parameters obtained from manually marked images. Full article

(This article belongs to the Special Issue Imaging Applications in Agriculture)

► Show Figures

Figure 1

15 pages, 3743 KiB

Open AccessArticle

Blink Detection Using 3D Convolutional Neural Architectures and Analysis of Accumulated Frame Predictions

by George Nousias, Konstantinos K. Delibasis and Georgios Labiris

J. Imaging 2025, 11(1), 27; https://doi.org/10.3390/jimaging11010027 - 19 Jan 2025

Abstract

Blink detection is considered a useful indicator both for clinical conditions and drowsiness state. In this work, we propose and compare deep learning architectures for the task of detecting blinks in video frame sequences. The first step is the training and application of [...] Read more.

Blink detection is considered a useful indicator both for clinical conditions and drowsiness state. In this work, we propose and compare deep learning architectures for the task of detecting blinks in video frame sequences. The first step is the training and application of an eye detector that extracts the eye regions from each video frame. The cropped eye regions are organized as three-dimensional (3D) input with the third dimension spanning time of 300 ms. Two different 3D convolutional neural networks are utilized (a simple 3D CNN and 3D ResNet), as well as a 3D autoencoder combined with a classifier coupled to the latent space. Finally, we propose the usage of a frame prediction accumulator combined with morphological processing and watershed segmentation to detect blinks and determine their start and stop frame in previously unseen videos. The proposed framework was trained on ten (9) different participants and tested on five (8) different ones, with a total of 162,400 frames and 1172 blinks for each eye. The start and end frame of each blink in the dataset has been annotate by specialized ophthalmologist. Quantitative comparison with state-of-the-art blink detection methodologies provide favorable results for the proposed neural architectures coupled with the prediction accumulator, with the 3D ResNet being the best as well as the fastest performer. Full article

(This article belongs to the Special Issue Deep Learning in Biomedical Image Segmentation and Classification: Advancements, Challenges and Applications)

► Show Figures

Figure 1

17 pages, 7356 KiB

Open AccessArticle

Increasing Neural-Based Pedestrian Detectors’ Robustness to Adversarial Patch Attacks Using Anomaly Localization

by Olga Ilina, Maxim Tereshonok and Vadim Ziyadinov

J. Imaging 2025, 11(1), 26; https://doi.org/10.3390/jimaging11010026 - 17 Jan 2025

Abstract

Object detection in images is a fundamental component of many safety-critical systems, such as autonomous driving, video surveillance systems, and robotics. Adversarial patch attacks, being easily implemented in the real world, provide effective counteraction to object detection by state-of-the-art neural-based detectors. It poses [...] Read more.

Object detection in images is a fundamental component of many safety-critical systems, such as autonomous driving, video surveillance systems, and robotics. Adversarial patch attacks, being easily implemented in the real world, provide effective counteraction to object detection by state-of-the-art neural-based detectors. It poses a serious danger in various fields of activity. Existing defense methods against patch attacks are insufficiently effective, which underlines the need to develop new reliable solutions. In this manuscript, we propose a method which helps to increase the robustness of neural network systems to the input adversarial images. The proposed method consists of a Deep Convolutional Neural Network to reconstruct a benign image from the adversarial one; a Calculating Maximum Error block to highlight the mismatches between input and reconstructed images; a Localizing Anomalous Fragments block to extract the anomalous regions using the Isolation Forest algorithm from histograms of images’ fragments; and a Clustering and Processing block to group and evaluate the extracted anomalous regions. The proposed method, based on anomaly localization, demonstrates high resistance to adversarial patch attacks while maintaining the high quality of object detection. The experimental results show that the proposed method is effective in defending against adversarial patch attacks. Using the YOLOv3 algorithm with the proposed defensive method for pedestrian detection in the INRIAPerson dataset under the adversarial attacks, the mAP50 metric reaches 80.97% compared to 46.79% without a defensive method. The results of the research demonstrate that the proposed method is promising for improvement of object detection systems security. Full article

(This article belongs to the Section Image and Video Processing)

► Show Figures

Figure 1

23 pages, 22211 KiB

Open AccessArticle

A Local Adversarial Attack with a Maximum Aggregated Region Sparseness Strategy for 3D Objects

by Ling Zhao, Xun Lv, Lili Zhu, Binyan Luo, Hang Cao, Jiahao Cui, Haifeng Li and Jian Peng

J. Imaging 2025, 11(1), 25; https://doi.org/10.3390/jimaging11010025 - 13 Jan 2025

Abstract

The increasing reliance on deep neural network-based object detection models in various applications has raised significant security concerns due to their vulnerability to adversarial attacks. In physical 3D environments, existing adversarial attacks that target object detection (3D-AE) face significant challenges. These attacks often [...] Read more.

The increasing reliance on deep neural network-based object detection models in various applications has raised significant security concerns due to their vulnerability to adversarial attacks. In physical 3D environments, existing adversarial attacks that target object detection (3D-AE) face significant challenges. These attacks often require large and dispersed modifications to objects, making them easily noticeable and reducing their effectiveness in real-world scenarios. To maximize the attack effectiveness, large and dispersed attack camouflages are often employed, which makes the camouflages overly conspicuous and reduces their visual stealth. The core issue is how to use minimal and concentrated camouflage to maximize the attack effect. Addressing this, our research focuses on developing more subtle and efficient attack methods that can better evade detection in practical settings. Based on these principles, this paper proposes a local 3D attack method driven by a Maximum Aggregated Region Sparseness (MARS) strategy. In simpler terms, our approach strategically concentrates the attack modifications to specific areas to enhance effectiveness while maintaining stealth. To maximize the aggregation of attack-camouflaged regions, an aggregation regularization term is designed to constrain the mask aggregation matrix based on the face-adjacency relationships. To minimize the attack camouflage regions, a sparseness regularization is designed to make the mask weights tend toward a U-shaped distribution and limit extreme values. Additionally, neural rendering is used to obtain gradient-propagating multi-angle augmented data and suppress the model’s detection to locate universal critical decision regions from multiple angles. These technical strategies ensure that the adversarial modifications remain effective across different viewpoints and conditions. We test the attack effectiveness of different region selection strategies. On the CARLA dataset, the average attack efficiency of attacking the YOLOv3 and v5 series networks reaches 1.724, which represents an improvement of 0.986 (134%) compared to baseline methods. These results demonstrate a significant enhancement in attack performance, highlighting the potential risks to real-world object detection systems. The experimental results demonstrate that our attack method achieves both stealth and aggressiveness from different viewpoints. Furthermore, we explore the transferability of the decision regions. The results indicate that our method can be effectively combined with different texture optimization methods, with the average precision decreasing by 0.488 and 0.662 across different networks, which indicates a strong attack effectiveness. Full article

► Show Figures

Figure 1

22 pages, 11474 KiB

Open AccessArticle

LittleFaceNet: A Small-Sized Face Recognition Method Based on RetinaFace and AdaFace

by Zhengwei Ren, Xinyu Liu, Jing Xu, Yongsheng Zhang and Ming Fang

J. Imaging 2025, 11(1), 24; https://doi.org/10.3390/jimaging11010024 - 13 Jan 2025

Abstract

For surveillance video management in university laboratories, issues such as occlusion and low-resolution face capture often arise. Traditional face recognition algorithms are typically static and rely heavily on clear images, resulting in inaccurate recognition for low-resolution, small-sized faces. To address the challenges of [...] Read more.

For surveillance video management in university laboratories, issues such as occlusion and low-resolution face capture often arise. Traditional face recognition algorithms are typically static and rely heavily on clear images, resulting in inaccurate recognition for low-resolution, small-sized faces. To address the challenges of occlusion and low-resolution person identification, this paper proposes a new face recognition framework by reconstructing Retinaface-Resnet and combining it with Quality-Adaptive Margin (adaface). Currently, although there are many target detection algorithms, they all require a large amount of data for training. However, datasets for low-resolution face detection are scarce, leading to poor detection performance of the models. This paper aims to solve Retinaface’s weak face recognition capability in low-resolution scenarios and its potential inaccuracies in face bounding box localization when faces are at extreme angles or partially occluded. To this end, Spatial Depth-wise Separable Convolutions are introduced. Retinaface-Resnet is designed for face detection and localization, while adaface is employed to address low-resolution face recognition by using feature norm approximation to estimate image quality and applying an adaptive margin function. Additionally, a multi-object tracking algorithm is used to solve the problem of moving occlusion. Experimental results demonstrate significant improvements, achieving an accuracy of 96.12% on the WiderFace dataset and a recognition accuracy of 84.36% in practical laboratory applications. Full article

(This article belongs to the Section Computer Vision and Pattern Recognition)

► Show Figures

Figure 1

20 pages, 7090 KiB

Open AccessArticle

An Infrared and Visible Image Alignment Method Based on Gradient Distribution Properties and Scale-Invariant Features in Electric Power Scenes

by Lin Zhu, Yuxing Mao, Chunxu Chen and Lanjia Ning

J. Imaging 2025, 11(1), 23; https://doi.org/10.3390/jimaging11010023 - 13 Jan 2025

Abstract

In grid intelligent inspection systems, automatic registration of infrared and visible light images in power scenes is a crucial research technology. Since there are obvious differences in key attributes between visible and infrared images, direct alignment is often difficult to achieve the expected [...] Read more.

In grid intelligent inspection systems, automatic registration of infrared and visible light images in power scenes is a crucial research technology. Since there are obvious differences in key attributes between visible and infrared images, direct alignment is often difficult to achieve the expected results. To overcome the high difficulty of aligning infrared and visible light images, an image alignment method is proposed in this paper. First, we use the Sobel operator to extract the edge information of the image pair. Second, the feature points in the edges are recognised by a curvature scale space (CSS) corner detector. Third, the Histogram of Orientation Gradients (HOG) is extracted as the gradient distribution characteristics of the feature points, which are normalised with the Scale Invariant Feature Transform (SIFT) algorithm to form feature descriptors. Finally, initial matching and accurate matching are achieved by the improved fast approximate nearest-neighbour matching method and adaptive thresholding, respectively. Experiments show that this method can robustly match the feature points of image pairs under rotation, scale, and viewpoint differences, and achieves excellent matching results. Full article

(This article belongs to the Topic Computer Vision and Image Processing, 2nd Edition)

► Show Figures

Graphical abstract

19 pages, 4635 KiB

Open AccessArticle

ZooCNN: A Zero-Order Optimized Convolutional Neural Network for Pneumonia Classification Using Chest Radiographs

by Saravana Kumar Ganesan, Parthasarathy Velusamy, Santhosh Rajendran, Ranjithkumar Sakthivel, Manikandan Bose and Baskaran Stephen Inbaraj

J. Imaging 2025, 11(1), 22; https://doi.org/10.3390/jimaging11010022 - 13 Jan 2025

Abstract

Pneumonia, a leading cause of mortality in children under five, is usually diagnosed through chest X-ray (CXR) images due to its efficiency and cost-effectiveness. However, the shortage of radiologists in the Least Developed Countries (LDCs) emphasizes the need for automated pneumonia diagnostic systems. [...] Read more.

Pneumonia, a leading cause of mortality in children under five, is usually diagnosed through chest X-ray (CXR) images due to its efficiency and cost-effectiveness. However, the shortage of radiologists in the Least Developed Countries (LDCs) emphasizes the need for automated pneumonia diagnostic systems. This article presents a Deep Learning model, Zero-Order Optimized Convolutional Neural Network (ZooCNN), a Zero-Order Optimization (Zoo)-based CNN model for classifying CXR images into three classes, Normal Lungs (NL), Bacterial Pneumonia (BP), and Viral Pneumonia (VP); this model utilizes the Adaptive Synthetic Sampling (ADASYN) approach to ensure class balance in the Kaggle CXR Images (Pneumonia) dataset. Conventional CNN models, though promising, face challenges such as overfitting and have high computational costs. The use of ZooPlatform (ZooPT), a hyperparameter finetuning strategy, on a baseline CNN model finetunes the hyperparameters and provides a modified architecture, ZooCNN, with a 72% reduction in weights. The model was trained, tested, and validated on the Kaggle CXR Images (Pneumonia) dataset. The ZooCNN achieved an accuracy of 97.27%, a sensitivity of 97.00%, a specificity of 98.60%, and an F1 score of 97.03%. The results were compared with contemporary models to highlight the efficacy of the ZooCNN in pneumonia classification (PC), offering a potential tool to aid physicians in clinical settings. Full article

► Show Figures

Figure 1

Journal Description

Journal of Imaging

Latest Articles

Journal Menu

Journal Browser

Highly Accessed Articles

Latest Books

E-Mail Alert

News

Topics

Conferences

Special Issues

Further Information

Guidelines

MDPI Initiatives

Follow MDPI