A Systematic Review and Comparative Analysis Approach to Boom Gate Access Using Plate Number Recognition

Bukola, Asaju Christine; Owolawi, Pius Adewale; Du, Chuling; Van Wyk, Etienne

doi:10.3390/computers13110286

Open AccessSystematic Review

A Systematic Review and Comparative Analysis Approach to Boom Gate Access Using Plate Number Recognition

by

Asaju Christine Bukola

^*

,

Pius Adewale Owolawi

^*

,

Chuling Du

and

Etienne Van Wyk

Computer Systems Engineering, Tshwane University of Technology, Pretoria 0001, South Africa

^*

Authors to whom correspondence should be addressed.

Computers 2024, 13(11), 286; https://doi.org/10.3390/computers13110286

Submission received: 14 August 2024 / Revised: 18 October 2024 / Accepted: 20 October 2024 / Published: 4 November 2024

(This article belongs to the Special Issue Machine Learning Applications in Pattern Recognition)

Download

Browse Figures

Versions Notes

Abstract

:

Security has been paramount to many organizations for many years, with access control being one of the critical measures to ensure security. Among various approaches to access control, vehicle plate number recognition has received wide attention. However, its application to boom gate access has not been adequately explored. This study proposes a method to access the boom gate by optimizing vehicle plate number recognition. Given the speed and accuracy of the YOLO (You Only Look Once) object detection algorithm, this study proposes using the YOLO deep learning algorithm for plate number detection to access a boom gate. To identify the gap and the most suitable YOLO variant, the study systematically surveyed the publication database to identify peer-reviewed articles published between 2020 and 2024 on plate number recognition using different YOLO versions. In addition, experiments are performed on four YOLO versions: YOLOv5, YOLOv7, YOLOv8, and YOLOv9, focusing on vehicle plate number recognition. The experiments, using an open-source dataset with 699 samples in total, reported accuracies of 81%, 82%, 83%, and 73% for YOLO V5, V7, V8, and V9, respectively. This comparative analysis aims to determine the most appropriate YOLO version for the task, optimizing both security and efficiency in boom gate access control systems. By optimizing the capabilities of advanced YOLO algorithms, the proposed method seeks to improve the reliability and effectiveness of access control through precise and rapid plate number recognition. The result of the analysis reveals that each YOLO version has distinct advantages depending on the application’s specific requirements. In complex detection conditions with changing lighting and shadows, it was revealed that YOLOv8 performed better in terms of reduced loss rates and increased precision and recall metrics.

Keywords:

automatic license plate detection; YOLO; computer vision; object detection

1. Introduction

Object detection is a key technology in the ever-evolving field of computer vision, providing computers with a remarkable ability to recognize and locate a wide range of objects in visual input [1]. The applications of object detection are numerous and revolutionary, ranging from surveillance systems that monitor public areas [2] to self-driving cars traversing busy streets [3], face detection, face recognition, pedestrian counting, security systems, vehicle detection [4], etc. Object detection surpasses the confines of basic image recognition by not only identifying the presence of objects but also indicating their specific locations in the visual field [5]. This capability is made possible through the deployment of some algorithms, most notably convolutional neural networks (CNNs) and deep learning methodologies. These algorithms undergo rigorous training on diverse datasets, enabling them to recognize patterns, shapes, and features that define individual objects.

Based on the number of times a similar input image is sent through a network, object detection methods are separated into one-shot and two-shot algorithms. Two-shot algorithms were proposed earlier, including R-CNN (Region-based Convolutional Neural Networks) [6], Fast R-CNN, and Faster R-CNN [7]. Many optimization approaches follow Faster R-CNN, such as enhanced feature networks, improved RPN, improved ROI classification, sample post-processing, larger mini-batch, etc. [8], resulting in today’s famous R-FCN and Mask R-CNN algorithms [8]. Two passes of the input image are used in two-shot object detection to determine the existence and location of objects. In the first pass, a series of predictions or possible object positions is produced, these predictions are refined, and final predictions are made in the second run. Although this method is computationally more expensive than single-shot object detection (Figure 1), it is also more accurate [9].

Single-shot object detection makes a single pass over the input image to estimate the presence and location of objects. Because they process a whole image in a single pass, they are computationally efficient [9]. These techniques can be used for real-time object detection in environments with limited resources. A comparison of various object identification models is presented in Table 1, which takes into account the models’ architecture, speed, accuracy, and training time. YOLO represents a classic model for one-shot algorithms. SSD is another significant work that comes after YOLO. Following SSD, there are R-SSD, DSSD, DSOD, and FSSD works. Other one-step detection techniques exist; however, YOLO is typically faster and more accurate than the others. We investigated the use of YOLO algorithms to identify license plate numbers due to their computational efficiency. This study focuses on the identification of license plate numbers, an important task to enhance organizational security through vehicle access. Controlled access must be ensured, since any shortcomings in the access system could seriously jeopardize the security structure. The research places particular emphasis on license plate recognition as a crucial element of access control systems that aid in the management and improvement of security in regions that are restricted. The objective is to improve security procedures and offer more effective and secure alternatives to access control by concentrating on this type of technology.

The major contributions of this paper are as follows.

Systematically review articles published between 2020 and 2024 on the use of YOLO for license plate recognition.
Identify obstacles, limitations, and gaps in the literature on using YOLO for license plate number recognition.
Perform a comparative analysis of YOLO versions for license plate detection: a key contribution lies in the experimental comparison of four recent versions of YOLO (versions 5, 7, 8, and 9) for the specific task of license plate number detection. This analysis aims to determine the most efficient YOLO version, offering practical insights for real-world applications.
Application of YOLO object detection to boom gate access.

The remainder of this work is detailed as follows. Section 2 discusses the different works that have been carried out using one or more versions of the proposed YOLO network for the task of detecting the number of license plates using a systematic review approach. The reviewer was able to recognize the gap in the existing research. Section 3 talks about the basic architecture of YOLO. Section 4 described the inception of the YOLO versions up to date, with an emphasis on the four versions that will be used for analysis. In Section 5, the different experiments that were carried out in all four versions of YOLO for license plate recognition are discussed. Section 6 presented the results that were reported from the experiments, and these results were discussed. The conclusion of the study is found in Section 10.

2. Related Work: A Systematic Review Approach

This review was performed in accordance with the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. Different authors have applied different versions of YOLO and reported remarkable results in license plate number detection tasks. In this section, a systematic review approach is employed to review the advents and usage of YOLO versions in recent years by authors. The review considers works between the years 2020 and 2024. Since older studies may not have incorporated the most recent methods or addressed the issues raised by newer technology and datasets, concentrating on articles published after 2020 guarantees that the review covers the most recent research and reflects current developments in the area. Consequently, the scope was purposefully restricted to include the most recent and pertinent methods of license plate identification. The articles considered in these years include conference papers, articles, and review papers. This study also takes into consideration sources such as journals and conference proceedings. The articles that were in their final publishing stages were selected for review. All selected articles were open-access and in English. The search was carried out using two major databases; Scopus and Google Scholar. The search was carried out on 17 May 2024. The database search keywords used in Scopus include; YOLO and plate number recognition using the query structure with the clauses “AND” and “OR”. For Google Scholar, keywords were used. It was discovered from the search, specifically, the Scopus database, that a large percentage of the work on YOLO for plate number recognition was carried out in the computer science field (31%). At the same time, 23.2% was conducted in engineering, and 10.4% was carried out in mathematics. The remaining works were conducted in the fields of physics, medicine, agriculture, biochemistry, energy, and others. The percentage distribution is shown in Figure 2.

The study also analyzed the search based on the type of document. The analysis revealed that the highest percentage of authors presented their work through conference papers (58.9%). In total, 39.9% were published as articles, while the remaining were book chapters, reviews, books, letters, and data papers. Our analysis also revealed that 0.1% of the works were retracted. This indicates a need for this review, as reviews only made up 0.4% of the works since 2020.

Figure 3 shows the percentage distribution in the form of a pie chart.

Considering the year of the various studies, this study breaks down the total findings based on the year of publication, as shown in Table 2.

The publication of documents increased significantly from 2020 to 2023, and the maximum number was found to be from 2023. The documents reported a downward surge in the year 2024, which is significantly lower than in 2023. However, it is much better than what was reported in 2021 and even 2020. From 2020 to 2021, 46.34% more documents were published with an increase of 196 documents. From 2021 to 2022, 74.31% more documents were published, totalling an increase of 460. From 2022 to 2023, 43.56% more documents were published, thereby increasing by 470. From 2023 to 2024, 60.87% less work was conducted, with 943 declining in document publication. We observed that 2023 is the year that reported the maximum number of documents. This might depict high research activity or more activity in terms of data collection. The decline in 2024 is not good, and this might be because there was possibly a trend reversal or diminishing research activities. However, it may be just because the year has not ended yet, and in the course of the year, the number of documents will most likely increase. From the overall comparison with the year 2020, the overall trend from the year 2020 to 2024 speaks of the growth in the documents. This indicates the need for more research in this area. Table 2 is represented as a graph in Figure 4 to clearly depict the surges over the years.

Based on different screening criteria, the relevant articles that were finally reviewed as most related to the study topic were selected and reduced to a total of 19 articles from Scopus and 4 articles from Google Scholar. A total of 23 articles were selected for review, but only 15 were relevant to this study. Therefore, in this study, a total of 15 articles were reviewed. This indicates that there is more room for further research in the YOLO algorithm for plate number recognition. The Prisma flow chart for the search is given in Figure 5 and depicts the search flow for the articles included in this review.

Review of Selected Articles

The final articles that were screened and included are reviewed in this subsection. These are discussed as follows:

The authors of [14] presented a technique for better surveillance and enforcement of traffic rules, particularly in places with high vehicle density, such as traffic signals, military bases, and even government headquarters. The authors proposed an ANPR system based on YOLO object detection and optical character recognition, powered by the Py-tesseract engine. YOLO was utilized to process detected images of moving vehicles and capture number plates. Textual information was identified using OCR. A dataset of 600 images of Indian vehicles was used to train and test the system. The system developed had a license plate detection rate accuracy of 97%, high precision, recall, and mAP values. Many factors have been described that make this proposed model particularly good at detecting number plates under various conditions. However, recognition in varying lighting and weather circumstances, as well as compatibility with various number plate forms, still needs to be improved.

The authors of [15] addressed the challenge of monitoring traffic offences of two-wheelers in real-time. Their method used a framework that combined YOLOv7 for object detection and a CNN model for optical character recognition (OCR). The study used a custom dataset of 1069 motorcycle images obtained from five places in Goa, India, as well as a Kaggle character image collection. Their model outperformed prior models, with 92.6% accuracy for license plate detection and 99% for OCR. More improvements are required for efficiency and to address a variety of traffic infractions, such as violating passenger limitations on motorbikes.

The manual enforcement of helmet laws among motorcyclists was considered inefficient and resource-intensive in [16]. Therefore, they proposed an automated system using YOLO for helmet detection and OCR for number plate recognition. The dataset used for the experiment included self-made videos from various locations in Lahore, Pakistan, and images from online repositories. The system reported an accuracy of 97.69% in detecting helmet violations and recognizing license plates. Further development is required to improve real-time performance and handle occlusion and diverse lighting conditions.

The authors of [17] studied the economic and logistical issues of manually enforcing helmet laws. They therefore proposed a system that utilized YOLOv8 for helmet detection and EasyOCR for number plate identification. Their study used a collection of 500 images of vehicles and helmets to train and test their model. The model reported a 100% detection accuracy and 80% number plate recognition accuracy, indicating increased speed and accuracy for real-time applications.

Traditional license plate recognition systems struggle with accuracy and speed in changing conditions, resulting in a high false-positive rate. The study in [18] aims to address this. The authors combined YOLO, a rapid and accurate object detection model, and blockchain technology to improve the efficiency and security of license plate identification in intelligent transportation systems. The system demonstrated higher accuracy and a large reduction in false positives. The integration with blockchain enabled safe and tamper-proof data management. Specific findings included higher detection rates; however, exact percentages were not provided. The system requires more improvement to distinguish between license plates and comparable objects such as billboards and traffic signs, as its current performance in complicated situations is poor.

Existing methods, such as Faster R-CNN and SSD, have limitations in real-time license plate recognition due to slower speeds and lower accuracy, as identified by the authors in [19]. Their work proposed employing YOLOv8 to improve detection speed and accuracy by comparing it to faster R-CNN and SSD. YOLOv8 outperformed faster R-CNN and SSD, with higher precision and recall rates. The accuracy for short datasets was 70%, whereas for large datasets it was 41.9%, with significant mAPs. Optimization is required for detecting very small license plates and those in low-light settings. Comprehensive datasets are necessary to assess the algorithm’s robustness in a variety of settings.

According to the authors of [20], misuse of allocated accessible parking spaces is a significant issue, and typical LPR systems struggle with real-time enforcement. The authors developed a Shine system that employs the YOLOv7 model to detect motor vehicles, license plates, and disability badges while authenticating permitted use through a central server. Shine obtained a mean average precision (mAP) of 92.16%, successfully recognizing and checking permitted vehicles in real-time. Their technology currently does not provide real-time availability information for accessible parking spaces. Future improvements could address this issue, increasing total parking management efficiency in metropolitan settings.

The problem highlighted in [21] is the real-time localization of Bhutanese license plates, which is difficult due to the variety of plate patterns. The proposed method suggested the use of a YOLO-based deep learning model for accurate detection. The dataset used includes real-world images of Bhutanese license plates. The results reveal an overall mean Average Precision (mAP) of 98.6% and a training loss of 0.0231, demonstrating a great detection accuracy. However, the study identified a deficiency in the variety of datasets, indicating the need for more diverse training data to improve resilience.

In [22], the issue of building an automatic license plate recognition (ALPR) system for unrestricted real-time environments was addressed, with an emphasis on handling missing or incorrect characters caused by noise. The proposed approach included a convolutional neural network (CNN) that was specially designed for bilingual text recognition and a deep learning-based system that used YOLOv5 for license plate detection. A novel method for recovering lost characters affected by noise was developed without requiring an additional deep learning model. To train and test their proposed model, they used a proprietary dataset of 2600 frames and 200 artificially manufactured plates from real-time traffic recordings in Saudi Arabia. The system outperformed commercial systems like OpenALPR, Plate Recognizer, and Sighthound, reporting 99.5% accuracy in character recognition and 97% accuracy in plate identification.

The challenge of detecting multinational and multilingual license plates in real-world circumstances was examined in [23]. A YOLOv2 detector with ResNet50 for feature extraction was used to detect license plates. A custom CNN was then used to determine the license plates’ country, language, and layout. The study used the LPDC 2020 dataset, which includes 9021 images from the United States, Europe, Turkey, the United Arab Emirates, and Saudi Arabia, as well as additional benchmark datasets. The technology demonstrated remarkable accuracy in detecting and classifying license plates from various regions. The system handled varied plate sizes and layouts with a minimum IOU of 0.625 and an average IOU of 0.85. The study emphasized the lack of complete international license plate datasets and the necessity for additional research into integrating and testing combined datasets to optimize worldwide ALPR systems.

The authors of [24] focused on improving license plate identification by directing attention mechanisms to relevant aspects in complex scenarios. Their study proposed an upgraded YOLOv5-CBAM (Convolutional Block Attention Module) network for detection and a Light-MutiCNN for identification, which uses attention techniques to improve feature representation. For training and testing, they used the CCPD dataset as well as other publicly available data sets. YOLOv5-CBAM reported a detection accuracy of 99.742%, whereas Light-MutiCNN with CBAM enhanced recognition accuracy by 2% over standard approaches while keeping the model size at 1.8 MB, making it acceptable for real-time applications. The study highlighted the need for additional development in identifying plates under varying situations and proposed expanding the technology to detect license plates on electric bicycles for broader applicability.

The development of a fully automated and generalizable ALPR system was proposed in [25]. The study uses YOLOv4 for vehicle and license plate detection and ResNet50 for vehicle categorization. The three steps involved were character recognition, license plate detection, and vehicle detection. The system was tested on five datasets from distinct geographies, each with varied lighting conditions and backgrounds. The approach produced competitive results, with a total of 18 FPS for single-vehicle frames. However, it performed less accurately under certain conditions and required more data for robustness. The study highlighted the need for better handling of real-world scenarios and improving character detection accuracy.

Real-time license plate identification using edge AI to overcome latency and bandwidth constraints was proposed in [26]. For improved performance, the authors utilized an embedded system with M-YOLOv4, which includes advanced convolutional layers, batch normalization, and particular activation functions such as Mish and Leaky ReLU. The datasets used were real-time images taken in various settings to ensure robustness and applicability. The proposed system demonstrated increased precision and recall, reporting a 97% recognition rate during the daytime and 95% at night, with an mAP of 96.1% and 93.7%, respectively, thereby minimizing false detections and improving real-time processing capabilities. The study emphasized the need for further improvement of the model to handle various environmental conditions and enhance detection rates.

The authors in [27] addressed the subject of improving efficiency and accuracy in smart parking systems using real-time license plate recognition. A single-stage deep learning approach based on the YOLO algorithm is proposed, with a focus on end-to-end detection and recognition. A custom dataset was created with modifications to cover many real-world settings, including difficult parking conditions. It achieved great precision and real-time processing capabilities, demonstrating its suitability for smart parking applications. Future research should focus on improving the model’s robustness and generalizability across different license plate types and environmental situations.

The issue of identifying irregular license plates in real-world circumstances was addressed in [28]. It was proposed by the authors to use deep CNN for key point estimation and YOLO for character detection, both with perspective correction. The model was tested and evaluated using the CCPD and AOLP datasets. The results showed improved recognition accuracy and outstanding inference speed, making them suited for real-time applications. High-end GPU requirements restricted viable uses. Future research should focus on reducing hardware dependence. This review classifies the findings of the studied literature into key areas based on the methods and applications, as shown in Table 3.

Based on a review of the relevant literature, this study has identified a gap in the application of YOLO-based methods for license plate recognition, particularly in the context of boom gate access systems within enclosed or controlled environments. While previous studies have explored the use of various YOLO versions for recognizing license plates in a variety of scenarios, none have specifically focused on using these methods for automated entry systems where vehicles gain access via boom gates, such as in parking facilities, secured compounds, or gated communities. This presents a unique challenge, as boom gate systems require highly accurate and real-time recognition of license plates to ensure smooth and secure access control.

Furthermore, despite the advanced results reported by various researchers who have used different versions of YOLO in their studies, identifying the most effective version for achieving the highest level of accuracy and efficiency in this particular context remains a significant concern. Each YOLO version has its strengths, but the challenge lies in finding the right balance between speed and precision, especially in applications where real-time performance is critical. Accurate license plate detection and recognition are essential in such systems to minimize delays, avoid false positives or negatives, and ensure that only authorized vehicles have access.

Among the numerous object detection algorithms, the YOLO family of models has consistently demonstrated its superior performance in terms of both speed and accuracy. YOLO’s ability to detect objects in real-time, combined with its relatively low computational demands, makes it a highly suitable candidate for tasks that require rapid and reliable decision-making, such as boom gate access control. Its real-time capabilities mean that it can process video streams quickly, identify license plates with high accuracy, and allow for seamless vehicle entry without causing traffic congestion or delays.

As a result, this study seeks to conduct a series of experiments with four versions of the YOLO algorithm; YOLO v5, YOLO v7, YOLO v8, and YOLO v9, to evaluate their performance in the specific task of license plate recognition for automatic boom gate access. By comparing these versions, the study aims to determine which one offers the optimal combination of speed, accuracy, and efficiency for this application. The results of these experiments will provide valuable insights into the best practices for implementing YOLO-based systems in real-world environments where automated vehicle access control is required.

Each of the four YOLO versions to be tested (v5, v7, v8, and v9) offers distinct advantages and has been fine-tuned to improve object detection capabilities over previous iterations. A brief overview of these versions, along with their key differences and performance characteristics, will be discussed in Section 3.

3. YOLO Network Architecture

The YOLO architecture is an approach to object detection that addresses spatially dispersed bounding boxes and class probabilities [29]. By redefining object detection as a regression problem, a single neural network predicts bounding boxes, and class probabilities are predicted from entire images in a single assessment. This method can be optimized on the basis of one detection performance, making it less likely to predict false positives in the background. YOLO is preferred over other conventional object detection techniques due to its speed [29]. When trained on the PASCAL VOC dataset, the YOLO model processes images in real-time at 45 frames per second [30,31], with a mean average precision of 63.4%. However, its performance is lower than state-of-the-art methods like Faster R-CNN (71% mAP) [32]. Figure 6 presents the YOLO architecture, as presented by the authors in [29].

The YOLO detection network consists of 24 convolutional layers, with the feature space reduced by alternating 1 × 1 layers. The convolutional layers were pre-trained using half the resolution for the ImageNet classification challenge and doubled for detection. The fundamental concept of YOLO-v1 involved covering an image with an s×s grid cell, with each cell containing B bounding boxes, dimensions, and confidence ratings for object detection implementation. The bounding box’s object presence or absence was indicated by the confidence score. Equation (1) can, therefore, be used to express the confidence score as:

c o n f i d e n c e s c o r e = p (o b j e c t) * I o u_{p r e d}^{t r u t h},

(1)

where

p (o b j e c t)

indicates the likelihood of the object being present, with a range of 0–1, and

I o u_{p r e d}^{t r u t h}

represents the intersection-over-union with the predicted bounding box with regard to the ground truth bounding box. The components of each bounding box are

x, y, w, h a n d c o n f i d e n c e s c o r e

. The notation

x, y, w, h

represents the bounding box’s center coordinates (

x, y, w i d t h a n d h e i g h t

) [32].

There were a few obvious shortcomings that needed to be fixed, like the architecture’s increased localization error and relatively worse recall when compared to Faster R-CNN. Furthermore, the design had trouble identifying objects in close proximity because each grid cell could only have two bounding box proposals. The subsequent versions of YOLO were inspired by the shortcomings of the original concept. Different versions of YOLO have evolved, with one version providing an improvement over the other. These include, after YOLOv1 [10], YOLOv2 [33], YOLOv3 [34], YOLOv4 [35], YOLOv5 [36], YOLOv6 [37], YOLOv7 [38], and YOLOv8 [39].

4. Inception of YOLO Versions

Since its release in 2015 by [10] and colleagues, the YOLO (You Only Look Once) framework has experienced several updates, each significantly enhancing its real-time object detection capabilities. The initial version was followed by subsequent enhancements with YOLO V2 in 2016 [10], such as bounding box prediction, and YOLO V3 in 2018 developed by Redmon and Ali in [40] with an addition of feature pyramid networks. The efficiency and accuracy of the model evolved as a result of these versions, which also included important additions such as anchor boxes, the Darknet-19 architecture with fully convolutional predictions in V2, the Darknet-53 design with multi-scale predictions in V3, and more.

In 2020, Joseph Redmon withdrew from computer vision research, citing ethical concerns regarding the military use of the technology. This departure paved the way for new groups to advance the YOLO framework, making the research more widely available. Alexey Bochkovskiy and his team published the YOLOv4 paper in 2020 [35], which refined the network hyperparameters and introduced an IOU-based loss function. Ultralytics then emerged with YOLOv5, improving the anchor finding methodology and building on their previous adaptations of YOLOv3 [41]. YOLOR (You Only Learn One Representation) [42], which focuses on multi-task learning for classification, detection, and pose estimation, was introduced in 2021 by the same team that produced YOLOv4. Additionally, Megvii Technology released YOLOX (Exceeding YOLO Series) later that year, featuring a return to an anchor-free design [38]. Figure 7 shows the different features of different YOLO versions.

The features shown in the bar chart are anchor boxes, bounding box prediction, feature pyramid network, SPP (Spatial Pyramid Pooling), PANet (Path Aggregation Network), activation function, data augmentation, and optimizer. Each bar represents the presence (1) or absence (0) of a feature, except for bounding box prediction, which is represented by the number of bounding boxes per scale.

Speed was another aspect through which each of these versions experienced improvements. The graph in Figure 8 illustrates the relationship between the release year and speed (in frames per second) for various YOLO versions: YOLOv1 (2016): 45 fps, YOLOv2 (2017): 67 fps, YOLOv3 (2018): 80 fps, YOLOv4 (2020): 65 fps, YOLOv5 (2020): 140 fps, YOLOv6 (2022): 120 fps, YOLOv7 (2022): 150 fps, YOLOv8 (2023): 160 fps, YOLOv9 (2023): 180 fps. It was observed that there is a general trend of increasing speed over the years as the YOLO versions progress, reflecting improvements in efficiency and performance.

Notable increases in speed can be seen between YOLOv3 and YOLOv4, and between YOLOv4 and YOLOv5, suggesting major optimizations and enhancements in these versions. Recent versions: the most recent versions, YOLOv8 and YOLOv9, show the highest speeds, indicating a continued focus on improving processing speed along with accuracy. This graph shows a visual representation of the advancements in speed of the YOLO object detection models over the years.

The pie chart in Figure 9 represents the percentage distribution of each of the described characteristics in each of the YOLO versions.

In this study, we will consider four versions of YOLO, namely YOLOv5, YOLOv7, YOLOv8, and YOLOv9, for vehicle license plate detection (specifically, versions 5 (large), V7, 8 (large), and 9 (compact) variants). This study uses YOLOv5, YOLOv7, YOLOv8 and YOLOv9 for vehicle license plate detection due to their improvements in accuracy, speed, and feature enhancements. YOLOv5 is ideal for real-time object detection tasks, with its smaller model size and efficient inference times. YOLOv7 offers optimizations in speed and accuracy, improving performance in complex environments. YOLOv8 improves feature extraction, anchor-free design, and generalization, allowing for more accurate detection in diverse lighting conditions. YOLOv9 builds on these strengths, offering cutting-edge performance. The different architectures of each of these versions are described in Section 4.1, Section 4.2, Section 4.3, and Section 4.4, respectively.

4.1. YOLOv5 Architecture

YOLOv5, specifically, was released by Ultralytics and was not an official update from the original creators of YOLO. YOLOv5 is a widely used object detection algorithm. With progressively higher accuracy rates, YOLOv5 comes in four main variants: small (s), medium (m), large (l), and extra large (x). Every version has a different training time as well. YOLOv5’s architecture is made up of three main components: First, the backbone, which extracts features using CSPDarknet. The second is the neck, which fuses features utilizing PANet. Lastly, the YOLO layer, which represents the head, produces the detection results and provides class labels, confidence scores, item locations, and sizes [43]. This structured architecture enables YOLO to efficiently perform object detection tasks with high levels of accuracy and efficiency [44]. Figure 10 shows the architecture of YOLOv5.

4.2. YOLOv7 Architecture

The YOLOv7 network advances the state-of-the-art in object detection to new heights by inferring more quickly and accurately than its predecessor—the YOLO V5 network [45]. The three primary components of YOLOv7’s structure are the input, the prediction portion, the improved feature extraction network, and the backbone feature extraction part. The input image is initially resized to

640 \times 640

via YOLOv7, after which it is fed into the backbone network. Three layers of feature maps of varying sizes are then produced through the head network, and the prediction results are finally generated via RepConv [46]. Figure 11 shows the YOLOv7 architecture.

4.3. YOLOv8 Architecture

YOLOv8 is available in five versions categorized by the parameter size: nano(n), small(s), medium(m), large(l), and extra-large(x). These variants can be applied to tasks such as classification, object detection, and segmentation [47]. YOLOv8 introduces several significant enhancements, including mosaic data augmentation, anchor-free detection, a C2f module, a decoupled head, and a modified loss function. Similarly to YOLOv4, mosaic data augmentation is employed to combine four images, enriching contextual information for the model. However, YOLOv8 adjusts this strategy by discontinuing augmentation in the last ten training epochs to improve performance. By directly estimating an object’s midway and minimizing dependency on predefined anchor boxes, anchor-free detection is utilized to improve generalization and speed up Non-Max Suppression (NMS). Gradient flow is improved, and computational costs are decreased by concatenating the output of all bottleneck modules with the C2f module, which substitutes the C3 module. The decoupled head separates classification and regression tasks, although this may lead to loss misalignment, which can be addressed by introducing a task alignment score. By optimizing bounding box borders to reduce false negatives, this score chooses the top k positive samples for classification loss using Binary cross-entropy (BCE) and regression loss using Complete IoU (CIoU) and Distributional Focal Loss (DFL). The head, neck, and backbone make up the three main parts of the convolutional neural network that YOLOv8 uses, as seen in Figure 12. YOLOv8 is based on a modified version of the CSPDarknet53 architecture [48].

4.4. YOLOV9 Architecture

YOLOv9 is an improvement on YOLOv7, which was introduced in [50]. The authors of [50] produced both. YOLOv7 placed a heavy emphasis on architectural enhancements to the training procedure, referred to as the trainable bag-of-freebies, which lower training costs while raising the object detection accuracy. Nevertheless, it failed to tackle the issue of information loss from the source data due to many feedforward process downscaling steps, which is commonly referred to as the information bottleneck [50]. Consequently, YOLOv9 presented two unique methods that push the boundaries of object detection efficiency and accuracy while simultaneously addressing the problem of information bottlenecks. Among these methods are the Generalized Efficient Layer Aggregation Network (GELAN) and Programmable Gradient Information (PGI).

Four key components make up this method: reversible functions, Programmable Gradient Information (PGI), the information bottleneck idea, and GELAN, or a generalized efficient layer aggregation network.

Lost information and data changes in a neural network are explained by the concept of an information bottleneck. The information bottleneck equation explains the loss of shared information between the transformed and real data as they go through the deep network stages. The following is the information bottleneck equation:

I (J, J) \geq I (J, f θ (J)) \geq I (J, g ϕ (f θ (J)))

(2)

I denotes mutual information, and f as well as g each represent transformation functions with values

θ

and

ϕ

. In a deep neural network, when the data J pass through layers (

f θ a n d g ϕ

), significant information that is necessary for accurate predictions is lost. Inconsistent gradients and insufficient model convergence could arise from this loss. Another option is to increase the model’s size in order to enhance data translation performance and retain more information. However, this method does not address the issue of uneven gradients in very deep networks [51].

In view of the information bottleneck, a reversible function is a logical solution. Reversible functions in neural networks ensure that no information is lost during data transformation. The network outputs can effectively reconstitute the input data due to these functions that enable the changes to be reversed [51]. Here is the equation for the reversible function:

J = v ζ (r ψ (J))

(3)

Using

ψ

and

ζ

as parameters, respectively, the forward and reverse transformations are represented by r and v [51]. Therefore, the bottleneck problem can be solved using the following reversible function equation:

I (J, J) = I (J, r ψ (J)) = I (J, v ζ (r ψ (J))

(4)

Using the transformation function r and its inverse v, the mutual information I moves through all the levels while maintaining its initial input J. Under-parameterization in lightweight models restricts their ability to process substantial volumes of unprocessed data, leading to significant information loss and affecting the performance [51]. To address this, a new training method called Programmable Gradient Information (PGI) was developed in [50]. PGI includes a main branch for inference, accurate gradient computation, an auxiliary reversible branch, and multilevel auxiliary information, effectively addressing deep supervision concerns without increasing inference costs. Figure 13 shows the PGI architecture.

The GELAN architecture is designed to interact with the PGI framework, enhancing the ability of the model to interpret and extract information from data. While PGI addresses the challenge of preserving significant data throughout deep neural networks, GELAN builds on this foundation by offering a flexible and efficient framework that can manage a variety of processing blocks [51]. The GELAN architecture is shown in Figure 14.

The YOLOv9 model combines the PGI and GELAN architectures.

4.5. Table of Comparison

Subsequently, this study further compares the four YOLO versions discussed in light of key features such as year, speed, robustness, architectural components, key elements of the version, variants, and special features of each version. Table 4 shows this.

5. Experiments: YOLOV5, YOLOV7, YOLOV8 and YOLOV9

In light of the differences in features and architecture of the four versions that were presented in this study, this study further performed comparative experiments on each of the versions, namely versions 5 (large), V7, 8 (large) and 9 (compact). These versions were selected because these variants, among the variants of the versions, are more efficient in terms of memory usage, requiring fewer resources to achieve similar results. This study carried out training and evaluation experiments on the four versions for plate number recognition applicable to boom gate access and compared the results, determining the most efficient version for the boom gate access problem.

5.1. Proposed Model

In this section, the training and validation experiments conducted using the proposed model are discussed. The experiments were carried out on a PC that had an Intel Core i5 processor and an NVIDIA RTX 8070 GPU, manufactured by TSMC, Hsinchu Science Park in Hsinchu, Taiwan. The PC was sourced in Pretoria South Africa Python—3.12.8 programming language was used for the coding aspect. The parameters used include a batch size of 4, at 100 epochs for each of the experiments, and the default hyperparameters were used, i.e., hyp.scratch-low.yaml. The proposed model is illustrated in Figure 15.

5.2. Data for Training and Validation

To train and validate the proposed models, an open-sourced license plate dataset provided by the author in [52] was utilized. The selected dataset is a good choice for training and evaluating models for the task of license plate recognition and classification due to its diversity, size, high-quality annotations and real-world complexity. The smaller volume of the data set makes it a great choice for quick model iteration and experimentation, allowing the development of an accurate and efficient license plate. The 400 × 279 pixel dataset consists of 433 images with bounding box annotations of car license plates inside the images of cars. The annotations follow the format specified by PASCAL VOC. The dataset was converted to YOLO format for training, using the Roboflow API [53]. This dataset was used for the four versions used for analysis in this study. The sample of dataset is shown in Figure 16.

5.3. Data Preprocessing

Each image, which had an original size of 400 × 279 pixels, was scaled during the pre-processing stage to comply with YOLO specifications, which call for an input size of 640 × 640 pixels. For the neural network model, this scaling step keeps the input size uniform [54]. Furthermore, augmentation techniques were applied to increase the diversity of the dataset and improve the model’s dependability [55]. To generate different orientation variations, the images were rotated 90 degrees, and then flipped clockwise, counterclockwise, and upside down. With the generation of 699 images overall via these augmentation techniques, the dataset was able to encompass a greater variety of viewpoints and orientations. With this improvement, the model is better able to recognize license plates in a variety of situations and generalizations. The datasets were then split into training and validation datasets, using the ratio 70%:30%, respectively.

5.4. YOLOv5 Experiment: Training and Validation

In this experiment, the input image was resized to an input size of 640 × 640 pixels. The features of the image were extracted using CNNs, which are the core technology employed by YOLOv5 to produce image features. Before transmitting these features to the head, the model neck mixes them (feature fusion). The model head makes predictions by analyzing the gathered features following the fusion.

5.5. YOLOv7 Experiment: Training and Validation

The architecture of YOLOv7 consists of the input layer, the backbone, the feature pyramid network, and the detection head. To extract features, the backbone receives the 640 × 640 image of the vehicle and its license plate. The CBS module includes multiple convolution layers and step sizes. The convolution of k = 1 and s = 1 is mainly utilized for adjusting the number of channels; the convolution of k = 3 and s = 1 is primarily used for feature extraction; and the convolution of k = 3 and s = 2 is mostly used for down-sampling. Following extraction, three feature maps (80 × 80 × 512, 40 × 40 × 1024, and 20 × 20 × 1024) are produced. The feature pyramid network then fuses the three extracted feature layers, incorporating feature data from various scales.

By using max pooling, the SPPCSPC module can produce completely distinct sensitivity fields that are more effective at differentiating between large and small objects. Second, in order to manage the shortest and longest gradient paths, the network makes use of the efficient convergent network ELAN module. This enhances the network’s capacity to learn new features and increases its robustness. In order to identify network modules and routes for the model’s reparameterization module, the feature layer is finally updated. The module is separated into two parts by model reparameterization technology: detection and training. The module is split up into many branch structures during training, which can increase the number of features the network gathers and increase accuracy. Multi-branch structures are combined during detection to provide a fully comparable module, increasing detection speed.

5.6. YOLOv8 Experiment: Training and Validation

A key part of the YOLOv8 architecture designed to extract features from input images is the backbone network [56,57]. The backbone often employs deep convolutional neural network (CNN) layers, like Darknet, to capture hierarchical image representations. In the YOLOv8 model, the input image, which has been resized to

640 \times 640

, is processed through its backbone network, CPSDarknet, to extract important features. For detection, CSPDarknet53, a convolutional neural network based on the DarkNet-53 architecture, acts as the backbone for plate number detection. It uses a CSPNet method. The CSPNet method divides the feature map of the base layer into two components and then combines them at different stages. This split-and-merge approach optimizes gradient flow throughout the network [58]. This module typically consists of convolutional layers followed by a YOLO layer that carries out object detection such as confidence scores, bounding box coordinates, and class probabilities. During training, the model computed a loss function based on the disparities between expected and actual values. The loss function directs the optimization process for updating the model parameters and improving their performance.

5.7. YOLOv9 Experiment: Training and Validation

The PGI module in the YOLOv9 model draws features from input data. Traditional deep learning methods extract features from input data, which reduces the detail and complexity of the original dataset. PGI thus introduces a paradigmatic shift by integrating network-based processes that retain and use all aspects of input information via programs. This is accomplished through its programmable gradient pathways, which allow for dynamic changes based on the precise needs of the current job. This allows the network to access more information when computing gradients for backpropagation, resulting in a more accurate and informed update of the model’s weight [59]. The extracted feature sets are then sent to the Generalized Efficient Layer Aggregation Network (GELAN) module, where detection training and validation takes place.

6. Results and Discussions

The results reported during training and validation of the YOLOv 5,7,8 and 9 models are presented in this section. Accuracy, precision, recall, F-1 score, mAP, and confusion matrices were used to evaluate the proposed models.

A table that describes a classification technique’s performance is known as a confusion matrix. A confusion matrix shows and summarizes how well a classification algorithm performs [60]. The frequency with which a machine learning model accurately predicts a result is measured by its accuracy [61]. Divide the total number of predictions by the number of correct predictions to determine accuracy. Stated differently, accuracy provides a response to the inquiry, “How often is the model correct?” Precision is one metric of a machine learning model’s performance; it refers to the quality of the model’s positive predictions [62]. On the other hand, recall is a metric that indicates how frequently a machine learning model accurately detects positive instances (true positives) among all of the actual positive samples in a dataset [62]. The F1 score is a measure of the harmonic mean of precision and recall. The mean average precision (mAP) is defined as a metric used to calculate the accuracy of performance for object detection models [63]. The confidence score, expressed as a percentage, represents the possibility that the image will be detected accurately by the algorithm. The results are based on mAP across various IoU thresholds [64].

6.1. Results: YOLOv5 Model

For the experiment of plate number recognition, the model that was trained and validated using the YOLOv5 algorithm reported an accuracy of 81% on validation data. The performance of the model while classifying license and background classes is presented using the confusion matrix. The result indicated that the model correctly predicted the license class, showing an accuracy value of 81%. License plates were incorrectly classified as background 19% of the time, as shown in Figure 17.

The curve in Figure 18 reveals that the F1 score increases as the confidence threshold increases from 0.0, eventually settling around an F1 value of 0.84 when the confidence is about 0.2. The curve remains very consistent and flat as the confidence level approaches 0.8. This indicates that the model maintains a constant balance of precision and recall within this confidence interval. After a confidence level of about 0.8, the F1 score declines considerably, indicating that raising the threshold over this point results in a loss of precision, recall, or both.

The precision–confidence curve in Figure 19 shows that at a confidence threshold of about 0.851, the precision reaches 1.00. This indicates that at this threshold, all the detections the model made were correct.

In Figure 20, the precision–recall curve begins with very high precision at lower recall levels, indicating that when the model truly finds “licence” objects, it is very precise. As recall increases (i.e., as the model attempts to capture more of the actual “licence” instances), precision gradually decreases. As recall approaches 1.0, precision drops sharply, meaning that the model generates a greater number of false positives in order to recognize nearly all true “licence” instances. The label “licence 0.883” denotes that the mAP for this class at an IoU criterion of 0.5 is 0.883. This is an excellent performance that demonstrates a decent combination of precision and recall across multiple criteria.

As indicated in Figure 21, the model appears to detect the majority of true positives but probably a large number of false positives when the curve begins with a high recall at low confidence levels. Recall dramatically decreases with increasing confidence, especially beyond the 0.8 threshold, suggesting that the model has fewer false positives but misses more real positives. The annotation “all classes 0.94 at 0.000” shows that, with the confidence level at its lowest, the overall recall rate is 94%. With the use of these data, the confidence threshold can be adjusted to strike a compromise between obtaining every potential accurate detection and preserving detection quality by preventing an excessive number of false alarms.

Figure 22 summarizes the results. The graph shows significant reductions in box loss, object loss, and classification error during training, indicating effective learning without overfitting. Validation losses diminish and plateau, indicating effective generalization to new data. These losses are steady and close to training losses, indicating that there is no overfitting. The stability of the model in both the training and validation phases indicates its robustness and ability to generalize well to new data. The model performance is stable and consistent throughout the training and validation phases.

6.2. Results: YOLOv7 Model

For the training and validation of the plate number recognition experiment, the model that was trained using the YOLOv7 algorithm is reported here. The confusion matrix shows the performance of the model that classifies two classes, namely, license and background. The result indicated that the model correctly predicted the license class, showing an accuracy of 82%. License plates were incorrectly classified as background 19% of the time, as shown in Figure 23.

The F1 curve indicating the progress of training is shown in Figure 24. The F1 score of the model starts low at 0.0, indicating poor performance when indiscriminate. As the confidence threshold approaches 0.2, the F1 score rises, indicating improved precision and recall. The model maintains a plateau at 0.8, showing consistent performance. However, as the confidence threshold goes above 0.8, the F1 score drops sharply, indicating high precision but low recall. The ideal balance between precision and recall, which maximizes the F1 score, is found at a confidence threshold of about 0.517, with an F1 score of approximately 0.81, ensuring high accuracy while minimizing false positives and negatives.

This study further shows the precision curve, which reported a precision of 100%. The graph plots the model’s precision against the confidence threshold. This is presented in Figure 25. Starting from a confidence threshold of 0, the precision begins relatively low. As the confidence threshold increases, there’s a sharp rise in precision, which quickly stabilizes around a high level. This high level of precision is maintained as the confidence threshold continues to increase up to about 0.8, indicating that the model performs consistently well across a broad range of thresholds. Beyond a threshold of approximately 0.8, the precision continues to be high but shows some variability, peaking dramatically at certain points, notably reaching a precision of 1.00 at a confidence of 0.916. This indicates perfect precision, suggesting that the model is highly accurate, though possibly at the expense of missing some true positives (a trade-off between precision and recall).

The recall is reported in Figure 26. The model’s recall remains stable even with slightly increased confidence, indicating a high detection rate without many false positives, as shown in Figure 26. However, as the confidence threshold rises, particularly past 0.6, the recall declines sharply, indicating more conservative predictions. The model achieves a 93% recall when considering all classes at the lowest confidence threshold (0.000), indicating it captures nearly all positives it can detect. Tuning the confidence threshold is crucial to balance recall and precision, determining the optimal confidence setting for specific application needs.

A graph of precision–recall is also presented in Figure 27. The curve plots precision (Y-axis) against recall (X-axis) from 0 to 1.0 for both metrics. It starts with high precision at lower recall levels, which is typical, as the model initially returns more correct positive predictions than incorrect ones when it is highly selective. The graph shows a relatively stable plateau of high precision when the recall is between 0.0 and around 0.6. This suggests that, up to this point, the model can continue to achieve high accuracy without compromising the rate of true positive detection. As the recall increases beyond approximately 0.6, the precision begins to decline steeply. This indicates that as the model attempts to capture more true positives (raising recall), it is beginning to produce more false positive predictions. The curve is labeled with a license value of 0.804 and all classes with 0.804 mAP of 0.5, suggesting that the model has achieved an mAP of 0.804 at an IoU threshold of 0.5 for the recognition of the license plate and potentially other classes it was trained to detect.

A detailed result of the training and validation of the YOLOv7 model is shown in the graph in Figure 28. This indicates consistent improvements in detecting and classifying the given dataset, with high precision and improved recall. When the number of iterations rises, the box loss exhibits a steady reduction, while the object loss shows a decreasing trend, suggesting an enhancement in the model’s confidence in detecting objects. The classification loss shows a stable and effective performance, with loss values hovering close to zero with minimal fluctuations. The precision metric, evaluated at the threshold of IoU = 0.5, shows a consistently high and stable accuracy in classifying a high proportion of positive identifications as correct. The recall starts low but sees a significant increase as iterations progress, particularly in the mAP at 0.5 graph, showing how well the model is now able to identify all true positives.

6.3. Results: YOLOv8 Model

The outcome of YOLOv8’s validation and training for license plate recognition is presented and discussed in this section. The performance of the model that classifies two classes, license and background, is shown in the confusion matrix. The result indicated that the model correctly predicted the license class, showing an accuracy of 83%. The license plates were incorrectly predicted as background 17% of the time, as shown in Figure 29.

The figure in Figure 30 shows the F1 score during the training/validation process. The relationship between the F1 score and the YOLOv8 model’s confidence threshold is depicted by the F1 confidence curve in Figure 30. This curve shows that the F1 score remains relatively high and stable across a range of confidence thresholds from approximately 0.0 to 0.8. The maximum F1 score is reached at a confidence threshold of 0.510, which is about 0.85 for all classes combined. Beyond this threshold, the F1 score sharply decreases, indicating that higher confidence thresholds result in more false negatives, possibly due to the model being overly cautious and discarding valid detection. This graph helps to determine an optimal confidence threshold for deploying the model in practical scenarios to ensure the best balance between precision and recall.

The YOLO v8 model’s precision–confidence curve in Figure 31 shows a rapid rise in precision from a low confidence threshold. The model maintains high precision across varying confidence thresholds, with a high plateau around 0.8 and above. The model’s maximum precision, 1.00, is achieved at a confidence threshold of 0.907. This point is critical for practical applications, suggesting an optimal balance between missing true positives and falsely recognizing non-plates as plates.

The recall–confidence curve in Figure 32 shows a high recall at lower confidence thresholds, that is, 1.0 when confidence is nearly zero. This suggests that the model can detect most positive instances with less strict confidence in its predictions. As the confidence threshold rises, recall declines, indicating fewer objects are being detected with greater certainty. A sharp decline in recall occurs as confidence nears 1.0, indicating very few predictions are made with nearly perfect confidence and missing many positives. This curve helps identify an optimal confidence threshold where recall remains high while maintaining a reasonable level of confidence in predictions. The curve depicts that recall for all classes combined is 0.90 at a confidence level of 0.000.

In Figure 33, the precision–recall curve is shown. The curve sheds light on the relationship between recall—the model’s capacity to locate all the relevant instances inside the dataset—and precision—the accuracy of the positive predictions. From the curve, it is observed that the precision is almost constant at the highest level until it drops sharply as recall approaches 1.0. As the recall increases to include nearly all relevant license plate detection, there is a sharp drop in precision. This suggests that while attempting to detect nearly every possible license plate, the model starts to make more false positive errors, which is typical for most detection models as they strive to maximize recall. The model reported an mAP of 0.855 at an IoU threshold of 0.5. This performance metric is a good indicator that the model is reliable in detecting license plates with a high degree of coherence between the actual ground truth and the estimated bounding boxes.

A comprehensive result is presented in Figure 34. The plots are organized to show the evolution of different loss components and performance metrics across epochs, providing insight into the learning process and effectiveness of the model. The YOLO v8 model shows consistent decreases in box loss, class loss, and DF1 loss, indicating improved accuracy in locating license plates. The model’s general trends show greater variability and higher values compared to training losses, suggesting adjustments to new patterns over time. Precision and recall remain stable throughout epochs, with an improvement in all relevant license plates in the dataset. The model’s mAP scores show a consistently high accuracy at 0.50 and an increasing trend at 0.50–0.95. This suggests robust model training and potential deployment in environments requiring accurate license plate recognition.

6.4. Result: YOLOv9

For the experiment of YOLOv9, an accuracy of 73% was reported. The model’s performance is displayed by the confusion matrix in Figure 35, while classifying two classes, namely, license and background, as in other experiments. The result indicated that the model correctly predicted the license class, reporting an accuracy value of 73%. The license plates were incorrectly classified as background 27% of the time, as shown in Figure 35.

The F1 score during the training and validation process is presented in Figure 36. Here, the F1 score serves as the primary measure of the model’s accuracy over a range of confidence levels. It starts at a low confidence threshold and rapidly ascends as the confidence threshold increases to around 0.41. The peak F1 score of 0.70 indicates the optimal balance between precision and recall, allowing the model to recognize license plates without significant trade-offs. The peak F1 score at a 0.41 threshold is the most effective operating point for the YOLO v9 model in practical scenarios, ensuring the highest statistical efficiency in terms of combined precision and recall.

Figure 37 shows the precision–confidence curve from training the YOLOv9 model. The YOLOv9 model’s precision increases with the confidence level, reaching a peak at 1.0 around the 0.8 (80%) confidence level. The curve shows a smooth progression with a steep rise from 0.2 to 0.6. The model achieves perfect precision when slightly above 0.8, indicating an optimal confidence threshold for practical applications. Following the precision, the model was also evaluated using recall metrics. This is shown in Figure 38.

In Figure 38, the model had a high initial recall value near 1.0, demonstrating its fairly confident capacity to recognize most positives. As the confidence threshold increases, the recall decreases, indicating stable performance. However, a sharp decline at high confidence levels indicates that very few predictions are made with near-perfect confidence. The graph helps find an optimal balance between recall and confidence levels, allowing decision-making on whether to maximize recall or increase confidence if false positives are high. The model’s performance in recognizing license-related features is also analyzed. The model’s training results demonstrate its capability, but careful configuration of confidence thresholds is needed to meet specific operational requirements.

The model initially has high precision but a limited recall range, indicating low accuracy in predicting license plates, as shown by the precision–recall curve in Figure 39. As recall increases, precision decreases, indicating false positives and reducing precision. The overall performance of the model is measured by an mAP of 0.705 at 0.5, indicating a good balance.

The overall performance of the model is presented through various graphs, as shown in Figure 40, including box loss, classification loss, and direct feature loss. The training box loss shows a decreasing trend, indicating an improved localization of the license plates over time. The classification loss shows a decreasing trend, indicating improved classification of objects. The training DFL loss shows a decrease, indicating a better prediction of additional features. The model’s precision and recall show an increasing proportion of correct identifications. The mean average precision at Intersection over Union (IoU) thresholds shows an increase in accuracy. Overall, the model is improving its capabilities and generalization to new, unknown data that are essential for real-world uses.

7. Comparative Analysis of Results

The results that were reported for the four object detection models, namely YOLOv5, v7, v8 and v9, are discussed here. Table 5 presents the F1 scores, recall, precision, accuracy and mAP at a threshold of 0.5 for further analysis.

From Table 5, it is clear that YOLOv8 recorded the best F1 score and highest detection accuracy, while YOLOv9 reported the lowest accuracy of detection and F1 score.

The 100% precision across all models indicates that when an object is detected, it is correctly identified with no false positives, which is crucial for applications where false alarms must be minimized. The variance in F1 scores and accuracy among the models suggests different strengths and trade-offs in detection capability and reliability. Higher F1 scores in YOLOv5 and YOLOv8 indicate a better balance in detecting all relevant objects while minimizing false negatives and positives. YOLOv5 is a fast and efficient object detection tool for real-time tasks like license plate recognition. Its compact architecture with CSPDarknet and PANet ensures faster inference times. YOLOv8 improves with an anchor-free design, mosaic data enhancement, and a decoupled head for better detection under various conditions.

The decrease in performance in YOLOv9, especially reflected in its lower mAP and accuracy, could be due to several factors such as model complexity, overfitting, or less robust handling of diverse or challenging detection scenarios.

Choosing between these models would depend on specific needs, for instance, YOLOv5 and YOLOv8 offer robust performance across metrics, making them suitable for general use. YOLOv7 offers similar benefits but with slightly lower overall effectiveness. YOLOv9, despite its high recall and precision, might be less desirable due to its lower overall detection reliability. For the task of license plate detection, YOLOv8 proves to be most suitable with respect to the experiments carried out. This analysis helps in understanding the advantages and disadvantages associated with each YOLO version, helping to choose the best model for particular object detection tasks. In general, YOLO object detection algorithms have reported a performance that is comparable to other related state-of-the-art methods. Though it might not be better in terms of accuracy, in speed, YOLO is known for its real-time performance. Table 6 shows this comparison.

8. Testing

This study conducted a test on the trained model. During this phase of the research, the trained models were tested in real-world scenarios using a vehicle sample that was not included in the training or validation datasets. The vehicles belonged to one of the researchers involved in the study; therefore, ethical clearance was not required. The work used only a vehicle sample for now; ethics clearance will be sought in the future for larger samples. Videos recording the car plate number were taken with a Samsung Galaxy A14 phone camera, which has a 50MP AI camera. These videos were then used as input for the trained models. The results of this live test showed plate number recognition with an IoU value of 0.91, 0.89, 0.93, and 0.70 for versions 5, 7, 8, and 9, respectively. This study demonstrated that the model performs well in a real-world context, especially YOLOv5 and v8. Figure 41, Figure 42, Figure 43 and Figure 44 show the output generated by the proposed models during this test, which demonstrates their performance in real-world circumstances under various lighting conditions, weather conditions, and vehicle orientations.

9. Working Application: Boom Gate Access

Cameras fitted with the developed model were installed at the entrance with a boom gate in order to use it for practical purposes and to identify approaching vehicles. The images are processed and supplied to the system, which includes the database of permitted number plates, after the camera detects them. The system takes in the detected license plate number and, depending on whether the plate number appears in the school’s database or not, allows or denies access, expediting the admissions process for vehicles that are approved. Figure 45 provides an example of how the model is used in operation. This study is a work in progress.

10. Conclusions

This study systematically investigated the application of YOLO object detection algorithm variants applicable to automatic license plate recognition systems, specifically within the context of applying it to boom gate access. Through an experimental comparative analysis of four recent YOLO versions (YOLOv5, YOLOv7, YOLOv8, and YOLOv9), this work aimed to identify the most suitable YOLO model that optimizes accuracy, speed, and computational efficiency for real-world implementation in secure access control systems.

Our findings demonstrated that each YOLO version has distinct advantages depending on the particular specifications of the application. YOLOv5 and YOLOv7 showed substantial improvements in speed and accuracy over previous versions, making them suitable for environments where real-time processing is crucial. YOLOv8 introduced enhancements such as the integration of mosaic data augmentation and anchor-free detection, which significantly improved detection precision under various operational conditions. YOLOv9, the latest in the series, incorporated advanced features such as the Generalized Efficient Layer Aggregation Network (GELAN) and Programmable Gradient Information (PGI), pushing the boundaries of object detection capabilities further by addressing the information bottleneck problem, thus enhancing model robustness and inference efficiency.

The experimental results revealed that while all models tested performed well, YOLOv8 showed superior performance in terms of lower loss rates and higher precision and recall metrics, particularly in challenging detection scenarios characterized by variable lighting and occlusions. Such advancements are critical in enhancing the reliability of access control systems, where the accuracy of vehicle identification directly impacts security and operational efficiency.

This comprehensive review and experimental analysis not only underline the capabilities of YOLO algorithms in enhancing automated license plate recognition systems but also provide practical insights that could assist in the development of more sophisticated and reliable automated systems for managing vehicle access. Further research could explore the integration of these advanced detection models with other technological elements such as AI-powered surveillance and blockchain technology for improved data security and management.

Ultimately, the continuous evolution of the YOLO models, as demonstrated in this study, highlights the potential for significant improvements in automated systems used in security-sensitive environments. Future work in this research could focus on custom YOLO models tailored to specific environmental conditions or operational demands, potentially incorporating real-time adaptive learning capabilities to further enhance their effectiveness and reliability in dynamic real-world settings.

Limitations of the Study

The study noted some of the shortcomings of the dataset and emphasized that it may not accurately reflect the wide variety of factors found in real-world settings. Therefore, more research is needed to assess the resilience and scalability of the proposed method in the context of larger datasets and more varied implementations. Furthermore, there are certain limitations associated with the YOLO object detection model’s design, especially when it comes to recognizing broken or deteriorating license plates. These restrictions are more noticeable in adverse weather circumstances, such as rain, snow, fog, or nighttime, which can seriously hinder the system’s ability to detect and identify license plates. Future studies should look at ways to make the model adaptable and effective in challenging environments, as well as possible ways to improve training datasets to more closely resemble the details of the actual world. By doing this, the object detection system will become more robust, reliable, and capable of operating well in a variety of situations.

Author Contributions

Conceptualization, A.C.B. and P.A.O.; methodology, A.C.B.; software, A.C.B. and P.A.O.; validation, A.C.B. and P.A.O.; formal analysis, A.C.B.; investigation, A.C.B.; resources, C.D.; data curation, A.C.B.; writing—original draft preparation, A.C.B.; writing—review and editing, A.C.B.; visualization, C.D. and E.V.W.; supervision, P.A.O.; project administration, C.D. and E.V.W.; funding acquisition, C.D. and E.V.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Datasets used for this study can be found at: https://www.kaggle.com/datasets/andrewmvd/car-plate-detection, accessed on 23 June 2024.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Challa, N. Artificial Intelligence for Object Detection and its Metadata. Int. J. Artif. Intell. Mach. Learn. (IJAIML) 2023, 2, 121–133. [Google Scholar] [CrossRef]
Mirzaei, B.; Nezamabadi-Pour, H.; Raoof, A.; Derakhshani, R. Small Object Detection and Tracking: A Comprehensive Review. Sensors 2023, 23, 6887. [Google Scholar] [CrossRef] [PubMed]
Charroud, A.; El Moutaouakil, K.; Palade, V.; Yahyaouy, A.; Onyekpe, U.; Eyo, E.U. Localization and Mapping for Self-Driving Vehicles: A Survey. Machines 2024, 12, 118. [Google Scholar] [CrossRef]
Kaur, J.; Singh, W. Tools, techniques, datasets and application areas for object detection in an image: A review. Multimed. Tools Appl. 2022, 81, 38297–38351. [Google Scholar] [CrossRef]
Kanjee, R. The Remarkable Impact of Object Detection in Artificial Intelligence and Computer Vision. 2023. Available online: https://www.linkedin.com/pulse/remarkable-impact-object-detection-artificial-computer-ritesh-kanjee (accessed on 19 June 2024).
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
Girshick, R. Fast r-cnn. arXiv 2015, arXiv:1504.08083. [Google Scholar]
Chen, Y.; Li, L.; Li, W.; Guo, Q.; Du, Z.; Xu, Z. AI Computing Systems: An Application Driven Perspective; Elsevier: Amsterdam, The Netherlands, 2022. [Google Scholar]
Viswanatha, V.; Chandana, R.K.; Ramachandra, A.C. Iot based smart mirror using raspberry pi 4 and yolo algorithm: A novel framework for interactive display. Indian J. Sci. Technol. 2022, 15, 2011–2020. [Google Scholar] [CrossRef]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. arXiv 2015, arXiv:1506.01497. [Google Scholar] [CrossRef]
Vishwakarma, N. Real-Time Object Detection with SSDs: Single Shot MultiBox Detectors. 2023. Available online: https://www.analyticsvidhya.com/blog/2023/11/real-time-object-detection-with-ssds-single-shot-multibox-detectors/ (accessed on 19 June 2024).
Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. arXiv 2017, arXiv:1708.02002. [Google Scholar]
Rawat, A.S.; Devrani, H.; Yaduvanshi, A.; Bohra, M.; Kumar, I.; Singh, T. Surveillance System using Moving Vehicle Number Plate Recognition. In Proceedings of the 2023 2nd International Conference on Edge Computing and Applications (ICECAA), Namakkal, India, 19–21 July 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 940–945. [Google Scholar]
Dias, A.; Almeida, A.M.D.; Fernandes, D.S.; Fernandes, J.; Fernandes, S.; Aswale, S. Automatic Two Wheeler License Plate Recognition Using Deep Learning Techniques. In Proceedings of the 2023 3rd International Conference on Technological Advancements in Computational Sciences (ICTACS), Tashkent, Uzbekistan, 1–3 November 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–7. [Google Scholar]
Pandiaraja, P.; Abisheck, S.; Mohan, A.; Ramanikanth, M. Survey on Traffic Violation Prediction using Deep Learning Based on Helmets with Number Plate Recognition. In Proceedings of the 2024 5th International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV), Tirunelveli, India, 11–12 March 2024; IEEE: Piscataway, NJ, USA; pp. 234–239. [Google Scholar]
Patil, S.S.; Patil, S.H.; Pawar, A.M.; Bewoor, M.S.; Kadam, A.K.; Patkar, U.C.; Wadare, K.; Sharma, S. Vehicle Number Plate Detection using YoloV8 and EasyOCR. In Proceedings of the 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT), Delhi, India, 6–8 July 2023; IEEE: Piscataway, NJ, USA; 2023; pp. 1–4. [Google Scholar]
Alharbi, F.; Alshahrani, R.; Zakariah, M.; Aldweesh, A.; Alghamdi, A.A. YOLO and Blockchain Technology Applied to Intelligent Transportation License Plate Character Recognition for Security. Comput. Mater. Contin. 2023, 77, 3689–3722. [Google Scholar] [CrossRef]
Shyaa, T.A.; Hashim, A.A. Superior Use of YOLOv8 to Enhance Car License Plates Detection Speed and Accuracy. Rev. D’Intelligence Artif. 2024, 38, 139–145. [Google Scholar] [CrossRef]
Neupane, D.; Bhattarai, A.; Aryal, S.; Bouadjenek, M.R.; Seok, U.; Seok, J. Shine: A deep learning-based accessible parking management system. Expert Syst. Appl. 2024, 238, 122205. [Google Scholar] [CrossRef]
Jamtsho, Y.; Riyamongkol, P.; Waranusast, R. Real-time Bhutanese license plate localization using YOLO. ICT Express 2020, 6, 121–124. [Google Scholar] [CrossRef]
Khan, I.R.; Ali, S.T.A.; Siddiq, A.; Shim, S.O. Multi-string missing characters restoration for automatic license plate recognition system. Int. J. Adv. Comput. Sci. Appl. 2023, 14, 835–843. [Google Scholar] [CrossRef]
Salemdeeb, M.; Erturk, S. Multi-national and multi-language license plate detection using convolutional neural networks. Eng. Technol. Appl. Sci. Res. 2020, 10, 5979–5985. [Google Scholar] [CrossRef]
Wang, L.; Cao, C.; Zou, B.; Ye, J.; Zhang, J. License Plate Recognition via Attention Mechanism. CMC-Comput. Mater. Contin. 2023, 75, 1801–1814. [Google Scholar] [CrossRef]
Al-Batat, R.; Angelopoulou, A.; Premkumar, S.; Hemanth, J.; Kapetanios, E. An end-to-end automated license plate recognition system using YOLO based vehicle and license plate detection with vehicle classification. Sensors 2022, 22, 9477. [Google Scholar] [CrossRef]
Lin, C.J.; Chuang, C.C.; Lin, H.Y. Edge-ai-based real-time automated license plate recognition system. Appl. Sci. 2022, 12, 1445. [Google Scholar] [CrossRef]
Lina, Y.; Shaokun, L. A Single-Stage Deep Learning-based Approach for Real-Time License Plate Recognition in Smart Parking System. Int. J. Adv. Comput. Sci. Appl. 2023, 14, 1142–1150. [Google Scholar]
Nguyen, H. A High-Performance Approach for Irregular License Plate Recognition in Unconstrained Scenarios. Int. J. Adv. Comput. Sci. Appl. 2023, 14, 339–346. [Google Scholar] [CrossRef]
Koylu, C.; Zhao, C.; Shao, W. Deep neural networks and kernel density estimation for detecting human activity patterns from geo-tagged images: A case study of birdwatching on flickr. ISPRS Int. J.-Geo-Inf. 2019, 8, 45. [Google Scholar] [CrossRef]
Everingham, M.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef]
Shetty, S. Application of convolutional neural network for image classification on Pascal VOC challenge 2012 dataset. arXiv 2016, arXiv:1607.03785. [Google Scholar]
Hussain, M. YOLO-v1 to YOLO-v8, the rise of YOLO and its complementary nature toward digital manufacturing and industrial defect detection. Machines 2023, 11, 677. [Google Scholar] [CrossRef]
Parupalli, S.; Akhsitha, S.; Naval, D.; Kasam, P.; Yavagiri, S. Performance evaluation of YOLOv2 and modified YOLOv2 using face mask detection. Multimed. Tools Appl. 2023, 83, 30167–30180. [Google Scholar] [CrossRef]
Tsang, S. Review: YOLOv3-You Only Look Once (Object Detection). 2019. Available online: https://towardsdatascience.com/review-yolov3-you-only-look-once-object-detection-eab75d7a1ba6 (accessed on 20 June 2024).
Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
Jocher, G.; Stoken, A.; Borovec, J.; Changyu, L.; Hogan, A.; Diaconu, L.; Poznanski, J.; Yu, L.; Rai, P.; Ferriday, R.; et al. Ultralytics/yolov5: v3. 0. Zenodo 2020. Available online: https://ui.adsabs.harvard.edu/abs/2020zndo...4154370J/abstract (accessed on 22 June 2024).
Li, C.; Li, L.; Geng, Y.; Jiang, H.; Cheng, M.; Zhang, B.; Ke, Z.; Xu, X.; Chu, X. YOLOv6 v3.0: A Full-Scale Reloading. arXiv 2023, arXiv:2301.05586. [Google Scholar]
Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv 2022, arXiv:2207.02696. [Google Scholar]
YOLOv8: A New State-of-the-Art Computer Vision Model. Available online: https://yolov8.com/ (accessed on 23 June 2024).
Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
Ultralytics. YOLOv5. 2020. Available online: https://github.com/ultralytics/yolov5 (accessed on 23 June 2024).
Wang, C.Y.; Yeh, I.H.; Liao, H.Y.M. You only learn one representation: Unified network for multiple tasks. arXiv 2021, arXiv:2105.04206. [Google Scholar]
Xu, R.; Lin, H.; Lu, K.; Cao, L.; Liu, Y. A Forest Fire Detection System Based on Ensemble Learning. Forests 2021, 12, 217. [Google Scholar] [CrossRef]
Vijayakumar, A.; Vairavasundaram, S. YOLO-based Object Detection Models: A Review and its Applications. Multimed Tools Appl. 2014, 83, 83535–83574. [Google Scholar] [CrossRef]
Wang, Y.; Wang, H.; Xin, Z. Efficient Detection Model of Steel Strip Surface Defects Based on YOLO-V7. IEEE Access 2022, 10, 133936–133944. [Google Scholar] [CrossRef]
Ding, X.; Zhang, X.; Ma, N.; Han, J.; Ding, G.; Sun, J. Repvgg: Making vgg-style convnets great again. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13733–13742. [Google Scholar]
Boesch, G. A Guide to YOLOv8 in 2024. 2024. Available online: https://viso.ai/deep-learning/yolov8-guide (accessed on 23 June 2024).
Davy, M.K.; Banda, P.J.; Hamweendo, A. Automatic vehicle number plate recognition system. Phys. Astron Int. J. 2023, 7, 69–72. [Google Scholar] [CrossRef]
Gao, C.; Zhao, G.; Gao, S.; Du, S.; Kim, E.; Shen, T. Advancing architectural heritage: Precision decoding of East Asian timber structures from Tang dynasty to traditional Japan. Herit. Sci. 2024, 12, 219. [Google Scholar] [CrossRef]
Chien, C.T.; Ju, R.Y.; Chou, K.Y.; Chiang, J.S. YOLOv9 for Fracture Detection in Pediatric Wrist Trauma X-ray Images. arXiv 2024, arXiv:2403.11249. [Google Scholar] [CrossRef]
What is YOLOv9: An In-Depth Exploration of the Internal Features of the Next-Generation Object Detector. arXiv 2024, arXiv:2409.07813.
Car License Plate Detection. 2020. Available online: https://www.kaggle.com/datasets/andrewmvd/car-plate-detection (accessed on 2 March 2024).
Pavithra, M.; Karthikesh, P.S.; Jahnavi, B.; Navyalokesh, M.; Krishna, K.L. Implementation of Enhanced Security System using Roboflow. In Proceedings of the 2024 11th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions)(ICRITO), Noida, India, 14–15 March 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–5. [Google Scholar]
Qing, Y.; Liu, W.; Feng, L.; Gao, W. Improved Yolo network for free-angle remote sensing target detection. Remote Sens. 2021, 13, 2171. [Google Scholar] [CrossRef]
Lu, Y.; Zhang, L.; Xie, W. YOLO-compact: An efficient YOLO network for single category real-time object detection. In Proceedings of the 2020 Chinese Control and Decision Conference (CCDC), Hefei, China, 22–24 August 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1931–1936. [Google Scholar]
Aqaileh, T.; Alkhateeb, F. Automatic Jordanian License Plate Detection and Recognition System Using Deep Learning Techniques. J. Imaging 2023, 9, 201. [Google Scholar] [CrossRef]
Batra, P.; Hussain, I.; Ahad, M.A.; Casalino, G.; Alam, M.A.; Khalique, A.; Hassan, S.I. A novel memory and time-efficient ALPR system based on YOLOv5. Sensors 2022, 22, 5283. [Google Scholar] [CrossRef]
Huang, L.; Huang, W. RD-YOLO: An effective and efficient object detector for roadside perception system. Sensors 2022, 22, 8097. [Google Scholar] [CrossRef]
Sun, X.; Ren, X.; Ma, S.; Wei, B.; Li, W.; Wang, H. Training Simplification and Model Simplification for Deep Learning: A Minimal Effort Back Propagation Method. IEEE Trans. Knowl. Data Eng. 2017, 32, 374–387. [Google Scholar] [CrossRef]
Narkhede, S. Understanding Confusion Matrix–Towards Data Science. Medium, 9 May 2018. Available online: https://towardsdatascience.com/understanding-confusion-matrix-a9ad42dcfd62 (accessed on 20 June 2024).
Handelman, G.S.; Kok, H.K.; Chandra, R.V.; Razavi, A.H.; Huang, S.; Brooks, M.; Lee, M.J.; Asadi, H. Peering into the black box of artificial intelligence: Evaluation metrics of machine learning methods. Am. J. Roentgenol. 2019, 212, 38–43. [Google Scholar] [CrossRef] [PubMed]
Naidu, G.; Zuva, T.; Sibanda, E.M. A review of evaluation metrics in machine learning algorithms. In Proceedings of the Computer Science On-Line Conference; Springer: Berlin/Heidelberg, Germany, 2023; pp. 15–25. [Google Scholar]
Biswas, S.; Riba, P.; Lladós, J.; Pal, U. Beyond document object detection: Instance-level segmentation of complex layouts. Int. J. Doc. Anal. Recognit. (IJDAR) 2021, 24, 269–281. [Google Scholar] [CrossRef]
Padilla, R.; Netto, S.L.; Da Silva, E.A. A survey on performance metrics for object-detection algorithms. In Proceedings of the 2020 International Conference on Systems, Signals and Image Processing (IWSSIP), Niteroi, Brazil, 1–3 July 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 237–242. [Google Scholar]
Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10781–10790. [Google Scholar]
He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE international Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]

Figure 1. One- and two-stage detectors adapted from [9].

Figure 2. Distribution of works in YOLO for plate number recognition by subject area.

Figure 3. Distribution of works in YOLO for plate number recognition by research output.

Figure 4. Distribution of works in YOLO for plate number recognition documents by year.

Figure 5. Prisma flow diagram.

Figure 6. YOLO architecture adapted from [29].

Figure 7. Differences in YOLO features across different versions.

Figure 8. YOLO versions: release year vs. speed (fps).

Figure 9. Pie chart representation of the different features.

Figure 10. YOLOv5 architecture adapted from [43].

Figure 11. YOLOv7 architecture adapted from [45].

Figure 12. YOLOv8 architecture adapted from [49].

Figure 13. PGI architecture adapted from [50].

Figure 14. GELAN architecture adapted from [51].

Figure 15. Proposed model.

Figure 16. Dataset sample from [52].

Figure 17. Confusion matrix for YOLOv5 validation result.

Figure 18. F1 curve for YOLOv5 model.

Figure 19. P curve for YOLOv5 model.

Figure 20. Precision–recall curve for YOLOv5 model.

Figure 21. Recall curve for YOLOv5 model.

Figure 22. Summary of training and validation performance for YOLOv5 model.

Figure 23. Confusion matrix for YOLO V7 validation result.

Figure 24. F1 curve for YOLO V7 training.

Figure 25. Precision curve for YOLO V7 training.

Figure 26. Recall curve for YOLO V7 training.

Figure 27. Precision–recall curve for YOLO V7 training.

Figure 28. Overall results for YOLO V7 training.

Figure 29. Confusion matrix for YOLOv8 result.

Figure 30. F1 curve for YOLOv8 model.

Figure 31. Precision curve for YOLOv8 model.

Figure 32. Recall curve for YOLOv8 model.

Figure 33. Precision–recall curve for YOLOv8 model.

Figure 34. Comprehensive results for YOLOv8 model.

Figure 35. Confusion matrix for YOLOv9 model training/validation.

Figure 36. F1-score curve for YOLOv9 model training/validation.

Figure 37. Precision curve for YOLOv9 model training/validation.

Figure 38. Recall curve for YOLOv9 model training/validation.

Figure 39. Precision–recall curve for YOLOv9 model training/validation.

Figure 40. Comprehensive results for YOLOv9 model training/validation.

Figure 41. Live test output for YOLOv5 model.

Figure 42. Live test output for YOLOv7.

Figure 43. Live test output for YOLOv8 model.

Figure 44. Live test output for YOLOv9.

Figure 45. Application framework: boom gate access.

Table 1. Table of comparison of object detection models.

Model	Architecture	Speed	Accuracy	Training Time
YOLO [10]	One-stage detector	Fast	High	Faster
FasterR-CNN [11]	Two-stage detector	Slower	High	Slower
SSD [12]	One-stage detector	Moderate	Moderately	Data
RetinaNet [13]	One-stage detector	Slow	High	Moderate

Table 2. Document distribution based on year of publication.

Year	Number of Documents
2024	606
2023	1549
2022	1079
2021	619
2020	423

Table 3. Insights from the studied literature.

Model	Methods Used	Application
YOLO-based Models	YOLOv3, YOLOv4, YOLOv5, YOLOv8; real-time object detection models widely used for license plate detection because of their speed and accuracy [17,19,21,28].	Traffic flow monitoring [19,21], crime prevention: assists in detecting stolen vehicles and monitoring traffic violations [18,19].
Convolutional Neural network (CNN)	CNN for feature extractions [17,21,28].	ALPR systems used in toll systems, smart parking, and vehicle identification [15,27].
Transfer Learning	Pre-trained models applied for fine-tuning models with limited regional datasets [20,22].	Regional ALPR system: used to adapt ALPR models for specific regions with distinct license plate formats [21,22].
Optical Character Recognition (OCR)	OCR integrated to recognize characters from detected license plates, improving accuracy in environments with varying lighting conditions and image quality [15,17,28].	Toll collection, traffic law enforcement [15,17,28].
Blockchain Integration	Blockchain + YOLO: combines blockchain with YOLO to ensure secure data storage and management in ALPR systems [18,28].	Data privacy and security: ensure secure transmission and storage of sensitive vehicle data for smart city and ITS applications [18,28].
Multi-Stage Detection Systems	YOLO with multiple stages: detects vehicles first, then license plates, followed by character recognition to enhance robustness [15,16].	Urban parking management: effectively manages parking lots and enforces traffic regulations, including parking violations [15,16].
Helmet Detection and Traffic Violations	YOLO-based helmet detection: integrated helmet and license plate detection for monitoring compliance with helmet laws, especially in two-wheeler traffic [15,16].	Helmet law enforcement: detects non-compliance with helmet laws and captures license plates for issuing fines [15,16].
Lightweight Models for Mobile Use	YOLOv8n and CNN for mobile: optimized versions of YOLO models used in mobile and resource-constrained devices for real-time license plate detection [19,28].	Mobile ALPR: used in mobile apps to assist in vehicle monitoring, security patrols, and parking management [27,28].
Real-Time ALPR Systems	YOLO-based real-time detection: YOLO models (e.g., YOLOv8) enable real-time detection and recognition of vehicles moving at high speeds [17,19,27].	Toll booth and parking enforcement: real-time license plate recognition helps in toll collection, reducing congestion, and monitoring parking [18,19,28].

Table 4. Comparison of YOLOv5, YOLOv7, YOLOv8, and YOLOv9 architectures.

Feature	YOLOv5	YOLOv7	YOLOv8	YOLOv9
Author	Ultralytics	Wang, Bochkovskiy, and Liao [38]	Various authors	Chien, Ju, and Cho [50]
Year	2020	2022	2023	2024
Speed	140 fps	150 fps	160 fps	180 fps
Robustness	High	Higher than YOLOv5	Higher than YOLOv7	Highest among these
Architectural Components	CSPDarknet backbone, PANet neck, YOLO layer head	Input, backbone feature extraction, RepConv prediction	Modified CSPDarknet53, C2f module, decoupled head	GELAN, PGI, reversible functions
Key Enhancements	Efficient object detection	Faster and more accurate than YOLOv5	Mosaic data augmentation, anchor-free detection	Addresses information bottleneck, enhanced efficiency
Variants	Small (s), Medium (m), Large (l), Extra Large (x)	V7, tiny, V7-W6	Nano (n), Small (s), Medium (m), Large (l), XL (x)	Nano (n), Small (s), Compact (c), Extended (e) Medium (m)
Special Features	High accuracy and efficiency	Trainable bag-of-freebies	Task alignment score, Distributional Focal Loss (DFL)	Generalized Efficient Layer Aggregation Network (GELAN)

Table 5. Table of results of the four models.

YOLO Model	F1 Score	Recall	Precision	Accuracy	mAP @ 0.5
YOLOv5	84%	94%	100%	81%	0.888
YOLOv7	81%	93%	100%	82%	0.804
YOLOv8	85%	90%	100%	83%	0.855
YOLOv9	70%	94%	100%	73%	0.705

Table 6. Comparison of results with other state-of-the-art object detection works.

Author	Model Used for Detection	Results Reported	Speed of Detection
Shaoqing Ren et al. [11]	Faster R-CNN	PASCAL VOC 2007: Precision 99%, Recall 99%, mAP 99.9%	5fps (on GPU)
Tsung-Yi Lin et al. [13]	RetinaNet	COCO test-dev AP 39.1%	5fps (ResNet-101-FPN backbone)
Mingxing Tan et al. [65]	EfficientDet	COCO test-dev AP 55.1% (EfficientDet-D7)	4x–11x faster than previous detectors
Kaiming He et al. [66]	Fast R-CNN	PASCAL VOC 2012: mAP 99.7%	5fps (on GPU)
Current study	YOLO v5, v7, v8, and v9	81%,82%, 83%, 73% accuracy	140fps, 150fps, 160fps, 180fps on GPU

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Bukola, A.C.; Owolawi, P.A.; Du, C.; Van Wyk, E. A Systematic Review and Comparative Analysis Approach to Boom Gate Access Using Plate Number Recognition. Computers 2024, 13, 286. https://doi.org/10.3390/computers13110286

AMA Style

Bukola AC, Owolawi PA, Du C, Van Wyk E. A Systematic Review and Comparative Analysis Approach to Boom Gate Access Using Plate Number Recognition. Computers. 2024; 13(11):286. https://doi.org/10.3390/computers13110286

Chicago/Turabian Style

Bukola, Asaju Christine, Pius Adewale Owolawi, Chuling Du, and Etienne Van Wyk. 2024. "A Systematic Review and Comparative Analysis Approach to Boom Gate Access Using Plate Number Recognition" Computers 13, no. 11: 286. https://doi.org/10.3390/computers13110286

APA Style

Bukola, A. C., Owolawi, P. A., Du, C., & Van Wyk, E. (2024). A Systematic Review and Comparative Analysis Approach to Boom Gate Access Using Plate Number Recognition. Computers, 13(11), 286. https://doi.org/10.3390/computers13110286

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Systematic Review and Comparative Analysis Approach to Boom Gate Access Using Plate Number Recognition

Abstract

1. Introduction

2. Related Work: A Systematic Review Approach

Review of Selected Articles

3. YOLO Network Architecture

4. Inception of YOLO Versions

4.1. YOLOv5 Architecture

4.2. YOLOv7 Architecture

4.3. YOLOv8 Architecture

4.4. YOLOV9 Architecture

4.5. Table of Comparison

5. Experiments: YOLOV5, YOLOV7, YOLOV8 and YOLOV9

5.1. Proposed Model

5.2. Data for Training and Validation

5.3. Data Preprocessing

5.4. YOLOv5 Experiment: Training and Validation

5.5. YOLOv7 Experiment: Training and Validation

5.6. YOLOv8 Experiment: Training and Validation

5.7. YOLOv9 Experiment: Training and Validation

6. Results and Discussions

6.1. Results: YOLOv5 Model

6.2. Results: YOLOv7 Model

6.3. Results: YOLOv8 Model

6.4. Result: YOLOv9

7. Comparative Analysis of Results

8. Testing

9. Working Application: Boom Gate Access

10. Conclusions

Limitations of the Study

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI