Two-Stage Efficient Parking Space Detection Method Based on Deep Learning and Computer Vision

Jiang, Junzhe; Tang, Rongnian; Kang, Weian; Xu, Zengcai; Qian, Cheng

doi:10.3390/app15031004

Open AccessArticle

Two-Stage Efficient Parking Space Detection Method Based on Deep Learning and Computer Vision

by

Junzhe Jiang

,

Rongnian Tang

,

Weian Kang

,

Zengcai Xu

and

Cheng Qian

^*

The School of Mechanical and Electrical Engineering, Hainan University, Haikou 570228, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(3), 1004; https://doi.org/10.3390/app15031004

Submission received: 13 December 2024 / Revised: 15 January 2025 / Accepted: 15 January 2025 / Published: 21 January 2025

Download

Browse Figures

Versions Notes

Abstract

:

One of the basic requirements of an automated parking system is to quickly and accurately detect parking spaces. With the development of deep convolutional neural networks, parking space detection systems are becoming increasingly mature. However, the existing detection systems are often too complicated, resulting in a long detection time. In order to solve this problem, this paper proposes a two-stage parking space detection system, which aims to combine the advantages of deep learning and traditional computer vision and treat parking space detection as a two-stage process. First, we take the vehicle’s circumnavigation image as input and extract the target box of the parking space key points from the global image through the YoloV11 network. Then, using the special convolution kernel proposed in this paper, the local image is convolved to obtain the direction information of the parking space. Finally, the correct parking space is deduced based on its characteristic parameters. The method in this paper achieved a detection accuracy of 98.24% on the public data set ps2.0 (Parking-slot 2.0), and the parking space inference time on a single image was only 12.3 ms. Compared with the existing methods, the two-stage parking space detection system proposed in this paper not only demonstrates higher robustness, but also significantly reduces the time required for parking space inference.

Keywords:

parking space detection; deep learning; computer vision; key point

1. Introduction

In the process of vehicle driving, the main methods for parking space detection are free space, vision-based strategies, and methods that combine free space and vision-based strategies [1]. The free-space-based approach determines suitable parking locations by identifying and measuring the space between adjacent vehicles. This method is widely used because it can be implemented using a variety of ranging sensors. Methods that combine free space and vision-based strategies offer good accuracy and stability. However, they require a significant amount of hardware, which presents certain limitations in the modern automotive industry, where lightweight design is increasingly prioritized.

The vision-based approach comprehensively perceives the surrounding environment through image acquisition, feature extraction, obstacle detection, and other technologies. It especially analyzes of the gap between the ground markings and the vehicle, effectively identifying parking spaces and assisting in parking path planning. In most cases, vision-based methods can provide more accurate parking information than free space [2,3]. At the same time, most automobile manufacturers have begun to produce vehicles equipped with wide-field imaging sensors, usually for the around view monitor (AVM) system composed of fisheye cameras [4,5], and for these reasons, vision-based parking spot detection methods have received a lot of attention.

The two main automated parking space detection methods through visual feature extraction are the line-based method and the angle-based method. Jung et al. [6] assumed that parking space lines had a fixed width, and identified these parking lines in Hough space [7] through peak detection and clustering techniques. They then used T-template matching to identify the separate parking lines. Similarly, Wang et al. [8] proposed a parking line detection method based on Radon space [9]. They believed that compared with Hough transform, Radon transform performed better in noise resistance and robustness. However, a common disadvantage of these two methods [8,10] is that they are more sensitive to changes in the width of the parking space lines. Hamada et al. [11] extracted all line segments using probabilistic Hough transform [12] after obtaining the edge map of the circumferential image and deduced the effective parking space location based on geometric constraints. Although these methods have advantages in the detection of line segments, their adaptability is limited to a certain extent by changes in the width of parking lines. Suhr and Jung [13] studied the method of automatic parking spot detection and tracking in underground and indoor environments. They used the RANSAC algorithm for guide line detection to ensure that the guide line of a parking space can be effectively identified in noisy environments. The limitation of this method is that it is only suitable for vertical parking spaces and requires clearly visible guidance lines. Different from the straight-line based berth detection methods, there are other corner-based berth detection methods, with the work of Suhr and Jung being representative. Refs. [14,15] They used the Harris corner detector [16] to detect significant corner spots in parking spaces. The Harris corner detector is a classical corner detection algorithm. By calculating the local structure matrix of the image, it identifies the image regions with significant changes, which are usually corners or boundary intersections. After the corner detection is completed, Suhr and Jung combine these corners to form the so-called nodes. Based on these nodes and geometric clues in the parking lot, the complete parking space contour is deduced by inference.

With the development of deep learning in recent years, input images are put into specific neural networks, and through continuous iterative learning mechanisms, weak classifiers are gradually trained into strong classifiers, so that they can obtain more accurate results for specific tasks.

Deep convolutional neural networks (DCNNs) have emerged as a fundamental architecture within neural networks, widely applied in domains such as computer vision, image recognition, object detection, and natural language processing. Unlike traditional image processing methods that rely on manually designed feature extraction techniques, DCNNs automatically learn essential features directly from images via a multi-layer network. Typically, DCNNs are composed of three key types of layers: convolutional layers, pooling layers, and fully connected layers. Convolutional layers are responsible for detecting low-level features, such as edges and textures, which are progressively integrated to form complex patterns. Pooling layers reduce the dimensionality of the data, enhancing computational efficiency. Finally, fully connected layers consolidate the extracted features to generate predictions or classification outputs.

The development of DCNN can be traced back to the 1980s, but the real breakthrough came in 2012, when AlexNet [17] made a significant achievement in the ImageNet competition, greatly advancing the research of deep learning and convolutional neural networks. Since then, many advanced DCNN architectures have been introduced, such as VGG [18,19,20,21], ResNet [22,23,24], DenseNet [25,26,27], etc. DCNN can automatically learn and extract useful features from the original data, reduce the dependence on manual feature engineering, greatly reduce the number of parameters in the model through parameter sharing in the spatial dimension of the convolution kernel, and improve computational efficiency. The method based on DCNN has made great contributions in many fields, among which the object detection field discussed in this paper also benefits from this.

The application of DCNN in target detection tasks began with the groundbreaking R-CNN study by Girshick et al. [28] R-CNN is actually a multi-stage detection framework. First, for the input image, the target proposal algorithm is used to generate a candidate bounding box containing a high probability target. Then, the standard DCNN is used as a feature extractor to extract the feature of each candidate bounding box, and finally, the category of the target in the bounding box is determined by the classifier. On the basis of the R-CNN framework, many researchers have proposed a variety of improvement methods to improve its performance, among which the representative ones include Fast R-CNN [29], Faster R-CNN [30], Mask R-CNN [31], etc.

This kind of parking space detection method is effective but still has shortcomings: the judgment of the parking space direction after locating the key point is too complicated, and it is difficult to take into account the accuracy and real-time. Therefore, in view of the existing parking space detection methods, this paper divided the parking space detection into two stages. In the first stage, the key point detection was carried out by using Yolov11 on complex global images. Yolov11, an enhanced deep learning-based object detection model, enables rapid and accurate identification of critical points within images. In the second stage, the rotation identification of parking space was carried out by using traditional computer vision methods combined with the convolution kernel of size

p \times p

proposed in this paper. After obtaining the key point information of the parking space and the rotation of the parking space, the position of the real parking space in the global image can be inferred according to the known information. After testing, it is found that the two-stage parking space detection system has high robustness and greatly reduces the time required for parking space inference.

The experimental results demonstrate that this two-stage parking space detection system is highly robust and significantly reduces inference time. This method is particularly effective in automated parking environments, where it expedites the detection process, optimizing vehicle scheduling and minimizing overall parking time.

2. Research Content

2.1. Research Content and Innovation

The parking spot detection problem is divided into a two-stage problem. In the first stage, deep neural networks are used to find the key points of parking spaces, which can be regarded as an object detection problem in convolutional neural networks. Thanks to the rapid development of object detection algorithms in recent years, many object detection algorithms with excellent performance have emerged, such as two-stage RCNN series, SPPNet [32,33], one-stage YOLO series [34,35], SSD series [36,37], and methods fused with the Transformer’s DETR [38] series. DETR integrates Transformer architecture with object detection, utilizing self-attention mechanisms that allow the model to simultaneously analyze all parts of the image, understanding the global context and relationships between objects. Unlike traditional object detection methods, DETR does not rely on predefined regions or anchor boxes; instead, it uses learnable object queries to directly predict the bounding boxes and categories of objects in the image. This end-to-end approach has shown superior performance and accuracy, especially in tasks requiring fine-grained detection, such as parking spot key point detection. However, the main drawback of DETR is its high computational cost, particularly due to the calculation-heavy self-attention mechanism, which requires significant processing power and memory. This makes it less suitable for real-time applications in vehicles. The YOLO series is based on the classic convolutional neural network (CNN) architecture and adopts a one-stage object detection approach. It treats object detection as a single regression problem, where the entire image is input into the network at once, and the network directly outputs the bounding boxes and class labels for the detected objects. The design of YOLO is simple, with relatively low computational cost, making it highly suitable for scenarios requiring a rapid response. According to the comparison and comprehensive consideration in Figure 1 and Table 1, we choose YOLO series as the method to find the key points of the parking space in key point detection.

After obtaining the location information of the key points of the parking space, the classic method is to use the simplified version of VGGNet proposed by Zhang et al. to classify the key points of the parking space, and the subsequent work is also based on this approach. Kumar VR et al. integrate global and local information to improve the accuracy of parking spot detection [39]. Min et al. first detect parking spots in the spliced ring view, use the detected parking spots as nodes in the diagram, and then adopt a Transformer-based approach. The attention information aggregation and location coding of parking spots are carried out. Finally, a multi-layer perceptron (MLP) is used to make a binary classification judgment on whether a pair of parking spots can form a parking space [40]. However, when the key point location is located, after simple image processing, we believe that the obtained image has very obvious features and little noise interference. Traditional computer vision can be used to process the image extracted by the YOLO network. We construct a convolution kernel whose size is consistent with that of the extracted key point image. This convolution kernel is used to identify whether the parking space is clockwise or counterclockwise, and use the principle that the key points are in pairs to identify the orientation of the parking space, and finally use the characteristics of the parking space to complete the positioning of the parking space.

Compared with the previous methods, the proposed method combines the advantages of deep learning and traditional computer vision, and greatly simplifies the calculation and hardware requirements while ensuring performance, providing a feasible scheme for the industrialization of automatic parking.

2.2. Parking Space Detection

In this section, we will introduce our proposed two-stage parking space detection method in detail. The whole parking space detection system takes panoramic images as input. In Figure 1a, the schematic layout for parking planning is shown (note: (a) represents an ideal vertical parking space), while Figure 1b illustrates an ideal horizontal parking space. The key point areas of the ideal parking spaces are highlighted with blue circles, and the dashed blue lines represent the entrance lines of the ideal parking spaces. In Figure 2, all key points are represented in black, the area where the key points are located is marked with a red box, and the parking space entrance line (the line between two valid key points) is marked with a yellow line. In order to discuss the whole parking space recognition process in detail, it is divided into three sections: key point detection, local image discrimination, and parking space inference. The overall process is shown below (Figure 3).

2.3. Key Point Detection

In order to accurately identify key points, we need to train a key point detector D first. Through the literature review, we find that YoloV11 [40,41] is the most advanced universal target detector based on DCNN at the moment. It can achieve quite high detection accuracy, and the operation speed is also very fast. Based on the comprehensive consideration of accuracy and speed, our key point detector is based on YoloV11.

In order to train the detector, we need to prepare training samples. Fortunately, the ps2.0 dataset training set is a large data set specifically designed for detecting key points of parking spaces, and the training set includes 2290 training images in five categories: indoor parking lot, outdoor normal daylight, outdoor rainy, outdoor shadow, and outdoor street light. It basically includes various scenes that can be encountered in reality, and for each mark point

P_{i}

, a

p \times p

box is selected as the true boundary box of mark point

P_{i}

, with

P_{i}

as the center. The target box used by YoloV11 is 56 pixels long and 56 pixels wide.

We want the classifier to be able to recognize the key points of any rotation angle in any time period. In order to achieve this goal, we can rotate each training image by

j

degree each time and change the brightness of the image to

λ_{i}

times that of the original image;

i = 1,2, 3 \dots n

. In particular, key points of parking spaces with different brightness levels and rotation angles are addressed by applying rotation and brightness transformations to the original data. These enhancements not only expand the dataset for training but also ensure robustness against overfitting. The available data after data enhancement are

\frac{n π}{j}

times the size of the original data. Figure 2a shows the original image, while Figure 2b shows the image rotated by 30 degrees and with brightness reduced to 50% of the original image. The key points are shown in black, and the regions where the key points are located are highlighted in red.

When training the key point detector, the label data we input add a class of labels to the original ps2.0 data set. The key point detector will also look for the anchor frame of the vehicle while looking for the key point anchor frame of the parking space. The purpose of this is to remove the parking space of the existing vehicle in the subsequent parking space inference. Figure 2c is the training image. Label data category 0 represents the key point of the parking space, label data 1 represents the vehicle, and the target anchor frame of each category is stored with four data points, which, respectively, represent the target center position, the target center position, the target width, and the target height.

Based on the key point detector D of YoloV11, the corresponding training parameters are set as follows: the angle j of each rotation of the original image is set to 20 degrees; the variation range of image brightness is

λ_{i} ϵ [0.2, 1.8]

; the batch size is set to 64; the learning rate starts from 0.0001; and the learning rate is divided by 10 for every 50,000 training times.

2.4. Local Image Discrimination

When the key point of the image is obtained, we save the result of each image in a Python list. The structure of the list is expressed as [“Imgname1.jpg”, num of Bounding box, [Bounding box], “Imgname2.jpg”, num of Bounding box, [Bounding box]...] The list records the number of key points in each image obtained by the key point detector D, as well as the identification box of the area where the key point is located. The center point of each identification box is the key point of the parking space we need.

If the confidence of the key point in the image is greater than δ, the region where the key point P_i is located is considered as the target region. Assuming that P₁ and P₂ are key points with confidence greater than δ in the same image, we need to identify whether they can form valid entry lines. First, as shown in Figure 4a,b, (a) represents the correct vertical parking space entrance line, while (b) represents the correct horizontal parking space entrance line. If P₁ and P₂ can form a vertical entrance line for the parking space, then P₁ and P₂ should meet the length constraint condition t₁ ≤ |P₁ P₂| ≤ t₂; if P₁ and P₂ can form a horizontal entrance line for the parking space, then P₁ and P₂ should meet the length constraint condition t₃ ≤ |P₁P₂| ≤ t₄, where t₁, t₂, t₃, and t₄ are hyperparameters depending on the actual length of the entrance line of the real parking space and the resolution of the panoramic camera.

As shown in Figure 4c,d, (c) represents the first erroneous inference of the parking space entrance line, while (d) represents the second incorrect inference of the parking space entrance line. The reason for these two types of errors is that in order to identify the entrance line of the parking space more comprehensively, the values of

t_{1}, t_{2}, t_{3}, {and t}_{4}

are appropriately increased.

Even if the entrance line of the parking space formed by the two key points can meet the constraints between the lengths, it is still likely to fail to form an effective entrance line. For example, in Figure 4c, the length constraint of the entrance line of the parking space can be satisfied, but they still cannot form an effective entrance line, because there are additional key points between, satisfying the formula

\frac{x_{i} - x_{1}}{x_{2} - x_{1}} = \frac{y_{i} - y_{1}}{y_{2} - y_{1}}

(1)

At the same time,

x_{i} \in (x_{1}, x_{2})

and

y_{i} \in (y_{1}, y_{2})

, where the coordinates of

P_{1}

on the image are

(x_{1}, y_{1})

, the coordinates of

P_{2}

on the image are

(x_{2}, y_{2})

, and the coordinates of

P_{i}

on the image are

(x_{i}, y_{i})

.

When the key points identified by the classifier meet the above two conditions at the same time, we still cannot simply think that

P_{1}

and

P_{2}

can form an effective parking space entrance line. For example, in Figure 4d, the two key points

P_{1}

and

P_{2}

meet the above conditions 1 and 2, but cannot form a feasible parking space entrance line because they are on both sides of the target vehicle. Therefore, we must also confirm whether the parking space is clockwise or counterclockwise. Figure 5a shows the clockwise direction of the vertical parking space, Figure 5b shows the counterclockwise direction of the vertical parking space, and Figure 5c shows the clockwise direction of the horizontal parking space. The purpose of differentiating parking spaces into these four categories is to prevent errors in Figure 4d. In the subsequent inference of parking spaces, the rotation of parking spaces must be known to determine the direction of the normal vector perpendicular to the entrance line of the parking space. To identify these four types of parking spaces, we can solve the problem by identifying the target box containing P_i points. In this paper, a convolutional kernel K of

p \times p

with the same size as the real border is constructed. In Figure 6a, we show this convolution kernel. The values of the left half of the convolution kernel are all −1, and the values of the right half are all 1. It is worth noting that the values of 1 and −1 of the convolution kernel are freely defined by the author. Of course, 1 can be changed to other positive numbers A, and −1 can be changed to other negative numbers −A. After the modification, the entire calculation result is A times different from the calculation result obtained using 1 and −1, but the final result has the same positive and negative values, and the rotation direction of the identified parking space is also unchanged. However, it should be noted that the two numbers used should be opposite to each other to ensure the accuracy of the calculation results. Before the rotation detection of parking space, it is necessary to transform the image obtained by the key point detector D into a grayscale image, as shown in Figure 6b, and before converting the image to grayscale, the original 56 × 56 image should be resized to the standard size of 112 × 112 to achieve better results. YoloV11 extracts the target frame containing the key points. In the subsequent operations, linear interpolation is used to enlarge the image to a resolution of 126 × 126. Noise may appear at the boundaries of the parking space, so average filtering is applied to the grayscale image to remove irregularities around the parking lines, making the image smoother, as shown in Figure 6c. Specifically, the average filtering process with a convolution kernel size of 7 × 7 is used to further smooth the image based on Figure 6b. Then, transform the grayscale image into a binary image I by using the Otsu method, as shown in Figure 6d, to transform the gray image into a binary image. Finally, the convolution operation

J (i, j) = (I * K) (i, j)

is carried out by using the defined convolution kernel K and the processed binarization image

I

.

Figure 6d represents a clockwise rotation, Figure 6e represents a counterclockwise rotation, and Figure 6f simulates the binary image of Figure 6e using a 10 × 10 matrix. The matrix in Figure 6f is convolved, with the convolution kernel used to determine the direction of rotation. The final result is 15 × 255, which leads to the conclusion that Figure 6e exhibits counterclockwise rotation. When the key points of the parking spaces that form the entrance lines of two parking spaces are obtained, along with the rotation directions of these key points, we can connect the two key points, then use the rotation direction to determine the normal vector of the parking space entrance line. The normal vector should point toward the inside of the parking space, allowing for the accurate localization of the parking space.

L (i, j) = \{\begin{matrix} - 1, J (i, j) < 0 \\ 1, J (i, j) \geq 0 \end{matrix}

(2)

When the function

J (i, j) \geq 0

, the parking space is in the counterclockwise direction. When the function

J (i, j) < 0

, the parking space is in the clockwise direction. In addition, the function

L (i, j)

is the identification function of the rotation direction of the parking space.

The whole local image discrimination involves three steps. Firstly, the convolution kernel K is used to classify each obtained local image by spiral direction, and then the key points of the parking space with the same spiral direction are combined successively to determine the length. If

t_{1} \leq | P_{n} P_{m} | \leq t_{2}

or

t_{3} \leq | P_{n} P_{m} | \leq t_{4}

, the key points

P_{n}

and

P_{m}

meet the length constraint. Finally, it is determined whether there are other key points

P_{i}

between

P_{n}

and

P_{m}

. If none are found, the key points

P_{n}

and

P_{m}

can form the entrance line of the parking space.

2.5. Parking Space Inference

When

P_{n}

and

P_{m}

can form the entrance of a parking space, assume that the coordinates of

P_{n}

and

P_{m}

on the original image are

(x_{n}, y_{n})

and

(x_{m}, y_{m}),

respectively, the vector

\vec{P_{n} P_{m}}

is the feasible entrance line of the parking space, the local coordinate system is constructed with the midpoint

(\frac{x_{n} + x_{m}}{2}, \frac{y_{n} + y_{m}}{2})

of

P_{n}

and

P_{m}

, and

\frac{\vec{P_{n} P_{m}}}{| P_{n} P_{m} |}

is a basis vector of the local coordinate system, expressed as

\vec{a} = (\frac{x_{m} - x_{n}}{\sqrt{{(x_{m} - x_{n})}^{2} + {(y_{m} - y_{n})}^{2}}}, \frac{y_{m} - y_{n}}{\sqrt{{(x_{m} - x_{n})}^{2} + {(y_{m} - y_{n})}^{2}}})

. Construct a space cartesian coordinate system with the normal vector

\vec{b} = (- \frac{y_{m} - y_{n}}{\sqrt{{(x_{m} - x_{n})}^{2} + {(y_{m} - y_{n})}^{2}}}, \frac{x_{m} - x_{n}}{\sqrt{{(x_{m} - x_{n})}^{2} + {(y_{m} - y_{n})}^{2}}})

of vector

\vec{a}

as another basis, and the normal vector passing through point

P_{n}

and perpendicular to vector

\vec{a}

is expressed as follows:

\begin{matrix} \vec{c} = L (i, j) \times (x_{n} & - \frac{y_{m} - y_{n}}{\sqrt{{(x_{m} - x_{n})}^{2} + {(y_{m} - y_{n})}^{2}}} - x_{n}, y_{n} \\ + \frac{x_{m} - x_{n}}{\sqrt{{(x_{m} - x_{n})}^{2} + {(y_{m} - y_{n})}^{2}}} - y_{n}) \end{matrix}

(3)

By the same token, the expression for a normal vector passing through

P_{m}

and perpendicular to the vector

\vec{a}

is

\vec{d} = L (i, j) \times (x_{m} - \frac{y_{m} - y_{n}}{\sqrt{{(x_{m} - x_{n})}^{2} + {(y_{m} - y_{n})}^{2}}} - x_{m}, y_{m} + \frac{x_{m} - x_{n}}{\sqrt{{(x_{m} - x_{n})}^{2} + {(y_{m} - y_{n})}^{2}}} - y_{m})

(4)

where

L (i, j)

is used to specify the direction of vectors

\vec{c}

and

\vec{d}

. If the parking space direction is counterclockwise, the vector direction is left; if the parking space direction is clockwise, the vector direction is right. According to prior knowledge, the length of the entrance line of the horizontal parking space is

λ_{1}

, the width of the horizontal parking space is

λ_{2}

, the length of the entrance line of the vertical parking space is

μ_{1}

, the width of the vertical parking space is

μ_{2}

, and each parking space can be represented by four vectors. The entrance line of the horizontal parking space is the vector

λ_{1} \vec{a},

originating from

P_{n}

. The width of the horizontal parking space is represented by the vectors

λ_{2} \vec{c}

and

λ_{2} \vec{d,}

originating from

P_{n}

and

P_{m,}

, respectively. The coordinates of points

P_{n}

and

P_{m}

in the local coordinate system are, respectively, (

- \frac{\sqrt{{(x_{m} - x_{n})}^{2} + {(y_{m} - y_{n})}^{2}}}{2}

,0) and (

\frac{\sqrt{{(x_{m} - x_{n})}^{2} + {(y_{m} - y_{n})}^{2}}}{2}

,0). In reality, the parking space is composed of two parallel lines, and the vector parallel to the entrance line of the parking space can be expressed in the local coordinate system as points

ζ_{a} = (- \frac{\sqrt{{(x_{m} - x_{n})}^{2} + {(y_{m} - y_{n})}^{2}}}{2}

,

L (i, j) λ_{1}), s e r v i n g

as the starting point.

ζ_{b} = (- \frac{\sqrt{{(x_{m} - x_{n})}^{2} + {(y_{m} - y_{n})}^{2}}}{2}

,

L (i, j) λ_{1}) r e p r e s e n t s

the endpoint of the vector

\vec{ζ_{a} ζ_{b}}

, and vector

\vec{ζ_{a} ζ_{b}}

can be transformed into vector

ζ_{λ}

in the global coordinate system through the coordinate transformation matrix

T

, which is represented as

ζ = T \vec{ζ_{a} ζ_{b}}

. For any parking space entrance line, as long as the identification function

L (i, j)

of its rotation and the key points of the parking space,

P_{n}

and

P_{m},

can be represented by four vectors,

\vec{P_{n} P_{m}}, λ_{2} \vec{c}, λ_{2} \vec{d}, a n d ζ_{λ}

, the same representation applies to the vertical coordinate system, which can also be represented by four vectors:

\vec{P_{n} P_{m}}, μ_{2} \vec{c}, μ_{2} \vec{d}, ζ_{μ}

. The coordinates of all points and the settings of hyperparameters are in pixels. The upper left corner of the image is the coordinate origin, with the rightward direction being the positive direction of the x-axis, and the downward direction being the positive direction of the y-axis. The point

(x_{n}, y_{n})

is located

x_{n}

pixels to the right of the coordinate origin and

y_{n}

pixels below.

When there are only two parking space key points in the image, the parking space can be directly inferred using the method above. However, an image may contain multiple feasible parking spaces, as shown in Figure 5b, where three feasible parking spaces are present. Therefore, when YOLOv11 detects multiple parking space key points, the first step is to determine the parking space rotation direction. Key points with the same rotation direction in the same image are grouped together. Then, all possible combinations of the parking space key points are formed. If there are N key points with the same rotation direction, there are

C_{N}^{2}

possible combinations. Afterward, the combinations that meet the conditions in Formula (1) and the length constraints are identified to form valid parking space entrance lines. Finally, the correct parking spaces are inferred in sequence.

3. Result Analysis

In this study, we compared the performance of several different algorithms, including accuracy and positioning error analysis, as well as overall performance analysis.

3.1. Accuracy Analysis

In our proposed parking space detection method, there are some hyperparameters that need to be determined. First, it should be noted that the YoloV11 framework we use specifies the size of the input image. Therefore, when obtaining the panoramic image of the vehicle, we need to first unify the size of the input image. All hyperparameters are empirically set to be compatible with this resolution. We specify the specific values of these hyperparameters in Table 2. The specific meaning of hyperparameters can be understood from the context.

The parameter selection in this paper is based on the best parameter selection obtained in the actual experiment. For example, p represents the size of the target box detected by YoloV11, j represents the angle of each image increase during image enhancement. These parameters are selected according to the hardware equipment of the computer and the convenience of the subsequent actual operations. For example, the data

t_{1}, t_{2}

,

t_{3}

are determined according to the size of the actual parking space.

In parking space detection, key point detection is a crucial step. In order to complete this task, we propose a method based on YoloV11. In this experiment, its performance was evaluated and compared with other classical object detection algorithms HOG+SVM, ACF+Boosting, Faster-RCNN, SSD, and DETR on the data set ps2.0.

In Figure 7, we present statistics on FN and FPPI of two classical machine learning algorithms and four deep learning-based algorithms, including YoloV11, in the identification of key points of parking spaces. We found that in the detection task of key points of parking Spaces, the depth-based algorithm generally showed better performance in the verification set. In particular, YoloV11 and DETR were very close in accuracy.

3.2. Positioning Error Analysis

In addition to accuracy, positioning error is also an important indicator to measure system performance. We assume that the key points of the default parking spaces are located at the center of the local feature map to infer parking spaces, so positioning error will directly affect the success rate of parking. We recorded the mean value and standard deviation of all the detected true positive targets relative to the real key points as the measurement of positioning error in Table 3. In the comparison in Table 3, the depth-based algorithm still shows a better effect on positioning error, but the effect of Faster-RCNN on positioning error is relatively poor because the prior box size used by Faster-RCNN is larger. However, in terms of the detection of key points in parking spaces, the previous box size is larger. The key point can be considered as a small target detection task relative to the whole image. Although DETR has the best performance in terms of accuracy, it automatically generates target frames based on the network. Although this method greatly simplifies the intermediate process, it is also mentioned in the original text that DETR [35] needs to improve its performance in the face of small-target detection tasks. Compared with DETR, YoloV11 has better performance in positioning error. The reason is that YoloV11 adopts multiple YOLO-Heads as output, and different YOLO-Heads have different receptive fields. The optimal receptive field is selected as output result according to the size of detection target. As a result, YoloV11 can achieve more accurate positioning accuracy. At the same time, we list the time cost of the depth model to detect the marked points in an image in Table 4. It can be seen from Table 4 that among the several depth-based models compared, YoloV11 calculates the least time cost for the key points of the parking space.

3.3. Overall Performance Analysis of Different Algorithms

The reason why the YoloV11 algorithm can be outstanding in performance is that after many rounds of iterations, the whole algorithm, from YoloV1 in 2015 to YoloV11 in 2024, has been adjusted in the network structure for many times, which makes YoloV11 exhibit stronger feature extraction abilities and more efficient use of parameters compared to previous versions. The accuracy is improved, while the number of parameters is reduced, and the NMS is replaced with a more efficient algorithm. In target detection tasks, DETR algorithm is a true end-to-end algorithm, it does not need to set a prior box, and the final bounding box is completely generated by the network. The reason why the DETR algorithm can be so efficient is that the added Transform can enlarge the receptive field of the image to contain the whole image. The accuracy of the algorithm is improved by the self-attention mechanism, but the localization accuracy of the target frame is reduced at the same time. Therefore, considering all advantages and disadvantages, this paper believes that YoloV11 is the best choice to detect the key points of parking spaces.

In the process of handling the global image and deducing the parking frame, we compared four methods, including the method proposed in this paper. It can be clearly seen in Table 5 that the accuracy rate of parking space inference can be above 98%, regardless of the method based on it. The idea proposed in this paper is inherited from the key point detector based on Yolo algorithm. The PSD_L decision tree-based marker detector is abandoned. However, the relative novelty is mainly reflected in two aspects. First of all, we use the latest YoloV11 as the key point detector instead of the used Yolov2. Secondly, in processing local images, we use the method based on deep learning to judge the category of parking space and propose a brand new method. Building a custom convolution kernel can also achieve similar effects. However, in terms of computational amount, it is necessary to obtain the category of parking space after five convolutional operations using a 3 × 3 convolution kernel, three maximum pooling operations using a 3 × 3 pooling kernel size, and two fully connected layers with 4096 neurons. By contrast, the adopted method only requires one convolutional operation with a 4 × 3 convolution kernel to obtain the correct result. In this paper, the final size of the binary image is 56 × 56, so in the example given in this paper, the judgment of the rotation of each parking space key point only needs 3136 times of multiplication operations and 3135 times of addition operations, which is one of the reasons why it can be ahead of other algorithms in computing power.

The table shows the time and accuracy required from obtaining the global image to inferring the parking space. It can be seen from the table that the accuracy of various methods reaches more than 98%, but there is a huge difference in time consumption. The reason is that the method used in this paper does not continue to use a neural network to assist in inferring the parking space after obtaining the local image, as other algorithms do. Instead, a convolutional kernel is designed to replace the second convolutional neural network, which greatly reduces the computational time required.

The reason why there is such a huge difference in the cost of computing time is that for various types of convolutional neural networks, the difference lies in the use of different network architectures. However, convolutional neural networks all need to undergo multiple feature extraction, and in order for the network to fully learn the features of images, a large number of parameters are generally required for learning. However, these parameters are often not valid when performing calculations, especially for simple tasks. Correspondingly, if the characteristics of a clear image with little noise are very clear, then traditional computer vision will obtain better results. Therefore, in this paper, convolutional neural networks are abandoned when inferring parking spaces. Instead, a convolution kernel of one size is used to inferring the spin direction of parking spaces, and the convolution kernel proposed in this paper is used for direct inference, which is equivalent to a convolution operation in the convolutional neural network with the convolution kernel size

p

, step size

p,

and no filling. The complex neural network is transformed into a network operation with only one convolutional layer, which greatly improves the operation speed. In other words, if a function

f (A, α)

can be artificially constructed and approximated to the function

g (A, β)

learned by the neural network, it can be expressed as

f (A, α) \approx g (A, β)

, where

A

represents the input image matrix, and

α

and

β

are the parameters of a vector representation function with dimension

n

. This paper believes that in this case, choosing function

f (A, α)

can often save significant calculation time while providing approximate results.

3.4. Experimental Hardware

The parking space inference was performed on both a laptop platform and a desktop platform. The configuration used for the laptop platform was Intel i5-12450H, 2.0 GHz, 16 GB RAM, equipped with NVIDIA RTX 4060 GPU and 8 GB GDDR VRAM. For inference on 100 images with a resolution of 432 × 432, the average time per image is 16.82 ms. The configuration used for the desktop platform was Intel Core i7-8700, 3.2 GHz, 32 GB RAM, equipped with NVIDIA RTX 4060 GPU and 8 GB GDDR VRAM. For inference on 100 images with a resolution of 432 × 432, the average time per image is 12.3ms. All experimental data in Table 5 are based on the results obtained on the laptop platform described above.

4. Conclusions

This paper presents a two-stage parking space detection system. The core contribution of this method is that it integrates deep learning and traditional computer vision and proposes a special convolution kernel to judge the direction of parking space. This method can guarantee the accuracy of parking space inference, greatly shortening the inference time. The experimental results on the ps2.0 dataset show that the proposed method is a phased full-function parking space detection algorithm, which achieves the same accuracy as Deepps [42]. Compared with all kinds of advanced methods, the method proposed in this paper has shorter computing time and lower hardware requirements, being more suitable for embedded device deployment.

To fully consider the impact of complex environments on parking space detection, the system is designed as a two-stage detection method. In the first stage, the YOLOv11 network with its complex structure was used to locate key points of the parking spaces. Extensive experiments on the ps2.0 dataset showed good results in five different scenarios: indoor parking lot, outdoor normal daylight, outdoor rainy, outdoor shadow, and outdoor streetlight. In the second stage, the processed images were resized to 56 × 56 pixels, retaining only the necessary information for detection. By first reducing the complex image to a smaller size and localizing relevant features, the algorithm avoids interference from complex environments. A simple convolution kernel is then used to infer the parking space.

However, the performance of parking space inference decreases in more complex environments, such as when parking spaces are occluded. Additionally, the algorithm struggles to accurately detect specific parking spaces, such as inclined spaces. For non-standard parking space scenarios, it is first necessary to determine whether the parking space is a standard type or a special type based on the angle of the parking space. If it is a standard type parking space, the angle will be 90°; if it is a non-standard type parking space, the angle will not be 90°. To design a convolution kernel suitable for all types of parking spaces, an additional step must be added during parking space inference. Specifically, computer vision techniques should be used on the binary image to calculate the angle of the parking space, and the convolution kernel used to detect the parking space’s orientation should be dynamically adjusted based on the calculated angle. This will be addressed in future research.

Author Contributions

Conceptualization, C.Q. and Z.X.; methodology, Z.X., W.K. and J.J.; validation, W.K. and Z.X.; formal analysis, W.K.; investigation, W.K., J.J. and R.T.; data curation, R.T. and Z.X.; writing—original draft preparation, W.K.; writing—review and editing, R.T. and Z.X.; visualization, C.Q. and Z.X.; supervision, R.T. and C.Q.; project administration, J.J.; funding acquisition, R.T. and C.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Hainan University Science Startup Foundation under Grant RZ2300002364, National Natural Science Found of China grant number 32060413. At the same time, I would like to express my sincere thanks to the relevant professors who provided guidance for this research.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used in this study can be obtained from the corresponding authors.

Acknowledgments

The authors want to thank the editor and the anonymous reviewers for their valuable suggestions for improving this paper.

Conflicts of Interest

The authors declare no conflicts of interest.

References

De Luelmo, S.P.; Garcia-Espinosa, F.J.; Montemayor, A.S.; José Pantrigo, J. Combining deep learning methods and rule-based systems for automatic parking space detection. Integr. Comput. Aided Eng. 2025, 32, 95–106. [Google Scholar] [CrossRef]
Al-Absi, H.R.; Devaraj, J.D.D.; Sebastian, P.; Voon, Y.V. Vision-based automated parking system. In Proceedings of the 10th International Conference on Information Science, Signal Processing and their Applications (ISSPA 2010), Kuala Lumpur, Malaysia, 10–13 May 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 757–760. [Google Scholar]
Wong, G.S.; Goh KO, M.; Tee, C.; Md Sabri, A.Q. Review of vision-based deep learning parking slot detection on surround view images. Sensors 2023, 23, 6869. [Google Scholar] [CrossRef] [PubMed]
Li, L.; Li, C.; Zhang, Q.; Guo, T.; Miao, Z. Automatic parking slot detection based on around view monitor (AVM) systems. In Proceedings of the 9th International Conference on Wireless Communications and Signal Processing (WCSP), Nanjing, China, 11–13 October 2017; pp. 1–6. [Google Scholar] [CrossRef]
Jo, Y.G.; Hong, S.H.; Hwang, S.S.; Ha, J.M. Fisheye Lens Camera based Autonomous Valet Parking System. arXiv 2021, arXiv:2104.13119. [Google Scholar]
Jung, H.G.; Kim, D.S.; Yoon, P.J.; Kim, J. Parking slot markings recognition for automatic parking assist system. In Proceedings of the IEEE Intelligent Vehicles Symposium, Tokyo, Japan, 13–15 June 2006; pp. 106–113. [Google Scholar]
Sonka, M.; Hlavac, V.; Boyle, R. Image Processing, Analysis, and Machine Vision; Thomson-Engineering: Atlanta, GA, USA, 2008. [Google Scholar]
Wang, C.; Zhang, H.; Yang, M.; Wang, X.; Ye, L.; Guo, C. Automatic parking based on a bird’s eye view vision system. Adv. Mech. Eng. 2014, 6, 847406. [Google Scholar] [CrossRef]
Deans, S.R. The Radon Transform and Some of Its Applications; Dover: New York, NY, USA, 1983. [Google Scholar]
Kumar, V.R.; Eising, C.; Witt, C.; Yogamani, S.K. Surround-view fisheye camera perception for automated driving: Overview, survey & challenges. IEEE Trans. Intell. Transp. Syst. 2023, 24, 3638–3659. [Google Scholar]
Hamada, K.; Hu, Z.; Fan, M.; Chen, H. Surround view based parking lot detection and tracking. In Proceedings of the IEEE Intelligent Vehicles Symposium, Seoul, Republic of Korea, 28 June–1 July 2015; pp. 1106–1111. [Google Scholar]
Matas, J.; Galambos, C.; Kittler, J. Robust detection of lines using the progressive probabilistic hough transform. Comput. Vis. Image Understand 2000, 78, 119–137. [Google Scholar] [CrossRef]
Suhr, J.K.; Jung, H.G. Automatic parking space detection and tracking for underground and indoor environments. IEEE Trans. Ind. Electron. 2016, 63, 5687–5698. [Google Scholar] [CrossRef]
Suhr, J.K.; Jung, H.G. Full-automatic recognition of various parking slot markings using a hierarchical tree structure. Opt. Eng. 2013, 52, 037203. [Google Scholar] [CrossRef]
Suhr, J.K.; Jung, H.G. Sensor fusion-based vacant parking slot detection and tracking. IEEE Trans. Intell. Transp. Syst. 2014, 15, 21–36. [Google Scholar] [CrossRef]
Harris, C.; Stephens, M. A combined corner and edge detector. In Proceedings of the 4th Alvey Vision Conference, Manchester, UK, 31 August–2 September 1988; pp. 147–151. [Google Scholar]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012; pp. 1097–1105. [Google Scholar]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015; pp. 1–14. [Google Scholar]
Vedaldi, A.; Zisserman, A. VGG Convolutional Neural Networks Practical; Department of Engineering Science, University of Oxford: Oxford, UK, 2016; p. 66. [Google Scholar]
Rajinikanth, V.; Joseph Raj, A.N.; Thanaraj, K.P.; Naik, G.R. A customized VGG19 network with concatenation of deep and handcrafted features for brain tumor detection. Appl. Sci. 2020, 10, 3429. [Google Scholar] [CrossRef]
Thakur, N.; Bhattacharjee, E.; Jain, R.; Acharya, B.; Hu, Y.C. Deep learning-based parking occupancy detection framework using ResNet and VGG-16. Multimed. Tools Appl. 2024, 83, 1941–1964. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Koonce, B.; Koonce, B.E. Convolutional Neural Networks with Swift for Tensorflow: Image Recognition and Dataset Categorization; Apress: New York, NY, USA, 2021. [Google Scholar]
Wu, Z.; Shen, C.; Van Den Hengel, A. Wider or deeper: Revisiting the resnet model for visual recognition. Pattern Recognit. 2019, 90, 119–133. [Google Scholar] [CrossRef]
Zhu, Y.; Newsam, S. Densenet for dense flow. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 790–794. [Google Scholar]
Huang, G.; Liu, S.; Van der Maaten, L.; Weinberger, K.Q. Condensenet: An efficient densenet using learned group convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 2752–2761. [Google Scholar]
Rajyalakshmi, V.; Lakshmanna, K. Detection of car parking space by using Hybrid Deep DenseNet Optimization algorithm. Int. J. Netw. Manag. 2024, 34, e2228. [Google Scholar] [CrossRef]
Girshick, R.; Donahue, J.; Darrel, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
Yasaswi, B. AOA based masked region-CNN model for detection of parking space in IoT environment. Int. Res. J. Multidiscip. Technovation 2024, 6, 97–108. [Google Scholar]
Zhu, Y.M.; Abdalla, A.; Tang, Z.; Cen, H.Y. Improving rice nitrogen stress diagnosis by denoising strips in hyperspectral images via deep learning. Biosyst. Eng. 2022, 219, 165–176. [Google Scholar] [CrossRef]
Chaudhry, M.; Maurya, S.; Singh, S.P.; Yadav, S. An Analysis of Deep Learning-Based Studies on Object Detection. In Proceedings of the 2023 3rd International Conference on Technological Advancements in Computational Sciences (ICTACS), Tashkent, Uzbekistan, 1–3 November 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 134–138. [Google Scholar]
Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
Huang, Y.; Huang, H.; Qin, F.; Chen, Y.; Zou, J.; Liu, B.; Qiao, X. YOLO-IAPs: A Rapid Detection Method for Invasive Alien Plants in the Wild Based on Improved YOLOv9. Agriculture 2024, 14, 2201. [Google Scholar] [CrossRef]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Springer International Publishing: Amsterdam, The Netherlands, 2016; pp. 21–37. [Google Scholar]
Zhai, S.; Shang, D.; Wang, S.; Dong, S. DF-SSD: An improved SSD object detection algorithm based on DenseNet and feature fusion. IEEE Access 2020, 8, 24344–24357. [Google Scholar] [CrossRef]
Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 213–229. [Google Scholar]
Min, C.; Xu, J.; Xiao, L.; Zhao, D.; Nie, Y.; Dai, B. Attentional graph neural network for parking-slot detection. IEEE Robot. Autom. Lett. 2021, 6, 3445–3450. [Google Scholar] [CrossRef]
Khanam, R.; Hussain, M. Yolov11: An overview of the key architectural enhancements. arXiv 2024, arXiv:2410.17725. [Google Scholar]
Alif, M.A.R. YOLOv11 for Vehicle Detection: Advancements, Performance, and Applications in Intelligent Transportation Systems. arXiv 2024, arXiv:2410.22898. [Google Scholar]
Zhang, L.; Huang, J.; Li, X.; Xiong, L. Vision-based parking-slot detection: A DCNN-based approach and a large-scale benchmark dataset. IEEE Trans. Image Process. 2018, 27, 5350–5364. [Google Scholar] [CrossRef]

Figure 1. (a) Vertical parking space (b) Horizontal parking space. (c) The input image.

Figure 2. (a) The original image; (b) image generated by rotating it 60 degrees and increasing the brightness to 1.5 times that of the original image. (c) training image.

Figure 3. Flow chart.

Figure 4. Schematic diagram of key points of parking space. (Note: the target box of the key point of the parking space identified by the YoloV11 detector is marked with a red square box, the black dot is the key point of the parking space, the key point of the parking space is in the center of the red box, and the yellow dashed line represents the entrance line of the parking space.) (a) Correct vertical parking space entrance line; (b) correct level parking space entrance line. (c) The first erroneous inference; (d) the second erroneous inference.

Figure 5. Schematic diagram of clockwise and counterclockwise layouts. (a) A clockwise vertical parking space; (b) a counterclockwise vertical parking space; (c) a clockwise horizontal parking space; (d) a counterclockwise horizontal parking space.

Figure 6. (a) The convolution kernel proposed for judging the turn of the parking space; (b) YoloV11 extracts the target frame containing the key points. In the subsequent operations, linear interpolation is used to enlarge the image to a resolution of 126 × 126. There will be noise disturbance at the parking space boundary. (c) On the basis of Figure (b), the average filtering process with the convolution kernel size of 7 × 7 makes the image more smooth. (d) Binary image of a clockwise-rotated vertical parking space. (e) Binary image of a clockwise-rotated vertical parking space. (f) Binary image of a counterclockwise-rotated vertical parking space.

Figure 7. Performance comparison of multiple target detection algorithms. (a) Performance comparison of four target detection algorithms based on neural networks. (b) Two traditional machine learning algorithms.

Table 1. Actual data label table corresponding to the input image (Figure 1c).

Target Classification	Target Center $X$	Target Center $Y$	Target Width	Target Altitude
0	0.212963	0.197090	0.042328	0.042328
0	0.212963	0.328042	0.042328	0.042328
0	0.212963	0.609127	0.042328	0.042328
1	0.089286	0.472884	0.175926	0.284392

Table 2. Parameter assignment table.

Parameter	Value	Parameter	Value
p	56 Pixel	$λ_{2}$	30°
$δ$	71.5%	$t_{1}$	195 Pixel
$μ_{2}$	83 Pixel	$t_{3}$	160 Pixel
$t_{2}$	279 Pixel	$ℷ$	86 Pixel
$t_{4}$	139 Pixel	$J$	0.5–1.5

Table 3. Comparison of the offset between the key point and the actual label.

Method	Localization Error (in Pixel)	Localization Error (in cm)
HoG+SVM	4.03 ± 1.98	6.72 ± 3.30
ACF+Boosting	2.86 ± 1.54	4.77 ± 2.57
Faster-RCNN	3.67 ± 2.32	6.12 ± 3.87
SSD	1.51 ± 1.17	2.52 ± 1.95
DETR	1.76 ± 1.43	3.52 ± 2.16
YoloV11-based	1.12 ± 1.05	1.46 ± 1.75

Table 4. Comparison of calculation efficiency.

Method	Time Cost (ms)
Faster-RCNN	63.7
SSD	27.1
DETR	106.4
YoloV11	9.3

Table 5. A comparison of the time and accuracy required to deduce the parking space.

Method	Precision Rate	Time Cost (ms)
$P S D_L$	98.55%	269.4
${P S D_L}_{L I P}$	98.19%	138.1
$D e e p P S$	99.54%	95.6
$S e c o n d S t a g e P S$	98.24%	12.3

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jiang, J.; Tang, R.; Kang, W.; Xu, Z.; Qian, C. Two-Stage Efficient Parking Space Detection Method Based on Deep Learning and Computer Vision. Appl. Sci. 2025, 15, 1004. https://doi.org/10.3390/app15031004

AMA Style

Jiang J, Tang R, Kang W, Xu Z, Qian C. Two-Stage Efficient Parking Space Detection Method Based on Deep Learning and Computer Vision. Applied Sciences. 2025; 15(3):1004. https://doi.org/10.3390/app15031004

Chicago/Turabian Style

Jiang, Junzhe, Rongnian Tang, Weian Kang, Zengcai Xu, and Cheng Qian. 2025. "Two-Stage Efficient Parking Space Detection Method Based on Deep Learning and Computer Vision" Applied Sciences 15, no. 3: 1004. https://doi.org/10.3390/app15031004

APA Style

Jiang, J., Tang, R., Kang, W., Xu, Z., & Qian, C. (2025). Two-Stage Efficient Parking Space Detection Method Based on Deep Learning and Computer Vision. Applied Sciences, 15(3), 1004. https://doi.org/10.3390/app15031004

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Two-Stage Efficient Parking Space Detection Method Based on Deep Learning and Computer Vision

Abstract

1. Introduction

2. Research Content

2.1. Research Content and Innovation

2.2. Parking Space Detection

2.3. Key Point Detection

2.4. Local Image Discrimination

2.5. Parking Space Inference

3. Result Analysis

3.1. Accuracy Analysis

3.2. Positioning Error Analysis

3.3. Overall Performance Analysis of Different Algorithms

3.4. Experimental Hardware

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI