This section describes approaches to predict inference latency during object detection in edge computing. In an edge computing environment consisting of a client, edge, and cloud, the two proposed methods approximately predict the inference latency of a task between devices and object detection algorithms by using statistical information. Through experiments, in object detection, we found that the inference latency is not proportional to the size of the image and is determined by the characteristics of the image. In addition, it was confirmed that the tendency of inference latency varies depending on the object detection algorithm in use. Therefore, it is difficult to predict the inference latency of the edge and cloud using only the size of the image. Through various analyses, it was observed that there was a statistical similarity between the inference latencies of devices and object detection algorithms. Using this characteristic, the client can predict the inference latency for the edge and cloud based on the statistical information on the inference latency in the edge and cloud. The predicted inference latency could be used in a scheduling step of determining whether to perform the task locally or remotely on the edge and cloud. After that, the inference latency of object detection was measured for 50 image samples out of 5000 images selected for visualization, and the results are presented in the figures below.
4.1. Inference Latency According to Image Size
This section presents the relationship between image size and inference latency when using the RFCN and SSD-MobileNet models for object detection. In [
21], an approach where the inference latency is also increased in proportion to the increase in the image size was considered. However, this approach does not apply to all applications, and an approach suitable for specific application characteristics should be considered.
We measured the inference latency for images with various sizes to analyze the relationship between image size and inference time for the RFCN and SSD-MobileNet models. As shown in
Figure 4, we can confirm that the relationship between image size and inference latency is uncorrelated, which shows that it is difficult to predict inference latency with image size alone in object detection.
For this reason, the client cannot predict the inference latency required for object detection in the edge and cloud devices, leading to a problem in which it cannot determine where to execute the task, locally or remotely, on the edge and cloud. In other words, when exploiting object detection services in edge computing, the process of predicting inference latency at the edges and clouds should be additionally considered.
4.3. Proposed Inference Latency Prediction between Devices
This section introduces the first method (Method 1), wherein the client predicts the inference latency of the edge and cloud with a given inference model. To analyze the similarity in inference latency between devices, we compare the inference latencies of devices by using the normalized inference latency, which converts the distribution of inference latency to a normal distribution. Let
be the distribution of the inference latency for each device. Then, the distribution of normalized inference latency
can be converted as below:
where
and
are the mean and standard deviation of the inference latency for each device for 5000 images. The notations used in this paper are given in
Table 2. From now on, the device information is used as a subscript under the symbols of the mean and standard deviation. For example,
and
are the mean and standard deviation of the cloud device, respectively. The statistical information is listed in
Table 3 and
Table 4 for the RFCN and SSD-MobileNet models, respectively.
We compare the similarities of the normalized inference latency values by reflecting the
and
of inference latency for the client, edge, and cloud devices.
Figure 7 and
Figure 8 show the normalized inference latency with the RFCN and SSD-MobileNet models for the client, edge, and cloud devices, respectively. As shown in
Figure 7 and
Figure 8, we can observe that the normalized inference latency values are similar for the client, edge, and cloud devices. Also, the similarity between the client, edge, and cloud devices is more clearly revealed in REST than in gRPC. Therefore, it can be considered that the inference latency of the client can be utilized to predict the inference latency of the edge and cloud.
Considering the similarity in normalized inference latency between the client, edge, and cloud devices, we propose a method for predicting the inference latency at the edges and clouds for the clients. The predicted inference latency for the edge and cloud in object detection can be obtained by using the normalized inference latency for the client and statistical information for the edge and cloud, as shown in
Figure 9.
Let
be the distribution of the normalized inference latency for the client device. This distribution can be easily obtained by executing object detection in the client. Then, the distributions of predicted inference latency for the edge and cloud
and
can be calculated as below:
We assumed that the client was informed of the
and
of the edge and cloud in advance. To employ the proposed technique, a process of calculating the
and
for the edge and cloud according to the application and dataset is required.
From Equations (2) and (3), the client can predict the inference latency of the edge and cloud. The predicted inference latencies for the edge and cloud are utilized as important parameters in determining whether the client executes the object detection task locally or remotely. In applications such as autonomous driving, which is very sensitive to latency, the communication latency and inference latency should be considered more importantly. Therefore, the proposed inference latency prediction technique is expected to play an important role in various applications.
Additionally, the inference latency of the client and cloud is predictable at the edge. Let
be the distribution of the normalized inference latency for the edge device. Then, the distributions of predicted inference latency for the client and cloud
and
can be computed as below:
When it is difficult for the client to determine the device to execute the task, it is possible to predict the inference latency of other devices at the edge and cloud for scheduling.
The steps of the proposed methods are as follows. First, the client searches for edge and cloud devices that are around (Step 1) and receives their statistical information on inference latency (Step 2). Then, the inference latency is predicted based on
and the received statistical information on the edge and cloud devices from Equations (2) and (3) on the client (Step 3). Considering the predicted inference latency and communication latency of the edge and cloud devices, the client derives an offloading policy regarding to which device to offload, as in [
21] (Step 4). Finally, the workload is offloaded to the selected devices, and they execute object detection (Step 5) and update statistics based on the actual inference latency (Step 6).
4.4. Proposed Inference Latency Prediction between Object Detection Algorithms
This section describes the second method (Method 2), wherein the client with the SSD-MobileNet model predicts the inference latency of the RFCN model for the client, edge, and cloud. It is very efficient to predict the inference latency of the RFCN model through the SSD-MobileNet model because the SSD-MobileNet model operates much faster than the RFCN model, as explained in
Section 3.2. To analyze the relationship between the inference latencies of the RFCN and SSD-MobileNet models, we compare the correlation coefficients
between the normalized inference latency of the SSD-MobileNet model for the client,
, and the normalized inference latency of the RFCN model for the client, edge, and cloud, i.e.,
,
, and
, as listed in
Table 5. The two protocols show somewhat different characteristics. In gRPC,
is observed to have a negative correlation with
,
, and
, and in REST,
is observed to have a negative correlation with
and
and a positive correlation with
. Therefore, it is possible to predict the inference latency of the RFCN model through that of the SSD-MobileNet model by exploiting these correlation characteristics.
We compared the relationship between the normalized inference latencies of the SSD-MobileNet and RFCN models by considering a factor, , to reflect the negative correlation. For the positive and negative correlations, was set to +1 and −1, respectively. Then, the normalized inference latencies of the RFCN model for the client, edge, and cloud, , , and , are predicted as . Therefore, with the gRPC protocol, the normalized inference latencies of the RFCN model for the client, edge, and cloud are predicted as , , and , respectively.
Figure 10 shows the normalized inference latencies of the RFCN and SSD-MobileNet models for the client in the REST protocol. It can be observed that the normalized inference latencies of the client in the RFCN and SSD-MobileNet models have a negative correlation as shown in
Figure 10a. By setting
as –1, we can observe that the normalized inference latencies of the client tend to be similar for the RFCN and SSD-MobileNet models, as shown in
Figure 10b.
Figure 11 shows the normalized inference latencies of the RFCN model for the edge device and the SSD-MobileNet model for the client device in the REST protocol. Similar to what is observed in the relationship between the RFCN and SSD-MobileNet models for the client device, it can be observed that the normalized inference latencies of the RFCN model for the edge device and the SSD-MobileNet model for the client device have a negative correlation, as shown in
Figure 11a. Therefore, it is possible to approximately predict the normalized inference latency of the RFCN model for the edge device by multiplying the normalized inference latency of the SSD-MobileNet model for the client by −1, as shown in
Figure 11b. Similarly, the normalized inference latency of the RFCN for the cloud device can be predicted based on the normalized inference latency of the SSD-MobileNet model for the client device.
Considering these similar characteristics of the normalized inference latency between devices, the actual inference latency of the RFCN model for the client, edge, and cloud can be predicted based on their statistical information,
, and
. First, the SSD-MobileNet model is executed on the client to obtain
, and
is determined by considering the correlation coefficients of normalized inference latency between the RFCN and SSD-MobileNet models. Then, the distributions of predicted inference latency for the client, edge, and cloud
,
and
can be calculated as below:
where
and
are the mean and standard deviation of the inference latency of RFCN model for each device. It is assumed that the
and
for all devices are known in advance on the client and are updated periodically. The correlation characteristics of the inference latency for each inference model and the analysis and delivery of statistical information are not the scope of this paper, and we will further study these topics in the future. As it becomes possible to predict the inference latency of the RFCN model for the client, edge, and cloud devices through the normalized inference latency of the SSD-MobileNet model for the client device, the actual inference latency of the RFCN model with relatively long latency can be predicted as short latency.