Author Contributions
Conceptualization, X.W., L.C. and S.Z.; Methodology, X.W.; Software, X.W.; Formal analysis, X.W., L.C., S.Z., Y.J., L.T. and Y.Z.; Data curation, X.W.; Writing—original draft preparation, X.W.; Writing—review and editing, X.W., L.C., S.Z., Y.J., L.T. and Y.Z.; Visualization, X.W.; Supervision, L.C. and S.Z. All authors have read and agreed to the published version of the manuscript.
Figure 1.
China’s fire four indicators. China fire data in recent years, including the number of accidents, economic losses, deaths, and injuries.
Figure 1.
China’s fire four indicators. China fire data in recent years, including the number of accidents, economic losses, deaths, and injuries.
Figure 2.
The YOLOv5 structure follows a similar framework as the overall YOLO series. It includes the Input terminal, Backbone, Neck, and Output terminal.
Figure 2.
The YOLOv5 structure follows a similar framework as the overall YOLO series. It includes the Input terminal, Backbone, Neck, and Output terminal.
Figure 3.
Diagram of the FPN+PAN sampling structure.
Figure 3.
Diagram of the FPN+PAN sampling structure.
Figure 4.
Block diagram of IoU.
Figure 4.
Block diagram of IoU.
Figure 5.
The overall network structure of CAGSA-YOLO. The white block part is where the original model remains unchanged, and the gray block part is the improved part in this study.
Figure 5.
The overall network structure of CAGSA-YOLO. The white block part is where the original model remains unchanged, and the gray block part is the improved part in this study.
Figure 6.
The overall framework of CARAFE (Content-Aware Reassembly of Features). CARAFE as a whole is composed of the kernel prediction module and content-aware reassembly module.
Figure 6.
The overall framework of CARAFE (Content-Aware Reassembly of Features). CARAFE as a whole is composed of the kernel prediction module and content-aware reassembly module.
Figure 7.
Ghost convolution frame diagram. (a) is the ordinary convolution, and (b) is the Ghost convolution. In Ghost convolution, a part of the feature map is generated via the ordinary convolution operation with fewer convolution kernels, and then another part of the feature map is generated via simple calculation and is finally concatenated to achieve the effect of increasing channels and expanding features.
Figure 7.
Ghost convolution frame diagram. (a) is the ordinary convolution, and (b) is the Ghost convolution. In Ghost convolution, a part of the feature map is generated via the ordinary convolution operation with fewer convolution kernels, and then another part of the feature map is generated via simple calculation and is finally concatenated to achieve the effect of increasing channels and expanding features.
Figure 8.
SA Overall framework. Firstly, the channel dimension is grouped into multiple sub-features, and then the sub-features are integrated into the complementary channel and spatial attention module through the shuffle unit.
Figure 8.
SA Overall framework. Firstly, the channel dimension is grouped into multiple sub-features, and then the sub-features are integrated into the complementary channel and spatial attention module through the shuffle unit.
Figure 9.
Common fire safety tools. The fire safety tools are added to the fire detection data set and integrated into a new fire safety data set.
Figure 9.
Common fire safety tools. The fire safety tools are added to the fire detection data set and integrated into a new fire safety data set.
Figure 10.
Category labeling. The fire safety dataset was expanded to eight categories.
Figure 10.
Category labeling. The fire safety dataset was expanded to eight categories.
Figure 11.
LabelImg software annotation example.
Figure 11.
LabelImg software annotation example.
Figure 12.
Number of labels and label borders.
Figure 12.
Number of labels and label borders.
Figure 13.
Label position and label size.
Figure 13.
Label position and label size.
Figure 14.
This is a comparison of the model before and after. It can be seen from (a) that although CAGSA-YOLO has a slower increase in mAP than the original model at the beginning, it has a higher accuracy in the end. In (b), it can be seen that the loss of the improved model decreases faster, and it is smaller, which indicates more accurate detection.
Figure 14.
This is a comparison of the model before and after. It can be seen from (a) that although CAGSA-YOLO has a slower increase in mAP than the original model at the beginning, it has a higher accuracy in the end. In (b), it can be seen that the loss of the improved model decreases faster, and it is smaller, which indicates more accurate detection.
Figure 15.
The original YOLOv5s model for detection.
Figure 15.
The original YOLOv5s model for detection.
Figure 16.
Improved CAGSA-YOLO model for detection.
Figure 16.
Improved CAGSA-YOLO model for detection.
Table 1.
Main symbols’ abbreviations and definitions.
Table 1.
Main symbols’ abbreviations and definitions.
P | Precision |
R | Recall |
mAP | Average precision |
Param | Amount of Parameter calculation |
GFLOPs | Giga Floating-point Operations Per Second |
CARAFE | Content-Aware Reassembly of Features |
| Used to control the scaling size of the encoder part |
| Used to control the resizing size of the recombiner |
SA | Shuffle attention |
CAGSA-YOLO | YOLOv5s + CARAFE + scale + Ghost + SA |
Table 2.
Table about anchors assignment.
Table 2.
Table about anchors assignment.
Feature Map Size | P3/8(80 × 80) | P4/16(40 × 40) | P5/32(20 × 20) |
---|
Receptive field size | medium | large | larger |
Anchor frame size | [10, 13] | [30, 61] | [116, 90] |
[16, 30] | [62, 45] | [156, 198] |
[33, 23] | [59, 119] | [373, 326] |
Table 3.
This is a comparison table of upsampled models.
Table 3.
This is a comparison table of upsampled models.
Faster R-CNN | | | | | | |
---|
Nearest | 36.5 | 58.4 | 39.3 | 21.3 | 40.3 | 47.2 |
Bilinear | 36.7 | 58.7 | 39.7 | 21.0 | 40.5 | 47.5 |
Deconv | 36.4 | 58.2 | 39.2 | 21.3 | 39.9 | 46.5 |
CARAFE | 37.8 | 60.1 | 40.8 | 23.2 | 41.2 | 48.2 |
Mast R-CNN | | | | | | |
Nearest | 32.7 | 55.0 | 34.8 | 17.7 | 35.9 | 44.4 |
Bilinear | 34.2 | 55.9 | 36.4 | 18.5 | 37.5 | 46.2 |
Deconv | 34.2 | 55.5 | 36.3 | 17.6 | 37.8 | 46.7 |
CARAFE | 34.7 | 56.2 | 37.1 | 18.2 | 37.9 | 47.5 |
Table 4.
Table showing anchor assignment. When N = 12, it is divided into 4 detection layers, and the original 9 anchors are changed to 12 anchors.
Table 4.
Table showing anchor assignment. When N = 12, it is divided into 4 detection layers, and the original 9 anchors are changed to 12 anchors.
Feature Map Size | 160 × 160 | 80 × 80 | 40 × 40 | 20 × 20 |
---|
Receptive field size | small | medium | large | larger |
Anchor frame size N = 12 | [5, 6] | [10, 13] | [30, 61] | [116, 90] |
[8, 14] | [16, 30] | [62, 45] | [156, 198] |
[15, 11] | [33, 23] | [59, 119] | [373, 326] |
Table 5.
Contrast of different attentions. The partial quotation of the data in Zhang’s article reflects the superiority of SA.
Table 5.
Contrast of different attentions. The partial quotation of the data in Zhang’s article reflects the superiority of SA.
Attention Methods | Backbones | Param | GFLOPs |
---|
SE-Net [26] | ResNet-50 | 28.088 M | 4.130 |
CBAM [27] | 28.090 M | 4.139 |
ECA-Net [28] | 25.557 M | 4.127 |
SA-Net [25] | 25.557 M | 4.125 |
Table 6.
This is a table of initialization Parameter settings. Batch size (bs) is set to 32, epoch to 300, initial learning rate (lr0) to 0.01, final OneCycleLR learning rate (lrf) to 0.2, SGD momentum (momentum) to 0.937, and optimizer weight decay (weight_decay) to 0.0005 in the training model.
Table 6.
This is a table of initialization Parameter settings. Batch size (bs) is set to 32, epoch to 300, initial learning rate (lr0) to 0.01, final OneCycleLR learning rate (lrf) to 0.2, SGD momentum (momentum) to 0.937, and optimizer weight decay (weight_decay) to 0.0005 in the training model.
bs | Epoch | lr0 | lrf | Momentum | Weight_Decay |
---|
32 | 300 | 0.01 | 0.2 | 0.937 | 0.0005 |
Table 7.
This is a table of research on CARAFE Parameters. The CARAFE upsampling was replaced under YOLOv5s, and test experiments were performed for
and under different Parameters.
Table 7.
This is a table of research on CARAFE Parameters. The CARAFE upsampling was replaced under YOLOv5s, and test experiments were performed for
and under different Parameters.
| | | | | Param | GFLOPs |
---|
1 | 3 | 90.9% | 80.2% | 83.8% | 7,102,173 | 16.4 |
1 | 5 | 92.3% | 79.5% | 84.1% | 7,110,493 | 16.4 |
1 | 7 | 91.7% | 79.9% | 83.4% | 7,122,973 | 16.4 |
3 | 3 | 92.5% | 79.2% | 83.9% | 7,139,037 | 16.5 |
3 | 5 | 93% | 78.8% | 83.5% | 7,212,893 | 16.6 |
3 | 7 | 90.3% | 80.2% | 83.4% | 7,323,677 | 16.8 |
5 | 5 | 91.2% | 79.8% | 83.2% | 7,417,693 | 17.0 |
5 | 7 | 92.8% | 78.9% | 83.7% | 7,725,085 | 17.6 |
7 | 7 | 92.2% | 79.8% | 83.7% | 8,327,197 | 18.8 |
Table 8.
This is a table of contrasts between attentional mechanisms. The improved YOLOv5s model is compared with five attentions to verify the superiority of SA.
Table 8.
This is a table of contrasts between attentional mechanisms. The improved YOLOv5s model is compared with five attentions to verify the superiority of SA.
Attention Methods | | | | Param | GFLOPs |
---|
SE-Net [26] | 88.8% | 79.5% | 84.8% | 6,680,552 | 18.9 |
CBAM [27] | 88.4% | 79.8% | 84.0% | 6,680,650 | 18.9 |
ECA-Net [28] | 88.3% | 80.0% | 84.2% | 6,648,045 | 19.0 |
SA-Net [25] | 89.7% | 80.1% | 85.1% | 6,647,976 | 18.9 |
Table 9.
This is a comparison table of algorithm experiments. The improved algorithm in this paper has 0.9% higher average precision than YOLOv3, 8% higher than YOLOv3-tiny, 11.1% higher than YOLOv4-tiny, and 1.7% higher than YOLOv5s, respectively.
Table 9.
This is a comparison table of algorithm experiments. The improved algorithm in this paper has 0.9% higher average precision than YOLOv3, 8% higher than YOLOv3-tiny, 11.1% higher than YOLOv4-tiny, and 1.7% higher than YOLOv5s, respectively.
Model | MB | | | | Param | GFLOPs |
---|
YOLOv3 | 123.5 | 90.1% | 81.1% | 84.2% | 61,535,125 | 154.7 |
YOLOv3-tiny | 17.4 | 79.3% | 76.0% | 77.1% | 8,682,862 | 12.9 |
YOLOv4-tiny | 23.6 | 57.2% | 77.1% | 74.0% | 5,890,286 | 16.2 |
YOLOv5s | 14.4 | 93.7% | 78.5% | 83.4% | 7,072,789 | 16.3 |
CAGSA-YOLO | 13.8 | 89.7% | 80.1% | 85.1% | 6,647,976 | 18.9 |
Table 10.
Comparison table of ablation experiments. Step by step, the ideas for improvement are shown to verify whether the method for improvement is correct and whether the model is effectively improved.
Table 10.
Comparison table of ablation experiments. Step by step, the ideas for improvement are shown to verify whether the method for improvement is correct and whether the model is effectively improved.
Model | MB | | | | Param | GFLOPs |
---|
YOLOv5s | 14.4 | 93.7% | 78.5% | 83.4% | 7,072,789 | 16.3 |
YOLOv5s + CARAFE | 14.5 | 92.3% | 79.5% | 84.1% | 7,110,493 | 16.4 |
YOLOv5s + CARAFE + scale | 15.0 | 89.7% | 79.7% | 84.5% | 7,265,704 | 19.3 |
YOLOv5s + CARAFE + scale + Ghost | 13.8 | 87.7% | 81.4% | 84.5% | 6,647,784 | 18.9 |
CAGSA-YOLO (YOLOv5s + CARAFE + scale + Ghost + SA) | 13.8 | 89.7% | 80.1% | 85.1% | 6,647,976 | 18.9 |