Fire Segmentation with an Optimized Weighted Image Fusion Method
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThis work proposes an optimized weighted Image Fusion method for fire segmentation.
This research direction is very meaningful, and using visible light and infrared images to identify fire targets is a reasonable research approach. In real-world scenarios, these two modalities often work together to detect fires (such as with drone cameras).
1. The capitalization in the title should be consistent to avoid looking random.
2. Figures should preferably use vector graphics, such as Figures 1 and 3, as some variables are not clear.
3. In real-world scenarios, visible light and infrared images require matching and alignment. How is this issue addressed?
4. The writing in the article needs improvement, as there are numerous spelling errors, such as "vsible" in Figure 3.
5. The contributions of this work should be summarized to facilitate reader understanding.
6. Details of the dataset should be presented.
7. It is recommended to supplement the results related to model scale, such as FLOPs and inference time.
8. While combining the two types of images is beneficial, how does your proposed model architecture demonstrate superiority? Are there similar baselines that can be compared?
9. Some highly relevant papers on fire recognition are missing, including some dual-frame methods. It is recommended to cite these papers:
[1] QuasiVSD: efficient dual-frame smoke detection
[2] EFFNet: Enhanced Feature Foreground Network for Video Smoke Source Prediction and Detection
[3] CNN-Transformer Hybrid Architecture for Early Fire Detection
Comments on the Quality of English LanguageThe presentation needs to be improved.
Author Response
Response to Reviewer 1 Comments
We are very grateful to the reviewer 1 for his comments and we provide the following responses:
Point 1:
The capitalization in the title should be consistent to avoid looking random.
Response 1
Correction is done:
Fire Segmentation with an Optimized Weighted Image Fusion Method
Point 2:
Figures should preferably use vector graphics, such as Figures 1 and 3, as some variables are not clear.
Response 2:
Correction is done. We have used the .emf file format
Point 3:
In real-world scenarios, visible light and infrared images require matching and alignment. How is this issue addressed?
Response 3:
For the used dataset, the multi-model images alignment is based on homography matrix transform.
Point 4:
The writing in the article needs improvement, as there are numerous spelling errors, such as "vsible" in Figure 3.
Response 3:
Correction is done .
Point 5:
The contributions of this work should be summarized to facilitate reader understanding.
Response 5:
We have summarized many parts as abstract, introduction, section 4.4 (Table 7).
Point 6:
Details of the dataset should be presented.
Response 6:
The dataset details are thoroughly covered in section 2.4, titled "Data Presentation." In this section, we have included comprehensive information about the dataset, including its sources, structure, and relevant characteristics.
Point 7:
It is recommended to supplement the results related to model scale, such as FLOPs and inference time.
Response 7:
The inference time for a single image was 100 ms, indicating a relatively quick processing capability suitable for real-time applications.
Point 8:
While combining the two types of images is beneficial, how does your proposed model architecture demonstrate superiority? Are there similar baselines that can be compared?
Response 8:
The proposed model architecture demonstrates clear superiority through significant improvements in segmentation performance metrics compared to established baselines. Specifically, it achieves higher IoU (94.52%), accuracy (99.84%), precision (96.62%), specificity (99.88%), recall (97.44%), and F1 score (97.09%). These enhancements result from combining visible and IR images with LatLRR after image optimization weighting. The baselines include "Segmentation of visible image only," "Segmentation of IR images only," and "Segmentation of fused images with classical LatLRR." The model's superior performance across these metrics confirms its effectiveness and reliability in accurate image segmentation tasks, making it a robust choice for advanced applications.
Point 9:
Some highly relevant papers on fire recognition are missing, including some dual-frame methods. It is recommended to cite these papers:
[1] QuasiVSD: efficient dual-frame smoke detection
[2] EFFNet: Enhanced Feature Foreground Network for Video Smoke Source Prediction and Detection
[3] CNN-Transformer Hybrid Architecture for Early Fire Detection
Response 8:
All the proposed references were included:
[50] Cao, Yichao & Tang, Qingfei & Xu, Shaosheng & Li, Fan & Lu, Xiaobo. (2022). QuasiVSD: efficient dual-frame smoke detection. Neural Computing and Applications. 34. 10.1007/s00521-021-06606-2.
[51] Y. Cao, Q. Tang, X. Wu and X. Lu, "EFFNet: Enhanced Feature Foreground Network for Video Smoke Source Prediction and Detection," in IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 4, pp. 1820-1833, April 2022, doi: 10.1109/TCSVT.2021.3083112.
[52] Yang, C.; Pan, Y.; Cao, Y.; Lu, X. CNN-Transformer Hybrid Architecture for Early Fire Detection. In Proceedings of the Artificial Neural Networks and Machine Learning—ICANN 2022: 31st International Conference on Artificial Neural Networks, Bristol,UK, 6–9 September 2022; Part IV; Springer: Berlin, Germany, 2022; pp. 570–581.
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThe paper topic is well synchronized with the current weather in particular, and with climatic changes in general, so its contribution is up-to-date and valuable.
I have minor technical comments. Overall, the authors should read the paper again with great care, and remove all the technical ambiguities that could distract the reader.
For example, (the list is not complete, it is up to the authors to polish the final version of the paper and not the reviewer).
1. Figure 3 is not mentioned in the text.
2. Figure 4 shows a great distinction between the SCD with and without the optimization for images from 400 to 500 (approximate numbers). A comment including examples of images with almost no distinction between the SCDs (e.g. images from 150 to 200) and images with great distinction should be included.
3. Ambiguities with LRR abbreviations:
Line 101: LaLRR appears without the definition.
Line 105: ”Latent low-rank representation (LRR)” is defined.
Line 118: LatLRR appears without the definition.
Line 143: ”Latent Low-Rank Representation (LatLRR)” is defined with capital letters.
Line 184: LLRR appears without the definition.
Line 192: ”Latent Low-Rank Representation (LLRR)” is defined.
4. Line 282: a “dot” (“full-stop”) is used for a multiplication sign, which is improper. It also appears elsewhere (Table 7 for example) so check the paper for more.
5. Check the capital letters, for example in line 270: “Bright daylight … Low-light … night”. Why a small letter in “night” or why capital letter in “Low”?
6. “Et al” should be “Et al.”
Check for more similar technical issues.
Author Response
Response to Reviewer 2 Comments
We are very grateful to the reviewer 2 for his comments and we provide the following responses.
Point 1:
Figure 3 is not mentioned in the text.
Response 1:
Correction is done. Line 177-178 “In their work, Hui et al. [14] introduced a straightforward yet robust image fusion technique based on LatLRR (see Figure 3).
Point 2:
Figure 4 shows a great distinction between the SCD with and without the optimization for images from 400 to 500 (approximate numbers). A comment including examples of images with almost no distinction between the SCDs (e.g. images from 150 to 200) and images with great distinction should be included.
Response 2:
As given by tables 2,3 and 8 we have performed experiments under different lighting conditions and with flames of varying sizes.
Point 3:
Ambiguities with LRR abbreviations:
Line 101: LaLRR appears without the definition.
Line 105: ”Latent low-rank representation (LRR)” is defined.
Line 118: LatLRR appears without the definition.
Line 143: ”Latent Low-Rank Representation (LatLRR)” is defined with capital letters.
Line 184: LLRR appears without the definition.
Line 192: ”Latent Low-Rank Representation (LLRR)” is defined.
Response 3:
All Corrections are done refer to comments given above.
Point 4:
Line 282: a “dot” (“full-stop”) is used for a multiplication sign, which is improper. It also appears elsewhere (Table 7 for example) so check the paper for more.
Response 4:
Correction is done.
Point 5:
Check the capital letters, for example in line 270: “Bright daylight … Low-light … night”. Why a small letter in “night” or why capital letter in “Low”?
Response 5:
Correction is done. Lines 279-280.
Point 6:
“Et al” should be “Et al.”
Response 6:
Correction is done.
Point 7:
Check for more similar technical issues.
Response 7:
Correction is done.
Figure 4 content and mathematical formulas (Tables 1,7) was checked, and we have introduced many rectifications.
Author Response File: Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsTable 1 and table 7 is too descriptive so it should be simplified or converted into information within the text. Citation 50 is missing.
Author Response
Response to Reviewer 3 Comments
We are very grateful to the reviewer 3 for his comments and we provide the following responses:
Point 1:
Table 1 and table 7 is too descriptive so it should be simplified or converted into information within the text.
Response 1
Because the large variety of variables and parameters of the used fusion evaluation criteria, aiming more readability, we have summarized all details in Table1.
Table7: Additional details have been removed. Only the formulas have been maintained
Point 2:
Citation 50 is missing.
Response 2:
Correction is done.
Author Response File: Author Response.pdf