Egyptian Shabtis Identification by Means of Deep Neural Networks and Semantic Integration with Europeana
Round 1
Reviewer 1 Report
The authors have applied deep neural networks to the classification of shabtis – the ancient Egyptian figurines. The trained neural network was applied in the complete online system integrated with semantic metadata aggregator Europeana, which allows linking the classified shabtis with the examples of figurines from museums. The authors reviewed the related research works and proposed their solution based on neural networks. The realized online system based on ontologies as well as the applied neural networks are presented with all the details. Also, the experimental data are presented and analyzed. However, it is the weakest part of the paper – please see below the proposition of improving that part.
The authors should address the following issues and improve the paper:
1. The analysis of the neural network learning process and classification results is not sufficient. A more detailed analysis of the results using values of selected metrics like accuracy, precision, recall, f1-score, ROC curve, the precision-recall curve should be provided.
2. The English language should be improved – there are some grammar and style errors.
Author Response
Dear reviewer,
Thank you for your comments. We think they have helped us to significantly improve the article. Your comments about providing a more detailed analysis of the neural networks were very successful. We have included the precision-recall of the three models: FN, HN and FN + HN. The ROC curve has not been possible because our problem is a multi-classification system and we should have displayed 158 graphs, one for class. We have also computed the Average Precision. Our text:
“A Precision-Recall (PR) curve has been obtained to evaluate the FN, HN and FN + HN models. The Average Precision (AP) has been computed by integrating the curve, therefore computing the Area Under the Curve (AUC). Figure 13a shows the PR curve for the FN network with an AP = 0.73. Figure 13b shows the PR curve for the HN network with an AP = 0.61. Figure 13c shows the PR curve for the FN + HN model with an AP = 0.79. As previously explained, the combination of both models, FN and HN, improves the model, since the system is able to detect the shabtis either by their shape or by their name.”
In addition, we have included many more changes in the text to improve it. Regarding the English, our paper has been checked by a native speaker and we have modified some words or sentences that were wrong, and also some styles.
Thank you again,
Best regards,
The authors.
Reviewer 2 Report
The manuscript presents a methodology to recognize and classify Egyptian Shabtis by means of Convolutional Neural Networks (CNN). Two different YOLO v3 are used to detect the figure itself and the hieroglypic names including comparison with previous methods.
The manuscript is well written, and the results are discussed in detail. The contribution could be a useful addition. In the opinion of this reviewer, it is recommended for publication considering the following comments.
- The names have been manually labeled to read the hieroglyphs. How the introduced CNN can be modified to read hieroglyphic language. please explain and discuss.
- what are the configuration features of the developed YOLO, i.e. number of layers, the connection types, etc. and how this structure/architecture was designed. What is the effect of the structure/architecture on the accuracy and the computation burden? Please discuss.
-The size of the database of the FN is 1,111 images for 151 different shabti owners. While the HN has been developed based on 201 names for 60 different shabtis. How these differences in the database can produce robust prediction methodology. Could the authors comment.
- It is recommended to plot the curves of receiver operating characteristic (ROC) to evaluate the performance of the CNN.
- It is worth referring to recent applications of DNN in:
1- Computers materials & continua (2019) Vol.59, No.1, pp. 79-87, DOI:10.32604/cmc.2019.05882.
2- Finite Elements in Analysis and Design (2019) 165: 21-30.
Author Response
Dear reviewer,
Thank you for your comments. We think they have helped us to significantly improve the article. We hope we could answer all of your comments. Our changes have been:
“- The names have been manually labeled to read the hieroglyphs. How the introduced CNN can be modified to read hieroglyphic language. please explain and discuss.”
We have included: “The names found on two shabtis of the same person tend to be very similar, either because they were created through a mold, or because they were written in the same way, using the same hieroglyphics in the same position. In some cases, the name may appear in a different form or in different material on several groups of shabtis from the same tomb. However, the different cases of names have been included during the training.”
- what are the configuration features of the developed YOLO, i.e. number of layers, the connection types, etc. and how this structure/architecture was designed. What is the effect of the structure/architecture on the accuracy and the computation burden? Please discuss.
We have included: “Some parameters established in both networks (FN and HN) have been: batch=64, subdivisions=16, width=416, height=416, channels=3, momentum=0.9, decay=0.0005, angle=0, saturation = 1.5, exposure = 1.5, hue=.1, learning rate=0.001. YOLOv3 is composed of 75 convolutional layers, 23 shortcut layers after CNN-layers to propagate gradients further and allow efficient training, 4 route layers to merge precedent layers into one layer, 2 up-sample layers used for deconvolution and 3 Yolo layers responsible for calculating the loss at three different scales.”
In conclusions:
“The CNN used can work with up to 1, 000 classes offering good results. However, in our experience, their behavior worsens when the number of classes is too high. For this reason, if our system grows much further, different networks could be used for groups of classes, so that the winner among different networks would be the one that offers more confidence, similar to what is done between the FN and HN network.”
-The size of the database of the FN is 1,111 images for 151 different shabti owners. While the HN has been developed based on 201 names for 60 different shabtis. How these differences in the database can produce robust prediction methodology. Could the authors comment.
We have included: “The main objective of the HN network is to detect shabtis by name, so even though there are less training data and fewer classes in the network, it allows to filter those classes in a more unequivocal way. However, many times the names are not clear or have not been represented, so the combination with the FN network is necessary. The combination of FN and HN improves the success of both networks, ...”
- It is recommended to plot the curves of receiver operating characteristic (ROC) to evaluate the performance of the CNN.
Your comments about providing a more detailed analysis of the neural networks were very successful. We have included the precision-recall of the three models: FN, HN and FN + HN. The ROC curve has not been possible because our problem is a multi-classification system and we should have displayed 158 graphs, one for class. We have also computed the Average Precision.
“A Precision-Recall (PR) curve has been obtained to evaluate the FN, HN and FN + HN models. The Average Precision (AP) has been computed by integrating the curve, therefore computing the Area Under the Curve (AUC). Figure 13a shows the PR curve for the FN network with an AP = 0.73. Figure 13b shows the PR curve for the HN network with an AP = 0.61. Figure 13c shows the PR curve for the FN + HN model with an AP = 0.79. As previously explained, the combination of both models, FN and HN, improves the model, since the system is able to detect the shabtis either by their shape or by their name.”
- It is worth referring to recent applications of DNN in:
1- Computers materials & continua (2019) Vol.59, No.1, pp. 79-87, DOI:10.32604/cmc.2019.05882.
2- Finite Elements in Analysis and Design (2019) 165: 21-30.
We have included: “DNNs have been used for very different problems, such as recently the material modeling [25,26]. However, CNNs are a concrete type of DNNs that have represented a major advance in image detection and classification over the last few years.”
25. Hamdia, K.M.; Ghasemi, H.; Zhuang, X.; Alajlan, N.; Rabczuk, T. Computational machine learning representation for the flexoelectricity effect in truncated pyramid structures. Computers, Materials & Continua 59 (2019), Nr. 1 2019.
26. Hamdia, K.M.; Ghasemi, H.; Bazi, Y.; AlHichri, H.; Alajlan, N.; Rabczuk, T. A novel deep learning based method for the computational material design of flexoelectric nanostructures with topology optimization. Finite Elements in Analysis and Design 2019, 165, 21–30.
In addition, regarding the English, our paper has been checked by a native speaker and we have modified some words or sentences that were wrong, and also some styles.
Thank you again,
Best regards,
The authors.
Round 2
Reviewer 1 Report
The authors have addressed the most important issues, so in my opinion, the paper can be accepted.