PRIVAFRAME: A Frame-Based Knowledge Graph for Sensitive Personal Data
Round 1
Reviewer 1 Report (Previous Reviewer 1)
This paper discuss identifying categories of personal data in informal English sentences by adopting a new logical-symbolic approach. The paper is well organized with proper structure. Specifically, the technical terms are explained in detail and the topic of the paper is clear and understandable. The presented methodology and the results are clearly communicated. The author did all the comments mentioned in my previous report. The paper can be accepted.
Author Response
Thank you very much for the report and the attention given to our paper.
During this round of minor revisions, we also checked the English language.
Reviewer 2 Report (Previous Reviewer 2)
The article deals with the problem of automatizing detection of sensitive personal information in online texts. It presents PRIVAFRAME, a knowledge graph devoted to sensitive information detection and proposes a hybrid model of detecting sensitive information for further study.
The authors evaluate PRIVAFRAME using custom datasets, explaining why the available datasets are poorly suited to the task. The significance of content is somewhat diminished by the fact that there is no information about performance of other state-of-the-art models in the same dataset in the article to compare, but it still remains a solid contribution.
My main concern at this stage is minor problems in presenting the experimental results that make the article less clear:
1. Table 7 uses unclear term Performance. When dealing with classification tasks, it is good to use well-defined measures like Precision, Recall, and F-measure. Another approach is providing the formula for Performance.
2. Font in Figure 3 is too small; it cannot be read without magnification (when the article's text can be read comfortably)
3. In the important section Error Analysis (starts at the line 449), the facts like "label is missing" are often mentioned. However, this can be ambiguous: it can concern either manually created labels in the datasets (problem with the dataset) or automatically-placed labels (problem with the proposed model or its implementation). It is better to be more clear which kind of label it was.
The quality of English improved after the first draft, but some problems exist in the newly added paragraphs, please check them for Enlish as well.
Author Response
Thank you very much for the report and the suggestions to further improve the paper.
Efforts to address the concerns are resumed by points:
1. "Table 7 uses unclear term Performance. When dealing with classification tasks, it is good to use well-defined measures like Precision, Recall, and F-measure. Another approach is providing the formula for Performance."
The terminology in table 7 has been clarified, indicating that the scores are reported in terms of accuracy (both in the text and the table).
2. "Font in Figure 3 is too small; it cannot be read without magnification (when the article's text can be read comfortably)"
The figure has been replaced with a larger font figure to make it more readable.
3. "In the important section Error Analysis (starts at the line 449), the facts like "label is missing" are often mentioned. However, this can be ambiguous: it can concern either manually created labels in the datasets (problem with the dataset) or automatically-placed labels (problem with the proposed model or its implementation). It is better to be more clear which kind of label it was."
We addressed this problem by introducing the types of errors that can occur (a. FRED failure on frame extraction; b. Failure on compositional frame modeling; c. Complex or non-discriminatory sample sentence structure) and adding to the explanation of each critical category a label for the type of error.
As suggested, we also checked the English language.
Thank you again for the attention given to our paper.
Reviewer 3 Report (Previous Reviewer 3)
SECOND REVIEW
The paper is much improved now. You have addressed all the issues that I raised in my initial view (reproduced below), and as a result, I believe that your paper is now a valuable addition to the body of research and as such is worthy of publication.
Congratulations on revising your submission so thoroughly in such a short time.
I look forward to seeing your paper in press.
INITIAL REVIEW
Overall
This paper describes a method of identifying personal information using a logical-symbolic approach rather than a deep learning approach. The approach was tested on a labeled corpus and experimental results obtained. The abstract does not state what the results were, which I suspect will lead readers to assume that the results were not so good. On reading the whole paper, I now understand why the results are not reported. A new dataset and a new method were used and so there is no way to work out whether these results are close to state-of-the-art since no other experiments are reported.
The research is based on the May 2022 release of PDCs and so it is highly topical. The discussion of the selection of the corpora is logical and easy to follow. However, there are scant details on the method and the results. A reasonable attempt is made at discussing the results. Although I can find some novelty in the approach, there is no way to evaluate the results since no details of other experiments or benchmarks are provided.
Major Issues
1. Abstract
Details of the experimental results should be included in the abstract to encourage readers to look at the full text.
2. Lines 73-74
Your claim is that PRIVAFRAME is substantially different to references [10-13]. However, no detail is included about [10] and there are only sparse details regarding the others. What is not clear is why the PRIVAFRAME approach was selected. Was this supported in the literature? Why is or could this approach be better?
3. Table 1
The width of the table exceeds the printable area. Tables with only three columns, such as Table 1 could easily be adjusted to fit into the printable area.
3. Lines 270-276; 284-290 FrameNet and Frameset
Not all readers will be familiar with Fillmore’s theory, FrameNet and Framester, and given their centrality to the study, adding a couple of sentences to help readers come up to speed would be helpful. WordNet, I believe, is much more widely known and so does not need to be elaborated further.
4. Results section Line 344 - 367
This section also includes methods and perhaps Experiment is a better heading (although I would prefer to read Experiments and see this experiment compared to another). Both the method and results take just over 20 lines. When comparing these 20 lines to the body of the paper, which is 439 lines, it appears that less than 5% of the total word count is dedicated to these sections. I am left with the feeling that the researchers are salami slicing the work and trying to report one experiment in each paper. How can readers judge these results with nothing to compare them against?
5. Method subsection
There are scant details, meaning that readers would find it difficult to reconstruct your experiment. For example, the test set of 3671 sentences is mentioned. This number is not mentioned earlier in the paper and so the reader is left to guess how these were obtained.
6 Experimental results subsection
The numerical results may or may not be good, but it is difficult to assess given the lack of comparative experiments and lack of benchmark.
Minor issues
1. Lines 18-28 Paragraphing
I see no reason to justify the two one-sentence paragraphs and suggest joining the three paragraphs into one.
2. Line 23 “GDPR”
The full form should be given on first usage of an abbreviation.
3. Line 30 “citation style”
There are three citations. They should be listed within the same set of square brackets.
4. Line 58-59 “The first”
I assume that this refers to the literature and so it is necessary to add a citation.
5. Lines 61 amd 67 “uni-grams vs unigrams”
Consistency. I see no need for a hyphen.
6. Lines 74-76
I cannot understand the meaning of the sentence beginning “If these…”
7. Line 258
Errant single bracket needs to be deleted.
Author Response
Thank you very much for the report and the attention given to our paper.
During this round of minor revisions, we also checked the English language.
This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.
Round 1
Reviewer 1 Report
This paper discuss identifying categories of personal data in informal English sentences by adopting a new logical-symbolic approach. The paper is well organized with proper structure. Specifically, the technical terms are explained in detail and the topic of the paper is clear and understandable. The presented methodology and the results are clearly communicated. Here are my comments as follow:
Strengths:
(+) The paper is well-written.
(+) The problem is important.
(+) The problem is well-motivated.
(+) The problem is well-defined.
(+) The references are appropriate; however, some new references need to be added. The majority are old references.
(+) The proposed method is well-explained.
(+) The experiments are convincing, however, some figures that explain and the results need to be added.
(+) The experiments can be replicated.
(+) The figures are appropriate.
Weaknesses:
(-) There are English issues.
(-) The related work section must be enhanced.
==== ENGLISH ====
The pronoun 'we' is used too many times in the paper. Generally, 'we' is appropriate to discuss future work in the conclusion but besides that it should be used sparingly.
==== RELATED WORK ====
The authors should explain clearly what the differences are between the prior work and the solution presented in this paper.
Finally, i believe the paper has a merit.
Reviewer 2 Report
The article is concerned with describing a knowledge graph PRIVAFRAME that extends Framester in order to identify sensitive personal data in natural-language text. This is an important topic, given the current widespread information gathering, processing and sharing. The authors deal with the problem in an original manner, using symbolic reasoning over a frame-based model.
The presented results look original and promising, but it can be more interesting and impactful by concentrating more on its particular contribution - the knowledge graph and its evaluation. In particular, the following might be recommended:
1. The article will gain a lot from at least preliminary analysis and discussion of the accuracy of PDCs detection. What led to some PDCs being identified worse than the others? Is it caused by more various sentence structures, a poorly-defined rule or problems in frame indentification by the underlying software? Also, we don't learn if there were any false-positives. Some discussion on this will be very helpful.
2. The discussion (and/or) conclusions sections can be improved by discussing the possible ways of using the developed knowledge graph.
3. The discussion section contains a proposal of the hybrid model for detecting sensitive personal data, but it doesn't provide a detailed rationale for the proposed model. Why detecting sensitivity should be done with machine learning while detecting PDCs with knowledge graph? A more extensive discussion will improve the impact of the proposition.
The article also has a significant number of grammar errors and hard to understand sentences. (e.g. "To guarantee the subject’s right to privacy and to avoid the leakage of private content, even before the treatment methods of sensitive information (e.g., obfuscation), it is necessary to investigate the automatic identification of such information") It is recommended to use a help of a native-speaking colleague or professional proofreading service to improve English.
The article has a series of minor errors and deficiencies that should be fixed:
1. Footnotes 7 and 8 are missing.
2. Figure 3 is really a table and can be presented as a table.
3. There is no introductory sentence to begin the list of frame resources on line 270. The article jumps from "we have built a frame-based knowledge graph that could satisfy this need." to the list of available frame resources.
4. Lines 297 and 299 mention ApartmentOwned category, but the subsequent example is actually about CarOwned
5. The last sentence in the conclusion section line 439 references the same section (section 7); I cannot find where the mentioned extension to the ontological level is envisaged in this section. It seems to be in the section 6.
6. Table A1 can be improved by adding percentage of correctly detected labels. It is also not that big and can be included directly in the article.
7. Some references lack necessary information (e.g. 24, 25 and 26 don't contain the conference name, publisher, etc.)
8. DOI link for the reference 23 does not work.
Fixing these problem will increase the interest and impact of this article.
Reviewer 3 Report
Overall
This paper describes a method of identifying personal information using a logical-symbolic approach rather than a deep learning approach. The approach was tested on a labeled corpus and experimental results obtained. The abstract does not state what the results were, which I suspect will lead readers to assume that the results were not so good. On reading the whole paper, I now understand why the results are not reported. A new dataset and a new method were used and so there is no way to work out whether these results are close to state-of-the-art since no other experiments are reported.
The research is based on the May 2022 release of PDCs and so it is highly topical. The discussion of the selection of the corpora is logical and easy to follow. However, there are scant details on the method and the results. A reasonable attempt is made at discussing the results. Although I can find some novelty in the approach, there is no way to evaluate the results since no details of other experiments or benchmarks are provided.
Major Issues
1. Abstract
Details of the experimental results should be included in the abstract to encourage readers to look at the full text.
2. Lines 73-74
Your claim is that PRIVAFRAME is substantially different to references [10-13]. However, no detail is included about [10] and there are only sparse details regarding the others. What is not clear is why the PRIVAFRAME approach was selected. Was this supported in the literature? Why is or could this approach be better?
3. Table 1
The width of the table exceeds the printable area. Tables with only three columns, such as Table 1 could easily be adjusted to fit into the printable area.
3. Lines 270-276; 284-290 FrameNet and Frameset
Not all readers will be familiar with Fillmore’s theory, FrameNet and Framester, and given their centrality to the study, adding a couple of sentences to help readers come up to speed would be helpful. WordNet, I believe, is much more widely known and so does not need to be elaborated further.
4. Results section Line 344 - 367
This section also includes methods and perhaps Experiment is a better heading (although I would prefer to read Experiments and see this experiment compared to another). Both the method and results take just over 20 lines. When comparing these 20 lines to the body of the paper, which is 439 lines, it appears that less than 5% of the total word count is dedicated to these sections. I am left with the feeling that the researchers are salami slicing the work and trying to report one experiment in each paper. How can readers judge these results with nothing to compare them against?
5. Method subsection
There are scant details, meaning that readers would find it difficult to reconstruct your experiment. For example, the test set of 3671 sentences is mentioned. This number is not mentioned earlier in the paper and so the reader is left to guess how these were obtained.
6 Experimental results subsection
The numerical results may or may not be good, but it is difficult to assess given the lack of comparative experiments and lack of benchmark.
Minor issues
1. Lines 18-28 Paragraphing
I see no reason to justify the two one-sentence paragraphs and suggest joining the three paragraphs into one.
2. Line 23 “GDPR”
The full form should be given on first usage of an abbreviation.
3. Line 30 “citation style”
There are three citations. They should be listed within the same set of square brackets.
4. Line 58-59 “The first”
I assume that this refers to the literature and so it is necessary to add a citation.
5. Lines 61 amd 67 “uni-grams vs unigrams”
Consistency. I see no need for a hyphen.
6. Lines 74-76
I cannot understand the meaning of the sentence beginning “If these…”
7. Line 258
Errant single bracket needs to be deleted.
Reviewer 4 Report
Authors presented a frame-based knowledge graph built for personal data categories, which is useful for privacy protection. The mehtod sounds like novle, but the authors did not articulate the implmentation process of propossed framework. The result of experiments does not strongly support the conclusion stated at manuscript, especially withoug comparison with the other methods. The manuscript can be improved through scientifically writing. I would like to encourage authors to make further improvment to meet the publishing standard of the BDCC journal. Some examples are provided for authors reference as below.
Line 11: The knowledge graph been tested on a sensitive labeled corpus and we obtain the first experimental results.
Is the the first test result on this corpus or the performance of experimental is the first grade?
Line 13: can be very interesting if combined with neural networks approaches.
This is not a scientific statement without evidence support.
Line 18: In virtual environments and online conversations, we are used to sharing personal information.
Please use objective tones to articulate the observations.