The Faceted and Exploratory Search for Test Knowledge
Round 1
Reviewer 1 Report
Using KGs to integrate heterogeneous test cases seems like a relevant and useful aim, although I am not a test researcher. However, I think this presentation of your work has serious flaws.
The paper builds on the authors' previous works, but it is not clear exactly how and by how much it extends earlier work. I have trouble with this sentence: "Suc6,7h an approach was first developed by Rasche [6, 7] and is extended here to the latest version of Generic SCXML" 1. It comes way too late. An admission like this belongs in the introduction. 2. The delta over [6, 7] is too unspecific, and perhaps too small. 3. You mention only Rasche, but authors of the present paper also co-authored [6, 7]. So just "Rasche" is wrong. "Rasche et al." would be better but still misleading. You need to write something like "In previous work [6, 7], the authors have ...". (Also, one of the authors' name is misspelled in the ref list.)
The paper should explain its ultimate goal better - how will script integration into a KG improve life for testers, developers, and other stakeholders? This is the technical goal you address: "The goal is the storage of test knowledge in a knowledge graph and the subsequent provision of the faceted and explorative search in the supply chain's test processes." But I would like to see more explanation of why and how it is important useful.
The abstract promises "translation capabilities", but only offers translations from scripts to state models, not script-to-script translations.
This is essentially a technical "show-and-tell" paper, but there is too much "tell" and too little "show". Cut down the long-winded and explanations and demonstrate your approach with a running example instead. Your evaluation can be the start of such an example, but it needs to be elaborated.
The evaluation is limited to a demonstrative example and does not touch all sides of the proposal (the querying). It does not go into detail of the generated KG.
There are many unclear or wrong sentences and claims. Examples:
- "into tools—for example, by Wikipedia and DBPedia": why Wikipedia here?
- "SUT", "ISTQB": spell it out when first used
- "labelled edges define the relations between states": not only states
- "ontologies: a combination of property graphs and heterogeneous graphs": I do not accept this explanation/definition of ontologies
- "The intended usage of knowledge graphs can be, for example, automate creating the monitor's definition and the stimuli's that currently are specified manually by reading the System Requirement Document (SRD), developing a basic understanding of the system and then, it takes several failed test runs of stimulating the SUT": unclear
- "Table 1": where do these claims about general programming languages come from? Need clear ref. But better to cut if not needed later in the paper.
- "integrate test cases from the integrator on the supplier's test bench,": unclear what you mean
- "A test process-specific statement is a language feature": a statement is not a feature
- "Summary In general, a knowledge graph is suitable to represent knowledge for a domain.": this cannot be a summary, because 2.4 says (almost) nothing about KGs
Author Response
Dear reviewer,
Thank you for the review and for pointing out the possible improvements. I have revised the respective chapters according to your instructions.
In the following, I give a detailed point-by-point response:
- I have trouble with this sentence: "Suc6,7h an approach was first developed by Rasche [6, 7] and is extended here to the latest version of Generic SCXML" 1. It comes way too late. An admission like this belongs in the introduction.
- Response: I updated introduction and step 1
- The delta over [6, 7] is too unspecific, and perhaps too small. 3. You mention only Rasche, but authors of the present paper also co-authored [6, 7]. So just "Rasche" is wrong. "Rasche et al." would be better but still misleading. You need to write something like "In previous work [6, 7], the authors have ...". (Also, one of the authors' name is misspelled in the ref list.)
- Response: I updated the introduction and step 1
- The paper should explain its ultimate goal better - how will script integration into a KG improve life for testers, developers, and other stakeholders? This is the technical goal you address: "The goal is the storage of test knowledge in a knowledge graph and the subsequent provision of the faceted and explorative search in the supply chain's test processes." But I would like to see more explanation of why and how it is important useful.
- Response: I updated the abstract, introduction and example
- The abstract promises "translation capabilities", but only offers translations from scripts to state models, not script-to-script translations.
- Response: I updated the abstract
- This is essentially a technical "show-and-tell" paper, but there is too much "tell" and too little "show". Cut down the long-winded and explanations and demonstrate your approach with a running example instead. Your evaluation can be the start of such an example, but it needs to be elaborated.
- Response: I extended the section Evaluation and give a concrete example
- The evaluation is limited to a demonstrative example and does not touch all sides of the proposal (the querying). It does not go into detail of the generated KG.
- Response: I extended the section Evaluation and give a concrete example
- There are many unclear or wrong sentences and claims. Examples:
- "into tools—for example, by Wikipedia and DBPedia": why Wikipedia here?
- Response: I updated the paragraph
- "SUT", "ISTQB": spell it out when first used
- Response: I updated it
- - "labelled edges define the relations between states": not only states
- Response: I updated 2.2
- "ontologies: a combination of property graphs and heterogeneous graphs": I do not accept this explanation/definition of ontologies
- Response: I updated 2.2
- "The intended usage of knowledge graphs can be, for example, automate creating the monitor's definition and the stimuli's that currently are specified manually by reading the System Requirement Document (SRD), developing a basic understanding of the system and then, it takes several failed test runs of stimulating the SUT": unclear
- Response: I introduced ramp as term and updated the paragraph.
- "Table 1": where do these claims about general programming languages come from? Need clear ref. But better to cut if not needed later in the paper.
- Response: I deleted it
- "integrate test cases from the integrator on the supplier's test bench,": unclear what you mean
- Response: I removed it. The reader doesn’t need to know this detail of test case execution.
- "A test process-specific statement is a language feature": a statement is not a feature
- Response: Cannot find the sentence. Probable, I updated the sentence with the previous
- "Summary In general, a knowledge graph is suitable to represent knowledge for a domain.": this cannot be a summary, because 2.4 says (almost) nothing about KGs
- Response: Add the link to the ontologies
- "into tools—for example, by Wikipedia and DBPedia": why Wikipedia here?
Reviewer 2 Report
The paper describes a method for transposing test processes into kg entities, in order to allow the search and exploration of similar patterns.
I found the text quite hard to read, mostly because of the following elements:
- there are never-expanded acronyms, i.e. SUT, TA (figure 8), HIL
- there should be a clear definition of statement, state, stimulus, block, ramp
- some notation is chaotically repeated (e.g. Z, P), without showing some examples of possible values
- the logic description of the graph should be more solid, determining what can be a node, and what an edge, without the found ambiguity and imprecision, reported later in the detailed comments.
Actually, the description of the graph is the core contribution of the graph and needs to be better explained to make this paper in condition to be judged.
In addition, I have the feeling that the authors are considering as given some domain-specific information, which is not necessarily known by the reader. For example, why in Figure 1 and in the text we are interested in provoking a failure? Should instead the test try to see if the behavior is correct (success) or wrong (failure)? Moreover, quite more examples (with real data) would help understanding, e.g. for the SCXML statement on page 11 or about some valid graph entries (in particular about P and N values)
Even if the cited references [12] is dating back to 1998 (and for sure there are more recent and updated works to cite on the topic), I believe that the definition of "ontology" reported in the referenced paper (as "constituted by a specific vocabulary used to describe a certain reality") is still valid. However, the paper seems to use the word "ontology" in place of "graph organized according to an ontology". This is the case of sentences like "Most graphs of the Semantic web are ontologies". These mentions should be corrected in order to respect the definition in [12].
The evaluation looks promising, but also here the process is unclear. How is this similarity computed? Where the evaluation of the first step has been done (see line 560)?
The reference section should be improved by including missing authors and years (i.e. [13], [18]) and respecting the spelling (DBpedia in [14]). Citations [20] and [21] should be corrected: they are possibly duplicated, "Scribd" is probably not the author, there is no year, probably a better reference can be provided.
Detailed comments:
- Sec 1. "ontologies [...] apply schema.org". I am not sure that the authors were actually meaning that, but ontologies are not applications of schema.org (at least not in general)
- Sec 1. "KGs were developed and integrated into tools -- for example by Wikipedia and DBPedia". I consider this sentence imprecise, given that it is mixing Wikipedia (tool ?), which indeed integrated WikiData, and DBpedia (check spelling), which is a proper kg.
- Sec 1. "comparable capabilities to the Semantic Web". Which capabilities are you referring to exactly? I can indeed name a few SemWeb capabilities which are not covered by your approach, namely interlinking with external data, being both machine- and human-readable, ...
- Sec 2.1 "The graph here". Which graph?
- Where are sections 2.2 and 2.3?
- Sec. 2.4.1. ISTQB and IEEE 829 should be properly referenced (citation or footnote link)
- Figure 2 is not readable when printed in black&white (and blocks are enough big to be possible to integrate the legend into them, instead of only using colors)
- Sec 2.4.2. Usually, we speak about Statically- / Dynamically- typed languages or Statical / Dynamic type checking. It should be more clear that the Static/Dynamic division is about the check on the types (rather the full program)
- Figure 4 is hard to understand. Why the shape of a "V"? Why "System Verification" is repeated twice? Again, this is not black&white-friendly
- Figure 5 is again hard to understand. What are the 3 circles (in particular the middle one)? The caption is not helping at all
- Sec 2.4.5 should instead be 2.5 (also because of the absence of 2.4.3 and 2.4.4)
- Sec 3. "Provide test relevant serviceS" (missing "S")
- Figure 6 has a magnifier that covers the text. Moreover, what is it pointing to? The arrow between the 2 upper blocks? How?
- Figure 7 "ModelL" (extra "L")
- Sec 3.1 ends with ".." (extra full stop)
- Figure 9 has quite a hard notation, not explained in the caption. What is each block? Why is the statement (i guess) "logic_p0_c0_s0" in repeated both states (again, I guess) of Design Pattern 1? What is that triangle warning? What is fsm-editor?
- Sec 3.2 "Design pattern 1 is that only A signal is stimulated with A new value". Those A should be replaced with ONE, otherwise, I do not understand the sentence.
- Sec 3.2. "The grammar shows that ...". I actually do not see the role of XML element tree and XML attribute ID in the grammar. To which do they correspond?
- Sec 3.2. "The paper aims to create..." From this point onward, the topic of the discourse changed from <parsing/manipulating the XML> (as in the last 2 pages) to <creating a graph> (as in the next 3 pages). Please consider splitting this into 2 separate paragraphs
- Sec 3.2. "basic requirements shown in Figure 2". Figure 2 does not contain any requirements (neither does Figure 1).
- Sec 3.2. "The usage of this label is restricted". To what? How?
- Sec 3.2. It is not clear if a node v denotes a statement OR a state, or if the state itself is considered a statement. I believe that the first option is the correct one. Therefore, corrections should be done in line 403 ("i.e." => "or") and in the rest of the text, in which the 2 concepts are confused and should probably be better referred as just "nodes"
- Sec 3.2. Some sentences have to be read as the process is happening now. I recommend getting rid of the time dependencies at line 405 ("current" => "same") and 410 ("immediately" => "simultaneously"). The suggested corrections make sure that
- "verifytolerance" contains several different typos in the different mentions in the text
- The equation id progression (e.g. (1), (4)) is skipping numbers and restarting at a certain point
- Sec 3.2. What is the name of a statement? Can you give some examples?
- Equation (7) (and following tables/text) is confusing V (set of vertices) with v_i (a single vertex in V).
- Table 3 includes vertex as {'any_label', P}, while in the text a vertex should be "an ORDERED pair (p_i, l_i)". The order is wrong, P should be p_i. Not clear how P is valued.
- Table 4. The labels (e.g. `simple`, `parallel_statement`, ...) should be uppercase as in (4). It is not clear if the `blocking_statement` edges are anyhow instantiated. In the last row, the tags are opening and closing in a not XML-valid way (e.g. xsd:complexType) and there is a pending "<". I believe that the understanding of the process would largely benefit from examples (with values) for each raw (also for Table 3). Actually, Table 3 and 4 can be moved to an appendix, for having more space
- Figure 10. Why do some nodes have no label? Where are P1, P2, and Z represented? I think that a graph showing all contributions is needed here
- "children of" => "linked to". The graph is not representing nodes in a hierarchy
- Sec 3.3. "For him, a stimulus is not a node in the graph". It is not clear if for us (authors + reader) it is. Probably a better definition of stimulus may help.
- Figures 11 and 12. What are those snippets doing? Can you provide natural language versions of these querying? Are these instructions supposed to be parallel or blocking? Are those ":" wrong? Really there was not a better way to include them in the paper that a screenshot of the text editor?
- Sec 4.3. "fewer" => "lower"
- Sec. 5.4. Excel sheets and natural language CAN be imported in KGs. Simply, they can't with the described method
- Sec 5.4. To which copyright are you referring? I guess every company owns its own data (the only ones it is interested to work on), so I am a bit confused
Author Response
Dear reviewer,
Thank you a lot for the excellent review and for pointing out the needed improvements. I have revised the respective chapters according to your instructions.
In the following, I give a detailed point-by-point response:
- I found the text quite hard to read, mostly because of the following elements:
- there are never-expanded acronyms, i.e. SUT, TA (figure 8), HIL
- Response: Introduced acronyms over the complete document and figure 8
- there should be a clear definition of Statement, state, stimulus, block, ramp
- Response: Added in Section 2 as background
- some notation is chaotically repeated (e.g. Z, P), without showing some examples of possible values
- Response: Reworked the graph definition and gave a more detailed example.
- the logic description of the graph should be more solid, determining what can be a node, and what an edge, without the found ambiguity and imprecision, reported later in the detailed comments.
- Response: Reworked the graph definition and gave a more detailed example.
- there are never-expanded acronyms, i.e. SUT, TA (figure 8), HIL
- Actually, the description of the graph is the core contribution of the graph and needs to be better explained to make this paper in condition to be judged.
- Response: Reworked the graph definition and gave a more detailed example
- In addition, I have the feeling that the authors are considering as given some domain-specific information, which is not necessarily known by the reader. For example, why in Figure 1 and in the text we are interested in provoking a failure? Should instead the test try to see if the behavior is correct (success) or wrong (failure)? Moreover, quite more examples (with real data) would help understanding, e.g. for the SCXML statement on page 11 or about some valid graph entries (in particular about P and N values)
- Response: Added a clarification and examples in the graph definition and in the evaluation
- Even if the cited references [12] is dating back to 1998 (and for sure there are more recent and updated works to cite on the topic), I believe that the definition of "ontology" reported in the referenced paper (as "constituted by a specific vocabulary used to describe a certain reality") is still valid. However, the paper seems to use the word "ontology" in place of "graph organized according to an ontology". This is the case of sentences like "Most graphs of the Semantic web are ontologies". These mentions should be corrected in order to respect the definition in [12].
Improved the paragraph
- Response: Updated the introduction
- The evaluation looks promising, but also here the process is unclear. How is this similarity computed? Where the evaluation of the first step has been done (see line 560)?
- Response: Added a definition of similarity and mentioned AGILE_VT as reference
- The reference section should be improved by including missing authors and years (i.e. [13], [18]) and respecting the spelling (DBpedia in [14]). Citations [20] and [21] should be corrected: they are possibly duplicated, "Scribd" is probably not the author, there is no year, probably a better reference can be provided.
- Response: I improved the reference section
- Detailed comments:
- Sec 1. "ontologies [...] apply schema.org". I am not sure that the authors were actually meaning that, but ontologies are not applications of schema.org (at least not in general)
- Response: I updated the introduction
- Sec 1. "KGs were developed and integrated into tools -- for example by Wikipedia and DBPedia". I consider this sentence imprecise, given that it is mixing Wikipedia (tool ?), which indeed integrated WikiData, and DBpedia (check spelling), which is a proper kg.
- Response: Updated introduction and add a clarification
- Sec 1. "comparable capabilities to the Semantic Web". Which capabilities are you referring to exactly? I can indeed name a few SemWeb capabilities which are not covered by your approach, namely interlinking with external data, being both machine- and human-readable, ...
- Response: I updated the introduction
- Sec 2.1 "The graph here". Which graph?
- Response: I updated the graph G as the graph which is developed
- Where are sections 2.2 and 2.3?
- Response: I corrected wrong section numbers
- 2.4.1. ISTQB and IEEE 829 should be properly referenced (citation or footnote link)
- Response: I added the references
- Figure 2 is not readable when printed in black&white (and blocks are enough big to be possible to integrate the legend into them, instead of only using colors)
- Response: I updated figure 2
- Sec 2.4.2. Usually, we speak about Statically- / Dynamically- typed languages or Statical / Dynamic type checking. It should be more clear that the Static/Dynamic division is about the check on the types (rather the full program)
- Response: I added the restriction
- Figure 4 is hard to understand. Why the shape of a "V"? Why "System Verification" is repeated twice? Again, this is not black&white-friendly
- Response: I removed to focus more on the key points
- Figure 5 is again hard to understand. What are the 3 circles (in particular the middle one)? The caption is not helping at all
- Response: Add a legend, improved the label and added an introduction
- - Sec 2.4.5 should instead be 2.5 (also because of the absence of 2.4.3 and 2.4.4)
- Response: I updated the section numbers
- - Sec 3. "Provide test relevant serviceS" (missing "S")
- Response: Updated
- Figure 6 has a magnifier that covers the text. Moreover, what is it pointing to? The arrow between the 2 upper blocks? How?
- Response: Figure has been updated
- Figure 7 "ModelL" (extra "L")
- Response: Figure has been updated
- Sec 3.1 ends with ".." (extra full stop)
- Response: Updated
- Figure 9 has quite a hard notation, not explained in the caption. What is each block? Why is the statement (i guess) "logic_p0_c0_s0" in repeated both states (again, I guess) of Design Pattern 1? What is that triangle warning? What is fsm-editor?
- Response: I add a legend and redesigned the figure
- Sec 3.2 "Design pattern 1 is that only A signal is stimulated with A new value". Those A should be replaced with ONE, otherwise, I do not understand the sentence
- Response: Updated
- Sec 3.2. "The grammar shows that ...". I actually do not see the role of XML element tree and XML attribute ID in the grammar. To which do they correspond?
- Response: Thanks for the comment. The grammar missed the right syntax for ID
- - Sec 3.2. "eatThe paper aims to cr.." From this point onward, the topic of the discourse changed from <parsing/manipulating the XML> (as in the last 2 pages) to <creating a graph> (as in the next 3 pages). Please consider splitting this into 2 separate paragraphs
- Response: Updates
- - Sec 3.2. "basic requirements shown in Figure 2". Figure 2 does not contain any requirements (neither does Figure 1).
- Response: Clarified the misunderstanding
- - Sec 3.2. "The usage of this label is restricted". To what? How?
- Response: I updated the text. The labels define the type of node and thus, the usage is restricted
- Sec 3.2. It is not clear if a node v denotes a statement OR a state, or if the state itself is considered a statement. I believe that the first option is the correct one. Therefore, corrections should be done in line 403 ("i.e." => "or") and in the rest of the text, in which the 2 concepts are confused and should probably be better referred as just "nodes"
- Response: I introduced the concept of a micro and macro state to describe it better
- Sec 3.2. Some sentences have to be read as the process is happening now. I recommend getting rid of the time dependencies at line 405 ("current" => "same") and 410 ("immediately" => "simultaneously"). The suggested corrections make sure that …
- Response: I updated it.
- - "verifytolerance" contains several different typos in the different mentions in the text
- Response: I corrected the spelling
- - The equation id progression (e.g. (1), (4)) is skipping numbers and restarting at a certain point
- Response: I correct the numbers, thanks a lot for mentioning it
- Sec 3.2. What is the name of a statement? Can you give some examples?
- Response: Gave an example of the TASCXML:set
- Equation (7) (and following tables/text) is confusing V (set of vertices) with v_i (a single vertex in V).
- Response: I add explanation and assigned right variable related to graph G
- Table 3 includes vertex as {'any_label', P}, while in the text a vertex should be "an ORDERED pair (p_i, l_i)". The order is wrong, P should be p_i. Not clear how P is valued.
- Response: I updates Table 3 is and also the corresponding definition of P
- Table 4. The labels (e.g. `simple`, `parallel_statement`, ...) should be uppercase as in (4). It is not clear if the `blocking_statement` edges are anyhow instantiated. In the last row, the tags are opening and closing in a not XML-valid way (e.g. xsd:complexType) and there is a pending "<". I believe that the understanding of the process would largely benefit from examples (with values) for each raw (also for Table 3). Actually, Table 3 and 4 can be moved to an appendix, for having more space
- Response: I add examples and improved the XML snippets
- Figure 10. Why do some nodes have no label? Where are P1, P2, and Z represented? I think that a graph showing all contributions is needed here
- Response: I add examples and created a graph for a concrete example showing v, e, l, P
- "children of" => "linked to". The graph is not representing nodes in a hierarchy
- Response: Updated the graph definition completely
- - Sec 3.3. "For him, a stimulus is not a node in the graph". It is not clear if for us (authors + reader) it is. Probably a better definition of stimulus may help.
- Response: Update the paragraph
- Figures 11 and 12. What are those snippets doing? Can you provide natural language versions of these querying? Are these instructions supposed to be parallel or blocking? Are those ":" wrong? Really there was not a better way to include them in the paper that a screenshot of the text editor?
- Response: I updated the figure to the common look and feel. Moreover, the terminal symbol ENDCharacter: ';' defines the end of a row
- - Sec 4.3. "fewer" => "lower"
- Response: I exchanged the word
- 5.4. Excel sheets and natural language CAN be imported in KGs. Simply, they can't with the described method
- Response: I clarified that the own graph and approach is meant
- - Sec 5.4. To which copyright are you referring? I guess every company owns its own data (the only ones it is interested to work on), so I am a bit confused
- Response: I updated the section. The vision is that the OEM and suppliers upload their test cases. After uploading, the consideration of the IPR is difficult.
- Sec 1. "ontologies [...] apply schema.org". I am not sure that the authors were actually meaning that, but ontologies are not applications of schema.org (at least not in general)
Round 2
Reviewer 2 Report
I thank the authors for the improvements to the paper, which make this work more understandable and, finally, evaluable.
However, there are still some elements that are critical.
The claim "ontologies are graphs" is repeated twice, but I doubt this claim, for the following reasons. 1. The support references do not contain such a claim, almost never mentioning graphs at all (and [13] not really speaking about ontologies). 2. Ontologies can involve also different information structures rather than graphs. 3. I believe that "Graphs can be modeled according to an ontology" or "Ontologies can be visualized with a graph" are more correct versions of the statement.
The definition of "ontology" was one of the points I rose in the last review; I think the authors should not "force" the definition to fit their paper content, but respect this definition (as in the papers they reference) and use the word with its proper meaning.
The reference section has still some important improvement points (see comments below)
My biggest criticism is about the evaluation part.
First, that large parts of the text can be also found in a paper from 2 years ago [34], this makes me doubt if the evaluation is referring to this new approach or to the old one.
Then, the authors are not sharing any details about the test set (is this available in order to replicate the experiment? some example?) nor the exact results ("lower than 100%" is too generic).
Finally, no comparison with other systems is presented, nor is the one tested in [34].
This lack of details makes me doubt the effectiveness of the usage of a knowledge graph for this specific problem, given that it is impossible to evaluate if this is improving or not the existing (non-graph-based) systems.
Other detailed comments:
Figure 3: What is "type safety"? I would doubt that Python is "type safe". Moreover, the indentation is bizarre, as the presence of ending semicolons (not required in Python). Moreover, the keyword "new" does not exist in Python (basically, that code is not working).
In the code on page 12, a lot of "DIVISION" appeared in place of "/". Are these errors?
Sec 5.4. about intellectual property. I think it is reasonable to think that the presented system is the generic framework for performing this kind of operation (search), but of course, all proprietary languages have to provide their mappings to make this work with their system. Then they do not upload the test case in their format but in the shared KG format. What I am missing?
Typos
Section 1. "the authors focusSES"
Table 1. "an statement"
Table 2 would be more readable if removing all these "Statements for" in the second column
References. "Scribd Inc" is a website for sharing documents. I strongly doubt that they are also the authors of the cited document [25]
References. Is [6] and [28] the same document or not?
References. Authors missing at [13], [17], [18]. Links are wrong in [13], [27]. In [35] there are weird repetitions (e.g. Apr.)
References. I think you should credit for reference [17] Michael K. Bergman, which wrote the article called A Common Sense View of Knowledge Graphs on his personal blog "AI3"
Author Response
Dear reviewer,
Thank you a lot for the excellent review and for pointing out the needed improvements. I have revised the respective chapters according to your instructions.
In the following, I give a detailed point-by-point response:
- The claim "ontologies are graphs" is repeated twice, but I doubt this claim, for the following reasons. 1. The support references do not contain such a claim, almost never mentioning graphs at all (and [13] not really speaking about ontologies). 2. Ontologies can involve also different information structures rather than graphs. 3. I believe that "Graphs can be modeled according to an ontology" or "Ontologies can be visualized with a graph" are more correct versions of the statement.The definition of "ontology" was one of the points I rose in the last review; I think the authors should not "force" the definition to fit their paper content, but respect this definition (as in the papers they reference) and use the word with its proper meaning.
- Response: I updated the paragraph and removed the reference to [13], because there was no added value to mention it.
- The reference section has still some important improvement points (see comments below)
- Response: I updated the references
- My biggest criticism is about the evaluation part. First, that large parts of the text can be also found in a paper from 2 years ago [34], this makes me doubt if the evaluation is referring to this new approach or to the old one.
- Response: I added an explanation inside the evaluation and explained why the evaluation is still valid
- Then, the authors are not sharing any details about the test set (is this available in order to replicate the experiment? some example?) nor the exact results ("lower than 100%" is too generic).
- Response: The test set is confidential and cannot be shared. More details about the test set and the result are presented in the evaluation. Moreover, access to the example test case in SCXML and the pattern search are granted.
- Finally, no comparison with other systems is presented, nor is the one tested in [34]. This lack of details makes me doubt the effectiveness of the usage of a knowledge graph for this specific problem, given that it is impossible to evaluate if this is improving or not the existing (non-graph-based) systems.
- Response: I explained why this isn't possible inside the evaluation. A summary: The structure of test cases which stimulates a set of statements within the current time frame needs to be considered inside the pattern search. That means two different sequences of statements are semantically similar if they include the same set of TASCXML commands. This requirement can only be handled efficiently using a graph.
- Other detailed comments:
- Figure 3: What is "type safety"? I would doubt that Python is "type safe". Moreover, the indentation is bizarre, as the presence of ending semicolons (not required in Python). Moreover, the keyword "new" does not exist in Python (basically, that code is not working).
- Response: I updated the source code inside figure 3 and checked the syntax with the Python interpreter. I also updated the labels to clarify which type safety is meant.
- In the code on page 12, a lot of "DIVISION" appeared in place of "/". Are these errors?
- Response: You are right. It is a misleading name for "/". I replaced it with SEPERATOR and checked programmatically whether the grammar is still valid
- Sec 5.4. about intellectual property. I think it is reasonable to think that the presented system is the generic framework for performing this kind of operation (search), but of course, all proprietary languages have to provide their mappings to make this work with their system. Then they do not upload the test case in their format but in the shared KG format. What I am missing?
- Response: I added an additional sentence and explained that uploaded test cases are still assigned to their IPR, and this needs to be handled.
- Typos
- Section 1. "the authors focusSES"
- Table 1. "an statement"
- Table 2 would be more readable if removing all these "Statements for" in the second column
- Response: Thanks a lot. I found them and fixed them.
- "Scribd Inc" is a website for sharing documents. I strongly doubt that they are also the authors of the cited document [25]
- Response: I deleted the reference and replaced it by another reference
- Is [6] and [28] the same document or not?
- Response: I deleted reference [28]
- Authors missing at [13] (Schema. Org), [17] (A Common Sense View of Knowled), [18] (Knowledge graphs: New directions for ). Links are wrong in [13] (schema) , [27] (Types and programming languages). In [35] there are weird repetitions (e.g. Apr.) (uri shami)
- Response: I checked the need and updated them
- I think you should credit for reference [17] Michael K. Bergman, which wrote the article called A Common Sense View of Knowledge Graphs on his personal blog "AI3"
- Response: I added his name in the reference and assigned him within in the section: Knowledge Graph in the scope of test knowledge
Round 3
Reviewer 2 Report
Even if the paper has largely been improved since its first version, my concerns about crucial parts of the work stand, namely in the evaluation.
The evaluation -- still giving the same results as Franke et al. SSRN (2020) [30] -- makes me wonder how this work is improving the previous one.
The confidentiality of even the test cases makes me doubt if this approach is really generalisable, or if it can be only applied to the specific studied company of this work. In addition, we have very little information about these test cases: how many are them? what is their max, min and avg length in number of statements?
From what is written, we can't say if the right tests are correctly retrieved among the top K retrieved ones, e.g. applying a Mean Average Precision at K (MAP@K) metric.
Even without a test against a competitor, a test against a baseline can be performed. The baseline can be as simple as "pick the most similar textual representation of the test". Another possible alternative is to perform a user evaluation.
Minor:
In 2.2 two test ontologies have been mentioned without references.
I don't understand what this SEPERETOR (maybe sepArator?) is and why "/" have been replaced by it.
Author Response
Dear reviewer,
Thank you a lot for the excellent review and for pointing out the needed improvements. I have revised the respective chapters according to your instructions.
In the following, I give a detailed point-by-point response:
- The evaluation -- still giving the same results as Franke et al. SSRN (2020) [30] -- makes me wonder how this work is improving the previous one.
- Response: The work is a straightforward improvement related to the coverage of TASCXML commands and the extended statements types inside the pattern language. The core is preliminary work and the evaluation result is still valid for the extended version. For that purpose, the existing evaluation has been referred, and no new evaluation has been carried out.
- The confidentiality of even the test cases makes me doubt if this approach is really generalisable, or if it can be only applied to the specific studied company of this work.
- Response: I added a paragraph inside the application scenario to explain that the results are applicable for all HIL loop tests on integration and system level
- In addition, we have very little information about these test cases: how many are them? what is their max, min and avg length in number of statements?
- Response: I added a table inside the evaluation defining the applied test cases in more detail
- From what is written, we can't say if the right tests are correctly retrieved among the top K retrieved ones, e.g. applying a Mean Average Precision at K (MAP@K) metric.
- Response: I added a clarification inside evaluation/future work that this kind of evaluation is needed but not part of the article
- Even without a test against a competitor, a test against a baseline can be performed. The baseline can be as simple as "pick the most similar textual representation of the test". Another possible alternative is to perform a user evaluation.
- Response: I extended the statement why any keyword-based search (independent of the selected approach) is not applicable inside the evaluation. The authors still believe that a search over the test script language syntax cannot be the baseline for a pattern search
Minor:
- In 2.2 two test ontologies have been mentioned without references.
- Response: I added the references
- I don't understand what this SEPERETOR (maybe sepArator?) is and why "/" have been replaced by it.
- Response: I updated the spelling error.