Towards a Refined Heuristic Evaluation: Incorporating Hierarchical Analysis for Weighted Usability Assessment
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe current manuscript adopts the so-called Analytic Hierarchy Process and a tailored algorithm to modify the Granollers heuristic evaluation method and to streamline the evaluation process. The paper is well-written in general and detailed. However,
- The data acquisition procedure is described in detail in section 3.4; however, the data examination or refining is not well-described in details.
- In section 4.1, the authors state that "This smart approach can significantly reduce the necessary number of comparisons" and they mention some numbers there; however, it is not clear how does that reduction relate to N (as the number of criteria) in general?
- Any comparison to another candidate heuristic evaluation framework?
- The number of the references is relatively large, and it can be reduced to something around 40-50 relevant related journal papers and the very relevant conference papers if any.
- Make sure that there are no redundant parts in the revised paper.
Comments on the Quality of English Language
The paper is well-written in general; however, proofreading is required. For example, why to write "weighted" instead of "Weightened" in section 4.2
Author Response
Thank you very much for taking the time to review this manuscript. Please find the detailed responses in an attached file and the corresponding revisions/corrections highlighted in blue color the re-submitted files
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for Authors
The present manuscript focuses on UI heuristic prioritization using AHP, using transitive properties for pairwise comparisons. In my view, the approach has fundamental flaws.
First, I think the authors should be very clear in the manuscript about what they focus on: is it actually “usability” or “user interface”? The terms are not identical. Please see here: https://www.interaction-design.org/literature/topics/usability. This work addresses !UI! heuristics, as I understood it. But, for instance, the authors refer to the well-known “Nielsen, J.; Molich, R. Heuristic evaluation of !!user interfaces!!” as a method of !usability! evaluation. Usability evaluation goes way beyond UI heuristics, although UI design is indeed crucial to attain usability. I think terminology is an important issue within the manuscript and should be therefore clarified / fixed.
But the most important flaw comes from the usage of AHP, let alone !relying on AHP! for prioritizing heuristics. First: why weighting / prioritizing heuristics? After all, it’s *heuristics* which are mental shortcuts or rules of thumb, not mathematical rules.
Second: Why AHP? The authors state that AHP “excels in evaluating complex scenarios with multiple criteria”, but don’t provide any reference to support this. AHP presents significant drawbacks, like not considering the dependence and interaction between criteria. It is rather controversial in the field of MCDM. Although not admitting it, even Saaty tried to improve the method by proposing the ANP method (arguably flawed too, as it is based on AHP). Consider the manuscript’s context: how can one prove that UI heuristics are decoupled?
Third (and this one seems directly against the AHP pairwise comparison idea): If you use transitivity to reduce the number of pairwise comparisons, isn't this against the very idea of "diffusing" the inherent subjectivity in a pairwise comparison? With transitivity, you just enhance the subjectivity of the "source" pair comparison(s), instead of "compensating" it (them) by the subjectivity of all the other "human-made" pair comparisons. Indeed, the Consistency Index will look “great”, but a consistent AHP matrix does not imply objectivity.
It’s true that transitivity simplifies things, and it also assumes that the relative importance or preference of criteria or alternatives is consistent across all levels of the hierarchy. However, evaluators may have different preferences or judgments at different levels of the hierarchy, and these differences may not be fully captured by transitivity.
And there may be even “more” of AHP: if the evaluation scenario involves relationships between “alternatives” (i.e. what we’re “judging” – the UI screens, in this case), AHP ignores these relationships. ANP considers them, but only the existence of relationship, but not - the most important - the quantitative influence of one over the other. And such relationships (between the different parts of UI) do exist.
Bottom line: I think AHP is totally unsuited (and incorrectly used) in this research.
Some other issues:
-
the introduction does not properly highlight the research goal
-
(row 42) “the approaches such as SUS capture user feedback quantitatively” – SUS evaluates the perceived usability of a system, so I doubt it’s quantitative
-
(row 336) the mean of aggregating weights is not explained
-
(row 356) what means [blind review]?
-
references like 11, 12, 13, or 23 are too old – there’s newer research on these topics
Only minor issues observed: for instance on row 134: "showed acceptable and robustness."
Or on row 104: "principles for crafting engaging user experience."
Author Response
Thank you very much for taking the time to review this manuscript. Please find the detailed responses in an attached file and the corresponding revisions/corrections highlighted in blue color the re-submitted files
Author Response File: Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsIn this paper, we introduce a standardized approach to heuristic evaluation in usability testing in the field of human-computer interaction, integrating the Analytic Hierarchy Process and a customized algorithm using transitive features for pairwise comparisons. In the work done here, the evaluation process has been significantly simplified. In addition to reducing the complexity and workload associated with the traditional prioritization process, this method can contribute significantly to the accuracy of usability heuristic test results. In short, the summary here is that prioritizing heuristics based on their importance as determined by the Usability Testing Leader, rather than relying solely on the number of items, scale or number of heuristics, ensures that our approach focuses evaluations on the most critical aspects of usability. In this sense, it is a very important contribution to the literature. However, additional expertise and training may be required to apply the method effectively. It should also be noted that relying on expert judgments may limit the inclusiveness of the usability testing process and may ignore the views from a variety of user perspectives. Therefore, effective implementation of this method may require ongoing support and training for usability professionals, details of which should be provided in the paper. The discussion section of the paper should be enriched. For a clearer understanding of Figures 4 and 5, the visualization of the figures should be enhanced and additional explanations should be provided.
Author Response
Thank you very much for taking the time to review this manuscript. Please find the detailed responses in an attached file and the corresponding revisions/corrections highlighted in blue color the re-submitted files.
Author Response File: Author Response.pdf
Round 2
Reviewer 2 Report
Comments and Suggestions for AuthorsI really appreciate the effort to improve the manuscript. At the same time, I notice that there are no fundamental changes, therefore the problems that I initially highlighted still remain. Consequently, so do my objections to the proposed methodology. Here are my “updated” remarks.
1. Terminology: In my opinion, the context of the work should be narrowed down to 'heuristic UI assessment'. I can understand that, historically, Nielsen proposed his heuristics as a tool for 'UI *usability* assessment', but this was back in the 1990s. Since then, the perception on usability has evolved. Please see here, for instance: https://www.researchgate.net/publication/233479827_Usability_A_Critical_Analysis_and_a_Taxonomy, even if the reference is not necessarily new.
2. Research goal: I would keep it rather narrowed to something like “the need of easily / objectively assessing UIs along the implementation phase of a software application”, and not “comprehensively examines usability evaluation within human-computer Interaction, encapsulating its historical context, significance, advancements, and the challenges confronting this domain.” - the manuscript does not cover that! Also, it’s stated that “This investigation seeks to enhance the discourse in HCI by presenting a methodologically rigorous and empirically validated approach to usability evaluation” but it’s actually UI evaluation (criteria weighting) what the research is about.
3. Methodology: I still think AHP is the worst choice here :) - why not TOPSIS or another MCDM tool? But even so: my feel is that the relative relevance of each heuristic is not static: it depends on the specific lifecycle stage of the interface (a living prototype’s UI may be differently judged than the beta version’s UI which may be differently judged than the final release’s UI), and also on the user background and cognitive particularities. AHP does not handle this, but this problem should be at least discussed.
3a. Pairwise comparisons: To quote Prof. Munier: “consistency only means that there is a transitivity between all the DM estimates, which, by the way, does not correspond to reality”. You kind of “shortcut” the pairwise comparisons, “to reduce bias due to exhaustiveness”. In my opinion, this is methodologically “double wrong”. Please consult his book: Uses and Limitations of the AHP Method (it’s relatively new: 2021).
Bottom line: Despite the major flaws of the manuscript in its present form, I’m strongly encouraging the authors to reconsider the methodology and resubmit it as a new manuscript.
Minor language issues.
Author Response
Thank you very much for taking the time to review this manuscript. Please find the detailed responses in an attached file and the corresponding revisions/corrections highlighted in blue color the re-submitted files
Author Response File: Author Response.pdf
Round 3
Reviewer 2 Report
Comments and Suggestions for Authors
Dear Authors,
I’m aware that changing the methodology implies a consistent rework which means a lot of effort. Still, in its present form, in my opinion the methodology is seriously flawed. But it’s eventually up to you and the Editor if you like the manuscript to get published in this form.
Please recheck the manuscript as some minor issues still exist – for instance “Table ??” on row 242.
Kind regards!
Author Response
Here's a refined response to the reviewer's feedback:
Dear Reviewer,
We deeply appreciate your efforts to encourage us to enhance our research and improve the quality of our paper. We understand your concerns regarding the methodology and acknowledge that implementing those changes would require substantial rework. While we have chosen to maintain the current methodology for this manuscript, we have considered all your suggestions and are committed to implementing these improvements in future works. This will help us continue growing as HCI and heuristic UI assessment researchers.
Additionally, we have rechecked the manuscript and corrected the minor issues, including the citation error "Table ??" on row 242.
Thank you again for your invaluable feedback and support.
Kind regards,